(1) |
(2) |
The median of the d_{i} values is found when the values are sorted from smallest to largest and then the value in the middle is selected. The median is delightfully well behaved even if some of your data values happen to be near infinity. Analytically, the median arises from the optimization
(3) |
(4) |
(5) |
(6) |
Before this chapter, our model building was all based on the norm. The median is clearly a good idea for data containing large bursts of noise, but the median is a single value while geophysical models are made from many unknown elements. The norm offers us the new opportunity to build multiparameter models where the data includes huge bursts of noise. L-2 norm L-1 norm L-0 norm
Yet another average is the ``mode,'' which is the most commonly occurring value. For example, in the number sequence (1,1,2,3,5) the mode is 1 because it occurs the most times. Mathematically, the mode minimizes the zero norm of the residual, namely .To see why, notice that when we raise a residual to the zero power, the result is 0 if d_{i}=m_{0}, and it is 1 if .Thus, the sum of the residuals is the total number of residuals less those for which d_{i} matches m_{0}. The minimum of is the mode m=m_{0}. The zero power function is nondifferentiable at the place of interest so we do not look at the gradient.
and are convex functions of m (positive second derivative for all m), and this fact leads to the triangle inequalities for and assures slopes lead to a unique (if p>1) bottom. Because there is no triangle inequality for ,it should not be called a ``norm'' but a ``measure.''
Because most values are at the mode,
the mode is where a probability function is maximum.
The mode occurs with the maximum likelihood.
It is awkward to contemplate the mode for floating-point values
where the probability is minuscule (and irrelevant)
that any two values are identical.
A more natural concept is to think of the mode
as the bin containing the most values.