We can formalize the situation.
A parametric equation for a line is
where
is the parameter for moving on the line.
The process of selecting
is called ``line search."
Think of a two-dimensional example
where the vector of unknowns
has just two components, x1 and x2.
Then the size of the residual vector
can be
displayed with a contour plot in the plane of (x1,x2).
Our ellipsoidal bowl has ellipsoidal contours of constant altitude.
As we move in a line across this space by adjusting
,equation(45)
gives our altitude.
This equation has a unique minimum because it is a parabola in
.As we approach the minimum,
our trajectory becomes tangential to a contour line in (x1,x2)-space.
This is where we stop.
Now we compute our new residual
and we compute the new gradient
.OK, we are ready for the next slide down.
When we turn ourselves from "parallel to a contour line"
to the direction of
which is "perpendicular to that contour",
we are turning
.Our path to the bottom of the bowl will be made of many segments,
each turning
from the previous.
We will need an infinite number of such steps to reach the bottom.
It happens that the amazing conjugate-direction method
would reach the bottom in just two jumps
(because (x1,x2) is a two dimensional space.)
Missing figure (ls-sawtooth) A search path for steepest descent.