We’ve all seen (∂h/∂x)dx + (∂h/∂y)dy + (∂h/∂z)dz many times and have used it to calculate without understanding what the various symbols actually mean. In my case, it’s just another example of mouthing mathematical incantations without understanding them, something I became very good at at young age — see https://luysii.wordpress.com/2022/06/27/the-chinese-room-argument-understanding-math-and-the-imposter-syndrome/ for the gory details.
And now, finally, within a month of my 85th birthday, I finally understand what’s going on by reading only the first 25 pages of “Elementary Differential Geometry” revised second edition 2006 by Barrett O’Neill.
I was pointed to it by the marvelous Visual Differential Geometry by Tristan Needham, about which I’ve written 3 posts — this link has references to the other two — https://luysii.wordpress.com/2022/03/07/visual-differential-geometry-and-forms-q-take-3/
He describes O’Neill’s book as follows. “First published in 1966, this trail-blazing text pioneered the use of Forms at the undergraduate level. Today more than a half-century later, O’Neill’s work remains, in my view the single most clear-eyed, elegant and (ironically) modern treatment of the subject available — present company excepted! — at the undergraduate level”
It took a lot of work to untangle the notation (typical of all works in Differential Geometry). There is an old joke “differential geometry is the study of properties that are invariant under change of notation” which is funny because it is so close to the truth (John M. Lee)
So armed with no more than calculus 101, knowing what a vector space is, and a good deal of notational patience, the meaning of (∂h/∂x)dx + (∂h/∂y)dy + (∂h/∂z)dz (including what dx, dy and dz really are) should be within your grasp.
We begin with R^3, the set of triples of real numbers (a_1, a_2, a_3) where _ means that 1, 2, 3 are taken as subscripts). Interestingly, these aren’t vectors to O’Neill which will be defined shortly. All 3 components of a triple can be multiplied by a real number c — giving (c*a_1, c*a_2, c*a_3). Pairs of triples can be added. This makes R^3 into a vector space (which O’Neill calls Euclidean 3-space), the components of which are triples (which O’Neill calls points), but that is not how O’Neill defines a vector, which are pairs of points p = (p_1, p_2, p_3) and v = (v_1, v_2, v_3) — we’ll see why shortly.
A tangent vector to point p in R^3 is called a tangent vector to p (and is written v_p) and is defined as an ordered pair of points (p, v) where
p is the point of application of v_p (aka the tail of p)
v is the vector part of v_p (aka the tip of v_p)
It is visualized as an arrow whose tail is at p and whose tip (barb) is at p + v (remember you are allowed to add points). In the visualization of v_p, v does not appear.
The tangent space of R^3 at p is written T_pR^3 and is the set of vectors (p, v) such that p is constant and v varies over all possible points.
Each p in R^3 has its own tangent space, and tangent vectors in different tangent spaces can’t be added.
Next up functions.
A real value function on R^3 is written
f : R^3 –> R^1 (the real numbers)
f : (a_1, a_2, a_3) |—> c (some real number)
This is typical of the way functions are written in more advanced math, with the first line giving the domain (R^3) of the function and the range of the function (R^1) and the second line giving what happens to a typical element of the domain on application of the function to it.
O’Neill assumes that all the functions on domain R^3 have continuous derivatives of all orders. So the functions are smooth, differentiable or C^infinity — take your pick — they are equivalent.
The assumption of differentiability means that you have some mechanism for seeing how close two points are to each other. He doesn’t say it until later, but this assumes the usual distance metric using the Pythagorean theorem — if you’ve taken calc. 101 you know what these are.
For mental visualization it’s better to think of the function as from R^2 (x and y variables — e.g,. the Euclidean plane) to the real numbers. This is the classic topographic map, which tells how high over the ground you are at each point.
Now at last we’re getting close to (∂f/∂x)dx + (∂f/∂y)dy + (∂f/∂z)dz.
So now you’re on a ridge ascending to the summit of your favorite mountain. The height function tells you how high your are where you’re standing (call this point p), but what you really want to know is which way to go to get to the peak. You want to find a direction in which height is increasing. Enter the directional derivative (of the height function) Clearly height drops off on either side of the ridge and increases or decreases along the ridge. Equally clearly there is no single directional derivative here (as there would be for a function g : R^1 –> R^1). The directional derivative depends on p (where you are) and v the direction you choose — this is why O’Neill defines tangent vectors by two points (p, and v)
So the directional derivative requires two functions
the height function h : R^3 –> R^1
the direction function f : p + t*v where t is in R^1. This gives the a line through p going in direction v
So the directional derivative of h at p is
d/dt (h (p + t*v)) | _t = 0 ; take the limit of h (p + t*v) as t approaches zero
Causing me a lot of confusion, O’Neill gives the directional derivative the following name v_p[h] — which gives you no sense that a derivative of anything is involved. This is his equation
v_p[f] = d/dt (h (p + t*v)) | _t = 0
Notice that changing p (say to the peak of the mountain) changes the directional derivative — all of them point down. This is why O’Neill defines tangent vectors using two points (p, v).
Now a few more functions and the chain rule and we’re about done.
x : R^3 –> R^1
x : (v_1, v_2, v_3 ) |–> v_1
similarly y :R^3 –> R^1 picks out the y coordinate of (v_1, v_2, v_3 ) e.g. v_2
Let’s look at p + t*v in coordinate form, remembering what p and v are that way
p + t*v = ( p_1 + t * v_1, p_2 + t * v_2, p_3 + t * v_3)
Remember that we defined f = p + t *v
so df/dt = d( p + t*v )/dt
expanding
df’/dt= d( p_1 + t * v_1, p_2 + t * v_2, p_3 + t * v_3)/dt = (v_1, v_2, v_3)
Let’s be definite about what h : R^3 –> R^1 actually is
h : (x, y, z) |—> x^2 * y^3 *z ^4 meaning you must use partial derivatives
so ∂h/∂x = 2 x * y^3 * z* 4, etc.,
Look at v_p[h] = d/dt (h (p + t*v)) | _t = 0 again
It’s really v_p[h] = d/dt (h (f (t))|_=0
so it’s time for the chain rule
d/dt (h (f (t)) = (dh/df ) * (df/dt)
dh/df in coordinates is really
(∂h/∂x, ∂h/∂y,∂h/∂z)
df/dt in coordinates is really
(v_1, v_2, v_3)
But the chain rule is applied to each of the three terms
so what you have is d/dt (h (f (t)) = (∂h/∂x * v_1, ∂h/∂y * v_2, ∂h/∂z * v_3)
I left one thing out. The |_=0
So to do this you need to plug in the numbers (evaluating everything at p) and sum so what you get is
v_p[h] = ∂h/∂x * v_1 + ∂h/∂y * v_2 + ∂h/∂z * v_3
We need one more definition. Recall that the tangent space of R^3 at p is written T_pR^3 and is the set of vectors (p, v) such that p is constant and v varies over all possible points.
The set of all tangent spaces over R^3 is written (TR^3)
Finally on p. 24 O’Neill defines what you’ve all been waiting for : dh
dh : TR^3 –> R^1
dh : p ——> v_p[h] = ∂h/∂x * v_1 + ∂h/∂y * v_2 + ∂h/∂z * v_3
One last bit of manipulation — what is dx (and dy and dz)?
we know that the function x is defined as follows
x : R^3 –> R^1
x : (v_1, v_2, v_3 ) |–> v_1
so dx = (dx/dx, dx/dy, dx/dz)|_=0
is just v_1
so at (very) long last we have
dh : TR^3 –> R^1
dh : p ——> v_p[h] = ∂h/∂x * dx + ∂h/∂y * dy + ∂h/∂z * dz
Remember ∂h/∂x, ∂h/∂y, ∂h/∂z are all evaluated at p = (p_1, p_2, p_3)
So it’s a (fairly) simple matter to apply dh to any point p in R^3 and any direction (v_1, v_2, v_3) in R^3 to get the directional derivative
Amen. Selah.