Category Archives: Math

High level mathematicians look like normal people

Have you ever had the pleasure of taking a course from someone who wrote the book? I did. I audited a course at Amherst from Prof. David Cox who was one of three authors of “Ideals, Varieties and Algorithms” It was uncanny to listen to him lecture (with any notes) as if he were reading from the book. It was also rather humbling to have a full professor correcting your homework. We had Dr. Cox for several hours each weak (all 11 or 12 of us). This is why Amherst is such an elite school. Ditto for Princeton back in the day, when Physics 103 was taught by John Wheeler 3 hours a week. Physics 103 wasn’t for the high powered among us who were going to be professional physicists (Heinz Pagels, Jim Hartle), it was for preMeds and engineers.

Dr. Cox had one very useful pedagogical device — everyone had to ask a question at the beginning of class, Cox being of the opinion that there is no such thing as a dumb question in math.

Well Dr. Cox and his co-authors (Little and O’Shea) got an award from the American Mathematical sociecty for their book. There’s an excerpt below. You should follow the link to the review to see what the three look like along with two other awardees. Go to any midsize American city at lunchtime, and you’d be hard pressed to pick four of the five out of the crowd of middle aged men walking around. Well almost — one guy would be hard to pick out of the noonday crowd in Williamsburg Brooklyn or Tel Aviv. Four are extremely normal looking guys, not flamboyant or bizarre in any way. This is certainly true of the way Dr. Cox comports himself. The exception proving the rule however, is Raymond Smullyan who was my instructor in a complex variables course back in the day– quite an unusual and otherworldly individual — there’s now a book about him.

Here’s part of the citation. The link also contains bios of all.

“Even more impressive than its clarity of exposition is the impact it has had on mathematics. CLO, as it is fondly known, has not only introduced many to algebraic geometry, it has actually broadened how the subject could be taught and who could use it. One supporter of the nomination writes, “This book, more than any text in this field, has moved computational algebra and algebraic geometry into the mathematical mainstream. I, and others, have used it successfully as a text book for courses, an introductory text for summer programs, and a reference book.”
Another writer, who first met the book in an REU two years before it was published, says, “Without this grounding, I would have never survived my first graduate course in algebraic geometry.” This theme is echoed in many other accounts: “I first read CLO at the start of my second semester of graduate school…. Almost twenty years later I can still remember the relief after the first hour of reading. This was a math book you could actually read! It wasn’t just easy to read but the material also grabbed me.”
For those with a taste for statistics, we note that CLO has sold more than 20,000 copies, it has been cited more than 850 times in MathSciNet, and it has 5,000 citations recorded by Google Scholar. However, these numbers do not really tell the story. Ideals, Varieties, and Algorithms was chosen for the Leroy P. Steele Prize for Mathematical Exposition because it is a rare book that does it all. It is accessible to undergraduates. It has been a source of inspiration for thousands of students of all levels and backgrounds. Moreover, its presentation of the theory of Groebner bases has done more than any other book to popularize this topic, to show the powerful interaction of theory and computation in algebraic geometry, and to illustrate the utility of this theory as a tool in other sciences.”

Types of variables you need to know to understand thermodynamics

I’m been through the first 200 pages of Dill’s Book “Molecular Driving Forces (2003)” which is all about thermodynamics and statistical mechanics, things that must be understood to have any hope of understanding cellular biophysics. There are a lot of variables to consider (with multiple names for some) and they fall into 7 non mutually exclusive types.

Here they are with a few notes about them

l. Thermodynamic State Variables: These are the classics — Entropy (S), Internal Energy (U), Helmholtz Free Energy (F), Gibbs Free Energy (G), Enthalpy (H).
All are continuous functions of their Natural Variables (see next) so they can be differentiated. Their differentials are exact.

2. Natural variable of a thermodynamic state variable — these are defined as continuous variables which when an extremum (maximum, minimum) of the state variable using them is found, the state function won’t change with time (e.g. is at equilibrium). Here they are for the 5 state functions. T is Temperature, V is Volume, N is number of molecules, S and U are what you think, and p is pressure

State Name State Function Natural Variables
Helmholtz Free Energy— F —T, V, N
Entropy —S —U, V, N
Internal Energy —U — S, V, N
Gibbs Free Energy— G —T. p, N
Enthalpy — H —S, p, N

Note that U and S are both state variables and natural variables of each other. Note also (for maximum confusion) that Helmholtz free energy is not H but F, and that H is Enthalpy not Helmholtz Free energy

3. Extensive variable –can only be characterized by how much there is of it. This includes all 5 thermodynamic state variables (F, S, U, G, H) alone with V volume, and N number of molecules.  Extensive variables are also known as degrees of freedom.

4. Intensive variable — temperature, pressure, and ratios of State and Natural variables (actually the derivative of a state variable with respect to a natural variable — temperature is actually defined this way ( partial U / partial S)

5. Control variables — these are under the experimenter’s control, and are usually kept constant. They are also known as constraints, and most are intensive (volume isn’t). Examples constant temperature, constant volume, constant pressure

6. Conjugate variables. Here we need the total differential of a state variable (which exists for all) in terms of its natural variables to under stand what is going on.

Since U is a continuous function of each of S, V, and N

we have

dU = (partial U/ partial X) dS + (partial U / partial V) dV + (partial u / partial N ) dN

= T dS – p dV – mu dN ; mu is the chemical potential

So T is conjugate to S, p is conjugate to V, and mu is conjugate to N ; note that each pair of conjugates has one intensive variable (T, p, mu) and one extensive one ( S, V, N). Clearly the derivatives ( T, p, mu) are intensive.

7. None of the above — work(w) and heat (q)

Thermodynamics can be difficult to master unless these are clear. Another reason is that what you really want is to maximize (S) or minimize (U, H, F, G) state variables — the problem is you have no way to directly measure the two crucial ones you really want (U, S) and have to infer what they are from various derivatives and control variables. You can measure changes in S and U  between two temperatures by using heat capacities. That’s just like spectroscopy, where all you measure is the difference between energy levels, not the energy levels themselves. But it is the minimum values of U, G, H, F and maximum values of S which determine what you want to know.

There’s more to come about Dill’s book. I’ve found a few mistakes and have corresponded with him about various things that seem ambiguous (to me at least). As mentioned earlier, in grad school 44 years ago, I audited a statistical mechanics course taught by E. Bright Wilson himself. I never had the temerity to utter a word to him. How things have changed for the better, to be able to Email an author and get a response. He’s been extremely helpful and patient.

The pleasures of enough time

One of the joys of retirement is the ability to take the time to fully understand the math behind statistical mechanics and thermodynamics (on which large parts of chemistry are based — cellular biophysics as well). I’m going through some biophysics this year reading “Physical Biology of the Cell” 2nd Edition and “Molecular Driving Forces” 2nd Edition. Back in the day, what with other courses, research, career plans and hormones to contend with, there just wasn’t enough time.

To really understand the derivation of the Boltzmann equation, you must understand Lagrange multipliers, which requires an understanding of the gradient and where it comes from. To understand the partition function you must understand change of variables in an integral, and to understand that you must understand why the determinant of the Jacobian matrix of a set of independent vectors is the volume multiplier you need.

These were all math tools whose use was fairly simple and which didn’t require any understanding of where they came from. What a great preparation for a career in medicine, where we understood very little of why we did the things we did, not because of lack of time but because the deep understanding of the systems we were mucking about with simply didn’t (and doesn’t) exist. It was intellectually unsatisfying, but you couldn’t argue with the importance of what we were doing. Things are better now with the accretion of knowledge, but if we really understood things perfectly we’d have effective treatments for cancer and Alzheimer’s. We don’t.

But in the pure world of math, whether a human creation or existing outside of us all, this need not be accepted.

I’m not going to put page after page of derivation of the topics mentioned in the second paragraph, but mention a few things to know which might help you when you’re trying learn about them, and point you to books (with page numbers) that I’ve found helpful.

Let’s start with the gradient. If you remember it at all, you know that it’s a way of taking a continuous real valued function of several variables and making a vector of it. The vector has the miraculous property of pointing in the direction of greatest change in the function. How did this happen?

The most helpful derivation I’ve found was in Thomas’ textbook of calculus (9th Edition pp. 957–> ). Yes Thomas — the same book I used as a freshman 6o years ago ! Like most living things that have aged, it’s become fat. Thomas is now up to the 13th edition.

The simplest example of a continuous real valued function is a topographic map. Thomas starts with the directional derivative — which is how the function height(north, east) changes in the direction of a vector whose absolute value is 1. That’s the definition — to get something you can actually calculate, you need to know the chain rule, and how to put a path on the topo map. The derivative of the real valued function in the direction of a unit vector turns out to be the dot product of the gradient vector and any vector at that point whose absolute value is 1. The unit vector can point any direction but the value of the derivative (the dot product) will be greatest when the unit vector points in the direction of the gradient vector. That’s where the magic comes from. If you’re slightly shaky on linear algebra, vectors and dot products — here’s a (hopefully explanatory) link to some basic linear algebra — This is the first in a series — just follow the links.

The discussion of Lagrange multipliers (which is essentially the relation between two gradients — one of a function, the other of a constraint in Dill pp.68 -> 72 is only fair, and I did a lot more work to understand it (which can’t be reproduced here).

For an excellent discussion of wedge product and why the volume multiplier in an integral must be the determinant of the Jacobian — see Callahan Advanced Calculus p. 41 and exercise 2.15 p. 61, the latter being the most important. It explains why things work this way in 2 dimensions. The exercise takes you through the derivation step by step asking you to fill in some fairly easy dots. Even better is  exercise 2.34 on p. 67 which proves the same thing for any collection of n independent vectors in R^n.

The Jacobian is just the determinant of a square matrix, something familiar from linear algebra. The numbers are just the coefficients of the vectors at a given point. But in integrals we’re changing dx and dy to something else — dr and dTheta when you go to polar coordinates. Why a matrix here? Because if differential calculus is about anything it is about linearization of nonLinear functions, which is why you can use a matrix of derivatives (the Jacobian matrix)  for dx and dy.

Why is this important for statistical mechanics. Because one of the integrals you must evaluate is of exp(-ax^2) from -infinity to + infinity, and the switch to polar coordinates is the way to do it. You also must evaluate integrals of this type to understand the kinetic theory of ideal gases.

Not necessary in this context, but one of the best discussions of the derivative in its geometric context I’ve ever seen is on pp. 105 –> 106 of Callahan’s bok

So these are some pointers and hints, not a full discussion — I hope it makes the road easier for you, should you choose to take it.


A book recommendation, not a review

My first encounter with a topology textbook was not a happy one. I was in grad school knowing I’d leave in a few months to start med school and with plenty of time on my hands and enough money to do what I wanted. I’d always liked math and had taken calculus, including advanced and differential equations in college. Grad school and quantum mechanics meant more differential equations, series solutions of same, matrices, eigenvectors and eigenvalues, etc. etc. I liked the stuff. So I’d heard topology was cool — Mobius strips, Klein bottles, wormholes (from John Wheeler) etc. etc.

So I opened a topology book to find on page 1

A topology is a set with certain selected subsets called open sets satisfying two conditions
l. The union of any number of open sets is an open set
2. The intersection of a finite number of open sets is an open set

Say what?

In an effort to help, on page two the book provided another definition

A topology is a set with certain selected subsets called closed sets satisfying two conditions
l. The union of a finite number number of closed sets is a closed set
2. The intersection of any number of closed sets is a closed set

Ghastly. No motivation. No idea where the definitions came from or how they could be applied.

Which brings me to ‘An Introduction to Algebraic Topology” by Andrew H. Wallace. I recommend it highly, even though algebraic topology is just a branch of topology and fairly specialized at that.


Because in a wonderful, leisurely and discursive fashion, he starts out with the intuitive concept of nearness, applying it to to classic analytic geometry of the plane. He then moves on to continuous functions from one plane to another explaining why they must preserve nearness. Then he abstracts what nearness must mean in terms of the classic pythagorean distance function. Topological spaces are first defined in terms of nearness and neighborhoods, and only after 18 pages does he define open sets in terms of neighborhoods. It’s a wonderful exposition, explaining why open sets must have the properties they have. He doesn’t even get to algebraic topology until p. 62, explaining point set topological notions such as connectedness, compactness, homeomorphisms etc. etc. along the way.

This is a recommendation not a review because, I’ve not read the whole thing. But it’s a great explanation for why the definitions in topology must be the way they are.

It won’t set you back much — I paid. $12.95 for the Dover edition (not sure when).

Why some of us gamble

If you are one of the hapless schlubs who bought a Powerball ticket or two and didn’t win (like me), modern neuroscience can tell you why (but not without a bit of pompous terminology). They call a small chance of winning large amounts — e.g. Powerball along with a large chance of losing a little a positively skewed gamble. Impressed? I’m not.

Nonetheless [ Neuron vol. 89 pp. 63 – 69 ’16 ] is a very interesting paper. Functional Magnetic Resonance Imaging (fMRI – has shown that increased blood flow in one area of the brain (the nucleus accumbent sept) predicts risk seeking choices, while increased blood flow in another (the anterior insula) predicts aversion to risk. The unproven assumption behind fMRI is that increased blood flow is due to increased neural activity.

The neurochemistry of the two regions is quite different. The accumbens gets dopamine input while the insula gets norepinephrine projections.

BTW the insula is unique to man. Our cortex has grown so much (particularly in the frontal region) that it folds over on itself burying the insular cortex —

We are now actually able to measure axon bundles (white matter fiber tracts) in the living brain, using something called diffusion weighted MRI. By and large, fiber tracts tend to have the fibers parallel and running in the same direction. It is far easier for water to flow along parallel fibers than across them, and this is what the technique measures. For the mathematically inclined, what is actually measured is a tensor field, because at any given point in the brain it varies in direction, unlike a vector field which points just one way at a given point (the mathematically inclined know that this is a simplification because vectors are actually a type of tensor).

At any rate, the work used diffusion wieghted MRI to study the white matter tracts connecting the two areas. The larger the tract between insula and accumbens was the more risk averse an individual was. The implication being that a larger tract is a better connection between the two. So your nucleus is your impulsive child and the anterior insula is your mother telling you not to do anything crazy.

Fascinating, but like all this stuff it needs to be replicated, as it probably confirms the original idea for the study.

Two Christmas Presents

Two Christmas presents for you.  Yes Christmas presents.  I refuse to be culturally castrated by the professionally aggrieved.

The first is a link to a great scientific website — It’s primarily about math and physics, with some biology thrown in. Imagine the News and Views section of Nature or the Perspectives section of Science on steroids.

Quanta is an editorially independent division of the Simons Foundation. And what is that you enquire? It is the answer to “If you’re so smart, why ain’t you rich”. Jim Simons is both much smarter and much richer than you and I. You can read more about him in a book I’m about to review on the blog — “The Physics of Wall Street”

Simons was a very accomplished mathematician winning prizes with a friend James Ax in the 60’s and 70’s — not quite the Fields Medal but up there. The Simons Chern 3 form is part of string theory. The two founded Renaissance Technologies in the late 80’s a stock fund using mathematical techniques to beat the market. And beat it they did, averaging 40% a year (after fees which were hefty). Even in the most recent market blowout in 2008 they were up 80% for the year. The firm employs about 200 people, mostly mathematicians and physicists. It was described by an MIT math prof as ‘the best mathematics and physics department in the world”.

At any rate after becoming a multibillionaire, Simons established his foundation, of which Quanta is a small part. It’s very good, with some heavies writing for it — such as Ingrid Daubechies full prof of math at Princeton who did a good deal of the early work on wavelets.

I haven’t read it all but the math is incredible, mostly about the latest and greatest new results and why it is important placing it in context. Physics isn’t forgotten, and the lead article concerns the philosophy of science and how it’s a’changin’ a la string theory, which is light years away from an experimental test of any of it.

Your second Christmas present is a Joke

The pope visited Colorado 22 years ago. A little known fact about him is that he loved to drive. Although Colorado is famed for the Rockies, the eastern half is high plains, so flat that you can see Pike’s peak from 100 miles away across the plains. At any rate the pope was being driven by his chauffeur from Colorado Springs to Denver on the Interstate, when the pope asked if he could drive. “Only if we go out on the plains where no one will see you” said the chauffeur.

So they switched when they got about 30 miles out in the middle of nowhere with the pope driving and the chauffeur in the back seat both behind tinted opaque windows. The pope started driving, really enjoying it, going faster and faster. He got up to 85 when a state trooper pulled them over.

Where’s the fire saith the trooper. He blanched when the driver’s window came down and he saw who was driving, and called headquarters. Arrest him came the answer. The trooper said I’m not sure, this guy is very big. I don’t care how big he is, arrest him. Are you sure. Yes.

I dunno boss, this guy is so big he’s got the pope driving for him.

Merry Christmas and Happy New Year to all

Are you sure you know everything your protein is up to?

Just because you know one function of a protein doesn’t mean you know them all. A recent excellent review of the (drumroll) executioner caspases [ Neuron vol. 88 pp. 461 – 474 ’15 ] brings this to mind. Caspases control a form of cell death called apoptosis, in which a cell goes gently into the good night without causing a fuss (particularly inflammation and alerting the immune system that something bad killed it). They are enzymes which chop up other proteins and cause the activation of other proteins which chop up DNA. They cause the inner leaflet of the plasma membrane to expose itself (particularly phosphatidyl serine which tells nearby scavenger cells to ‘eat me’).

The answer to the mathematical puzzle in the previous post will be found at the end of this one.

In addition to containing an excellent review of the various steps turning caspases on and off, the review talks about all the things activated caspases do in the nervous system without killing the neuron containing them. Among them are neurite outgrowth and regeneration of peripheral nerve axons after transection. Well that’s pathology, but one executioner caspase (caspase3) is involved in the millisecond to millisecond functioning of the nervous system — e.g. long term depression of neurons (LTD), something quite important to learning.

Of course, such potentially lethal activity must be under tight control, and there are 8 inhibitors of apoptosis (IAPs) of which 3 bind the executioners. We also have inhibitors of IAPs (SMAC, HTRA2) — wheels within wheels.

Are there any other examples where a protein discovered by one of its functions turns out to have others. Absolutely. One example is cytochrome c, which was found as it shuttles electrons to complex IVin the electron transport chain of mitochondria.Certainly a crucial function. However, when the mitochondria stops functioning either because it is told to or something bad happens, cytochrome c is released from mitochondria into the cytoplasm where it then activates caspase3, one of the executioner caspases.

Here’s another. Enzymes which hook amino acids onto tRNA are called tRNA synthases (aaRs for some reason). However one of the (called EPRS) when phosphorylated due to interferon gamma activity, became part of a complex of proteins which silences specific genes (translation — stops the gene from being transcribed) involved in the inflammatory response.

Yet another tRNA synthase, when released from the cell triggers an inflammatory response.

Naturally molecular biologists have invented a fancy word for the process of evolving a completely different function for a molecule — exaptation (to contrast it with adaptation).

Note the word molecule — exaptation isn’t confined to proteins. [ Cell vol. 160 pp. 554 – 566 ’15 ] Discusses exaptation as something which happens to promoters and enhancers. This work looked at the promoters and enhancers active in the liver in 20 mammalian species — all the enhancers were rapidly evolving.


Answer to the mathematical puzzle of the previous post. R is the set of 4 straight lines bounding a square centered at (0,0)

Here’s why proving it has an inside and an outside isn’t enough to prove the Jordan Curve Theorem

No. The argument for R uses its geometry (the boundary is made of straight
line segments). The problem is that an embedding f: S^1 -> R^2 may be
convoluted, say something of the the Hilbert curve sort.

An incorrect proof of the Jordan Curve Theorem – can you find what’s wrong with it?

Every closed curve in an infinite flat plane divides it into a bounded part and an unbounded part (inside and and outside if you’re not particular). This is so screamingly obvious, that for a long time no one thought it needed proof. Bolzano changed all that about 200 years ago, but a proof was not forthcoming until Jordan gave a proof (thought by most to be defective) in 1887.

The proof is long and subtle. The one I’ve read uses the Brouwer fixed point theorem, which itself uses the fact that fundamental group of a circle is infinite cyclic (and that’s just for openers). You begin to get the idea.

Imagine the 4 points (1,1),(1,-1),(-1,1) and (-1,1) the vertices of a square centered at ( 0, 0 ). Now connect the vertices by straight lines (no diagonals) and you have the border of the square (call it R).

We’re already several pages into the proof, when the author makes the statement that R “splits R^2 (the plane) into two components.”

It seemed to me that this is exactly what the Jordan Curve theorem is trying to prove. I wrote the author saying ‘why not claim victory and go home?.

I got the following back

“It is obvious that the ‘interior’ of a rectangle R is path connected. It is
only a bit less obvious – but still very easy – to show that the ‘exterior’
of R is also connected. The rest of the claim is to show that every path
alpha from a point alpha(O)=P inside the rectangle R to a point alpha(1)=Q
out of it must cross the boundary of R. The set of numbers S={alpha(i) :
alpha(k) is in interior(R) for every k≤i} is not empty (0 is there), and it
is bounded from above by 1. So j=supS exists. Then, since the exterior and
the interior of R are open, j must be on the boundary of R. So, the interior
and the exterior are separate components of R^2 \ R. So, there are two of

Well the rectangle is topologically equivalent (homeomorphic) to a circle.

So why isn’t this enough?  It isn’t ! !

Answer to follow in the next post. Here’s the link — go to the end of the post —

Time to get busy

Well I asked for it (the answer sheets to my classmate’s book on general relativity). It came today all 347 pages of it + a small appendix “Light Orbits in the Schwarzschild Geometry”. It’s one of the few times the old school tie has actually been of some use. The real advantages of going to an elite school are (1) the education you can get if you want (2) the people you meet back then or subsequently. WRT #1 — the late 50s was the era of the “Gentleman’s C”.

It should be fun. The book is the exact opposite of the one I’d been working on which put the math front and center. This one puts the physics first and the math later on. I’m glad I’m reading it second because as an undergraduate and graduate student I became adept at mouthing mathematical incantations without really understanding what was going on. I think most of my math now is reasonably solid. I did make a lot of detours I probably didn’t need to make — manifold theory,some serious topology — but that was fun as well.

When you’re out there away from University studying on your own, you assume everything you don’t understand is due to your stupidity. This isn’t always the case (although it usually is), and I’ve found errors in just about every book I’ve studied hard, and my name features on errata web pages of most of them. For one example see

The many ways the many tensor notations can confuse you

This post is for the hardy autodictats attempting to learn tensors on their own. If you use multiple sources, you’ll find that they define the same terms used to describe tensors in diametrically opposed ways, so that just when you thought you knew what terms like covariant and contravariant tensor meant,  another source defines them completely differently, leading you to wonder (1) about your intelligence (2) your sanity.

Tensors involve vector spaces and their bases. This post assumes you know what they are. If you don’t understand how a vector can be expressed in terms of coordinates relative to a basis, pick up any book on linear algebra.

Tensors can be defined by the way their elements transform under a change of coordinate basis. This is where the terms covariant and contravariant come from. By the way when Einstein says that physical quantities must transform covariantly, he means they transform like tensors do (even contravariant tensors).

True enough, but this approach doesn’t help you understand the term tensor product or the weird ® notation (where there is an x within the circle) used to describe it.

The best way to view tensors (from a notational point of view) is to look on them as functions which take finite Cartesian products ( of vectors and covectors and produce a single real number.

To understand what a covector (aka dual vector) is, you must understand the inner product (aka dot product).

The definition of inner product (dot product) of a vector V with itself written < V | V >, probably came from the notion of vector length. Given the standard basis in two dimensional space E1 = (1,0) and E2 = (0,1) all vectors V can be written as x * E1 + y * E2 (x is known as the coefficient of E1). Vector length is given by the good old Pythagorean theorem as SQRT[ x^2 + y^2]. The dot product (inner product) is just x^2 + y^2 without the square root.

In 3 dimensions the distance of a point (x, y, z) from the origin is SQRT [x^2 + y^2 + z^2]. The definition of vector length (or distance) easily extends (by analogy) to n dimensions where the length of V is SQRT[x1^2 + x2^2 + . . . . + xn^2] and the dot product is x1^2 + x2^2 + . . . . + xn^2. Length is always a non-negative real number.

The definition of inner product also extends to the the dot product of two different vectors V and W where V = v1 * E1 + v2 * E2 + . … vn * En, W = w1 * E1 + . . + wn * En — e.g. < V | W >  = v1 * w1 + v2 * w2 + . . . + vn * wn. Again always a real number, but not always positive as any of the v’s and w’s can be negative.

So, if you hold W constant you can regard it as a function on the vector space in which V and W reside which takes any V and produces a real number. You can regard V the same way if you hold it constant.

Now with some of the complications which mathematicians love, you can regard the set of functions { W } operating on a vector space, as a vector space itself. Functions can be added (by their results) and can be multiplied by a real number (a scalar). The set of functions { W } regarded as a vector space is called the dual vector space.

Well if { W } along with function addition and scalar multiplication is a vector space, it must have a basis. Everything I’ve every read about tensors  involves finite dimensional vector spaces. So assume the vector space A is n dimensional where n is an positive integer, and call its basis vectors the ordered set a1, . . . , an. The dual vector space (call it B) is also n dimensional with another basis the ordered set b1, . . . , bn.

The bi are chosen so that their dot product with elements of A’s basis = Kronecker delta, e.g. if i = j then  < bi | aj >
= 1. If i doesn’t equal j  then < bi | aj >  = 0. This can be done by a long and horrible process (back in the day before computer algebra systems) called Gram Schmidt orthonormalization. Assume this can be done. If you’re a true masochist have a look at–Schmidt_process.

Notice what we have here. Any particular element of the dual space B (a real valued function operating on A) call it f can be written down as f1 * b1 + . . . + fn * bn. It will take any vector in A (written g1 * a1 + . . . + gn * an) and give you f1 * g1 + . . . + fn * gn which is a real number. Basically any element ( say bj) of the basis of dual space B just looks at a vector in A and picks out the coefficient of aj (when it forms the dot product with the vector in A.

Now (at long last) we can begin to look at the contrary way tensors are described. The most fruitful way is to look at them as the product of individual dot products between a vector and a dual vector.

Have a look at — To summarize  — the whole point of tensor use in physics is that they describe physical quantities which are ‘out there’ independently of the coordinates used to describe them. A hot dog has a certain length independently of its description in inches or centimeters. Change your viewpoint and the its coordinates in space will change as well (the hot dog doesn’t care about this). Tensors are a way to accomplish this.

It’s to good to pass up, but the length of the hot dog stays the same no matter how many times you (non invasively) measure it.  This is completely different than the situations in quantum mechanics, and is one of the reasons that quantum mechanics has never been unified with general relativity (which is a theory of gravity based on tensors).

Remember the dot product concerns  < dual vector — V | vector — W > . If you change the basis of vector  W (so vector W has different coordinates) the basis of dual vector   V must also change (to keep the dot product the same). A choice must be made as to which of the two concurrent basis changes is fundamental (actually neither is as they both are).

Mathematics has chosen the basis of vector W in as fundamental.

When you change the basis of W, the coefficients of W must change in the opposite way (to keep the vector length constant). The coefficients of W are said to change contravariantly. What about the coefficients of V? The basis of V changes oppositely to the basis of W (e.g. contravariantly), so the coefficients of V must change differently from this e.g. the same way the basis of W changes — e.g. covariantly. Confused?  Nonetheless, that’s the way they are named

Vectors and convectors and other mathematical entities such differentials, metrics and gradients are labelled as covariant or contravariant by the way their numerical coefficients change with a change in basis.

So the coefficients of vector W transform contravariantly, and the coefficients of dual vector V transform covariantly. This is true even though the coefficients of V and W always transform contravariantly (e. g. oppositely) to the way their basis transforms.

An immense source of confusion.

As mentioned above, one can regard vectors and dual vectors as real valued functions on elements of a vector space. So (adding to the confusion) vectors and dual vectors are both tensors. Vectors are contravariant tensors, and dual vectors are covariant tensors.

Now we form Cartesian products of vectors W (now called V) and convectors V (hereafter called V* to keep them straight).

We get something like this V x V x V x V* x V*, a cartesian product of 3 contravariant vectors and 2 dual vectors.

To get a real number out of them we form the tensor product V* ® V* ® V* ® V ® V, where the first V* operates on the first V to produce a real number, the second operates . . . and the last V* operates on the last V to produce a real number. All real numbers produced are multiplied together to produce the result.

Why not just call  V* ® V* ® V* ® V ® V a product? Well each V and V* is an n dimensional vector space, and the tensor V ® V is a n^2 dimensional space (and  V* ® V* ® V* ® V ® V is an n^5 dimensional vector space). When we form the product of two numbers (real or complex) we just get another number of the same species (real or complex). The tensor product of two n dimensional vector spaces is not another n dimensional space, hence the need for the adjective modifying the name product. The dot product nomenclature is much the same, the dot product of two vectors is not another vector, but a real number.

Here is yet another source of confusion. What we really have is a tensor product V* ® V* ® V* ® V ® V operating on a Cartesian product of vectors and covectors (tensors themselves) V x V x V x V* x V* to produce a real number.

Tensors can either be named by their operands making this a 3 contravariant 2 covariant tensor — (3, 2) tensor.

Other books name them by their operator (e.g. the tensor product) making it a 3 covariant 3 contravariant tensor (a 2, 3) tensor.

If you don’t get this settled when you switch books you’ll think you don’t really understand what contravariant and covariant mean (when in fact you do). Mercifully, one constancy in notation (thankfully) is that the contravariant number always comes first (or on top) and the covariant number second (or on bottom).

Hopefully this is helpful.  I wish I’d had this spelled out when I started.


Get every new post delivered to your Inbox.

Join 91 other followers