Category Archives: Math

Logically correct operationally horrible

A med school classmate who graduated from the University of Chicago was fond of saying — that’s how it works in practice, but how does it work in theory?

Exactly the opposite happened when I had to do some programming. It shows the exact difference between computer science and mathematics.

Basically I had to read a large textEdit file (minimum 2 megaBytes, Maximum 8) into a Filemaker Table and do something similar 15 times. The files ranged in size from 20,000 to 70,000 lines (each delimited by a carriage return). They needed to be broken up into 1000 records.

Each record began with “Begin Card Xnnnn” and ended with “End Card Xnnn” so it was easy to see where each of the 1000 cards began and ended. So a program was written to
l. look for “Begin Card Xnnn”
2. count the number of lines until “End Card Xnnn” was found
3. Open a new record in Filemaker
4. Put the data from Card Xnnnn into a field of the record
5. Repeat 1000 times.

Before I started, I checked the program out with smaller sized Files with 1, 5, 10, 50, 100, 200, 500 Cards.

The first program used a variable called “lineCounter” which just pointed to the line being processed. As each line was read, the pointer was advanced.

It was clear that the runTime was seriously nonLinear 10 cards took more than twice the time that 5 cards did. Even worse the more cards in the file the worse things got, so that 1000 cards took over an hour.

Although the logic of using an advancing pointer to select and retrieve lines was impeccable, the implementation was not.

You’ve really not been given enough information to figure out what went wrong but give it a shot before reading further.

I was thinking of the LineCounter variable as a memory pointer (which it was), similar to memory pointers in C.

But it wasn’t — to get to line 25,342, the high level command in Filemaker — MiddleValues (Text; Starting_Line ; Number_of_Lines_to_get) had to start at the beginning of the file, examine each character for a character return, keep a running count of character returns, and stop after 25,342 lines had been counted.

So what happened to run time?

Assume the LinePointer had to read each line (not exactly true, but close enough).

Given n lines in the file — that’s the sum of 1 to n — which turns out to be n^2 + n. (Derivation at the end)

So when there were 2*n lines in the file, the runtime went up by over 4 times (exactly 2^2 * n^2 + 2n)

So run times scaled in a polynomial fashion k * n lines would scale as k^2 * n^2 + k * n

At least it wasn’t exponential time which would have scaled as 2^k

How to solve the problem ?

Think about it before reading further

The trick was to start at the first lines in the file, get one card and then throw those lines away, starting over at the top each time. The speed up was impressive.

It really shows the difference between math and computer science. Both use logic, but computer science uses more

Derivation of sum of 1 to n.

Consider a square n little squares on a side. The total number of squares is n^2. Throw away the diagonal giving n^2 – n. The number of squares left is twice the sum of 1 to n – 1. So divide n^2 – n by 2, and add back n giving you n^2 + n

Entangled points

The terms Limit point, Cluster point, Accumulation point don’t really match the concept point set topology is trying to capture.

As usual, the motivation for any topological concept (including this one) lies in the real numbers.

1 is a limit point of the open interval (0, 1) of real numbers. Any open interval containing 1 also contains elements of (0, 1). 1 is entangled with the set (0, 1) given the usual topology of the real line.

What is the usual topology of the real line? (E.g. how are its open sets defined) It’s the set of open intervals) and their infinite unions and their finite intersection.

In this topology no open set can separate 1 from the set ( 0, 1) — e.g. they are entangled.

So call 1 an entangled point.This way of thinking allows you to think of open sets as separators of points from sets.

Hausdorff thought this way, when he described the separation axioms (TrennungsAxioms) describing points and sets that open sets could and could not separate.

The most useful collection of open sets satisfy Trennungsaxiom #2 — giving a Hausdorff topological space. There are enough of them so that every two distinct points are contained in two distinct disjoint open sets.

Thinking of limit points as entangled points gives you a more coherent way to think of continuous functions between topological spaces. They never separate a set and any of its entangled points in the domain when they map them to the target space. At least to me, this is far more satisfactory (and actually equivalent) to continuity than the usual definition; the inverse of an open set in the target space is an open set in the domain.

Clarity of thought and ease of implementation are two very different things. It is much easier to prove/disprove that a function is continuous using the usual definition than using the preservation of entangled points.

Organic chemistry could certainly use some better nomenclature. Why not call an SN1 reaction (Substitution Nucleophilic 1) SN-pancake — as the 4 carbons left after the bond is broken form a plane. Even better SN2 should be called SN-umbrella, as it is exactly like an umbrella turning inside out in the wind.

Norbert Weiner

In the Cambridge Mass of the early 60’s the name Norbert Weiner was spoken in hushed tones. Widely regarded as a genius tutti the assembled genii of Cambridge, that was all I knew about him aside from the fact that he got a bachelor’s degree in math from Tufts at age 14. As a high school student I tried to read Cybernetics, a widely respected book he wrote in 1948, and found it incomprehensible.

Surprisingly, his name never came up again in any undergraduate math courses, graduate chemistry and physics courses, extensive readings on programming and computation (until now).

From PNAS vol. 114 pp. 1281 – 1286 ’17 –“In their seminal theoretical work, Norbert Wiener and Arturo Rosenblueth showed in 1946 that the self-sustained activity in the cardiac muscle can be associated with an excitation wave rotating around an obstacle. This mechanism has since been very successfully applied to the understanding of the generation and control of malignant electrical activity in the heart. It is also well known that self-sustained excitation waves, called spirals, can exist in homogeneous excitable media. It has been demonstrated that spirals rotating within a homogeneous medium or anchored at an obstacle are generically expected for any excitable medium.”

That sounds a lot like atrial fibrillation, a serious risk factor for strokes, and something I dealt with all the time as a neurologist. Any theoretical input about what to do for it would be welcome.

A technique has been developed to cure the arrhythmia. Here it is. “Recently, an atrial defibrillation procedure was clinically introduced that locates the spiral core region by detecting the phase-change point trajectories of the electrophysiological wave field and then, by ablating that region, restores sinus rhythm.” The technique is now widely used, and one university hospital (Ohio State) says that they are doing over 600 per year.

“This is clearly at odds with the Wiener–Rosenblueth mechanism because a further destruction of the tissue near the spiral core should not improve the situation.” It’s worse than that because the summary says “In the case of a functionally determined core, an ablation procedure should even further stabilize the rotating wave”

So theory was happily (for the patients) ignored. Theorists never give up and the paper goes on to propose a mechanism explaining why the Ohio State procedure should work. Here’s what they say.

“Here, we show theoretically that fundamentally in any excitable medium a region with a propagation velocity faster than its surrounding can act as a nucleation center for reentry and can anchor an induced spiral wave. Our findings demonstrate a mechanistic underpinning for the recently developed ablation procedure.”

It certainly has the ring of post hoc propter hoc about it.

The strangeness of mathematical proof

I’ver written about Urysohn’s Lemma before and a copy of that post will be found at the end. I decided to plow through the proof since coming up with it is regarded by Munkres (the author of a widely used book on topology) as very creative. Here’s how he introduces it

“Now we come to the first deep theorem of the book,. a theorem that is commonly called the “Urysohn lemma”. . . . It is the crucial tool used in proving a number of important theorems. . . . Why do we call the Urysohn lemma a ‘deep’ theorem? Because its proof involves a really original idea, which the previous proofs did not. Perhaps we can explain what we mean this way: By and large, one would expect that if one went through this book and deleted all the proofs we have given up to now and then handed the book to a bright student who had not studied topology, that student ought to be able to go through the book and work out the proofs independently. (It would take a good deal of time and effort, of course, and one would not expect the student to handle the trickier examples.) But the Uyrsohn lemma is on a different level. It would take considerably more originality than most of us possess to prove this lemma.”

I’m not going to present the proof just comment on one of the tools used to prove it. This is a list of all the rational numbers found in the interval from 0 to 1, with no repeats.

Munkres gives the list at its start and you can see why it would list all the rational numbers. Here it is

0, 1, 1/2, 1/3, 2/3, 1/4, 3/4, 1/5 . . .

Note that 2/4 is missing (because 2 divides into 4 leaving a whole number). It would be fairly easy to write a program to produce the list, but a computer running the program would never stop. In addition it would be slow, because to avoid repeats given a denominator n, it would include 1/n and n-1/n in the list, but to rule out repeats it would have to perform n-2 divisions. It it had a way of knowing if a number was prime it could just put in 1/prime, 2/prime , , , (prime -1)/n without the division. But although there are lists of primes for small integers, there is no general way to find them, so brute force is required. So for 10^n, that means 10^n – 2 divisions. Once the numbers get truly large, there isn’t enough matter in the universe to represent them, nor is there enough time since the big bang to do the calculations.

However, the proof proceeds blithely on after showing the list — this is where the strangeness comes in. It basically uses the complete list of rational numbers as indexes for the infinite number of open sets to be found in a normal topological space. The proof below refers to the assumption of infinite divisibility of space (inherent in the theorem on normal topological spaces), something totally impossible physically.

So we’re in the never to be seen land of completed infinities (of time, space, numbers of operations). It’s remarkable that this stuff applies to the world we inhibit, but it does, and anyone wishing to understand physics at a deep level must come to grips with mathematics at this level.

Here’s the old post

Urysohn’s Lemma

The above quote is from one of the standard topology texts for undergraduates (or perhaps the standard text) by James R. Munkres of MIT. It appears on page 207 of 514 pages of text. Lee’s text book on Topological Manifolds gets to it on p. 112 (of 405). For why I’m reading Lee see

Well it is a great theorem, and the proof is ingenious, and understanding it gives you a sense of triumph that you actually did it, and a sense of awe about Urysohn, a Russian mathematician who died at 26. Understanding Urysohn is an esthetic experience, like a Dvorak trio or a clever organic synthesis [ Nature vol. 489 pp. 278 – 281 ’12 ].

Clearly, you have to have a fair amount of topology under your belt before you can even tackle it, but I’m not even going to state or prove the theorem. It does bring up some general philosophical points about math and its relation to reality (e.g. the physical world we live in and what we currently know about it).

I’ve talked about the large number of extremely precise definitions to be found in math (particularly topology). Actually what topology is about, is space, and what it means for objects to be near each other in space. Well, physics does that too, but it uses numbers — topology tries to get beyond numbers, and although precise, the 202 definitions I’ve written down as I’ve gone through Lee to this point don’t mention them for the most part.

Essentially topology reasons about our concept of space qualitatively, rather than quantitatively. In this, it resembles philosophy which uses a similar sort of qualitative reasoning to get at what are basically rather nebulous concepts — knowledge, truth, reality. As a neurologist, I can tell you that half the cranial nerves, and probably half our brains are involved with vision, so we automatically have a concept of space (and a very sophisticated one at that). Topologists are mental Lilliputians trying to tack down the giant Gulliver which is our conception of space with definitions, theorems, lemmas etc. etc.

Well one form of space anyway. Urysohn talks about normal spaces. Just think of a closed set as a Russian Doll with a bright shiny surface. Remove the surface, and you have a rather beat up Russian doll — this is an open set. When you open a Russian doll, there’s another one inside (smaller but still a Russian doll). What a normal space permits you to do (by its very definition), is insert a complete Russian doll of intermediate size, between any two Dolls.

This all sounds quite innocent until you realize that between any two Russian dolls an infinite number of concentric Russian dolls can be inserted. Where did they get a weird idea like this? From the number system of course. Between any two distinct rational numbers p/q and r/s where p, q, r and s are whole numbers, you can always insert a new one halfway between. This is where the infinite regress comes from.

For mathematics (and particularly for calculus) even this isn’t enough. The square root of two isn’t a rational number (one of the great Euclid proofs), but you can get as close to it as you wish using rational numbers. So there are an infinite number of non-rational numbers between any two rational numbers. In fact that’s how non-rational numbers (aka real numbers) are defined — essentially by fiat, that any series of real numbers bounded above has a greatest number (think 1, 1.4, 1.41, 1.414, defining the square root of 2).

What does this skullduggery have to do with space? It says essentially that space is infinitely divisible, and that you can always slice and dice it as finely as you wish. This is the calculus of Newton and the relativity of Einstein. It clearly is right, or we wouldn’t have GPS systems (which actually require a relativistic correction).

But it’s clearly wrong as any chemist knows. Matter isn’t infinitely divisible, Just go down 10 orders of magnitude from the visible and you get the hydrogen atom, which can’t be split into smaller and smaller hydrogen atoms (although it can be split).

It’s also clearly wrong as far as quantum mechanics goes — while space might not be quantized, there is no reasonable way to keep chopping it up once you get down to the elementary particle level. You can’t know where they are and where they are going exactly at the same time.

This is exactly one of the great unsolved problems of physics — bringing relativity, with it’s infinitely divisible space together with quantum mechanics, where the very meaning of space becomes somewhat blurry (if you can’t know exactly where anything is).

Interesting isn’t it?

Tensors yet again

In the grad school course on abstract algebra I audited a decade or so ago, the instructor began the discussion about tensors by saying they were the hardest thing in mathematics. Unfortunately I had to drop this section of the course due a family illness. I’ve written about tensors before and their baffling notation and nomenclature. The following is yet another way to look at them which may help with their confusing terminology

First, this post will assume you have a significant familiarity with linear algebra. I’ve written a series of posts on the subject if you need a brush up — pretty basic — here’s a link to the first post —
All of them can be found here —

Here’s another attempt to explain them — which will give you the background on dual vectors you’ll need for this post —

To the physicist, tensors really represent a philosophical position — e.g. there are shapes and processes external to us which are real, and independent of the way we choose to describe them mathematically. E. g. describing them by locating their various parts and physical extents in some sort of coordinate system. That approach is described here —

Zee in one of his books defines tensors as something that transforms like a tensor (honest to god). Neuenschwander in his book says “What kind of a definition is that supposed to be, that doesn’t tell you what it is that is changing.”

The following approach may help — it’s from an excellent book which I’ve not completely gotten through — “An Introduction to Tensors and Group Theory for Physicists” by Nadir Jeevanjee.

He says that tensors are just functions that take a bunch of vectors and return a number (either real or complex). It’s a good idea to keep the volume tensor (which takes 3 vectors and returns a real number) in mind while reading further. The tensor function just has one other constraint — it must be multilinear ( Amazingly, it turns out that this is all you need.

Tensors are named by the number of vectors (written V) and dual vectors (written V*) they massage to produce the number. This is fairly weird when you think of it. We don’t name sin (x) by x because this wouldn’t distinguish it from the zillion other real valued functions of a single variable.

So an (r, s) tensor is named by the ordered array of its operands — (V, …V,V*, …,V*) with r V’s first and s V* next in the array. The array tells you what the tensor function must be.

How can Jeevanjee get away with this? Amazingly, multilinearity is all you need. Recall that the great thing about the linearity of any function or operator on a vector space is that ALL you need to know is what the function or operator does to the basis vectors of the space. The effect on ANY vector in the vector space then follows by linearity.

Going back to the volume tensor whose operand is (V, V, V) and the vector space for all 3 V’s (R^3), how many basis vectors are there for V x V x V ? There are 3 for each V meaning that there are 3^3 = 27 possible basis vectors. You probably remember the formula for the volume enclosed by 3 vectors (call them u, v, w). The 3 components of u are u1 u2 and u3.

The volume tensor calculates volume by ( U crossproduct V ) dot product W.
Writing the calculation out

Volume = u1*v2*w3 – u1*v3*w2 + u2*v3*w1 – u2*v1*w3 + u3*v1*w2 – u3*v2*w1. What about the other 21 combinations of basis vectors? They are all zero, but they are all present in the tensor.

While any tensor manipulating two vectors can be expressed as a square matrix, clearly the volume tensor with 27 components can not be. So don’t confuse tensors with matrices (as I did).

Note that the formula for volume implicitly used the usual standard orthogonal coordinates for R^3. What would it be in spherical coordinates? You’d have to use a change of basis matrix to (r, theta, phi). Actually you’d have to have 3 of them, as basis vectors in V x V x V are 3 places arrays. This gives the horrible subscript and superscript notation of matrices by which tensors are usually defined. So rather than memorizing how tensors transform you can derive things like

T_i’^j’ = (A^k_i’)*(A^k_i’) * T_k^l where _ before a letter means subscript and ^ before a letter means superscript and A^k_i’ and A^k_i’ are change of basis matrices and the Einstein summation convention is used. Note that the chance of basis formula for tensor components for the volume tensor would have 3 such matrices, not two as I’ve shown.

One further point. You can regard a dual vector as a function that takes a vector and returns a number — so a dual vector is a (1,0) tensor. Similarly you can regard vectors as functions that take dual vectors and returns a number, so they are (0,1) tensors. So, actually vectors and dual vectors are tensors as well.

The distinction between describing what a tensor does (e.g. its function) and what its operands actually are caused me endless confusion. You write a tensor operating on a dual vector as a (0, 1) tensor, but a dual vector is a (1,0) considered as a function.

None of this discussion applies to the tensor product, which is an entirely different (but similar) story.

Hopefully this helps

Spot the flaw

Mathematical talent varies widely. It was a humbling thing a few years ago to sit in an upper level college math class on Abstract Algebra with a 16 year old high school student taking the course, listening with one ear while he did his German homework. He was later a double summa in both math and physics at Yale. So do mathematicians think differently? A fascinating paper implies that they use different parts of their brain doing math than when not doing it. The paper has one methodological flaw — see if you can find it.

[ Proc. Natl. Acad. Sci. vol. 113 pp. 4887 – 4889, 4909 – 4917 ’16 ] 15 expert mathematicians and 15 nonMathematicians with comparable academic qualifications were studied (3 literature, 3 history 1 philosophy 2 linguistics, 1 antiquity, 3 graphic arts and theater 1 communcation, 1 heritage conservation — fortunately no feminist studies). They had to listen to mathematical and nonMathematical statements and decide true false or meaningless. The nonMathematical statements referred to general knowledge of nature and history. All this while they were embedded in a Magnetic Resonance Imager, so that functional MRI (fMRI) could be performed.

In mathematicians there was no overlap of the math responsive network (e.g. the areas of the brain activated by doing math) with the areas activated by sentence comprehension and general semantic knowledge.

The same brain networks were activated by all types of mathematical statement (geometry, analysis, algebra and topology) as opposed to nonMathematical statement. The areas activated were the dorsal parietal, ventrolateral temporal and bilateral frontal. This was only present in the expert mathematicians (and only to mathematical statements) These areas are outside those associated with language (inferior frontal gyrus of the left hemisphere). The activated areas are also involved in visual processing of arabic numbers and simple calculation. The activated areas in mathematicians were NOT those related to language or general knowledge.

So what’s wrong with the conclusion? The editorialist (pp. 4887 – 4889) pointed this out but I thought of it independently.

All you can say is that experts working in their field of expertise use different parts of their brain than they use for general knowledge. The nonMathematicians should have been tested in their field of expertise. Shouldn’t be hard to do.

High level mathematicians look like normal people

Have you ever had the pleasure of taking a course from someone who wrote the book? I did. I audited a course at Amherst from Prof. David Cox who was one of three authors of “Ideals, Varieties and Algorithms” It was uncanny to listen to him lecture (with any notes) as if he were reading from the book. It was also rather humbling to have a full professor correcting your homework. We had Dr. Cox for several hours each weak (all 11 or 12 of us). This is why Amherst is such an elite school. Ditto for Princeton back in the day, when Physics 103 was taught by John Wheeler 3 hours a week. Physics 103 wasn’t for the high powered among us who were going to be professional physicists (Heinz Pagels, Jim Hartle), it was for preMeds and engineers.

Dr. Cox had one very useful pedagogical device — everyone had to ask a question at the beginning of class, Cox being of the opinion that there is no such thing as a dumb question in math.

Well Dr. Cox and his co-authors (Little and O’Shea) got an award from the American Mathematical sociecty for their book. There’s an excerpt below. You should follow the link to the review to see what the three look like along with two other awardees. Go to any midsize American city at lunchtime, and you’d be hard pressed to pick four of the five out of the crowd of middle aged men walking around. Well almost — one guy would be hard to pick out of the noonday crowd in Williamsburg Brooklyn or Tel Aviv. Four are extremely normal looking guys, not flamboyant or bizarre in any way. This is certainly true of the way Dr. Cox comports himself. The exception proving the rule however, is Raymond Smullyan who was my instructor in a complex variables course back in the day– quite an unusual and otherworldly individual — there’s now a book about him.

Here’s part of the citation. The link also contains bios of all.

“Even more impressive than its clarity of exposition is the impact it has had on mathematics. CLO, as it is fondly known, has not only introduced many to algebraic geometry, it has actually broadened how the subject could be taught and who could use it. One supporter of the nomination writes, “This book, more than any text in this field, has moved computational algebra and algebraic geometry into the mathematical mainstream. I, and others, have used it successfully as a text book for courses, an introductory text for summer programs, and a reference book.”
Another writer, who first met the book in an REU two years before it was published, says, “Without this grounding, I would have never survived my first graduate course in algebraic geometry.” This theme is echoed in many other accounts: “I first read CLO at the start of my second semester of graduate school…. Almost twenty years later I can still remember the relief after the first hour of reading. This was a math book you could actually read! It wasn’t just easy to read but the material also grabbed me.”
For those with a taste for statistics, we note that CLO has sold more than 20,000 copies, it has been cited more than 850 times in MathSciNet, and it has 5,000 citations recorded by Google Scholar. However, these numbers do not really tell the story. Ideals, Varieties, and Algorithms was chosen for the Leroy P. Steele Prize for Mathematical Exposition because it is a rare book that does it all. It is accessible to undergraduates. It has been a source of inspiration for thousands of students of all levels and backgrounds. Moreover, its presentation of the theory of Groebner bases has done more than any other book to popularize this topic, to show the powerful interaction of theory and computation in algebraic geometry, and to illustrate the utility of this theory as a tool in other sciences.”

Types of variables you need to know to understand thermodynamics

I’m been through the first 200 pages of Dill’s Book “Molecular Driving Forces (2003)” which is all about thermodynamics and statistical mechanics, things that must be understood to have any hope of understanding cellular biophysics. There are a lot of variables to consider (with multiple names for some) and they fall into 7 non mutually exclusive types.

Here they are with a few notes about them

l. Thermodynamic State Variables: These are the classics — Entropy (S), Internal Energy (U), Helmholtz Free Energy (F), Gibbs Free Energy (G), Enthalpy (H).
All are continuous functions of their Natural Variables (see next) so they can be differentiated. Their differentials are exact.

2. Natural variable of a thermodynamic state variable — these are defined as continuous variables which when an extremum (maximum, minimum) of the state variable using them is found, the state function won’t change with time (e.g. is at equilibrium). Here they are for the 5 state functions. T is Temperature, V is Volume, N is number of molecules, S and U are what you think, and p is pressure

State Name State Function Natural Variables
Helmholtz Free Energy— F —T, V, N
Entropy —S —U, V, N
Internal Energy —U — S, V, N
Gibbs Free Energy— G —T. p, N
Enthalpy — H —S, p, N

Note that U and S are both state variables and natural variables of each other. Note also (for maximum confusion) that Helmholtz free energy is not H but F, and that H is Enthalpy not Helmholtz Free energy

3. Extensive variable –can only be characterized by how much there is of it. This includes all 5 thermodynamic state variables (F, S, U, G, H) alone with V volume, and N number of molecules.  Extensive variables are also known as degrees of freedom.

4. Intensive variable — temperature, pressure, and ratios of State and Natural variables (actually the derivative of a state variable with respect to a natural variable — temperature is actually defined this way ( partial U / partial S)

5. Control variables — these are under the experimenter’s control, and are usually kept constant. They are also known as constraints, and most are intensive (volume isn’t). Examples constant temperature, constant volume, constant pressure

6. Conjugate variables. Here we need the total differential of a state variable (which exists for all) in terms of its natural variables to under stand what is going on.

Since U is a continuous function of each of S, V, and N

we have

dU = (partial U/ partial X) dS + (partial U / partial V) dV + (partial u / partial N ) dN

= T dS – p dV – mu dN ; mu is the chemical potential

So T is conjugate to S, p is conjugate to V, and mu is conjugate to N ; note that each pair of conjugates has one intensive variable (T, p, mu) and one extensive one ( S, V, N). Clearly the derivatives ( T, p, mu) are intensive.

7. None of the above — work (w) and heat (q)

Thermodynamics can be difficult to master unless these are clear. Another reason is that what you really want is to maximize (S) or minimize (U, H, F, G) state variables — the problem is you have no way to directly measure the two crucial ones you really want (U, S) and have to infer what they are from various derivatives and control variables. You can measure changes in S and U  between two temperatures by using heat capacities. That’s just like spectroscopy, where all you measure is the difference between energy levels, not the energy levels themselves. But it is the minimum values of U, G, H, F and maximum values of S which determine what you want to know.

There’s more to come about Dill’s book. I’ve found a few mistakes and have corresponded with him about various things that seem ambiguous (to me at least). As mentioned earlier, in grad school 44 years ago, I audited a statistical mechanics course taught by E. Bright Wilson himself. I never had the temerity to utter a word to him. How things have changed for the better, to be able to Email an author and get a response. He’s been extremely helpful and patient.

The pleasures of enough time

One of the joys of retirement is the ability to take the time to fully understand the math behind statistical mechanics and thermodynamics (on which large parts of chemistry are based — cellular biophysics as well). I’m going through some biophysics this year reading “Physical Biology of the Cell” 2nd Edition and “Molecular Driving Forces” 2nd Edition. Back in the day, what with other courses, research, career plans and hormones to contend with, there just wasn’t enough time.

To really understand the derivation of the Boltzmann equation, you must understand Lagrange multipliers, which requires an understanding of the gradient and where it comes from. To understand the partition function you must understand change of variables in an integral, and to understand that you must understand why the determinant of the Jacobian matrix of a set of independent vectors is the volume multiplier you need.

These were all math tools whose use was fairly simple and which didn’t require any understanding of where they came from. What a great preparation for a career in medicine, where we understood very little of why we did the things we did, not because of lack of time but because the deep understanding of the systems we were mucking about with simply didn’t (and doesn’t) exist. It was intellectually unsatisfying, but you couldn’t argue with the importance of what we were doing. Things are better now with the accretion of knowledge, but if we really understood things perfectly we’d have effective treatments for cancer and Alzheimer’s. We don’t.

But in the pure world of math, whether a human creation or existing outside of us all, this need not be accepted.

I’m not going to put page after page of derivation of the topics mentioned in the second paragraph, but mention a few things to know which might help you when you’re trying learn about them, and point you to books (with page numbers) that I’ve found helpful.

Let’s start with the gradient. If you remember it at all, you know that it’s a way of taking a continuous real valued function of several variables and making a vector of it. The vector has the miraculous property of pointing in the direction of greatest change in the function. How did this happen?

The most helpful derivation I’ve found was in Thomas’ textbook of calculus (9th Edition pp. 957–> ). Yes Thomas — the same book I used as a freshman 6o years ago ! Like most living things that have aged, it’s become fat. Thomas is now up to the 13th edition.

The simplest example of a continuous real valued function is a topographic map. Thomas starts with the directional derivative — which is how the function height(north, east) changes in the direction of a vector whose absolute value is 1. That’s the definition — to get something you can actually calculate, you need to know the chain rule, and how to put a path on the topo map. The derivative of the real valued function in the direction of a unit vector turns out to be the dot product of the gradient vector and any vector at that point whose absolute value is 1. The unit vector can point any direction but the value of the derivative (the dot product) will be greatest when the unit vector points in the direction of the gradient vector. That’s where the magic comes from. If you’re slightly shaky on linear algebra, vectors and dot products — here’s a (hopefully explanatory) link to some basic linear algebra — This is the first in a series — just follow the links.

The discussion of Lagrange multipliers (which is essentially the relation between two gradients — one of a function, the other of a constraint in Dill pp.68 -> 72 is only fair, and I did a lot more work to understand it (which can’t be reproduced here).

For an excellent discussion of wedge product and why the volume multiplier in an integral must be the determinant of the Jacobian — see Callahan Advanced Calculus p. 41 and exercise 2.15 p. 61, the latter being the most important. It explains why things work this way in 2 dimensions. The exercise takes you through the derivation step by step asking you to fill in some fairly easy dots. Even better is  exercise 2.34 on p. 67 which proves the same thing for any collection of n independent vectors in R^n.

The Jacobian is just the determinant of a square matrix, something familiar from linear algebra. The numbers are just the coefficients of the vectors at a given point. But in integrals we’re changing dx and dy to something else — dr and dTheta when you go to polar coordinates. Why a matrix here? Because if differential calculus is about anything it is about linearization of nonLinear functions, which is why you can use a matrix of derivatives (the Jacobian matrix)  for dx and dy.

Why is this important for statistical mechanics. Because one of the integrals you must evaluate is of exp(-ax^2) from -infinity to + infinity, and the switch to polar coordinates is the way to do it. You also must evaluate integrals of this type to understand the kinetic theory of ideal gases.

Not necessary in this context, but one of the best discussions of the derivative in its geometric context I’ve ever seen is on pp. 105 –> 106 of Callahan’s bok

So these are some pointers and hints, not a full discussion — I hope it makes the road easier for you, should you choose to take it.


A book recommendation, not a review

My first encounter with a topology textbook was not a happy one. I was in grad school knowing I’d leave in a few months to start med school and with plenty of time on my hands and enough money to do what I wanted. I’d always liked math and had taken calculus, including advanced and differential equations in college. Grad school and quantum mechanics meant more differential equations, series solutions of same, matrices, eigenvectors and eigenvalues, etc. etc. I liked the stuff. So I’d heard topology was cool — Mobius strips, Klein bottles, wormholes (from John Wheeler) etc. etc.

So I opened a topology book to find on page 1

A topology is a set with certain selected subsets called open sets satisfying two conditions
l. The union of any number of open sets is an open set
2. The intersection of a finite number of open sets is an open set

Say what?

In an effort to help, on page two the book provided another definition

A topology is a set with certain selected subsets called closed sets satisfying two conditions
l. The union of a finite number number of closed sets is a closed set
2. The intersection of any number of closed sets is a closed set

Ghastly. No motivation. No idea where the definitions came from or how they could be applied.

Which brings me to ‘An Introduction to Algebraic Topology” by Andrew H. Wallace. I recommend it highly, even though algebraic topology is just a branch of topology and fairly specialized at that.


Because in a wonderful, leisurely and discursive fashion, he starts out with the intuitive concept of nearness, applying it to to classic analytic geometry of the plane. He then moves on to continuous functions from one plane to another explaining why they must preserve nearness. Then he abstracts what nearness must mean in terms of the classic pythagorean distance function. Topological spaces are first defined in terms of nearness and neighborhoods, and only after 18 pages does he define open sets in terms of neighborhoods. It’s a wonderful exposition, explaining why open sets must have the properties they have. He doesn’t even get to algebraic topology until p. 62, explaining point set topological notions such as connectedness, compactness, homeomorphisms etc. etc. along the way.

This is a recommendation not a review because, I’ve not read the whole thing. But it’s a great explanation for why the definitions in topology must be the way they are.

It won’t set you back much — I paid. $12.95 for the Dover edition (not sure when).