Category Archives: Math

Axiomatize This !

“Analyze This”, is a very funny 1999 sendup of the Mafia and psychiatry with Robert DeNiro and Billy Crystal.  For some reason the diagram on p. 7 of Barrett O’Neill’s book “Elementary Differential Geometry” revised 2nd edition 2006 made me think of it.

O’Neill’s  book was highly recommended by the wonderful “Visual Differential Geometry and Forms” by Tristan Needham — as “the single most clear-eyed, elegant and (ironically) modern treatment of the subject available — present company excpted !”

So O’Neill starts by defining a point  as an ordered triple of real numbers.  Then he defines R^3 as a set of such points along with the ability to add them and multiply them by another real number.

O’Neill then defines tangent vector (written v_p) as two points (p and v) in R^3 where p is the point of application (aka the tail of the tangent vector) and v as its vector part (the tip of the tangent vector).

All terribly abstract but at least clear and unambiguous until he says — “We shall always picture v_p as the arrow from point p t0 the point p + v”.

The picture is a huge leap and impossible to axiomatize (e.g. “Axiomatize This”).   Actually the (mental) picture came first and gave rise to all these definitions and axioms.

The picture is figure 1.1 on p. 7 — it’s a stick figure of a box shaped like an orange crate sitting in a drawing of R^3 with 3 orthogonal axes (none of which is or can be axiomatized).  p sits at one vertex of the box, and p + v at another.  An arrow is drawn from p to p + v (with a barb at p + v) which is then labeled v_p.  Notice also, that point v appears nowhere in the diagram.

What the definitions and axioms are trying to capture is our intuition of what a (tangent) vector really is.

So on p. 7 what are we actually doing?  We’re looking at a plane in visual R^3 with a bunch of ‘straight’ lines on it.  Photons from that plane go to our (nearly) spherical eye which clearly is no longer a plane.  My late good friend Peter Dodwell, psychology professor at Queen’s University in Ontario, told me that the retinal image actually preserves angles of the image (e.g. it’s conformal). 1,000,000 nerve fibers from each eye go back to our brain (don’t try to axiomatize them).   The information each fiber carries is far more processed than that of a single pixel (retinal photoreceptor) but that’s another story, and perhaps one that could be axiomatized with a lot of work.

100 years ago Wilder Penfield noted that blood flowing through a part of the brain which was active looked red rather than blue (because it contained more oxygen).  That’s the way the brain appears to work.  Any part of the brain doing something gets more blood flow than it needs, so it can’t possibly suck out all the oxygen the blood carries.  Decades of work and zillions researchers have studied the mechanisms by which this happens.  We know a lot more, but still not enough.

Today we don’t have to open the skull as Penfield did, but just do a special type of Magnetic Resonance Imaging (MRI) called functional MRI (fMRI) to watch changes in vessel oxygenation (or lack of it) as conscious people perform various tasks.

When we look at that simple stick figure on p. 7, roughly half of our brain lights up on fMRI, to give us the perception that that stick figure really is something in 3 dimensional space (even though it isn’t).  Axiomatizing that would require us to know what consciousness is (which we don’t) and trace it down to the activity of billions of neurons and trillions of synapses between them.

So what O’Neill is trying to do, is tie down the magnificent Gulliver which is our perception of space with Lilliputian strands of logic.

You’ve got to admire mathematicians for trying.

Book review: The Theoretical Minimum (volume 1)

Volume I of the Theoretical Minimum by Leonard Susskind is a book I wish I had 61 years ago, although I doubt that I’ve would have had the time for it that I do now.  My educational and social background would have uniquely suited me for it.

Start even earlier in the fall of 1957, Freshman Physics for premeds and engineers at Princeton taught by none other than John Wheeler, typical of the way Princeton didn’t reserve its star faculty for graduate students (unlike Harvard).  As an 18 year old from a small high school  terrified of calculus and worried that I wasn’t smart enough, I turned down an offer to move up to the advanced physics and math classes having done fairly well on the first physics test.  We studied Newton’s laws, some thermodynamics, electricity and magnetism (but I don’t recall the Maxwell equations). What I do remember is Wheeler bringing in Neils Bohr to talk to the class (actually he appeared to mumble in Danish).

Fast forward to the spring 1961 and grad school in Chemistry at Harvard and the quantum mechanics course, given not to teach us much physics , but to give us a solid introduction to the quantum numbers describing atomic orbitals by solving the Schrodinger equation.

What follows is a long detour through how we did it.  Feel free to skip to the **** for the main thread of this post.

Recursion relations are no stranger to the differential equations course, where you learn to (tediously) find them for a polynomial series solution for the differential equation at hand. I never really understood them, but I could use them (like far too much math that I took back then).

So it wasn’t a shock when the QM instructor back then got to them in the course of solving the hydrogen atom (with it’s radially symmetric potential). First the equation had to be expressed in spherical coordinates (r, theta and phi) which made the Laplacian look rather fierce. Then the equation was split into 3, each involving one of r, theta or phi. The easiest to solve was the one involving phi which involved only a complex exponential. But periodic nature of the solution made the magnetic quantum number fall out. Pretty good, but nothing earthshaking.

Recursion relations made their appearance with the series solution of the radial and the theta equations. So it was plug and chug time with series solutions and recursion relations so things wouldn’t blow up (or as Dr. Gouterman put it, the electron has to be somewhere, so the wavefunction must be zero at infinity). MEGO (My Eyes Glazed Over) until all of a sudden there were the main quantum number (n) and the azimuthal quantum number (l) coming directly out of the recursions.

When I first realized what was going on, it really hit me. I can still see the room and the people in it (just as people can remember exactly where they were and what they were doing when they heard about 9/11 or (for the oldsters among you) when Kennedy was shot — I was cutting a physiology class in med school). The realization that what I had considered mathematical diddle, in some way was giving us the quantum numbers and the periodic table, and the shape of orbitals, was a glimpse of incredible and unseen power. For me it was like seeing the face of God and the closest thing to a religious experience I’ve ever had.

*****

So in the quantum mechanics course it was Lagrangians and Hamiltonians. Stuff I’d never been exposed to.  My upbringing had trained me long before college to mouth incantations in a language I didn’t understand and convince people (but not myself) that I did, and I felt this way about a lot of math, so H = T +V and L = T – V was no problem at all.  I decided to audit a mechanics course being given to understand what H and L were all about but the (intentionally nameless) prof was an obnoxious  example of a Harvard professor showing how smart he was and how dumb you were.  So I quit and remained ignorant of what H and L were really all about until Susskind’s book.

I read it on a 16 day trip to Iceland, about 1 chapter a day, thinking about the contents as we drove the 800 mile ring road, then going back and reading the chapters again and again.  Obviously, this was not something I had the time for as a grad student.

The book is marvelous, and clear.  Although informal and full of jokes, it was “not written for airheads” as the authors say.  At the end you will understand why the Lagrangian and Hamiltonian were invented (to make solution of Newton’s equation of motions easier).  You will see the action explained (but not it’s origin which will be saved for another volume on quantum mechanics) and the Euler-Lagrange equation derived.  Poisson brackets appear, and are explained, and look very much like the commutator of quantum mechanics.  Failure of commutation is widespread throughout math and physics, and failure of two infinitesimal paths to commute when applied sequentially is what curvature is all about.

Now with a better understanding of the Action and the Lagrangian under my belt, I’ll have to reread a lot of stuff, particularly Tony Zee’s book on Quantum Field Theory as simply as possible.

The following might be skipped unless you’re interested in how I became expert in mouthing incantations in a language I didn’t understand, and later used it in college and grad school.

The Chinese room argument was first published in a 1980 article by American philosopher John Searle. He imagines himself alone in a room following a computer program for responding to Chinese characters slipped under the door. Searle understands nothing of Chinese, and yet, by following the program for manipulating symbols and numerals just as a computer does, he sends appropriate strings of Chinese characters back out under the door, and this leads those outside to mistakenly suppose there is a Chinese speaker in the room.

So it was with me and math as an undergraduate due to a history dating back to age 10.  I hit college being very good at manipulating symbols whose meaning I was never given to understand.  I grew up 45 miles from the nearest synagogue.  My fanatically religious grandfather thought it was better not to attend services at all than to drive up there on the Sabbath.  My father was a young lawyer building a practice, and couldn’t close his office on Friday.   So my he taught me how  to read Hebrew letters and reproduce how they sound, so I could read from the Torah at my Bar Mitzvah (which I did comprehending nothing).  Since I’m musical, learning the cantillations under the letters wasn’t a problem.
Thanks to Susskind I no longer feel that way about Hamiltonians, Lagrangians and Action.

MegaMillions and the Gambler’s fallacy

What better time to discuss the gambler’s fallacy than tonight 150 minutes or so before MegaMillions lets us know which set of 5 numbers will win over a billion dollars (before taxes).

People generally fail to produce random sequences by overusing alternating patterns and avoiding repeating ones — the gambler’s fallacy that repeating sequences are rare (hence improbable)

Although the gambler’s fallacy is just that, a fallacy, it has a neural basis in terms of the way we think [ Proc. Natl. Acad. Sci. vol. 112 pp. 3788 – 3792 ’23 ].

Here’s why.  There is a surprising amount of systematic structure lurking within random sequences.  Let’s toss a fair coin where the probability of a head or a tail is exactly .5.

 

Record the average amount of time (number of tosses) for a pattern to first occur in a sequence (the waiting time). It is significantly long for a repetition (head head — HH or tail tail TT) than for an alternation HT or TH.  The waiting time for HH or TT is 6 tosses, while that for HT or TH is only 4 tosses.  This is in spike of the fact that repetitions and alterations are equally probable (occurring once every four tosses — the same mean time statistics).

This is because repetitions are more bunched in time, they come in bursts, with greater spacing between them, compared with alternations — so they appear less common — hence to the gambler — less probable.

The difference comes from the fact that repetitions can build on each other (e.g. the sequence HHH contains two instances of HH, while alternations cannot).  The waiting time is the variance in the distribution of the interarrival time of patterns which is larger for repetitions than alternations.

So there is a logical basis for the gambler’s fallacy.  Which reminds me.  Did you buy your ticket yet?  Times a’wasting.

James Hartle R. I. P.

Jim Hartle, one of the smartest guys in my college class has died, and of Alzheimer’s disease, showing once again that intelligence does not absolutely protect against Alzheimer’s (although the more educated you are the less likely you are to get it [ Int. J. Epidemiol. Volume 49, Issue 4, August 2020, Pages 1163–1172 ].

He studied with John Wheeler as a Princeton undergraduate, got his PhD with Murray Gell-Mann and worked so extensively with Stephen Hawking that he was asked to speak at Hawking’s funeral.

My total person to person contact with Jim may have lasted 10 minutes (at my 50th reunion).  I knew all sorts of physics majors in the class, but he wasn’t one of them.  I knew about him only because I read our  50th reunion book, and found out how distinguished he was.  I found him relaxed, friendly and far from overbearing, like some of the physics majors I knew.
This was typical of just about everyone at the 50th.  A classmate’s wife (from Chile) described classmates at the 25th as a bunch of roosters.
We did correspond a bit, and he did send me the answer sheets to the problems in his book on Gravity (which I’ve never gone through preferring to get the math under my belt first rather than mouth various incantations which I didn’t understand).  Jim is the reason I started studying Math and Physics in earnest, hoping to have something intelligent to say to him at the next reunion.  He wasn’t present at the 55th,   COVID19 ended the 60th and now he’s gone.
Two more examples of brilliant men you might know of who died of Alzheimer’s are Daniel Quillen Harvard ’61 who won the Fields Medal and Claude Shannon.

The physics department at University of California Santa Barbara has an obituary describing much of his work

https://www.physics.ucsb.edu/news/announcement/2132

How far we’ve come from the McCulloch Pitts neuron

The McCulloch Pitts neuron was described in 1943.  It consists of a bunch of inputs (dendrites) some excitatory, some inhibitory, which are just summed (integrated) the results determining the output (whether the  axon of the neuron fired or didn’t).  Hooking them together could instantiate a variety of boolean functions and ultimately a Turing machine.

The McCulloch Pitts neuron really isn’t that far from the ‘neurons’ in neural nets which underlie the spectacular achievements of artificial intelligence (ChatGTP etc. etc.)   The neuron of the neural net is nothing more than a set of inputs, a set of weights, and an activation function. The neuron translates these inputs into a single output, which can then be picked up as input for another layer of neurons later on.

The major difference between the computation a linked bunch of neurons in the two models (McCulloch Pitts and neural net) is that given the same set of inputs in McCulloch Pitts you always get the same output, while in neural nets you don’t.  The difference is that the set of weights on the inputs to each neuron in the net which can be and are adjusted which depends on how close the output of the net is to the target (which in the case of ChatGTP is how accurately it predicts the next word in a sample of text).

There is a huge debate going on as to whether ChatGTP and similar neural nets understand what they are doing and whether they are/will become conscious.

So does ChatGTP explain how our brains do what they do?  Not at all.  Our neurons are doing far more than integrating input and firing.  This was brought home in a paper focused on something entirely different, the gamma oscillations of brain electrical activity (Neuron vol. 111 pp. 936 – 953 ’23).  People have been studying brain rhythms since Hans Berger discovered alpha rhythm just shy of a century ago.  The electroencephalogram (EEG) measures the various rhythms as they occur over the brain.  Back in the day when I was starting out in neurology (1967), it was one of the few diagnostic tools we had.  It wasn’t very good, and a cynical attending described it as useless but not worthless (because you could charge for it).

The gray matter of the surface of our brains (cerebral cortex) is gray because it is packed with the cell bodies of neurons — some 100,000 under each square millimeter of cortex.  Somehow they are wired together so that they can produce coherent rhythmic electrical activity as they fire.

The best place to study how a bunch neurons produce rhythms is the hippocampus, an area crucial in forming memories and one of the earliest places the senile plaques of Alzheimer’s disease show up.

Unlike the jumble of neurons in the cortex, the large neurons of the hippocampus are all lined up and oriented the same way like trees in a forest.  All the cell bodies lie in roughly the same layer, with the major dendrite (apical dendrite) going up like the trunk of a tree, and the ones near the cell body spreading out like the roots of a tree.

Technology has marched on, and it is now possible to fashion electrodes, which can measure neuronal electrical activity along the trunk, and watch it in real time.

Figure 2b p. 941 shows that different parts of the trunk of the hippocampal  neurons show rhythmic activity at different frequencies at any given time.  Not only that, but as time passes each area of the trunk (apical dendrite) changes the frequency of its rhythmic activity.  This is light years away from the integrate and fire model of McCulloch Pitts, or the adjustment of weights on the inputs to the neurons of the neuronal net.

It shows that each of these neurons is a complex processor of information (a computer if you will).  Even though artificial intelligence has made great strides, it really isn’t telling us how the brain does what it does.

Finally if you want to see what genius looks like, check out the life of Walter Pitts — https://en.wikipedia.org/wiki/Walter_Pitts  — corresponding with Bertrand Russell about Principia Mathematica at age 12, studying with Carnap at the University of Chicago at 15, all while he was homeless.

 

What does (∂h/∂x)dx + (∂h/∂y)dy + (∂h/∂z)dz really mean?

We’ve all seen (∂h/∂x)dx + (∂h/∂y)dy + (∂h/∂z)dz many times and have used it to calculate without understanding what the various symbols  actually mean.  In my case, it’s just another example of mouthing mathematical incantations without understanding them, something I became very good at at young age — see https://luysii.wordpress.com/2022/06/27/the-chinese-room-argument-understanding-math-and-the-imposter-syndrome/ for the gory details.

And now, finally, within a month of my 85th birthday, I finally understand what’s going on by reading only the first 25 pages of “Elementary Differential Geometry” revised second edition 2006 by Barrett O’Neill.

I was pointed to it by the marvelous Visual Differential Geometry by Tristan Needham, about which I’ve written 3 posts — this link has references to the other two — https://luysii.wordpress.com/2022/03/07/visual-differential-geometry-and-forms-q-take-3/

He describes O’Neill’s book as follows.  “First published in 1966, this trail-blazing text pioneered the use of Forms at the undergraduate level.  Today more than a half-century later, O’Neill’s work remains, in my view the single most clear-eyed, elegant and (ironically) modern treatment of the subject available — present company excepted! — at the undergraduate level”

It took a lot of work to untangle the notation (typical of all works in Differential Geometry). There is an old joke “differential geometry is the study of properties that are invariant under change of notation” which is funny because it is so close to the truth (John M. Lee)

So armed with no more than calculus 101, knowing what a vector space is,  and a good deal of notational patience, the meaning of (∂h/∂x)dx + (∂h/∂y)dy + (∂h/∂z)dz (including what dx, dy and dz really are) should be within your grasp.

We begin with R^3, the set of triples of real numbers (a_1, a_2, a_3) where _ means that 1, 2, 3 are taken as subscripts). Interestingly, these aren’t vectors to O’Neill which will be defined shortly.  All 3 components of a triple can be multiplied by a real number c — giving (c*a_1, c*a_2, c*a_3). Pairs of triples can be added.  This makes R^3 into a vector space (which O’Neill calls Euclidean 3-space), the components of which are triples (which O’Neill calls points), but that is not how O’Neill defines a vector, which are pairs of points p = (p_1, p_2, p_3) and v = (v_1, v_2, v_3) — we’ll see why shortly.

A tangent vector to point p in R^3 is called a tangent vector to p (and is written v_p) and is defined as an ordered pair of points (p, v) where

p is the point of application of v_p (aka the tail of p)

v is the vector part of v_p (aka the tip of v_p)

It is visualized as an arrow whose tail is at p and whose tip (barb) is at  p + v (remember you are allowed to add points).  In the visualization of v_p, v does not appear.

The tangent space of R^3 at p is written T_pR^3 and is the set of vectors (p, v) such that p is constant and v varies over all possible points.

Each p in R^3 has its own tangent space, and tangent vectors in different tangent spaces can’t be added.

Next up functions.

A real value function on R^3 is written

f :  R^3 –> R^1 (the real numbers)

f : (a_1, a_2, a_3) |—> c (some real number)

This is typical of the way functions are written in more advanced math, with the first line giving the domain (R^3) of the function and the range of the function (R^1) and the second line giving what happens to a typical element of the domain on application of the function to it.

O’Neill assumes that all the functions on domain R^3 have continuous derivatives of all orders.  So the functions are smooth, differentiable or C^infinity — take your pick — they are equivalent.

The assumption of differentiability means that you have some mechanism for seeing how close two points are to each other.  He doesn’t say it until later, but this assumes the usual distance metric using the Pythagorean theorem — if you’ve taken calc. 101 you know what these are.

For mental visualization it’s better to think of the function as from R^2 (x and y variables — e.g,. the Euclidean plane) to the real numbers.  This is the classic topographic map, which tells how high over the ground you are at each point.

Now at last we’re getting close to (∂f/∂x)dx + (∂f/∂y)dy + (∂f/∂z)dz.

So now you’re on a ridge ascending to the summit of your favorite mountain.  The height function tells you how high your are where you’re standing (call this point p), but what you really want to know is which way to go to get to the peak.  You want to find a direction in which height is increasing.   Enter the directional derivative (of the height function)  Clearly height drops off on either side of the ridge and increases or decreases along the ridge.   Equally clearly there is no single directional derivative here (as there would be for a function g : R^1 –> R^1).  The directional derivative  depends on p (where you are) and v the direction you choose — this is why O’Neill defines tangent vectors by two points (p, and v)

So the directional derivative requires two functions

the height function h : R^3 –> R^1

the direction function f : p + t*v where t is in R^1.  This gives the a line through p going in direction v

So the directional derivative of  h at p is

d/dt  (h (p + t*v)) | _t = 0  ; take the limit of h (p + t*v)  as t approaches zero

Causing me a lot of confusion, O’Neill gives the directional derivative the following name v_p[h] — which gives you no sense that a derivative of anything is involved.  This is his equation

v_p[f] = d/dt  (h (p + t*v)) | _t = 0

Notice that changing p (say to the peak of the mountain) changes the directional derivative —  all of them point down.   This is why O’Neill defines tangent vectors using two points (p, v).

Now a few more functions and the chain rule and we’re about done.

x :    R^3                    –>  R^1

x : (v_1, v_2, v_3 ) |–>  v_1

similarly y :R^3 –> R^1 picks out the y coordinate of (v_1, v_2, v_3 )  e.g. v_2

Let’s look at p + t*v in coordinate form, remembering what p and v are that way

p + t*v  = ( p_1 + t * v_1, p_2 + t * v_2, p_3 + t * v_3)

Remember that we defined f = p  + t *v

so df/dt = d( p + t*v )/dt

expanding

df’/dt= d( p_1 + t * v_1, p_2 + t * v_2, p_3 + t * v_3)/dt = (v_1, v_2, v_3)

Let’s be definite about what h : R^3 –> R^1 actually is

h : (x, y, z) |—> x^2 * y^3 *z ^4 meaning you must use partial derivatives

so ∂h/∂x = 2 x * y^3 * z* 4,  etc.,

Look at v_p[h] = d/dt  (h (p + t*v)) | _t = 0 again

It’s really v_p[h] = d/dt (h (f (t))|_=0

so it’s time for the chain rule

d/dt (h (f (t)) = (dh/df ) * (df/dt)

dh/df in coordinates is really

(∂h/∂x, ∂h/∂y,∂h/∂z)

df/dt in coordinates is really

(v_1, v_2, v_3)

But the chain rule is applied to each of the three terms

so what you have is d/dt (h (f (t))  = (∂h/∂x * v_1,  ∂h/∂y * v_2, ∂h/∂z * v_3)

I left one thing out.  The |_=0

So to do this you need to plug in the numbers (evaluating everything at p) and sum so what you get is

v_p[h] = ∂h/∂x * v_1 +  ∂h/∂y * v_2 +  ∂h/∂z * v_3

We need one more definition. Recall that the tangent space of R^3 at p is written T_pR^3 and is the set of vectors (p, v) such that p is constant and v varies over all possible points.

The set of all tangent spaces over R^3 is written (TR^3)

Finally on p. 24 O’Neill defines what you’ve all been waiting for :  dh

dh : TR^3 –> R^1

dh : p  ——>  v_p[h] = ∂h/∂x * v_1 +  ∂h/∂y * v_2 +  ∂h/∂z * v_3

One last bit of manipulation — what is dx (and dy and dz)?

we know that  the function x is defined as follows

x :    R^3                    –>  R^1

x : (v_1, v_2, v_3 ) |–>  v_1

so dx = (dx/dx, dx/dy, dx/dz)|_=0

is just  v_1

so at (very) long last we have

dh : TR^3 –> R^1

dh : p  ——>  v_p[h] = ∂h/∂x * dx +  ∂h/∂y * dy +  ∂h/∂z * dz

Remember ∂h/∂x, ∂h/∂y,  ∂h/∂z are all evaluated at p = (p_1, p_2, p_3)

So it’s a (fairly) simple matter to apply dh to any point p in R^3 and any direction  (v_1, v_2, v_3) in R^3 to get the directional derivative

Amen. Selah.

What AlphaZero ‘knows’ about Chess

I’m a lousy chess player.  When I was 12, my 7 year old brother beat me regularly.  Fortunately chess ability doesn’t correlate with intelligence, as another pair of brothers will show.   The younger brother could beat the older one at similar ages, but as the years passed, the older brother became a Rhodes scholar, while the severely handicapped younger brother (due to encephalitis at one month of age) is currently living in a group home.

A fascinating paper [ Proc. Natl. Acad. Sci vol. 119 e2206625119 ’22 ] opens the black box of AlphaZero, a neural net which is currently the world champion, to see what it ‘knows’ about chess as it relentlessly plays itself to build up expertise.

The paper is highly technical and I don’t claim to understand all of it (or even most of it), but it’s likely behind a paywall so you’ll have to content yourself with this unless you subscribe ($235/year for the online edition). The first computer chess machines used a bunch of rules developed by expert chess players.  Neural nets require training.  For picture classification they required thousands and thousands of pictures, and feed back about whether they got it right or wrong.  Then the probability of firing between elements of the net (neurons) was adjusted up if the answer was correct, down otherwise.  This is supervised learning.

Game playing machines are unsupervised, they just play thousands and millions of games against themselves (AlphaZero played one million).  Gradually they get better and better, so they beat humans and earlier rule based machines.  A net that has played 32,000 games beats the same net that has played 16,000 games 100 games out of 100 games.  However the 128,000 beats 64,000 only 64 times.

They they had a world chess champion (V.K.) analyze how the machines were playing.

Between 16,000 and 32,000 plays the net began to understand the relative values of the pieces (anything vs. pawns, queen vs. rook etc. etc.)

Between 32,000 and 64,000 king safety appeared

Between 64,000 and 128,00 games which attack was most likely to succeed appeared.

Showing that there is no perfect strategy, separate 1,000,000 runs of the machine settled on two variants of the (extremely popular) Ruy Lopez opening.

They studied recorded human games (between experts or they wouldn’t have been recorded) in the past 500 years.  Initially most people played the same way, with variants appearing as the years passed.  The neural net was just the opposite, trying lots of different things initially and subsequently settling on few approaches.

All in all, a fascinating look inside the black box of a neural net.

History repeats itself

The stories of young and not so young Russians running for the exits to escape the Tsar’s (Putin’s) army, resonates with me as both my grandfathers did exactly that in the 1880’s – 1890’s.  What really brought it home was the fact we just found out this year that my mother’s father’s last name was not what we’ve thought it was. He was adopted by another family with many children and given their name to avoid conscription.  My father’s father got out because he was the first born son, and in line for a lifetime in the Tsar’s army.

Ah Russia !  The gift that keeps on giving.

Here’s another example which is part of an old post

Hitler’s gifts (and Russia’s gift)

In the summer of 1984 Barack Obama was at Harvard Law, his future wife was a Princeton undergraduate, and Edward Frenkel, a 16 year old mathematical prodigy, was being examined for admission to Moscow State University. He didn’t get in because he was Jewish. His blow by blow description of the 5 hour oral exam designed to exclude him on pp. 28 – 38 of his book “Love & Math” is as painful to read as it must have been for him to write.

Harvard recognized his talent, and made him a visiting professor at age 21, later enrolling him in grad school so he could get a PhD. He’s now a Stanford prof.

Here’s a link to the full post  –https://luysii.wordpress.com/2013/10/27/hitlers-gifts-and-russias-gift/

 

Book Review: Proving Ground, Kathy Kleiman

Proving Ground is a fascinating book about the 6 women who programmed the first programmable computer, the ENIAC (Electronic Numerical Integrator And Computer).  Prior to this, the women were computers as the term was used in the 1940s for people who sat in front of calculating machines and performed lengthy numerical computations solving differential equations to find the path of an artillery shell one bloody addition/subtraction/multiplication/division at a time.  When World War II started and when the man were off in the army, the search was on for  women with a mathematical background who could do this.

A single trajectory took a day to calculate, and each trajectory had to be separately calculated for different wind currents, air temperature, speed and weight of the shell.  The computations were largely done at the Moore School of Engineering at Penn and were way too slow (although accurate) to produce the numbers of trajectories the army needed.

Enter Dr. John Mauchley who had an idea of how to do this using vacuum tubes, and a brilliant 23 year old engineer, J. Presper Eckert, who could instantiate it. The army committed money to building the machine, which came in 42 monster boxes 8 feet tall, 2 feet wide and what looks like 4 feet deep.

6 of the best and brightest computers of trajectories were recruited to figure out how to wire the boxes together to mimic the trajectory calculations they had already been doing.  So, if you’ve ever done any programming, you’ll know that having a definite target to mimic with software makes life much easier.

Going a bit deeper, if you’ve done any programming in machine language, you know about registers, the addition and logical unit, hard wired memory, alterable memory.

Here’s what the 6 women were given by Dr. Eckert (without ever seeing the monster boxes)

l. A circuit diagram of each box, showing how this vacuum tube activated that vacuum tube etc. etc. The 42 boxes contained 18,000 vacuum tubes.  Vacuum tubes and transistors are similar in that their utility is that they only conduct electricity in one direction and can be turned on and off.

2. A block diagram — which showed how the functions of a unit or system interrelate

3. A logical diagram — places for dials switches, plug and cables on the front of the 42 units.

So given this, the 6 had to figure out what each unit did, and how to wire them together to mimic the trajectory calculations they had been doing.

They did it, and initially without being able to enter the room with the boxes (because they didn’t have the proper security clearance).  Eventually they got it and were able to figure out how to wire the boxes together.

If that isn’t brilliant enough, because the calculations were still taking too long, they invented parallel programing.

For those of you who know computing, that should be enough to make you thirst for more detail.

The book contains a lot of sociology.  The women were treated like dirt by the higher ups (but not by Mauchley or Eckert).  When the time came to show ENIAC off to the brass (both academic and military), they were tasked with serving coffee and hanging up coats.  When Kleiman found pictures of them with ENIAC and asked who they were, she was told they were ‘refrigerator ladies’ — whose function was similar to the barely clothed models draped over high powered automobiles to sell them.

I’ll skip the book’s sociology for some sociology of my own.  The book has biographies and much fascinating detail about all 6 women.  I grew up near Philly, and know the buildings at Penn where this was done (I went to Penn Med). Two of the 6 were graduates of Chestnut Hill College, a small Catholic school west of Philly.  The girl across the street went there.  Her mother was born in County Donegal and cleaned houses.  Her father dropped out of high school at 16 to support his widowed mother.  No social services between the two world wars, wasn’t that terrible etc. etc.  Her father worked in a lumberyard, yet the two of them sent both children to college, and owned their own home (eventually free of debt).  The Chestnut Hill grad I know became an editor at Harcourt Brace, her brother became a millionaire insurance executive.  It would be impossible for two working class people to do this today where I grew up (or probably in most places).

What is dx really?

“Differential geometry is the study of properties that are invariant under change of notion”  — Preface p. vii of “Introduction to Smooth Manifolds” J. M. Lee second edition.  Lee says this is “funny primarily because it is so close to the truth”.

Having ascended to a substantial base camp for the assault on the Einstein Field equations (e.g. understanding the Riemann curvature tensor), I thought I’d take a break and follow Needham’s advice about another math book “Elementary Differential Geometry” 2nd edition (revised 2006) by Barrett O’Neill.  “First published in 1966, this trail-blazing text pioneered the use of Forms at the undergraduate level.  Today, more than a half-century later, O’Neill’s work remains, in my view, the single most clear-eyed, elegant and (ironically) modern treatment of the subject available — present company excepted! — at the undergraduate level.”

Anyone taking calculus has seen plenty of dx’s — in derivatives, in integrals etc. etc..  They’re rarely explained.  O’Neill will get you there in just the first 24 pages.  One more page and you’ll understand

df =  (partial f/partial x) * dx + (partial f/partial y) * dy + (partial f/partial z) * dz

which you’ve doubtless seen before primarily as (heiroglyphics) before you moved on.

Is it easy?  No, not unless you read definitions and internalize them immediately.  The definitions are very clearly explained.

His definition of vector is a bit different — two points in Euclidean 3-space (written R^3 which is the only space he talks about in the first 25 pages).  His 3-space is actually a vector space in which point can be added and scalar multiplied.

You’ll need to bone up on the chain rule from calculus 101.

A few more definitions — natural coordinate functions, tangent space to R^3 at p, vector field on R^3, pointwise principle, natural frame field, Euclidean coordinate function (written x_i, where i is in { 1, 2, 3 } ), derivative of a function in direction of vector v (e.g. directional derivative), operation of a vector on a function, tangent vector, 1-form, dual space. I had to write them down to keep them straight as they’re mixed in paragraphs containing explanatory text.

and

at long last,

differential of x_i (written dx_i)

All is not perfect.  On p. 28 you are introduced to the alternation rule

dx_i ^ dx_j = – dx_j ^ dx_i with no justification whatsoever

On p. 30 you are given the formula for the exterior derivative of a one form again with no justification.  So its back to mumbling incantations and plug and chug