Category Archives: Linear Algebra Survival Guide for Quantum Mechanics

The Representation of group G on vector space V is really a left action of the group on the vector space

Say what? What does this have to do with quantum mechanics? Quite a bit. Practically everything in fact. Most chemists learn quantum mechanics because they want to see where atomic orbitals come from. So they stagger through the solution of the Schrodinger equation where the quantum numbers appear as solution of recursion equations for power series solutions of the Schrodinger equation.

Forget the Schrodinger equation (for now), quantum mechanics is really written in the language of linear algebra. Feynman warned us not to consider ‘how it can be like that’, but at least you can understand the ‘that’ — e.g. linear algebra. In fact, the instructor in a graduate course in abstract algebra I audited opened the linear algebra section with the remark that the only functions mathematicians really understand are the linear ones.

The definitions used (vector space, inner product, matrix multiplication, Hermitian operator) are obscure and strange. You can memorize them and mumble them as incantations when needed, or you can understand why they are the way they are and where they come from. So if you are a bit rusty on your linear algebra I’ve written a series of 9 posts on the subject — here’s a link to the first– just follow the links after that.

Just to whet your appetite, all of quantum mechanics consists of manipulation of a particular vector space called Hilbert space. Yes all of it.

Representations are a combination of abstract algebra and linear algebra, and are crucial in elementary particle physics. In fact elementary particles are representations of abstract symmetry groups.

So in what follows, I’ll assume you know what vector spaces, linear transformations of them, their matrix representation. I’m not going to explain what a group is, but it isn’t terribly complicated. So if you don’t know about them quit. The Wiki article is too detailed for what you need to know.

The title of the post really threw me, and understanding requires significant unpacking of the definitions, but you need to know this if you want to proceed further in physics.

So we’ll start with a Group G, its operation * and e, its identity element.

Next we have a set called X — just that a bunch of elements (called x, y, . . .), with no further structure imposed — you can’t add elements, you can’t mutiply them by real numbers. If you could with a few more details you’d have a vector space (see the survival guide)

Definition of Left Action (LA) of G on set X

LA : G x X –> X

LA : ( g, x ) |–> (g . x)

Such that the following two properties hold

l. For all x in X LA : (e, x) |–> (e.x) = x

2. For all g1 and g2 in G LA ( g1 * g2), x ) |–> ( g1 . (g2 . x )

Given vector space V define GL(V) the set of invertible linear transformations (LTs) of vector space. GL(V) becomes a group if you let composition of linear transformations become its operation (it’s all in the survival guide.

Now for the definition of representation of Group G on vector space V

It is a function

rho: G –> GL(V)

rho: g |–> LTg : V –> V linear ; LTg == Linear Transformation labeled by group element g

The representation rho defines a left group action on V

LA : (g, v) |–> LTg (V) — this satisfies the two properties above of a left action given above — think about it.

Now you’re ready for some serious study of quantum mechanics. When you read that the representation is acting on some vector space, you’ll know what they are talking about.

Want to understand Quantum Computing — buy this book

As quantum mechanics enters its second century, quantum computing has been hot stuff for the last third of it, beginning with Feynman’s lectures on computation in 84 – 86.  Articles on quantum computing  appear all the time in Nature, Science and even the mainstream press.

Perhaps you tried to understand it 20 years ago by reading Nielsen and Chuang’s massive tome Quantum Computation and Quantum information.  I did, and gave up.  At 648 pages and nearly half a million words, it’s something only for people entering the field.  Yet quantum computers are impossible to ignore.

That’s where a new book “Quantum Computing for Everyone” by Chris Bernhardt comes in.  You need little more than high school trigonometry and determination to get through it.  It is blazingly clear.  No term is used before it is defined and there are plenty of diagrams.   Of course Bernhardt simplifies things a bit.  Amazingly, he’s able to avoid the complex number system. At 189 pages and under 100,000 words it is not impossible to get through.

Not being an expert, I can’t speak for its completeness, but all the stuff I’ve read about in Nature, Science is there — no cloning, entanglement, Ed Frenkin (and his gate), Grover’s algorithm,  Shor’s algorithm, the RSA algorithm.  As a bonus there is a clear explanation of Bell’s theorem.

You don’t need a course in quantum mechanics to get through it, but it would make things easier.  Most chemists (for whom this blog is basically written) have had one.  This plus a background in linear algebra would certainly make the first 70 or so pages a breeze.

Just as a book on language doesn’t get into the fonts it can be written in, the book doesn’t get into how such a computer can be physically instantiated.  What it does do is tell you how the basic guts of the quantum computer work. Amazingly, they are just matrices (explained in the book) which change one basis for representing qubits (explained) into another.  These are the quantum gates —  ‘just operations that can be described by orthogonal matrices” p. 117.  The computation comes in by sending qubits through the gates (operating on vectors by matrices).

Be prepared to work.  The concepts (although clearly explained) come thick and fast.

Linear algebra is basic to quantum mechanics.  Superposition of quantum states is nothing more than a linear combination of vectors.  When I audited a course on QM 10 years ago to see what had changed in 50 years, I was amazed at how little linear algebra was emphasized.  You could do worse that read a series of posts on my blog titled “Linear Algebra Survival Guide for Quantum Mechanics” — There are 9 — start here and follow the links — you may find it helpful —

From a mathematical point of view entanglement (discussed extensively in the book) is fairly simple -philosophically it’s anything but – and the following was described by a math prof as concise and clear–

The book is a masterpiece — kudos to Bernhardt

The pleasures of enough time

One of the joys of retirement is the ability to take the time to fully understand the math behind statistical mechanics and thermodynamics (on which large parts of chemistry are based — cellular biophysics as well). I’m going through some biophysics this year reading “Physical Biology of the Cell” 2nd Edition and “Molecular Driving Forces” 2nd Edition. Back in the day, what with other courses, research, career plans and hormones to contend with, there just wasn’t enough time.

To really understand the derivation of the Boltzmann equation, you must understand Lagrange multipliers, which requires an understanding of the gradient and where it comes from. To understand the partition function you must understand change of variables in an integral, and to understand that you must understand why the determinant of the Jacobian matrix of a set of independent vectors is the volume multiplier you need.

These were all math tools whose use was fairly simple and which didn’t require any understanding of where they came from. What a great preparation for a career in medicine, where we understood very little of why we did the things we did, not because of lack of time but because the deep understanding of the systems we were mucking about with simply didn’t (and doesn’t) exist. It was intellectually unsatisfying, but you couldn’t argue with the importance of what we were doing. Things are better now with the accretion of knowledge, but if we really understood things perfectly we’d have effective treatments for cancer and Alzheimer’s. We don’t.

But in the pure world of math, whether a human creation or existing outside of us all, this need not be accepted.

I’m not going to put page after page of derivation of the topics mentioned in the second paragraph, but mention a few things to know which might help you when you’re trying learn about them, and point you to books (with page numbers) that I’ve found helpful.

Let’s start with the gradient. If you remember it at all, you know that it’s a way of taking a continuous real valued function of several variables and making a vector of it. The vector has the miraculous property of pointing in the direction of greatest change in the function. How did this happen?

The most helpful derivation I’ve found was in Thomas’ textbook of calculus (9th Edition pp. 957–> ). Yes Thomas — the same book I used as a freshman 6o years ago ! Like most living things that have aged, it’s become fat. Thomas is now up to the 13th edition.

The simplest example of a continuous real valued function is a topographic map. Thomas starts with the directional derivative — which is how the function height(north, east) changes in the direction of a vector whose absolute value is 1. That’s the definition — to get something you can actually calculate, you need to know the chain rule, and how to put a path on the topo map. The derivative of the real valued function in the direction of a unit vector turns out to be the dot product of the gradient vector and any vector at that point whose absolute value is 1. The unit vector can point any direction but the value of the derivative (the dot product) will be greatest when the unit vector points in the direction of the gradient vector. That’s where the magic comes from. If you’re slightly shaky on linear algebra, vectors and dot products — here’s a (hopefully explanatory) link to some basic linear algebra — This is the first in a series — just follow the links.

The discussion of Lagrange multipliers (which is essentially the relation between two gradients — one of a function, the other of a constraint in Dill pp.68 -> 72 is only fair, and I did a lot more work to understand it (which can’t be reproduced here).

For an excellent discussion of wedge product and why the volume multiplier in an integral must be the determinant of the Jacobian — see Callahan Advanced Calculus p. 41 and exercise 2.15 p. 61, the latter being the most important. It explains why things work this way in 2 dimensions. The exercise takes you through the derivation step by step asking you to fill in some fairly easy dots. Even better is  exercise 2.34 on p. 67 which proves the same thing for any collection of n independent vectors in R^n.

The Jacobian is just the determinant of a square matrix, something familiar from linear algebra. The numbers are just the coefficients of the vectors at a given point. But in integrals we’re changing dx and dy to something else — dr and dTheta when you go to polar coordinates. Why a matrix here? Because if differential calculus is about anything it is about linearization of nonLinear functions, which is why you can use a matrix of derivatives (the Jacobian matrix)  for dx and dy.

Why is this important for statistical mechanics. Because one of the integrals you must evaluate is of exp(-ax^2) from -infinity to + infinity, and the switch to polar coordinates is the way to do it. You also must evaluate integrals of this type to understand the kinetic theory of ideal gases.

Not necessary in this context, but one of the best discussions of the derivative in its geometric context I’ve ever seen is on pp. 105 –> 106 of Callahan’s bok

So these are some pointers and hints, not a full discussion — I hope it makes the road easier for you, should you choose to take it.


How formal tensor mathematics and the postulates of quantum mechanics give rise to entanglement

Tensors continue to amaze. I never thought I’d get a simple mathematical explanation of entanglement, but here it is. Explanation is probably too strong a word, because it relies on the postulates of quantum mechanics, which are extremely simple but which lead to extremely bizarre consequences (such as entanglement). As Feynman famously said ‘no one understands quantum mechanics’. Despite that it’s never made a prediction not confirmed by experiments, so the theory is correct even if we don’t understand ‘how it can be like that’. 100 years of correct prediction of experimentation are not to be sneezed at.

If you’re a bit foggy on just what entanglement is — have a look at Even better; read the book by Zeilinger referred to in the link (if you have the time).

Actually you don’t even need all the postulates for quantum mechanics (as given in the book “Quantum Computation and Quantum Information by Nielsen and Chuang). No differential equations. No Schrodinger equation. No operators. No eigenvalues. What could be nicer for those thirsting for knowledge? Such a deal ! ! ! Just 2 postulates and a little formal mathematics.

Postulate #1 “Associated to any isolated physical system, is a complex vector space with inner product (that is a Hilbert space) known as the state space of the system. The system is completely described by its state vector which is a unit vector in the system’s state space”. If this is unsatisfying, see an explication of this on p. 80 of Nielson and Chuang (where the postulate appears)

Because the linear algebra underlying quantum mechanics seemed to be largely ignored in the course I audited, I wrote a series of posts called Linear Algebra Survival Guide for Quantum Mechanics. The first should be all you need. but there are several more.

Even though I wrote a post on tensors, showing how they were a way of describing an object independently of the coordinates used to describe it, I did’t even discuss another aspect of tensors — multi linearity — which is crucial here. The post itself can be viewed at

Start by thinking of a simple tensor as a vector in a vector space. The tensor product is just a way of combining vectors in vector spaces to get another (and larger) vector space. So the tensor product isn’t a product in the sense that multiplication of two objects (real numbers, complex numbers, square matrices) produces another object of the exactly same kind.

So mathematicians use a special symbol for the tensor product — a circle with an x inside. I’m going to use something similar ‘®’ because I can’t figure out how to produce the actual symbol. So let V and W be the quantum mechanical state spaces of two systems.

Their tensor product is just V ® W. Mathematicians can define things any way they want. A crucial aspect of the tensor product is that is multilinear. So if v and v’ are elements of V, then v + v’ is also an element of V (because two vectors in a given vector space can always be added). Similarly w + w’ is an element of W if w an w’ are. Adding to the confusion trying to learn this stuff is the fact that all vectors are themselves tensors.

Multilinearity of the tensor product is what you’d think

(v + v’) ® (w + w’) = v ® (w + w’ ) + v’ ® (w + w’)

= v ® w + v ® w’ + v’ ® w + v’ ® w’

You get all 4 tensor products in this case.

This brings us to Postulate #2 (actually #4 on the book on p. 94 — we don’t need the other two — I told you this was fairly simple)

Postulate #2 “The state space of a composite physical system is the tensor product of the state spaces of the component physical systems.”

Where does entanglement come in? Patience, we’re nearly done. One now must distinguish simple and non-simple tensors. Each of the 4 tensors products in the sum on the last line is simple being the tensor product of two vectors.

What about v ® w’ + v’ ® w ?? It isn’t simple because there is no way to get this by itself as simple_tensor1 ® simple_tensor2 So it’s called a compound tensor. (v + v’) ® (w + w’) is a simple tensor because v + v’ is just another single element of V (call it v”) and w + w’ is just another single element of W (call it w”).

So the tensor product of (v + v’) ® (w + w’) — the elements of the two state spaces can be understood as though V has state v” and W has state w”.

v ® w’ + v’ ® w can’t be understood this way. The full system can’t be understood by considering V and W in isolation, e.g. the two subsystems V and W are ENTANGLED.

Yup, that’s all there is to entanglement (mathematically at least). The paradoxes entanglement including Einstein’s ‘creepy action at a distance’ are left for you to explore — again Zeilinger’s book is a great source.

But how can it be like that you ask? Feynman said not to start thinking these thoughts, and if he didn’t know you expect a retired neurologist to tell you? Please.

An old year’s resolution

One of the things I thought I was going to do in 2012 was learn about relativity.   For why see  Also my cousin’s new husband wrote a paper on a new way of looking at it.  I’ve been putting him off as I thought I should know the old way first.

I knew that general relativity involved lots of math such as manifolds and the curvature of space-time.  So rather than read verbal explanations, I thought I’d learn the math first.  I started reading John M. Lee’s two books on manifolds.  The first involves topological manifolds, the second involves manifolds with extra structure (smoothness) permitting calculus to be done on them.  Distance is not a topological concept, but is absolutely required for calculus — that’s what the smoothness is about.

I started with “Introduction to Topological Manifolds” (2nd. Edition) by John M. Lee.  I’ve got about 34 pages of notes on the first 95 pages (25% of the text), and made a list of the definitions I thought worth writing down — there are 170 of them. Eventually I got through a third of its 380 pages of text.  I thought that might be enough to help me read his second book “Introduction to Smooth Manifolds” but I only got through 100 of its 600 pages before I could see that I really needed to go back and completely go through the first book.

This seemed endless, and would probably take 2 more years.  This shouldn’t be taken as a criticism of Lee — his writing is clear as a bell.  One of the few criticisms of his books is that they are so clear, you think you understand what you are reading when you don’t.

So what to do?  A prof at one of the local colleges, James J. Callahan, wrote a book called “The Geometry of Spacetime” which concerns special and general relativity.  I asked if I could audit the course on it he’d been teaching there for decades.  Unfortunately he said “been there, done that” and had no plans ever to teach the course again.

Well, for the last month or so, I’ve been going through his book.  It’s excellent, with lots of diagrams and pictures, and wide margins for taking notes.  A symbol table would have been helpful, as would answers to the excellent (and fairly difficult) problems.

This also explains why there have been no posts in the past month.

The good news is that the only math you need for special relativity is calculus and linear algebra.  Really nothing more.  No manifolds.  At the end of the first third of the book (about 145 pages) you will have a clear understanding of

l. time dilation — why time slows down for moving objects

2. length contraction — why moving objects shrink

3. why two observers looking at the same event can see it happening at different times.

4. the Michelson Morley experiment — but the explanation of it in the Feynman lectures on physics 15-3, 15-4 is much better

5. The Kludge Lorentz used to make Maxwell’s equations obey the Galilean principle of relativity (e.g. Newton’s first law)

6. How Einstein derived Lorentz’s kludge purely by assuming the velocity of light was constant for all observers, never mind how they were moving relative to each other.  Reading how he did it, is like watching a master sculptor at work.

Well, I’ll never get through the rest of Callahan by the end of 2012, but I can see doing it in a few more months.  You could conceivably learn linear algebra by reading his book, but it would be tough.  I’ve written some fairly simplistic background linear algebra for quantum mechanics posts — you might find them useful.

One of the nicest things was seeing clearly what it means for different matrices to represent the same transformation, and why you should care.  I’d seen this many times in linear algebra, but seeing how simple reflection through an arbitrary line through the origin can be when you (1) rotate the line to the x axis by tan(y/x) radians (2) change the y coordinate to – y  — by an incredibly simple matrix  (3) rotate it back to the original angle .

That’s why any two n x n matrices X and Y represent the same linear transformation if they are related by the invertible matrix Z in the following way  X = Z^-1 * Y * Z

Merry Christmas and Happy New Year (none of that Happy Holidays crap for me)

Willock pp. 51 – 104

This is a continuation of my notes, as I read  Molecular Symmetry” by David J. Willock.  As you’ll see, things aren’t going particularly well.  Examples of concepts are great once they’ve been defined, but in this book it’s examples first, definitions later (if ever).

p. 51 — Note all the heavy lifting  required to produce an object with only (italics) C4 symmetry (figure 3.6)  First,  you need 4 objects in a plane (so they rotate into each other), separated by 90 degrees.  That’s far from enough objects as there are multiple planes of symmetry for 4 objects in a plane (I count 5 — how many do you get?)  So you need another 4 objects in a plane parallel to the first.  These objects must be a different distance from the symmetry axis, otherwise the object will have A C2 axis of symmetry, midway between the two planes. Lastly no object in the second plane can lie on a line parallel to the axis of symmetry which contains an object in the first plane — e.g. the two groups of 4 must be staggered relative to each other.    It’s even more complicated for S4 symmetry.  

p. 51 — The term classes of operation really hasn’t been defined (except by example).   Also this is the first example of (the heading of) a character table — which hasn’t been defined at this point.

p. 52 — Note H2O2 has C2 symmetry because it is not (italics) planar.   Ditto for 1,2 (S, S) dimethyl cyclopropane (more importantly this is true for disulfide bonds between cysteines forming cystines — a way of tying parts of proteins to each other. 

p. 55 — Pay attention to the nomenclature: Cnh means that an axis of degree n is present along with a horizontal plane of symmetry.  Cnv means that, instead, a vertical plane of symmetry is present (along with the Cn axis)

p. 57 — Make sure you understain why C4h doesn’t  have vertical planes of symmetry.

p. 59 — A bizarre pedagogical device — defining groups whose first letter is D by something they are not (italics) — which itself (cubic groups) is at present undefined.  

Willock then regroups by defining what Dn actually is.

It’s a good exercise to try to construct the D4 point group yourself. 

p. 61 — “It does form a subgroup” — If subgroup was ever defined, I missed it.  Subgroup is not in the index (neither is group !).  Point group is in the index, and point subgroup is as well appearing on p. 47 — but point subgroup isn’t defined there.  

p. 62 — Note the convention — the Z direction is perpendicular to the plane of a planar molecule.

p. 64 — Why are linear molecules called Cinfinity ? — because any rotation around the axis of symmetry (the molecule itself) leaves the molecule unchanged, and there are an infinity of such rotations.

p. 67 — Ah,  the tetrahedron embedded in a cube — exactly the way an organic chemist should think of the sp3 carbon bonds.  Here’s a mathematical problem for you.  Let the cube have sides of 1, the bonds as shown in figure 3.27, the carbon in the very center of the cube — now derive the classic tetrahedral bond angle — answer at the end of this post. 

p. 67 — 74 — The discussions of symmetries in various molecules is exactly why you should have the conventions for naming them down pat.  

p. 75 — in the second paragraph affect should be effect (at least in American English)

p. 76 — “Based on the atom positions alone we cannot tell the difference between the C2 rotation and the sigma(v) reflection, because either operation swaps the positions of the hydrogen atoms.”   Do we ever want to actually do this (for water that is)? Hopefully this will turn out to be chemically relevant. 

p. 77 — Note that the definition of character refers to the effect of a symmetry operation on one of an atom’s orbitals (not it’s position).  Does this only affect atoms whose position is not (italics) changed by the symmetry operation?  Very important to note that the character is -1 only on reversal of the orbital — later on, non-integer characters will be seen.  Note also that each symmetry operation produces a character (number) for each orbital, so there are (number of symmetry operations) * (number of orbital) characters in a character table

p. 77 – 78 — Note that the naming of the orbitals is consistent with what has gone on before.  p(z) is in the plane of the molecule because that’s where the axis of rotation is.

Labels are introduced for each of the possible standard sets of characters (but standard set really isn’t defined).  A standard set (of sets of characters??) is an irreducible representation for the group.  

Is one set of characters an irreducible representation by itself or is it a bunch of them? The index claims that this is the definition of irreducible represenation, but given the amiguity about what a standard set of characters actually is (italics) we don’t really know what an irreducible representation actually is.   This is definition by example, a pedagogical device foreign to math, but possibly a good pedagogical device — we’ll see.  But at this point, I’m not really clear what an irreducible represenation actually is.

p. 77 — In a future edition, it would be a good idea to lable the x, y and z axes (and even perhaps draw in the px, py and pz orbitals), and, if possible, put figure 4.2 on the same page as table 4.2.  Eventually things get figured out but it takes a lot of page flipping. 

p. 79 — Further tightening of the definition of a representation — it’s one row of a character table.

p. 79 — Nice explanation of orbital phases, but do electrons in atoms know or care about them?

p. 80 — Note that in the x-y axes are rotated 90 degrees in going from figure 4.4a to figure 4.4b  (why?).   Why talk about d orbitals? — they’re empty in H20 but possibly not in other molecules with C2v symmetry.  

p. 80 — Affect should be effect (at least in American English)

p. 81 — B1 x B2 = A2 doesn’t look like a sum to me.  If you actually summed them you’d get 2 for E, -2 for C2, and 0 for the other two.  It does look like the product though.

pp. 81 – 82 — Far from sure what is going on in section 4.3

p.82 — Table 4.4b does look like multiplication of the elements of B1 by itself. 

p. 82 — Not sure when basis vectors first made their appearance, possibly here.  I slid over this on first reading since basis vectors were quite familiar to me from linear algebra (see the category ).  But again, the term is used here without really being defined.  Probably not to confuse, the first basis vectors shown first are at 90 degrees to each other (x and y), but later on (p. 85 they don’t have to be — the basis 0vectors point along the 3 hydrogens of ammonia).

p. 83 — Very nice way to bring in matrices, but it’s worth nothing that each matrix stands for just one symmetry operation.  But each matrix lets you see what happens to all (italics) the basis vectors you’ve chosen. 

p. 84 — Get very clear in your mind that when you see an expression of the form

symmetry_operation1 symmetry_operation2 

juxtaposed to each other — that you do symmetry_operation2  FIRST.

p. 87  — Notice that the term character is acquiring a second meaning here — it no longer is the effect of a symmetry operation on one of an atom’s orbitals (not the atom’s position), it’s the effect of a symmetry operation on a whole set of basis elements.

p. 88 — Notice that in BF3, the basis vectors no longer align with the bonds (as they did in NH3), meaning that you can choose the basis vectors any way you want.  

p.89 — Figure 4.9 could be markedly improved.  One must distinguish between two types of lines (interrupted and continuous), and two types of arrowheads (solid and barbed), making for confuion in the diagrams where they all appear together (and often superimposed).  

Given the orbitals as combinations of two basis vectors, the character of symmetry operation and a basis vector, acquires yet another meaning — how much of the original orbital is left after the symmetry operation. 

p. 91 — A definition of irreducible representations as the ‘simplest’ symmetry behavior.  Simplest is not defined.  Also for the first time it is noted that symmetries can be of orbitals or vibrations.  We already know they can be of the locations of the atoms in a molecule.  

Section 4.8 is extremely confusing.

p. 92 — We now find out that what was going on with a character sum of 2 on p. 81 — The sums  were 2 and 0 because the representations were reducible.  


p. 93 (added 29 Jan ’12) — We later find out (p. 115) that the number of reducible representations of a point group is the number of classes.  The index says that class is defined an ‘equivalent set of operations’ — but how two distinct operations are equivalent is never defined, just illustrated.

p. 100 — Great to have the logic behind the naming of the labels used for irreducible representations (even if they are far from intuitive)

p. 101 — There is no explanation of the difference between basis vector and basis function. 

All in all, a very difficult chapter to untangle.  I’m far from sure I understand from p. 92 – 100.  However, hope lies in future chapters and I’ll push on.  I think it would be very difficult to learn from this book (so far) if you were totally unfamiliar with symmetry.  

Answer to the problem on p. 67.  Let the sides of the cube be of length 1.  The bonds are all the same length, so the carbon must be in the center of the cube.  Any two of the bonds point to the opposite corners of a square of length 1.  Therefore the ends of the bonds are sqrt(2) apart.   Now drop a perpendicular to the middle of this line to get to the carbon in the center.  This has length 1/2.  So we have a right triangle of side 1/2 and ( sqrt(2))/2.  So the answer is 2 * arctan(1.414).  Arctan(1.414 is) 54.731533 degrees giving the angle as 109.46 degrees.

Linear Algebra survival guide for Quantum Mechanics -IX

The heavy lifting is pretty much done.  Now for some fairly spectacular results, and then back to reading Clayden et. al.  To make things concrete, let Y be a 3 dimensional vector with complex coefficients c1, c2 and c3.  The coefficients multiply a set of basis vectors (which exist since all finite and infinite vector spaces have a basis).  The glory of abstraction is that we don’t actually have worry about what the basis vectors actually  are, just that they exist.  We are free to use their properties, one of which is orthogonality (I may not have proved this,  you should if I haven’t). So the column vector is




and the corresponding row vector (the conjugate transpose)  is 

c1*  c2*  c3*

Next, I’m going to write a corresponding hermitian matrix M as follows where Aij is an arbitrary complex number. 

A11  A12   A13

A21  A22  A23

A31  A32  A33

Now form the product

                            A11  A12   A13

                           A21  A22  A23

                           A31  A32  A33

c1*  c2*  c3*      X      Y       Z

The net effect is to form another row vector with 3 components.   All we need for what I want to prove  is an explicit formula for  X

X =  c1*(A11) + c2*(A21) + c3*(A31)

When we  multiply the row vector obtained by the column vector on the right we get

c1 [ c1*(A11) + c2*(A21) + c3*(A31) ] + c2 [ Y ] + c3 [ Z ]  — which by assumption must be a real number 

Next, form the product of M with the column vector 




A11  A12   A13     X’

A21  A22  A23     Y’

A31  A32  A33     Z’

This time all we need is X’  which is c1(A11) + c2(A12) + c3(A13)

When we multiply the column vector obtained by the row vector on the left we get

c1* [  c1(A11) + c2(A12) + c3(A13) ] + c2* Y’ + c3* Z’ — the same number as 

c1 [ c1*(A11) + c2*(A21) + c3*(A31) ] + c2 [ Y ] + c3 [ Z ]

Notice that c1, c2, c3 can each be any of the infinite number of complex numbers, without disturbing the equality. The ONLY way this can happen is if

c1*[c1(A11)] = c1[c1*(A11)]  — this is obviously true

and c1*[c2[A12)] = c1[c2*(A21)]  — something fishy

and c1*[c3[A13)] = c1[c3*(A31)]  ditto

The last two equalities look a bit strange.  If you go back to LASGFQM – II , you will see that c1*(c2) does NOT equal c1(c2*).  However 

c1*(c2)  does  equal [ c1 (c2* ) ]*.  They aren’t the same, but at least they are the complex conjugates of the other. This means that to make

c1*[c2[A12)] = c1[c2*(A21)],      A12 = A21* or  A12* = A21 which is the same thing.

So just by following the postulate of quantum mechanics about the type of linear transformation (called Hermitian) which can result in a measurement, we find that the matrix representing the linear transformation, the Hermitian matrix, has the property that Mij  = Mji*  (the first letter is the row index and the second is the column index).  This also means that the diagonal elements of any Hermitian matrix are real.  Now when I first bumped up against Hermitian matrices they were DEFINED this way, making them seem rather magical.  Hermitian matrices are in fact natural, and they do just what quantum mechanics wants them to do. 

Some more nomenclature:  Mij  = Mji* means that a Hermitian matrix equals its conjugate transpose   (which is another even more obscure way to define them). The conjugate transpose of a matrix is called the adjoint.  This means that the row vector as we’ve defined it is the adjoint of the column vector.  This  also  is why Hermitian matrices are called self-adjoint.   

That’s about it. Hopefully when you see this stuff in the future, you won’t be just mumbling incantations.   But perhaps you are wondering, where are the eigenvectors, where are the eigenvalues in all this?  What happened to the Schrodinger equation beloved in song and story?   That’s for the course you’re taking, but briefly and without explanation, the basis vectors I’ve been talking about (without explictly describing them) all result as follows:

Any Hermitian operator times wavefunction = some number times same wavefunction.  [1]

Several points:  many Hermitian operators change one wave function into another, so [ 1 ] doesn’t always hold.

IF [1] does hold the wavefunction is called an eigenfunction, and  ‘some number’ is the eigenvalue.  

There is usually a set of eigenfunctions for a given Hermitian operator — these are the basis functions (basis vectors of the infinite dimensional Hilbert space) of the vector space I was describing.  You find them by finding solutions of the Schrodinger equation H Psi = E Psi, but that’s for your course, but at least now you know the lingo.   Hopefully, these last few words are  less frustrating than the way Tom Wolfe ended “The Bonfire of the Vanities” years ago — the book just stopped rather than ended.  

I thought the course I audited was excellent, but we never even got into bonding.  Nonetheless, I think the base it gave was quite solid and it’s time to find out.  Michelle Francl recommended “Modern Quantum Chemistry” by Atilla (yes Atilla ! ) Szabo and Neil Ostlund as the next step.  You can’t beat the price as it’s a Dover paperback.  I’ve taken a brief look at ‘”Molecular Quantum Mechanics” by Atkins and Friedman — it starts with the postulates and moves on from there.  Plenty of pictures and diagrams, but no idea how good it is.  Finally, 40 years ago I lived across the street from a Physics grad student (whose name I can’t recall), and the real hot stuff back then was a book by Prugovecki called “Quantum Mechanics in Hilbert Space”.  Being a pack rat, I still have it. We’ll see. 

One further point.  I sort of dumped on Giancoli”s book on Physics, which I bought when the course was starting up 9/09 — pretty pictures and all that.  Having been through the first 300 pages or so (all on mechanics), I must say it’s damn good.  The pictures are appropriate, the diagrams well thought out, the exposition clear and user friendly without being sappy. 

Time to delve.  

Amen Selah

Linear Algebra survival guide for Quantum Mechanics – VIII

Quantum mechanics has never made an incorrect prediction.  What does it predict? Numbers basically, and real numbers at that.  When you read a dial, or measure an energy in a spectrum you get a (real) number.  Imaginary currents exist, but I don’t know if you can measure them (I”ll ask the EE who just married into the family this weekend).   So couple the real number output of a measurement with the postulate of quantum that tells you how to get them and out pop Hermitian matrices.  

A variety of equivalent postulate systems for QM exist  (Atkins uses 5, our instructor used 4).  All of them say that the state of the system is described by a wavefunction  (which we’re going to think of as a vector, since we’re in linear algebra land).  In LASGFQM – V the  equivalence of the integral of a function and a vector in infinite dimensional space was explained.  LASGFOM – VII explained why every linear transformation could be represented by a matrix, and why every matrix represents a linear transformation.  

An operator is just a linear transformation of a vector space to itself.  This means that if we’re dealing with a finite dimensional vector space, the matrix representing the operator will be square.   Recalling the rules for matrix multiplication (LASGFQM – IV), this means that you can do things like this 

            x  x  x

            x  x  x

            x  x  x

y  y  y               giving  the row vector  xy  xy  xy

 and things like this 




            x  x  x          giving the  column vector  xz

            x  x  x                                                           xz

            x  x  x                                                           xz

Of course way back at the beginning it was explained why the inner product of a V vector with itself, had to make one the complex conjugate (V*) of the other (so the the inner product of a vector with itself was a real number), and in LASGFQM  – VI  it was explained why multiplying a row vector by a column vector gives a number . Here it is




y  y  y      yz

So given that < V | V > really means < V* | V > to physicists, the inner product can be regarded as just another form of matrix multiplication, with the row vector being the conjugate transpose of the column vector.    

If you reverse the order of multiplication (column vector first, row vector second), you get an n x n matrix, not a number.   It should be pretty clear by now that you can multiply all 3 matrices together (row vector, n x n matrix, column vector) as long as you keep the order correct.  After all this huffing an puffing, you wind up with — drum roll — a number, which is complex because the vectors of quantum mechanics have complex coefficients (another one of the postulates). 

We’re at a fairly high level of abstraction here.  We haven’t chosen a basis, but all vector spaces have one (even infinite vector spaces).   We’ll talk about them in the next (and probably final) post.

Call the column vector Y, the row vector X, and the matrix M.  We have Y M X = some number.  It should be clear that it doesn’t matter which two matrices we multiply together first e.g. (Y M) X = Y (M X).

Recall that differentiation and integration are linear operators, so they can be represented by matrices.  The wavefunction is represented by a column vector.  Various things you want to know (kinetic energy, position) are represented by linear operators in QM.  

Here’s the postulate: For a given wavefunction Y,  any measurement on it (given by a linear operator M ) is always a REAL number  and is given by  the

conjugate transpose of Y  times  M times Y (the column vector).   

You have to accept the postulate (because it works ! ! !)  as the QM instructor  said many times.   Don’t ask how it can be like that (Feynman).   

This postulate is all that it takes to make the linear transformation M a very special one — e.g. a Hermitian matrix, with all sorts of interesting properties. Hermite described these matrices in 1855, long before QM.  I’ve tried to find out what he was working on without success.  More about the properties of Hermitian matrices next time, but to whet your appetite, if an element of M is written  Mij, where i is the row and j is the column, and Mij is a complex number, then Mji is the complex conjugate of Mij.  Believe it or not, this all follows from the postulate.

Linear Algebra survival guide for Quantum Mechanics – VII

In linear algebra all the world’s a matrix (even vectors). Everyone (except me in the last post) numbers matrix elements by the following subscript convention — the row always comes first, then the columns (mnemonic Roman Catholic).  Similarly matrix size is always written  a x b where a is the number of rows and b the number of columns.  Vectors in quantum mechanics are written both ways, as column vectors  1 x n, or as row vectors (n x 1).

Vectors aren’t usually called matrices, but matrices they are when it comes to multiplication. Vectors can be multiplied by a matrix (or multiply a matrix) using the usual matrix multiplication rules.  That’s one reason the example in LASGFQM – VI was so tedious — I wanted to show how matrices of different sizes could be multiplied together.  The order of the matrices is crucial.  The first matrix A must have the same number of columns  that the second matrix (B) has rows — otherwise it just doesn’t work.  The product matrix has the number of rows of matrix A and the columns of matrix B.  

So  it is possible to form  A B where A is 3 x 4 and B is 4 x 5 giving a 3 x 5 matrix, but B A makes no sense.  If you get stuck use the Hubbard method of writing them out (see the last post).  Here is a 3 x 3 matrix (A) multiplying a 3 x 1 matrix (vector B)




A11 A12 A13     A11*B11 + A12 B21 + A13 * B31  — this is a single number

A21 A22 A23    A21*B11 + A22*B21 + A23* B31 — ditto

A31 A32 A33   etc.

AB is just another 3 x 1 vector.  So the matrix just transforms one 3 dimensional vector into another

You should draw a similar diagram and see why B A is impossible.  What about

C  (3 x 1) times D (3 x 3)?  You get CD a 3 x 1 matrix (row vector) back .

                          D11 D12 D13

                         D21 D22 D23

                         D31 D32  D33

C11 C12 C13                                    What is CD12?

Suppose we get concrete and make B into a column vector of the following type




A11 A12 A13     A11

A21 A22 A23    A21

A31 A32 A33    A31

The first time I saw this, I didn’t understand it.   I thought  the mathematicians were going back to the old Cartesian system of standard orthonormal vectors.  They weren’t doing this at all.  Recall that we’re in a vector space and the column vector is really the 3 coefficients multiplying  the 3 basis vectors (which are not specified).  So you don’t have to mess around with choosing a basis, the result is true for ALL bases of a 3 dimensional vector space.  The power of abstraction.  The first column of A shows what the first basis vector goes to (in general), the second column shows what the second basis goes to.  Back in LSQFQM – IV, it was explained why any linear transformation (call it T) of a basis vector (call it C1) to another vector space must look like this

T(C1) =  t11 * D1 + t12 * D2 + . ..   for however many basis vectors vector space D has.

 Well, in the above example we’re going from a 3 dimensional vector space to another, and the first row of matrix A tells us what basis vector #1 is going to.  This is why every linear transformation can be represented by a matrix and every matrix represents a linear transformation.  Sometimes abstraction saves a lot of legwork.  

A more geometric way to look at all this is to regard an  n x n matrix multiplying an n x 1 vector as moving it around in n dimensional space (keeping one end fixed at the origin — see below).  So 

1  0  0 

0  1  0 

0  0  2

just multiplies the third basis vector by 2 leaving the other two alone.  

The notation is consistent. Recall that any linear transformation must leave the zero vector unchanged (see LSQFQM – I for a proof).  Given the rules for multiplying a matrix times a vector, this happens with a column vector which is all zeros.

The geometrically inclined can start thinking about what the possible linear transformations can do to three dimensional space (leaving the origin fixed).  Rotations about the origin are one possibility, expansion or contraction along a single basis vector are two more, projections down to a 2 dimensional plane or a 1 dimensional line are two more.  There are others (particularly when we’re in a vector space with complex numbers for coefficients — e.g. all of quantum mechanics). 

Up next time, eigenvectors, adjoints, and (hopefully) Hermitian operators.  That will be about it.  The point of these posts (which are far more extensive than I thought they would be when I started out) is to show you how natural the language of linear algebra is, once you see what’s going on under the hood.  It is not to teach quantum mechanics, which I’m still learning to see how it is used in chemistry.  QM is far from natural (although it describes the submicroscopic world — whether it can ever describe the world we live in is another question), but, if these posts are any good at all, you should be able to understand the language in which QM is expressed.

Linear Algebra survival guide for Quantum Mechanics – VI

Why is linear algebra like real estate?   Well, in linear algebra the 3 most important things are notation, notation, notation.  I’ve shown how two sequential linear transformations can be melded into one, but you’ve seen nothing about the matrix representation of a linear transformation.  

Here’s the playing field from LASGFQM – IV again.  There are 3 vector spaces, A, B and C of dimensions 3, 4, and 5, with bases {A1, A2, A3}, {B1, B2, B3, B4} and {C1, C2, C3, C4, C5}.  Then there is linear transformation T which transforms A into B, and linear tranformation S which transforms B into C.

We have T(A1) = AB11 * B1 + AB12 * B2 + AB13 *B3 + AB14*B4

S(B1) = BC11 *C1 + BC12 *C2 + BC13 *C3 + BC14 * C4 + BC15 * C5
S(B2) = BC21 *C1 + BC22 *C2 + BC23 *C3 + BC24 * C4 + BC25 * C5
S(B3) = BC31 *C1 + BC32 *C2 + BC33 *C3 + BC34 * C4 + BC35 * C5
S(B4) = BC41 *C1 + BC42 *C2 + BC43 *C3 + BC44 * C4 + BC45 * C5

To see the symmetry of what is going on you may have to make the print size smaller so the equations don’t slop over the linebreak. 

So after some heavy lifting we eventually arrived at: 

T(A1) = AB11 * ( BC11 * C1  +  BC12 * C2  +  BC13 * C3   +   BC14 * C4   +   BC15 * C5 ) +

                AB12 * ( BC21 * C1  +  BC22 * C2  +  BC23 * C3   +   BC24 * C4   +   BC25 * C5 ) +

                AB13 * ( BC31 * C1  +  BC32 * C2  +  BC33 * C3   +   BC34 * C4   +   BC35 * C5 ) +

                 AB14 * ( BC41 * C1  +  BC42 * C2  +  BC43 * C3   +   BC44 * C4   +   BC45 * C5 )

So that 

A1 = (AB11 * BC11 + AB12 * BC21 + AB13 * BC31 + AB14 * BC41) C1  +  

         (AB11 * BC12 + AB12 *BC22 + AB13 * BC32 + AB14 * BC42)  C2 + 

   etc. etc. 

All very open and above board, and obtained  just by plugging the B”s in terms of the C’s into the A’s in terms of the B’s to get the A’s in terms of the C’s.  

Notice that what we could call AC11 is just AB11 * BC11 + AB12 * BC21 + AB13 * BC31 + AB14 * BC41 and AC12 is just AB11 * BC12 + AB12 *BC22 + AB13 * BC32 + AB14 * BC42.  We need another 13 such sums to be able to express a vector in A (which is a unique linear combination of A1, A2, A3 because the three of them are a basis) in terms of the 5 C’ basis vectors.  It’s dreary but it can be done, and you just saw part of it.  

You don’t want to figure this out all the time.  So represent T as a rectangular array with 4 rows and 3 columns

AB11   AB21  AB31
AB12   AB22  AB32
AB13   AB23  AB33
AB14   AB24  AB34

Represent S as a rectangular array with 5 rows and 4 columns 

BC11   BC21   BC31  BC41
BC12   BC22  BC32  BC42
BC13   BC23  BC33  BC43
BC14   BC24  BC34  BC44
BC15   BC25  BC34  BC45

Now plunk the array of AB’s on top of (and to the right) of the array of BC’s

                                                 AB11   AB21  AB31
                                                AB12   AB22  AB32
                                                AB13   AB23  AB33
                                                AB14   AB24  AB34
BC11   BC21   BC31  BC41  AC11
BC12   BC22  BC32  BC42
BC13   BC23  BC33  BC43
BC14   BC24  BC34  BC44
BC15   BC25  BC34  BC45

Recall that (after much tedious algebra) we obtained that

AC11 was just AB11 * BC11 + AB12 * BC21 + AB13 * BC31 + AB14 * BC41

But AC11 is just the as if the first row of the BC array was a vector and the first column of the AB array was also a vector and you formed the dot product.  Well they are and you did just that to find element AC11 of the array representing the linear transformation from A to C.  Do this 14 more times to get all 15 possible combinations of 3 As and 5 Cs and you get an array of numbers with 5 rows and 3 columns.  This is the AC matrix and this is why matrix multiplication is the way it is.

Note: we have multiplied a 5 row times 4 column array by a 4 row 3 column array.  Recall that you can only form the inner product of vectors with the same numbers of components (e.g. they have to be in vector spaces of the same dimension).  

We have T: A to B (dimension 3 to dimension 4)

                 S: B to C (dimension 4 to dimension 5)

     This is written as ST (convention has it that the transformation on the right is always done first — this takes some getting used to, but at least everyone follows it, so it’s like medical school — the appendix is on the right, just remember it).   Notice that  TS makes absolutely no sense.   S takes you to a vector space of dimension 5, then T tries to start with a different vector space.   This is why when multiplying arrays (matrices) the number of rows of the matrix on the left must match the number of columns of the matrix on the right (or the top as I’ve drawn it — thanks to John and Barbara Hubbard and their great book on Vector Calculus).  If the two matrices are rectangular (as we have here), only one way of  multiplication is possible.  

More notation, and an apology.  Matrix T is a 4 row by 3 column matrix — this is always written as a 4 x 3 matrix.  Similarly for the coefficients of each element which I have in some way screwed up (but at least I did so consistently).  Invariably the matrix element (just a number) in the 3rd column of the fourth row is written element43 — If you look at what I’ve written everything is bassackwards.  Sorry, but the principles are correct. The mnemonic for the order of the coefficients is Roman Catholic (row column), a nonscatological mnemonic for once. 

That’s a lot of tedium, but it does explain why matrix multiplication is the way it is.  Notice a few other things.  The matrices you saw were 4 x 3 and 5 x 4, but 3 x 1 matrices are possible as well.  Such matrices are called column vectors.  Similarly 1 x 3 matrices exist and are called row vectors.  So what do you get if you multiply a 1 x 3 vector by a 3 x 1 vector?  

You get a 1 x 1 vector or a number.  This is another way to look at the inner product of two vectors.  Usually vectors are written as column vectors ( n x 1 ) with n rows and 1 columns.  1 x n row vectors are known as the transpose of the column vector. 

That’s plenty for now.  Hopefully the next post will be more interesting.  However, physics needs to calculate things and see if the numbers they get match up with experiment.  This means that they must choose a basis for each vector space, and express each vector as an array of coefficients of that basis.  Mathematicians avoid this where possible, just using the properties of vector space bases to reason about linear transformations, and the properties of various linear transformations to reason about bases.  You’ll see the power of this sort of thinking in the next post.  If you ever study differentiable manifolds you’ll see it in spades.