Category Archives: Linear Algebra Survival Guide for Quantum Mechanics

An old year’s resolution

One of the things I thought I was going to do in 2012 was learn about relativity.   For why see http://luysii.wordpress.com/2012/09/11/why-math-is-hard-for-me-and-organic-chemistry-is-easy/.  Also my cousin’s new husband wrote a paper on a new way of looking at it.  I’ve been putting him off as I thought I should know the old way first.

I knew that general relativity involved lots of math such as manifolds and the curvature of space-time.  So rather than read verbal explanations, I thought I’d learn the math first.  I started reading John M. Lee’s two books on manifolds.  The first involves topological manifolds, the second involves manifolds with extra structure (smoothness) permitting calculus to be done on them.  Distance is not a topological concept, but is absolutely required for calculus — that’s what the smoothness is about.

I started with “Introduction to Topological Manifolds” (2nd. Edition) by John M. Lee.  I’ve got about 34 pages of notes on the first 95 pages (25% of the text), and made a list of the definitions I thought worth writing down — there are 170 of them. Eventually I got through a third of its 380 pages of text.  I thought that might be enough to help me read his second book “Introduction to Smooth Manifolds” but I only got through 100 of its 600 pages before I could see that I really needed to go back and completely go through the first book.

This seemed endless, and would probably take 2 more years.  This shouldn’t be taken as a criticism of Lee — his writing is clear as a bell.  One of the few criticisms of his books is that they are so clear, you think you understand what you are reading when you don’t.

So what to do?  A prof at one of the local colleges, James J. Callahan, wrote a book called “The Geometry of Spacetime” which concerns special and general relativity.  I asked if I could audit the course on it he’d been teaching there for decades.  Unfortunately he said “been there, done that” and had no plans ever to teach the course again.

Well, for the last month or so, I’ve been going through his book.  It’s excellent, with lots of diagrams and pictures, and wide margins for taking notes.  A symbol table would have been helpful, as would answers to the excellent (and fairly difficult) problems.

This also explains why there have been no posts in the past month.

The good news is that the only math you need for special relativity is calculus and linear algebra.  Really nothing more.  No manifolds.  At the end of the first third of the book (about 145 pages) you will have a clear understanding of

l. time dilation — why time slows down for moving objects

2. length contraction — why moving objects shrink

3. why two observers looking at the same event can see it happening at different times.

4. the Michelson Morley experiment — but the explanation of it in the Feynman lectures on physics 15-3, 15-4 is much better

5. The Kludge Lorentz used to make Maxwell’s equations obey the Galilean principle of relativity (e.g. Newton’s first law)

6. How Einstein derived Lorentz’s kludge purely by assuming the velocity of light was constant for all observers, never mind how they were moving relative to each other.  Reading how he did it, is like watching a master sculptor at work.

Well, I’ll never get through the rest of Callahan by the end of 2012, but I can see doing it in a few more months.  You could conceivably learn linear algebra by reading his book, but it would be tough.  I’ve written some fairly simplistic background linear algebra for quantum mechanics posts — you might find them useful. https://luysii.wordpress.com/category/linear-algebra-survival-guide-for-quantum-mechanics/

One of the nicest things was seeing clearly what it means for different matrices to represent the same transformation, and why you should care.  I’d seen this many times in linear algebra, but seeing how simple reflection through an arbitrary line through the origin can be when you (1) rotate the line to the x axis by tan(y/x) radians (2) change the y coordinate to – y  – by an incredibly simple matrix  (3) rotate it back to the original angle .

That’s why any two n x n matrices X and Y represent the same linear transformation if they are related by the invertible matrix Z in the following way  X = Z^-1 * Y * Z

Merry Christmas and Happy New Year (none of that Happy Holidays crap for me)

Willock pp. 51 – 104

This is a continuation of my notes, as I read  Molecular Symmetry” by David J. Willock.  As you’ll see, things aren’t going particularly well.  Examples of concepts are great once they’ve been defined, but in this book it’s examples first, definitions later (if ever).

p. 51 — Note all the heavy lifting  required to produce an object with only (italics) C4 symmetry (figure 3.6)  First,  you need 4 objects in a plane (so they rotate into each other), separated by 90 degrees.  That’s far from enough objects as there are multiple planes of symmetry for 4 objects in a plane (I count 5 — how many do you get?)  So you need another 4 objects in a plane parallel to the first.  These objects must be a different distance from the symmetry axis, otherwise the object will have A C2 axis of symmetry, midway between the two planes. Lastly no object in the second plane can lie on a line parallel to the axis of symmetry which contains an object in the first plane — e.g. the two groups of 4 must be staggered relative to each other.    It’s even more complicated for S4 symmetry.  

p. 51 — The term classes of operation really hasn’t been defined (except by example).   Also this is the first example of (the heading of) a character table — which hasn’t been defined at this point.

p. 52 — Note H2O2 has C2 symmetry because it is not (italics) planar.   Ditto for 1,2 (S, S) dimethyl cyclopropane (more importantly this is true for disulfide bonds between cysteines forming cystines — a way of tying parts of proteins to each other. 

p. 55 — Pay attention to the nomenclature: Cnh means that an axis of degree n is present along with a horizontal plane of symmetry.  Cnv means that, instead, a vertical plane of symmetry is present (along with the Cn axis)

p. 57 — Make sure you understain why C4h doesn’t  have vertical planes of symmetry.

p. 59 — A bizarre pedagogical device — defining groups whose first letter is D by something they are not (italics) — which itself (cubic groups) is at present undefined.  

Willock then regroups by defining what Dn actually is.

It’s a good exercise to try to construct the D4 point group yourself. 

p. 61 — “It does form a subgroup” — If subgroup was ever defined, I missed it.  Subgroup is not in the index (neither is group !).  Point group is in the index, and point subgroup is as well appearing on p. 47 — but point subgroup isn’t defined there.  

p. 62 — Note the convention — the Z direction is perpendicular to the plane of a planar molecule.

p. 64 — Why are linear molecules called Cinfinity ? — because any rotation around the axis of symmetry (the molecule itself) leaves the molecule unchanged, and there are an infinity of such rotations.

p. 67 — Ah,  the tetrahedron embedded in a cube — exactly the way an organic chemist should think of the sp3 carbon bonds.  Here’s a mathematical problem for you.  Let the cube have sides of 1, the bonds as shown in figure 3.27, the carbon in the very center of the cube — now derive the classic tetrahedral bond angle — answer at the end of this post. 

p. 67 — 74 — The discussions of symmetries in various molecules is exactly why you should have the conventions for naming them down pat.  

p. 75 — in the second paragraph affect should be effect (at least in American English)

p. 76 — “Based on the atom positions alone we cannot tell the difference between the C2 rotation and the sigma(v) reflection, because either operation swaps the positions of the hydrogen atoms.”   Do we ever want to actually do this (for water that is)? Hopefully this will turn out to be chemically relevant. 

p. 77 — Note that the definition of character refers to the effect of a symmetry operation on one of an atom’s orbitals (not it’s position).  Does this only affect atoms whose position is not (italics) changed by the symmetry operation?  Very important to note that the character is -1 only on reversal of the orbital — later on, non-integer characters will be seen.  Note also that each symmetry operation produces a character (number) for each orbital, so there are (number of symmetry operations) * (number of orbital) characters in a character table

p. 77 – 78 — Note that the naming of the orbitals is consistent with what has gone on before.  p(z) is in the plane of the molecule because that’s where the axis of rotation is.

Labels are introduced for each of the possible standard sets of characters (but standard set really isn’t defined).  A standard set (of sets of characters??) is an irreducible representation for the group.  

Is one set of characters an irreducible representation by itself or is it a bunch of them? The index claims that this is the definition of irreducible represenation, but given the amiguity about what a standard set of characters actually is (italics) we don’t really know what an irreducible representation actually is.   This is definition by example, a pedagogical device foreign to math, but possibly a good pedagogical device — we’ll see.  But at this point, I’m not really clear what an irreducible represenation actually is.

p. 77 — In a future edition, it would be a good idea to lable the x, y and z axes (and even perhaps draw in the px, py and pz orbitals), and, if possible, put figure 4.2 on the same page as table 4.2.  Eventually things get figured out but it takes a lot of page flipping. 

p. 79 — Further tightening of the definition of a representation — it’s one row of a character table.

p. 79 — Nice explanation of orbital phases, but do electrons in atoms know or care about them?

p. 80 — Note that in the x-y axes are rotated 90 degrees in going from figure 4.4a to figure 4.4b  (why?).   Why talk about d orbitals? — they’re empty in H20 but possibly not in other molecules with C2v symmetry.  

p. 80 — Affect should be effect (at least in American English)

p. 81 — B1 x B2 = A2 doesn’t look like a sum to me.  If you actually summed them you’d get 2 for E, -2 for C2, and 0 for the other two.  It does look like the product though.

pp. 81 – 82 — Far from sure what is going on in section 4.3

p.82 — Table 4.4b does look like multiplication of the elements of B1 by itself. 

p. 82 — Not sure when basis vectors first made their appearance, possibly here.  I slid over this on first reading since basis vectors were quite familiar to me from linear algebra (see the category http://luysii.wordpress.com/category/linear-algebra-survival-guide-for-quantum-mechanics/ ).  But again, the term is used here without really being defined.  Probably not to confuse, the first basis vectors shown first are at 90 degrees to each other (x and y), but later on (p. 85 they don’t have to be — the basis 0vectors point along the 3 hydrogens of ammonia).

p. 83 — Very nice way to bring in matrices, but it’s worth nothing that each matrix stands for just one symmetry operation.  But each matrix lets you see what happens to all (italics) the basis vectors you’ve chosen. 

p. 84 — Get very clear in your mind that when you see an expression of the form

symmetry_operation1 symmetry_operation2 

juxtaposed to each other — that you do symmetry_operation2  FIRST.

p. 87  – Notice that the term character is acquiring a second meaning here — it no longer is the effect of a symmetry operation on one of an atom’s orbitals (not the atom’s position), it’s the effect of a symmetry operation on a whole set of basis elements.

p. 88 — Notice that in BF3, the basis vectors no longer align with the bonds (as they did in NH3), meaning that you can choose the basis vectors any way you want.  

p.89 — Figure 4.9 could be markedly improved.  One must distinguish between two types of lines (interrupted and continuous), and two types of arrowheads (solid and barbed), making for confuion in the diagrams where they all appear together (and often superimposed).  

Given the orbitals as combinations of two basis vectors, the character of symmetry operation and a basis vector, acquires yet another meaning — how much of the original orbital is left after the symmetry operation. 

p. 91 — A definition of irreducible representations as the ‘simplest’ symmetry behavior.  Simplest is not defined.  Also for the first time it is noted that symmetries can be of orbitals or vibrations.  We already know they can be of the locations of the atoms in a molecule.  

Section 4.8 is extremely confusing.

p. 92 — We now find out that what was going on with a character sum of 2 on p. 81 — The sums  were 2 and 0 because the representations were reducible.  

 

p. 93 (added 29 Jan ’12) — We later find out (p. 115) that the number of reducible representations of a point group is the number of classes.  The index says that class is defined an ‘equivalent set of operations’ — but how two distinct operations are equivalent is never defined, just illustrated.

p. 100 — Great to have the logic behind the naming of the labels used for irreducible representations (even if they are far from intuitive)

p. 101 — There is no explanation of the difference between basis vector and basis function. 

All in all, a very difficult chapter to untangle.  I’m far from sure I understand from p. 92 – 100.  However, hope lies in future chapters and I’ll push on.  I think it would be very difficult to learn from this book (so far) if you were totally unfamiliar with symmetry.  

Answer to the problem on p. 67.  Let the sides of the cube be of length 1.  The bonds are all the same length, so the carbon must be in the center of the cube.  Any two of the bonds point to the opposite corners of a square of length 1.  Therefore the ends of the bonds are sqrt(2) apart.   Now drop a perpendicular to the middle of this line to get to the carbon in the center.  This has length 1/2.  So we have a right triangle of side 1/2 and ( sqrt(2))/2.  So the answer is 2 * arctan(1.414).  Arctan(1.414 is) 54.731533 degrees giving the angle as 109.46 degrees.

Linear Algebra survival guide for Quantum Mechanics -IX

The heavy lifting is pretty much done.  Now for some fairly spectacular results, and then back to reading Clayden et. al.  To make things concrete, let Y be a 3 dimensional vector with complex coefficients c1, c2 and c3.  The coefficients multiply a set of basis vectors (which exist since all finite and infinite vector spaces have a basis).  The glory of abstraction is that we don’t actually have worry about what the basis vectors actually  are, just that they exist.  We are free to use their properties, one of which is orthogonality (I may not have proved this,  you should if I haven’t). So the column vector is

c1

c2

c3

and the corresponding row vector (the conjugate transpose)  is 

c1*  c2*  c3*

Next, I’m going to write a corresponding hermitian matrix M as follows where Aij is an arbitrary complex number. 

A11  A12   A13

A21  A22  A23

A31  A32  A33

Now form the product

                            A11  A12   A13

                           A21  A22  A23

                           A31  A32  A33

c1*  c2*  c3*      X      Y       Z

The net effect is to form another row vector with 3 components.   All we need for what I want to prove  is an explicit formula for  X

X =  c1*(A11) + c2*(A21) + c3*(A31)

When we  multiply the row vector obtained by the column vector on the right we get

c1 [ c1*(A11) + c2*(A21) + c3*(A31) ] + c2 [ Y ] + c3 [ Z ]  – which by assumption must be a real number 

Next, form the product of M with the column vector 

                               c1

                               c2 

                               c3

A11  A12   A13     X’

A21  A22  A23     Y’

A31  A32  A33     Z’

This time all we need is X’  which is c1(A11) + c2(A12) + c3(A13)

When we multiply the column vector obtained by the row vector on the left we get

c1* [  c1(A11) + c2(A12) + c3(A13) ] + c2* Y’ + c3* Z’ — the same number as 

c1 [ c1*(A11) + c2*(A21) + c3*(A31) ] + c2 [ Y ] + c3 [ Z ]

Notice that c1, c2, c3 can each be any of the infinite number of complex numbers, without disturbing the equality. The ONLY way this can happen is if

c1*[c1(A11)] = c1[c1*(A11)]  – this is obviously true

and c1*[c2[A12)] = c1[c2*(A21)]  – something fishy

and c1*[c3[A13)] = c1[c3*(A31)]  ditto

The last two equalities look a bit strange.  If you go back to LASGFQM – II , you will see that c1*(c2) does NOT equal c1(c2*).  However 

c1*(c2)  does  equal [ c1 (c2* ) ]*.  They aren’t the same, but at least they are the complex conjugates of the other. This means that to make

c1*[c2[A12)] = c1[c2*(A21)],      A12 = A21* or  A12* = A21 which is the same thing.

So just by following the postulate of quantum mechanics about the type of linear transformation (called Hermitian) which can result in a measurement, we find that the matrix representing the linear transformation, the Hermitian matrix, has the property that Mij  = Mji*  (the first letter is the row index and the second is the column index).  This also means that the diagonal elements of any Hermitian matrix are real.  Now when I first bumped up against Hermitian matrices they were DEFINED this way, making them seem rather magical.  Hermitian matrices are in fact natural, and they do just what quantum mechanics wants them to do. 

Some more nomenclature:  Mij  = Mji* means that a Hermitian matrix equals its conjugate transpose   (which is another even more obscure way to define them). The conjugate transpose of a matrix is called the adjoint.  This means that the row vector as we’ve defined it is the adjoint of the column vector.  This  also  is why Hermitian matrices are called self-adjoint.   

That’s about it. Hopefully when you see this stuff in the future, you won’t be just mumbling incantations.   But perhaps you are wondering, where are the eigenvectors, where are the eigenvalues in all this?  What happened to the Schrodinger equation beloved in song and story?   That’s for the course you’re taking, but briefly and without explanation, the basis vectors I’ve been talking about (without explictly describing them) all result as follows:

Any Hermitian operator times wavefunction = some number times same wavefunction.  [1]

Several points:  many Hermitian operators change one wave function into another, so [ 1 ] doesn’t always hold.

IF [1] does hold the wavefunction is called an eigenfunction, and  ‘some number’ is the eigenvalue.  

There is usually a set of eigenfunctions for a given Hermitian operator — these are the basis functions (basis vectors of the infinite dimensional Hilbert space) of the vector space I was describing.  You find them by finding solutions of the Schrodinger equation H Psi = E Psi, but that’s for your course, but at least now you know the lingo.   Hopefully, these last few words are  less frustrating than the way Tom Wolfe ended “The Bonfire of the Vanities” years ago — the book just stopped rather than ended.  

I thought the course I audited was excellent, but we never even got into bonding.  Nonetheless, I think the base it gave was quite solid and it’s time to find out.  Michelle Francl recommended “Modern Quantum Chemistry” by Atilla (yes Atilla ! ) Szabo and Neil Ostlund as the next step.  You can’t beat the price as it’s a Dover paperback.  I’ve taken a brief look at ‘”Molecular Quantum Mechanics” by Atkins and Friedman — it starts with the postulates and moves on from there.  Plenty of pictures and diagrams, but no idea how good it is.  Finally, 40 years ago I lived across the street from a Physics grad student (whose name I can’t recall), and the real hot stuff back then was a book by Prugovecki called “Quantum Mechanics in Hilbert Space”.  Being a pack rat, I still have it. We’ll see. 

One further point.  I sort of dumped on Giancoli”s book on Physics, which I bought when the course was starting up 9/09 — pretty pictures and all that.  Having been through the first 300 pages or so (all on mechanics), I must say it’s damn good.  The pictures are appropriate, the diagrams well thought out, the exposition clear and user friendly without being sappy. 

Time to delve.  

Amen Selah

Linear Algebra survival guide for Quantum Mechanics – VIII

Quantum mechanics has never made an incorrect prediction.  What does it predict? Numbers basically, and real numbers at that.  When you read a dial, or measure an energy in a spectrum you get a (real) number.  Imaginary currents exist, but I don’t know if you can measure them (I”ll ask the EE who just married into the family this weekend).   So couple the real number output of a measurement with the postulate of quantum that tells you how to get them and out pop Hermitian matrices.  

A variety of equivalent postulate systems for QM exist  (Atkins uses 5, our instructor used 4).  All of them say that the state of the system is described by a wavefunction  (which we’re going to think of as a vector, since we’re in linear algebra land).  In LASGFQM – V the  equivalence of the integral of a function and a vector in infinite dimensional space was explained.  LASGFOM – VII explained why every linear transformation could be represented by a matrix, and why every matrix represents a linear transformation.  

An operator is just a linear transformation of a vector space to itself.  This means that if we’re dealing with a finite dimensional vector space, the matrix representing the operator will be square.   Recalling the rules for matrix multiplication (LASGFQM – IV), this means that you can do things like this 

            x  x  x

            x  x  x

            x  x  x

y  y  y               giving  the row vector  xy  xy  xy

 and things like this 

                          z

                          z

                          z

            x  x  x          giving the  column vector  xz

            x  x  x                                                           xz

            x  x  x                                                           xz

Of course way back at the beginning it was explained why the inner product of a V vector with itself, had to make one the complex conjugate (V*) of the other (so the the inner product of a vector with itself was a real number), and in LASGFQM  - VI  it was explained why multiplying a row vector by a column vector gives a number . Here it is

                 z

                 z

                  z

y  y  y      yz

So given that < V | V > really means < V* | V > to physicists, the inner product can be regarded as just another form of matrix multiplication, with the row vector being the conjugate transpose of the column vector.    

If you reverse the order of multiplication (column vector first, row vector second), you get an n x n matrix, not a number.   It should be pretty clear by now that you can multiply all 3 matrices together (row vector, n x n matrix, column vector) as long as you keep the order correct.  After all this huffing an puffing, you wind up with — drum roll — a number, which is complex because the vectors of quantum mechanics have complex coefficients (another one of the postulates). 

We’re at a fairly high level of abstraction here.  We haven’t chosen a basis, but all vector spaces have one (even infinite vector spaces).   We’ll talk about them in the next (and probably final) post.

Call the column vector Y, the row vector X, and the matrix M.  We have Y M X = some number.  It should be clear that it doesn’t matter which two matrices we multiply together first e.g. (Y M) X = Y (M X).

Recall that differentiation and integration are linear operators, so they can be represented by matrices.  The wavefunction is represented by a column vector.  Various things you want to know (kinetic energy, position) are represented by linear operators in QM.  

Here’s the postulate: For a given wavefunction Y,  any measurement on it (given by a linear operator M ) is always a REAL number  and is given by  the

conjugate transpose of Y  times  M times Y (the column vector).   

You have to accept the postulate (because it works ! ! !)  as the QM instructor  said many times.   Don’t ask how it can be like that (Feynman).   

This postulate is all that it takes to make the linear transformation M a very special one — e.g. a Hermitian matrix, with all sorts of interesting properties. Hermite described these matrices in 1855, long before QM.  I’ve tried to find out what he was working on without success.  More about the properties of Hermitian matrices next time, but to whet your appetite, if an element of M is written  Mij, where i is the row and j is the column, and Mij is a complex number, then Mji is the complex conjugate of Mij.  Believe it or not, this all follows from the postulate.

Linear Algebra survival guide for Quantum Mechanics – VII

In linear algebra all the world’s a matrix (even vectors). Everyone (except me in the last post) numbers matrix elements by the following subscript convention — the row always comes first, then the columns (mnemonic Roman Catholic).  Similarly matrix size is always written  a x b where a is the number of rows and b the number of columns.  Vectors in quantum mechanics are written both ways, as column vectors  1 x n, or as row vectors (n x 1).

Vectors aren’t usually called matrices, but matrices they are when it comes to multiplication. Vectors can be multiplied by a matrix (or multiply a matrix) using the usual matrix multiplication rules.  That’s one reason the example in LASGFQM – VI was so tedious — I wanted to show how matrices of different sizes could be multiplied together.  The order of the matrices is crucial.  The first matrix A must have the same number of columns  that the second matrix (B) has rows — otherwise it just doesn’t work.  The product matrix has the number of rows of matrix A and the columns of matrix B.  

So  it is possible to form  A B where A is 3 x 4 and B is 4 x 5 giving a 3 x 5 matrix, but B A makes no sense.  If you get stuck use the Hubbard method of writing them out (see the last post).  Here is a 3 x 3 matrix (A) multiplying a 3 x 1 matrix (vector B)

                            B11

                            B21

                            B31

A11 A12 A13     A11*B11 + A12 B21 + A13 * B31  – this is a single number

A21 A22 A23    A21*B11 + A22*B21 + A23* B31 — ditto

A31 A32 A33   etc.

AB is just another 3 x 1 vector.  So the matrix just transforms one 3 dimensional vector into another

You should draw a similar diagram and see why B A is impossible.  What about

C  (3 x 1) times D (3 x 3)?  You get CD a 3 x 1 matrix (row vector) back .

                          D11 D12 D13

                         D21 D22 D23

                         D31 D32  D33

C11 C12 C13                                    What is CD12?

Suppose we get concrete and make B into a column vector of the following type

                            1

                            0

                            0

A11 A12 A13     A11

A21 A22 A23    A21

A31 A32 A33    A31

The first time I saw this, I didn’t understand it.   I thought  the mathematicians were going back to the old Cartesian system of standard orthonormal vectors.  They weren’t doing this at all.  Recall that we’re in a vector space and the column vector is really the 3 coefficients multiplying  the 3 basis vectors (which are not specified).  So you don’t have to mess around with choosing a basis, the result is true for ALL bases of a 3 dimensional vector space.  The power of abstraction.  The first column of A shows what the first basis vector goes to (in general), the second column shows what the second basis goes to.  Back in LSQFQM – IV, it was explained why any linear transformation (call it T) of a basis vector (call it C1) to another vector space must look like this

T(C1) =  t11 * D1 + t12 * D2 + . ..   for however many basis vectors vector space D has.

 Well, in the above example we’re going from a 3 dimensional vector space to another, and the first row of matrix A tells us what basis vector #1 is going to.  This is why every linear transformation can be represented by a matrix and every matrix represents a linear transformation.  Sometimes abstraction saves a lot of legwork.  

A more geometric way to look at all this is to regard an  n x n matrix multiplying an n x 1 vector as moving it around in n dimensional space (keeping one end fixed at the origin — see below).  So 

1  0  0 

0  1  0 

0  0  2

just multiplies the third basis vector by 2 leaving the other two alone.  

The notation is consistent. Recall that any linear transformation must leave the zero vector unchanged (see LSQFQM – I for a proof).  Given the rules for multiplying a matrix times a vector, this happens with a column vector which is all zeros.

The geometrically inclined can start thinking about what the possible linear transformations can do to three dimensional space (leaving the origin fixed).  Rotations about the origin are one possibility, expansion or contraction along a single basis vector are two more, projections down to a 2 dimensional plane or a 1 dimensional line are two more.  There are others (particularly when we’re in a vector space with complex numbers for coefficients — e.g. all of quantum mechanics). 

Up next time, eigenvectors, adjoints, and (hopefully) Hermitian operators.  That will be about it.  The point of these posts (which are far more extensive than I thought they would be when I started out) is to show you how natural the language of linear algebra is, once you see what’s going on under the hood.  It is not to teach quantum mechanics, which I’m still learning to see how it is used in chemistry.  QM is far from natural (although it describes the submicroscopic world — whether it can ever describe the world we live in is another question), but, if these posts are any good at all, you should be able to understand the language in which QM is expressed.

Linear Algebra survival guide for Quantum Mechanics – VI

Why is linear algebra like real estate?   Well, in linear algebra the 3 most important things are notation, notation, notation.  I’ve shown how two sequential linear transformations can be melded into one, but you’ve seen nothing about the matrix representation of a linear transformation.  

Here’s the playing field from LASGFQM – IV again.  There are 3 vector spaces, A, B and C of dimensions 3, 4, and 5, with bases {A1, A2, A3}, {B1, B2, B3, B4} and {C1, C2, C3, C4, C5}.  Then there is linear transformation T which transforms A into B, and linear tranformation S which transforms B into C.

We have T(A1) = AB11 * B1 + AB12 * B2 + AB13 *B3 + AB14*B4

S(B1) = BC11 *C1 + BC12 *C2 + BC13 *C3 + BC14 * C4 + BC15 * C5
S(B2) = BC21 *C1 + BC22 *C2 + BC23 *C3 + BC24 * C4 + BC25 * C5
S(B3) = BC31 *C1 + BC32 *C2 + BC33 *C3 + BC34 * C4 + BC35 * C5
S(B4) = BC41 *C1 + BC42 *C2 + BC43 *C3 + BC44 * C4 + BC45 * C5

To see the symmetry of what is going on you may have to make the print size smaller so the equations don’t slop over the linebreak. 

So after some heavy lifting we eventually arrived at: 

T(A1) = AB11 * ( BC11 * C1  +  BC12 * C2  +  BC13 * C3   +   BC14 * C4   +   BC15 * C5 ) +

                AB12 * ( BC21 * C1  +  BC22 * C2  +  BC23 * C3   +   BC24 * C4   +   BC25 * C5 ) +

                AB13 * ( BC31 * C1  +  BC32 * C2  +  BC33 * C3   +   BC34 * C4   +   BC35 * C5 ) +

                 AB14 * ( BC41 * C1  +  BC42 * C2  +  BC43 * C3   +   BC44 * C4   +   BC45 * C5 )

So that 

A1 = (AB11 * BC11 + AB12 * BC21 + AB13 * BC31 + AB14 * BC41) C1  +  

         (AB11 * BC12 + AB12 *BC22 + AB13 * BC32 + AB14 * BC42)  C2 + 

   etc. etc. 

All very open and above board, and obtained  just by plugging the B”s in terms of the C’s into the A’s in terms of the B’s to get the A’s in terms of the C’s.  

Notice that what we could call AC11 is just AB11 * BC11 + AB12 * BC21 + AB13 * BC31 + AB14 * BC41 and AC12 is just AB11 * BC12 + AB12 *BC22 + AB13 * BC32 + AB14 * BC42.  We need another 13 such sums to be able to express a vector in A (which is a unique linear combination of A1, A2, A3 because the three of them are a basis) in terms of the 5 C’ basis vectors.  It’s dreary but it can be done, and you just saw part of it.  

You don’t want to figure this out all the time.  So represent T as a rectangular array with 4 rows and 3 columns

AB11   AB21  AB31
AB12   AB22  AB32
AB13   AB23  AB33
AB14   AB24  AB34

Represent S as a rectangular array with 5 rows and 4 columns 

BC11   BC21   BC31  BC41
BC12   BC22  BC32  BC42
BC13   BC23  BC33  BC43
BC14   BC24  BC34  BC44
BC15   BC25  BC34  BC45

Now plunk the array of AB’s on top of (and to the right) of the array of BC’s

                                                 AB11   AB21  AB31
                                                AB12   AB22  AB32
                                                AB13   AB23  AB33
                                                AB14   AB24  AB34
BC11   BC21   BC31  BC41  AC11
BC12   BC22  BC32  BC42
BC13   BC23  BC33  BC43
BC14   BC24  BC34  BC44
BC15   BC25  BC34  BC45

Recall that (after much tedious algebra) we obtained that

AC11 was just AB11 * BC11 + AB12 * BC21 + AB13 * BC31 + AB14 * BC41

But AC11 is just the as if the first row of the BC array was a vector and the first column of the AB array was also a vector and you formed the dot product.  Well they are and you did just that to find element AC11 of the array representing the linear transformation from A to C.  Do this 14 more times to get all 15 possible combinations of 3 As and 5 Cs and you get an array of numbers with 5 rows and 3 columns.  This is the AC matrix and this is why matrix multiplication is the way it is.

Note: we have multiplied a 5 row times 4 column array by a 4 row 3 column array.  Recall that you can only form the inner product of vectors with the same numbers of components (e.g. they have to be in vector spaces of the same dimension).  

We have T: A to B (dimension 3 to dimension 4)

                 S: B to C (dimension 4 to dimension 5)

     This is written as ST (convention has it that the transformation on the right is always done first — this takes some getting used to, but at least everyone follows it, so it’s like medical school — the appendix is on the right, just remember it).   Notice that  TS makes absolutely no sense.   S takes you to a vector space of dimension 5, then T tries to start with a different vector space.   This is why when multiplying arrays (matrices) the number of rows of the matrix on the left must match the number of columns of the matrix on the right (or the top as I’ve drawn it — thanks to John and Barbara Hubbard and their great book on Vector Calculus).  If the two matrices are rectangular (as we have here), only one way of  multiplication is possible.  

More notation, and an apology.  Matrix T is a 4 row by 3 column matrix — this is always written as a 4 x 3 matrix.  Similarly for the coefficients of each element which I have in some way screwed up (but at least I did so consistently).  Invariably the matrix element (just a number) in the 3rd column of the fourth row is written element43 — If you look at what I’ve written everything is bassackwards.  Sorry, but the principles are correct. The mnemonic for the order of the coefficients is Roman Catholic (row column), a nonscatological mnemonic for once. 

That’s a lot of tedium, but it does explain why matrix multiplication is the way it is.  Notice a few other things.  The matrices you saw were 4 x 3 and 5 x 4, but 3 x 1 matrices are possible as well.  Such matrices are called column vectors.  Similarly 1 x 3 matrices exist and are called row vectors.  So what do you get if you multiply a 1 x 3 vector by a 3 x 1 vector?  

You get a 1 x 1 vector or a number.  This is another way to look at the inner product of two vectors.  Usually vectors are written as column vectors ( n x 1 ) with n rows and 1 columns.  1 x n row vectors are known as the transpose of the column vector. 

That’s plenty for now.  Hopefully the next post will be more interesting.  However, physics needs to calculate things and see if the numbers they get match up with experiment.  This means that they must choose a basis for each vector space, and express each vector as an array of coefficients of that basis.  Mathematicians avoid this where possible, just using the properties of vector space bases to reason about linear transformations, and the properties of various linear transformations to reason about bases.  You’ll see the power of this sort of thinking in the next post.  If you ever study differentiable manifolds you’ll see it in spades.

Linear Algebra survival guide for Quantum Mechanics – V

We’ve established a pretty good base camp for the final assault.  It’s time to acclimate to the altitude, look around and wax a bit philosophic.  What’s happened to the integrals and derivatives in all of this?  A vector is a vector and its components can be differentiated, but linear algebra never talks about integrating vectors.  During the QM course, I was constantly bombarding the instructor with questions about things I didn’t understand.  Finally, he said that he wished the students were asking those sorts of questions.  I told him they were just doing what most people do on their first exposure to QM — trying to survive.  That’s certainly the way I was the first time around QM.  True for calculus as well.    I quickly learned to ignore what a Riemann integral really is — the limit of an infinite sum of products.  Cut the baloney, to integrate something just find the antiderivative.  We all know that.   Well, that’s pretty much true for continuous functions and the problems you meet in Calculus I.  

Well you’re not in Kansas anymore, and to understand why an infinite dimensional vector is like an integral, you’ve got to go back to Riemann’s definition of the integral of a function.  You start with some finite interval (infinite intervals come later).  Then you chop it up into many (say 100) smaller nonoverlapping but contiguous subintervals (each of which has a finite nonzero length).   Then you pick one value of the function in each of the intervals (which can’t be infinite or the process fails), multiply it by the length of each subinterval and form the sum of all 100 products (which is just a number after all).   Then you chop each of the subintervals into subsubintervals and repeat the process obtaining a new number.  If the series of numbers approaches a limit as the process proceeds than the integral exists and is a number.  Purists will note that I’ve skipped all sorts of analysis, such that each interval is a compact (closed and bounded) set of real numbers, that the function is continuous on the intervals, so that it reaches a maximum and a minimum on each interval, and that if the integral exists, the sums of the maxima times the interval length on each interval and the sums of the minima times interval length approach each other etc. etc.  Parenthetically, the best analysis book I’ve met is “Understanding Analysis” by Stephen Abbott.

As you subdivide, the length of the sub-sub- .. . sub intervals get smaller and smaller (and of course more numerous).  What if you call each of the subintervals a dimension rather than an interval and the value of the function, the coefficient of the vector on that dimension?  Then as the number of subintervals increases, the plot of the function values you chosen for each interval get closer and closer, so that plotting a high dimension vector looks just like the continuous function you started with.  This is why an infinite dimensional vector looks like the integral of a function (and why quantum mechanics uses them).   

Now imagine a linear transformation of this vector into another vector in the same infinite dimensional space, and you’re almost to what quantum mechanics means by an operator.  Inner products of infinite dimensional vectors can be defined (with just a minor bit of heavy lifting).  Just multiply the coefficients of the vectors in each dimension together and form their sum.  This won’t be impossible.  Let the nth coefficient of vector #1 be 1/2^n, that of vector #2 1/3^n.  The sum of even an infinite number of such products is finite.   This implies that to be of use in QM the coefficients of any of its infinite vectors must form a convergent series.

Now, what if some (or all) of the coefficients are complex numbers?  No problem, because of the way inner product of vectors with complex coefficients was defined in the second post of the series.  The inner product of (even an infinite dimensional ) complex vector with itself is guaranteed to be a real number.  You’re almost in the playing field of QM, e.g. Hilbert space — an infinite dimensional space with an inner product defined on it.  The only other thing needed for Hilbert space is something called completeness, something I don’t understand well enough to explain, but it means something like plugging up the holes in the space, in the same way that the real numbers plug the holes in the rational numbers. 

Certainly not in Kansas anymore, and apparently barely in Physics either.  It’s time to respond to Wavefuntion’s comment on the last post. “It’s interesting that if you are learning “practical” quantum mechanics such as quantum chemistry, you can get away with a lot without almost any linear algebra. One has to only take a look at popular QC texts like Levine, Atkins, or Pauling and Wilson; almost no LA there.”  So what’s the point of all these posts?

It’s back to Feynman and another of his famous quotes “I think I can safely say that nobody understands quantum mechanics.” This from 1965.  A look a Louisa Gilder’s recent book “The Age of Entanglement” should convince you that, on a deep level, no one still does.  Feynman also warns us not to start thinking ‘how can it be like that’ (so did the instructor in the QM course).  So why all this verbiage?  

Because all QM follows from a few simple postulates, and these postulates are written in linear algebra.  Hopefully at the end of this, you’ll understand the language in which QM is written, so any difficulty will be with the underlying structure of QM (which is plenty), not the way QM is expressed (or why it is expressed the way it is).

Next up, vector and matrix notation and what the adjoint is, and why it’s important.  If you begin thinking hard about the inner product of two different complex vectors (even the finite ones) you’ll see that usually a complex number will result.  How does QM avoid this (since all measurable values must be real — one of the postulates). Adjoints and Hermitian operators are the way out.  There’s still some pretty hard stuff ahead.

Linear Algebra survival guide for Quantum Mechanics – IV

 The point of this post is to show from whence  the weird definition of matrix multiplication comes, and why it simply MUST be the way it is. Actually matrices don’t appear in this post, just the underlying equations they represent.   We’re dealing with spaces of finite dimension at this point (infinite dimensional spaces come later).  Such spaces have a basis — meaning a collection of elements (basis vectors) which are enough to describe every element of the space UNIQUELY, as a linear combination.  

To make things a bit more concrete, think of good old 3 dimensional space with basis vectors E1 = (1,0, 0) aka i, E2 = (0,1,0) aka j, and E3 = (0,0,1) aka j.  Every point in this space is uniquely described as a1 * E1 + a2 * E2 + a3 * E3 — e. g. a linear combination of the 3 basis vectors.  You can also think of each point as a vector from the origin (0,0,0) to the point (a1,a2,a3).  Once you establish what the basis is each vector is specified by its (unique) triple of numerical coordinates (a1, a2, a3).  Choose a different basis and you get a different set of coordinates, but you always get no more and no less than 3 coordinates — that’s what dimension is all about.  Note that the combination of basis vectors is linear (no powers greater than 1).  

So now we’re going to consider several spaces, namely A, B and C of dimensions 3, 4 and 5.  Their basis vectors are the set {A1, A2, A3 } for A,  {B1, B2, B3, B4 } for B — fill in the dots for C. 

What does a linear transformation from A to B look like?   Because of the way things have been set up, there is really no choice at all. 

Consider any vector of A — it must be of the form a1 * A1  +  a2 * A2  +  a3 * A3 , e.g. a linear combination of the basis vectors {A1, A2, A3}  – where the { } notation means set.  For any given vector in A, a1 a2 and a3 are uniquely determined.  Sorry to stress this so much but uniqueness is crucial.

Similarly any vector of C must be of the form  c1 * C1 + c2 * C2 + c3 * C3 + c4 * C4 + c5 * C5.  Go back and fill in the dots for B. 

Any linear function T from A to B must satisfy

T (X + Y) = T(X) + T(Y)

where X and Y are vectors in A and T(X), T(Y) are vectors in B.  So what?  A lot.  We only have to worry about what T does to A1, A2 and A3.  Why ? ?  Because the {Ai} are  basis vectors, and because of the second thing a linear function must satisfy

T ( number * X) = number * (T ( X))  so combining both properties

T (a1 * A1 + a2 * A2 + a3 * A3) = a1 * T(A1) + a2 * T(A2) + a3 * T(A3)

All we have to worry about is what T does to the 3 basis vectors of A.  Everything else follows easily enough.

So what is T(A1) ?  Well, it’s a vector in B.  Since B has a basis T(A1) is a unique linear combination of them.  Now the nomenclature will shift a bit. I’m going to write T(A1) as follows.

T(A1)  =  AB11 * B1  +  AB12 * B2  +  AB13 * B3  +  AB14 * B4

AB signifies that the function is from space A to space B, the numbers after AB are to be taken as subscripts.  Terms of art:  linear functions between vector spaces are usually called linear transformations.  When the vectors spaces on either end of the transformation are the same the linear transformation is called a linear operator (or operator for short).  Sound familiar?   An example of a linear operator in 3 dimensional space would just be a rotation of the coordinate axes, leaving the origin fixed.  For why the origin has to be fixed if the transformation is to be linear see the first post in the series.

Fill in the dots for T(A2) = AB21 * B1 + . . . 

T(A3) = AB31 * B1 + . . . 

Now for a blizzard of (similar and pretty simple) algebra.  Consider the linear transformation from B to C. Call the transformation S.  I’m going to stop putting the Bi’s and Ci’s in bold. you know they are basis vectors.  Also in what follows to get the equations to line up on top of each other you might have to make the characters smaller (say by holding down the Command and the minus key at the same time — in the Apple world)

S(B1)  =  BC11 * C1  +  BC12 * C2  +  BC13 * C3   +   BC14 * C4   +   BC15 * C5

S(B2)  =  BC21 * C1  +  BC22 * C2  +  BC23 * C3   +   BC24 * C4   +   BC25 * C5
S(B3)  =  BC31 * C1  +  BC32 * C2  +  BC33 * C3   +   BC34 * C4   +   BC35 * C5
S(B4)  =  BC41 * C1  +  BC42 * C2  +  BC43 * C3   +   BC44 * C4   +   BC45 * C5

It’s pretty simple to plug S(Bi) into T(A1). 

Recall that T(A1) = AB11 * B1  +  AB12 * B2  + AB13 * B3  + AB14 * B4

So we  get

T(A1) = AB11 * ( BC11 * C1  +  BC12 * C2  +  BC13 * C3   +   BC14 * C4   +   BC15 * C5 ) +

                AB12 * ( BC21 * C1  +  BC22 * C2  +  BC23 * C3   +   BC24 * C4   +   BC25 * C5 ) +

                AB13 * ( BC31 * C1  +  BC32 * C2  +  BC33 * C3   +   BC34 * C4   +   BC35 * C5 ) +

                 AB14 * ( BC41 * C1  +  BC42 * C2  +  BC43 * C3   +   BC44 * C4   +   BC45 * C5 )

So now we have a linear transformation of space A to space C, just by simple substitution.   Do you see the pattern yet? If not just collect terms of A1 in terms of {C1, C2, C3, C4, C5}.  It’s easy to do as they are all above each other.  If we write

S(T(A1))  = AC11 * C1  +  AC12 * C2  +  AC13 * C3  + AC14 * C4  +  AC15 * C5 

you can see that AC13 = AB11 * BC13 + AB12 * BC23   + AB13 * BC33   +  AB14 * BC43.  This is the sum of 4 terms of which are of the form AB1x * BCx, where x runs from 1 to 4

This should look very familiar if you know the formula for matrix multiplication.  If not don’t sweat it, I’ll discuss matrices next time, but you’ve basically  just seen them (they’re just a compact way of representing the above equations).   Linear transformations between (appropriately dimensioned) vector spaces can always be mushed together (combined) like this.  Why? (1) all finite dimensional vector spaces have a basis, with all that goes with them  and (2) linear transformations are a very special type of function (according to an instructor in a graduate algebra course — the only type of function mathematicians understand completely).  

It is the very simple algebra of combining linear transformations between finite dimensional vector spaces that makes matrix multiplication exactly what it is.  It simply can’t be anything else.  Now you know.   Quantum mechanics is written in this language, the syntax of which is the linear transformation, the representation the matrix.  Remarkably, when Heisenberg formulated quantum mechanics this way, he knew nothing about matrices.  A Hilbert trained mathematician and physicist (Max Born) had to tell him what he was really doing.  So much for the notion that physicists shoehorn our view of the world into a mathematical mold.  Amazingly, the mathematics always seems to get there first (Newton excepted). 


Linear Algebra survival guide for Quantum Mechanics – III

Before leaving the dot product, it should be noted that there are all sorts of nice geometric things you can do with it — such defining the angle between two vectors (and in a space with any finite number of dimensions to boot).  But these are things which are pretty intuitive (because they are geometric) so I’m not going to go into them.  When the dot product of two vectors is zero they are said to be orthogonal to each other (e.g. at right angles to each other).  You saw this with the dot product of E1 = (1,0) and E2 = (0,1) in the other post.  But it also works with any two vectors at right angles, such as X = (1,1) and Y = (1,-1).  

The notion of dimension seems pretty simple, until you start to think about it (consider fractals).  We cut our  vector teeth on vectors in 3 dimensional space (e.g. E1 = (1,0,0) aka i, E2 = (0,1,0) aka j, and E3  = (0,0,1) aka k.  Any point in 3 dimensional space can be expressed as a linear combination of them — e.g.  (x, y, z) = x * E1 + y * E2 + z * E3.   The crucial point about this way of representing a given point is that the representation is unique.  In math lingo, E1, E2, and E3 are said to be linearly independent, and if you study abstract algebra you will run up against the following (rather obscure) definition  – a collection of vectors is linearly independent if the only way to get them to add up to the zero vector (0, 0, . . .) is to multiply each of them by the real number zero.  X and Y by themselves are linearly independent , but X, Y and (1,0) = E1 are not, as 1 * X + 1 * Y + (-2) * E1 = (0, 0).  This definition is used in lots of proofs in abstract algebra, but it totally hides what is really going on.  Given a linearly independent set of vectors, the representation of any other vector as a linear combination of them is UNIQUE.  Given a set of vectors V1, V2, . .. we can always represent the zero vector as 0 * V1 + 0 *V2 + . …   If there is no other way to get the zero vector from them, then V1, V2,  … are linearly independent.  That’s where the criterion comes from, but uniqueness is what is crucial.  

It’s intuitively clear that you need two vectors to represent points in the plane, 3 to represent points in space, etc. etc.  So the dimension of any vector space is the maximum number of linearly independent vectors it contains.  The number of pairs of linearly independent vectors in the plane is infinite (just consider rotating the x and y axes). But the plane has dimension 2 because 3 (non co-linear) vectors are never linearly independent.   Spaces can have any number of dimensions, and quantum mechanics deals with a type of infinite dimensional space called Hilbert space (I’ll show how to get your mind around this in a later post).  As an example of a space with a large number of dimensions, consider the stock market.  Each stock in it occupies a separate dimension, with the price (or the volume, or the total number of shares outstanding) as a number to multiple that dimension by.  You don’t have a complete description of the stock market vector until you say what’s going on with each stock (dimension). 

Suppose you now have a space of dimension n, and a collection of n linearly independent vectors, so that any other n-dimensional vector can be uniquely expressed (can be uniquely represented) as a linear combination of the n vectors.  The collection of n vectors is then called a basis of the vector space.  There is no reason the vectors of the basis  have to be at right angles to each other (in fact in “La Geometrie” of Descartes which gave rise to the term Cartesian coordinates, the axes were NOT at right angles to each other, and didn’t even go past the first quadrant).  So (1,0) and (1,1) is a perfectly acceptable basis for the plane.  The pair are linearly independent — try getting them to add to (o, o) with nonzero coefficients. 

Quantum mechanics wants things nicer than this.  First, all the basis vectors are normalized — given a vector V’ we want to form a vector V pointing in the same direction such that < V | V > = 1.  Not hard to do — < V’ | V’ > is a just real number after all (call it x), so V is just V’/SQRT[x].  There was an example of this technique in the previous post in the series.  

Second (and this is the hard part), Quantum mechanics wants all its normalized basis vectors to be orthogonal to each other — e.g. if I and J are vectors < I | J > = 1 if I = J, and 0 if I doesn’t equal J.  Such a function is called the Kroneker delta function (or delta(ij).  How do you accomplish this?  By a true algebraic horror known as Gram Schmidt orthogonalization.  It is a ‘simple’ algorithm in which you take dot products of two vectors and then subtract them from another vector .  I never could get the damn thing to work on problems  years ago in grad school, and developed another name for it which I’ll leave to your imagination (where is Kyle Finchsigmate when you really need him?).  But work it does, so the basis vectors (the pure wavefunctions) of quantum mechanical space are both normalized and orthogonal to each other (e.g. they are orthonormal).  Since they are a basis, any other wave function has a UNIQUE representation in terms of them (these are the famous mixed states or the superposition states of quantum mechanics).  

If you’ve already studied a bit of QM, the basis vectors are the eigenvectors of the various quantum mechanical operators.  If not, don’t sweat it, this will be explained in the next post.  That’s a fair amount of background and terminology.  But it’s necessary for you to understand why matrix multiplication is the way it is, why matrices represent linear transformation, and why quantum mechanical operators are basically linear transformations.  That’s all coming up.

Linear Algebra survival guide for Quantum Mechanics – II

Before pushing on to the complexities of the dot product of two complex vectors, it’s worthwhile thinking about why the dot product isn’t a product as we’ve come to know products.  Consider E1 = (1, 0 ) and E2 = (0, 1).  Their dot product is 1 * 0 + 0 * 1 or zero.  Not your father’s product.  You’re not in Kansas any more.  Abstract algebraists, love such things and call them zero divisors, because neither of them is zero themselves yet when ‘multiplied’ together they produce zero.

This is not just mathematical trivia, as any two vectors we can dot together and get zero are called orthogonal. Such vectors are particularly important  for quantum mechanics, because (to get ahead a bit) all the eigenvectors we can interrogate by experiment to get any sort of measurement (energy, angular momentum etc. etc.)  are orthogonal to each other.   The dot product of V = 3 * E1 + 4 * E2 with itself is 25.    We can can make < V | V > = 1 by multiplying V by 1/SQRT(25)  – check it out.  Such a vector is said to be normalized.  Any vector you meet in quantum mechanics can and should be normalized, and usually is, except on your homework, where you forgot to do it and got the wrong answer. Vectors which are both orthogonal to each other and normalized are called (unsurprisingly) orthonormal.

I’d love to be able to put subscripts on the variables, but at this point I can’t, so here are the naming conventions once again.

x^2 means x times x (or x squared)
x1 means x with subscript 1 (when x is a small letter)
x57 (note two integers follow the x not one) means a matrix element with the first number for the Row and the second for the Column — mnemonic Roman Catholic
XV, etc. etc. are to be taken as vectors (I’ve got no way to put an arrow on top of them)
E1E2, are the standard basis vectors — E1 = (1, 0 , 0 . . ), E2 = (0, 1, 0, .. ), En = (0, 0, … 1), Ei stands for any of them
# stands for any number (which can be real or complex)
i (in italics) always stands for the SQRT[-1]
* has two meanings. When separated by spaces such as x * x it means multiply e.g. x^2
When next to a vector V* or a letter x* it means the complex conjugate of the vector or the number (see later)

The dot product of a vector V can be written 3 ways  V.VV,V> and < VV >.  Since physicists use the last one, that’s what I’ll stick to (mostly).

Recall that to get a real number from the dot product of a complex vector with itself, one must multiply the vector V by its complex conjugate V*.  Here what the complex conjugate is again.  Given a complex number z = a + ib, its complex conjugate (written z*) is a – ib.

z * z* (note the different uses of *) =   a^2 + b^2, which is a real nonNegative number because a and b are both real.  Note that conjugating a complex number twice doesn’t change it –e.g.  z** = z.

This modification of the definition of dot product for complex vectors, leads to significant complications. Why? When V, W are vectors with complex coefficients < VW > is NOT the same as < WV > unlike the case where the vectors have all real coefficients.  Here’s why.  No matter how many components a complex vector has, the dot product is only a sum of the products of just two complex numbers with each other (see the previous post).  The product of two complex numbers is just another one, as is the sum of any (finite) number of complex numbers.  This means that multiplying a mere two complex numbers together will be enough to see the problem. To avoid confusion with V and W which are vectors, I’ll call the complex numbers p and q. Remember that p1, p2, q1 and q2 are all real numbers and i is just, well i (the number which when multiplied by itself gives – 1).

p = p1 + p2i,       q = q1 + q2i

p* = p1 – p2i,      q* = q1 – q2i

p times q* = (p1 + p2i) * (q1 – q2i) = (p1 * q1 + p2 * q2) + i (p2 * q1 – p1 * q2)

p* times q = (p1 – p2i) * (q1 + q2i) =  (p1 * q1 + p2 * q2) + i(p1 * q2 – p2 * q1)

Note that the terms which multiply i are NOT the same (but they are the negative of each other).   So what does < V | W > mean?  Recall that

V = v1 * E1 + v2 * E2 + . … vn * En

W = w1 * E1 + . . + wn * En

VW > = v1 * w1 + v2 * w2 + . . . + vn * wn. ; here the * means multiplication not complex conjugation.

Remember that v1, w1, v2, etc. are now complex numbers, and you’ve just seen that v1* times w1 is NOT the same as v1 times w1*.  Clearly a convention is called for. Malheureusement, physicists use one convention and mathematicians use the other.   Since this is about quantum mechanics, here’s what physicists mean by < V | W >.  They mean the dot product of V* (whose coefficients are the complex conjugates of v1, . . v2) with W.  More explicitly they mean V* . W, but when written in physics notation < V | W >, the * isn’t mentioned (but never forget that it’s there).

Now  v1 * w1 + v2 * w2 + . . . + vn * wn is just another complex number — say z =  x + iy.  To form its complex conjugate we just negate the iy term to get z* = x – iy

Look at

p times q* = (p1 + p2i) *  (q1 – q2i) = (p1 * q1 + p2 * q2) + i (p2 * q1 – p1 * q2)

p* times q = (p1 – p2i) * (q1 + q2i) =  (p1 * q1 + p2 * q2) + i(p1 * q2 – p2 * q1)

once again.  Notice that p times q* is just the complex conjugate of p* times q

So if < V | W > = v1* * w1 + v2* * w2 + . . . + vn* * wn = x + iy ;  here * means 2 different things, complex conjugation when next to vi and multiplication when between vi and wi (sorry for the horrible notation, hopefully  someone knows how to get subscripts into all this).

By the physics convention < W | V > is w1* * v1 + w2* * v2 + . . . + wn* * vn.  Since p times q* is just the complex conjugate of p* times q,  w1* * v1 is the complex conjugate of w1 * v1*.  This means  w1* * v1 + w2* * v2 + . . . + wn* * vn = x – iy.

In shorthand < V | W > = < W | V >*, something you may have seen and puzzled over.  It’s all a result of wanting the dot product of a complex vector to be a real number.  Not handed down on tablets of stone, but the response to a problem.

Next up, vector spaces, linear transformations on them (operators) and their matrix representation.  I hope to pump subsequent posts  out one after the other, but I’m having some minor surgery on the 6th, so there may be a lag.

Follow

Get every new post delivered to your Inbox.

Join 57 other followers