Category Archives: Linear Algebra Survival Guide for Quantum Mechanics

Linear Algebra survival guide for Quantum Mechanics – IV

 The point of this post is to show from whence  the weird definition of matrix multiplication comes, and why it simply MUST be the way it is. Actually matrices don’t appear in this post, just the underlying equations they represent.   We’re dealing with spaces of finite dimension at this point (infinite dimensional spaces come later).  Such spaces have a basis — meaning a collection of elements (basis vectors) which are enough to describe every element of the space UNIQUELY, as a linear combination.  

To make things a bit more concrete, think of good old 3 dimensional space with basis vectors E1 = (1,0, 0) aka i, E2 = (0,1,0) aka j, and E3 = (0,0,1) aka j.  Every point in this space is uniquely described as a1 * E1 + a2 * E2 + a3 * E3 — e. g. a linear combination of the 3 basis vectors.  You can also think of each point as a vector from the origin (0,0,0) to the point (a1,a2,a3).  Once you establish what the basis is each vector is specified by its (unique) triple of numerical coordinates (a1, a2, a3).  Choose a different basis and you get a different set of coordinates, but you always get no more and no less than 3 coordinates — that’s what dimension is all about.  Note that the combination of basis vectors is linear (no powers greater than 1).  

So now we’re going to consider several spaces, namely A, B and C of dimensions 3, 4 and 5.  Their basis vectors are the set {A1, A2, A3 } for A,  {B1, B2, B3, B4 } for B — fill in the dots for C. 

What does a linear transformation from A to B look like?   Because of the way things have been set up, there is really no choice at all. 

Consider any vector of A — it must be of the form a1 * A1  +  a2 * A2  +  a3 * A3 , e.g. a linear combination of the basis vectors {A1, A2, A3}  — where the { } notation means set.  For any given vector in A, a1 a2 and a3 are uniquely determined.  Sorry to stress this so much but uniqueness is crucial.

Similarly any vector of C must be of the form  c1 * C1 + c2 * C2 + c3 * C3 + c4 * C4 + c5 * C5.  Go back and fill in the dots for B. 

Any linear function T from A to B must satisfy

T (X + Y) = T(X) + T(Y)

where X and Y are vectors in A and T(X), T(Y) are vectors in B.  So what?  A lot.  We only have to worry about what T does to A1, A2 and A3.  Why ? ?  Because the {Ai} are  basis vectors, and because of the second thing a linear function must satisfy

T ( number * X) = number * (T ( X))  so combining both properties

T (a1 * A1 + a2 * A2 + a3 * A3) = a1 * T(A1) + a2 * T(A2) + a3 * T(A3)

All we have to worry about is what T does to the 3 basis vectors of A.  Everything else follows easily enough.

So what is T(A1) ?  Well, it’s a vector in B.  Since B has a basis T(A1) is a unique linear combination of them.  Now the nomenclature will shift a bit. I’m going to write T(A1) as follows.

T(A1)  =  AB11 * B1  +  AB12 * B2  +  AB13 * B3  +  AB14 * B4

AB signifies that the function is from space A to space B, the numbers after AB are to be taken as subscripts.  Terms of art:  linear functions between vector spaces are usually called linear transformations.  When the vectors spaces on either end of the transformation are the same the linear transformation is called a linear operator (or operator for short).  Sound familiar?   An example of a linear operator in 3 dimensional space would just be a rotation of the coordinate axes, leaving the origin fixed.  For why the origin has to be fixed if the transformation is to be linear see the first post in the series.

Fill in the dots for T(A2) = AB21 * B1 + . . . 

T(A3) = AB31 * B1 + . . . 

Now for a blizzard of (similar and pretty simple) algebra.  Consider the linear transformation from B to C. Call the transformation S.  I’m going to stop putting the Bi’s and Ci’s in bold. you know they are basis vectors.  Also in what follows to get the equations to line up on top of each other you might have to make the characters smaller (say by holding down the Command and the minus key at the same time — in the Apple world)

S(B1)  =  BC11 * C1  +  BC12 * C2  +  BC13 * C3   +   BC14 * C4   +   BC15 * C5

S(B2)  =  BC21 * C1  +  BC22 * C2  +  BC23 * C3   +   BC24 * C4   +   BC25 * C5
S(B3)  =  BC31 * C1  +  BC32 * C2  +  BC33 * C3   +   BC34 * C4   +   BC35 * C5
S(B4)  =  BC41 * C1  +  BC42 * C2  +  BC43 * C3   +   BC44 * C4   +   BC45 * C5

It’s pretty simple to plug S(Bi) into T(A1). 

Recall that T(A1) = AB11 * B1  +  AB12 * B2  + AB13 * B3  + AB14 * B4

So we  get

T(A1) = AB11 * ( BC11 * C1  +  BC12 * C2  +  BC13 * C3   +   BC14 * C4   +   BC15 * C5 ) +

                AB12 * ( BC21 * C1  +  BC22 * C2  +  BC23 * C3   +   BC24 * C4   +   BC25 * C5 ) +

                AB13 * ( BC31 * C1  +  BC32 * C2  +  BC33 * C3   +   BC34 * C4   +   BC35 * C5 ) +

                 AB14 * ( BC41 * C1  +  BC42 * C2  +  BC43 * C3   +   BC44 * C4   +   BC45 * C5 )

So now we have a linear transformation of space A to space C, just by simple substitution.   Do you see the pattern yet? If not just collect terms of A1 in terms of {C1, C2, C3, C4, C5}.  It’s easy to do as they are all above each other.  If we write

S(T(A1))  = AC11 * C1  +  AC12 * C2  +  AC13 * C3  + AC14 * C4  +  AC15 * C5 

you can see that AC13 = AB11 * BC13 + AB12 * BC23   + AB13 * BC33   +  AB14 * BC43.  This is the sum of 4 terms of which are of the form AB1x * BCx, where x runs from 1 to 4

This should look very familiar if you know the formula for matrix multiplication.  If not don’t sweat it, I’ll discuss matrices next time, but you’ve basically  just seen them (they’re just a compact way of representing the above equations).   Linear transformations between (appropriately dimensioned) vector spaces can always be mushed together (combined) like this.  Why? (1) all finite dimensional vector spaces have a basis, with all that goes with them  and (2) linear transformations are a very special type of function (according to an instructor in a graduate algebra course — the only type of function mathematicians understand completely).  

It is the very simple algebra of combining linear transformations between finite dimensional vector spaces that makes matrix multiplication exactly what it is.  It simply can’t be anything else.  Now you know.   Quantum mechanics is written in this language, the syntax of which is the linear transformation, the representation the matrix.  Remarkably, when Heisenberg formulated quantum mechanics this way, he knew nothing about matrices.  A Hilbert trained mathematician and physicist (Max Born) had to tell him what he was really doing.  So much for the notion that physicists shoehorn our view of the world into a mathematical mold.  Amazingly, the mathematics always seems to get there first (Newton excepted). 


Linear Algebra survival guide for Quantum Mechanics – III

Before leaving the dot product, it should be noted that there are all sorts of nice geometric things you can do with it — such defining the angle between two vectors (and in a space with any finite number of dimensions to boot).  But these are things which are pretty intuitive (because they are geometric) so I’m not going to go into them.  When the dot product of two vectors is zero they are said to be orthogonal to each other (e.g. at right angles to each other).  You saw this with the dot product of E1 = (1,0) and E2 = (0,1) in the other post.  But it also works with any two vectors at right angles, such as X = (1,1) and Y = (1,-1).  

The notion of dimension seems pretty simple, until you start to think about it (consider fractals).  We cut our  vector teeth on vectors in 3 dimensional space (e.g. E1 = (1,0,0) aka i, E2 = (0,1,0) aka j, and E3  = (0,0,1) aka k.  Any point in 3 dimensional space can be expressed as a linear combination of them — e.g.  (x, y, z) = x * E1 + y * E2 + z * E3.   The crucial point about this way of representing a given point is that the representation is unique.  In math lingo, E1, E2, and E3 are said to be linearly independent, and if you study abstract algebra you will run up against the following (rather obscure) definition  — a collection of vectors is linearly independent if the only way to get them to add up to the zero vector (0, 0, . . .) is to multiply each of them by the real number zero.  X and Y by themselves are linearly independent , but X, Y and (1,0) = E1 are not, as 1 * X + 1 * Y + (-2) * E1 = (0, 0).  This definition is used in lots of proofs in abstract algebra, but it totally hides what is really going on.  Given a linearly independent set of vectors, the representation of any other vector as a linear combination of them is UNIQUE.  Given a set of vectors V1, V2, . .. we can always represent the zero vector as 0 * V1 + 0 *V2 + . …   If there is no other way to get the zero vector from them, then V1, V2,  … are linearly independent.  That’s where the criterion comes from, but uniqueness is what is crucial.  

It’s intuitively clear that you need two vectors to represent points in the plane, 3 to represent points in space, etc. etc.  So the dimension of any vector space is the maximum number of linearly independent vectors it contains.  The number of pairs of linearly independent vectors in the plane is infinite (just consider rotating the x and y axes). But the plane has dimension 2 because 3 (non co-linear) vectors are never linearly independent.   Spaces can have any number of dimensions, and quantum mechanics deals with a type of infinite dimensional space called Hilbert space (I’ll show how to get your mind around this in a later post).  As an example of a space with a large number of dimensions, consider the stock market.  Each stock in it occupies a separate dimension, with the price (or the volume, or the total number of shares outstanding) as a number to multiple that dimension by.  You don’t have a complete description of the stock market vector until you say what’s going on with each stock (dimension). 

Suppose you now have a space of dimension n, and a collection of n linearly independent vectors, so that any other n-dimensional vector can be uniquely expressed (can be uniquely represented) as a linear combination of the n vectors.  The collection of n vectors is then called a basis of the vector space.  There is no reason the vectors of the basis  have to be at right angles to each other (in fact in “La Geometrie” of Descartes which gave rise to the term Cartesian coordinates, the axes were NOT at right angles to each other, and didn’t even go past the first quadrant).  So (1,0) and (1,1) is a perfectly acceptable basis for the plane.  The pair are linearly independent — try getting them to add to (o, o) with nonzero coefficients. 

Quantum mechanics wants things nicer than this.  First, all the basis vectors are normalized — given a vector V’ we want to form a vector V pointing in the same direction such that < V | V > = 1.  Not hard to do — < V’ | V’ > is a just real number after all (call it x), so V is just V’/SQRT[x].  There was an example of this technique in the previous post in the series.  

Second (and this is the hard part), Quantum mechanics wants all its normalized basis vectors to be orthogonal to each other — e.g. if I and J are vectors < I | J > = 1 if I = J, and 0 if I doesn’t equal J.  Such a function is called the Kroneker delta function (or delta(ij).  How do you accomplish this?  By a true algebraic horror known as Gram Schmidt orthogonalization.  It is a ‘simple’ algorithm in which you take dot products of two vectors and then subtract them from another vector .  I never could get the damn thing to work on problems  years ago in grad school, and developed another name for it which I’ll leave to your imagination (where is Kyle Finchsigmate when you really need him?).  But work it does, so the basis vectors (the pure wavefunctions) of quantum mechanical space are both normalized and orthogonal to each other (e.g. they are orthonormal).  Since they are a basis, any other wave function has a UNIQUE representation in terms of them (these are the famous mixed states or the superposition states of quantum mechanics).  

If you’ve already studied a bit of QM, the basis vectors are the eigenvectors of the various quantum mechanical operators.  If not, don’t sweat it, this will be explained in the next post.  That’s a fair amount of background and terminology.  But it’s necessary for you to understand why matrix multiplication is the way it is, why matrices represent linear transformation, and why quantum mechanical operators are basically linear transformations.  That’s all coming up.

Linear Algebra survival guide for Quantum Mechanics – II

Before pushing on to the complexities of the dot product of two complex vectors, it’s worthwhile thinking about why the dot product isn’t a product as we’ve come to know products.  Consider E1 = (1, 0 ) and E2 = (0, 1).  Their dot product is 1 * 0 + 0 * 1 or zero.  Not your father’s product.  You’re not in Kansas any more.  Abstract algebraists, love such things and call them zero divisors, because neither of them is zero themselves yet when ‘multiplied’ together they produce zero.

This is not just mathematical trivia, as any two vectors we can dot together and get zero are called orthogonal. Such vectors are particularly important  for quantum mechanics, because (to get ahead a bit) all the eigenvectors we can interrogate by experiment to get any sort of measurement (energy, angular momentum etc. etc.)  are orthogonal to each other.   The dot product of V = 3 * E1 + 4 * E2 with itself is 25.    We can can make < V | V > = 1 by multiplying V by 1/SQRT(25)  — check it out.  Such a vector is said to be normalized.  Any vector you meet in quantum mechanics can and should be normalized, and usually is, except on your homework, where you forgot to do it and got the wrong answer. Vectors which are both orthogonal to each other and normalized are called (unsurprisingly) orthonormal.

I’d love to be able to put subscripts on the variables, but at this point I can’t, so here are the naming conventions once again.

x^2 means x times x (or x squared)
x1 means x with subscript 1 (when x is a small letter)
x57 (note two integers follow the x not one) means a matrix element with the first number for the Row and the second for the Column — mnemonic Roman Catholic
XV, etc. etc. are to be taken as vectors (I’ve got no way to put an arrow on top of them)
E1E2, are the standard basis vectors — E1 = (1, 0 , 0 . . ), E2 = (0, 1, 0, .. ), En = (0, 0, … 1), Ei stands for any of them
# stands for any number (which can be real or complex)
i (in italics) always stands for the SQRT[-1]
* has two meanings. When separated by spaces such as x * x it means multiply e.g. x^2
When next to a vector V* or a letter x* it means the complex conjugate of the vector or the number (see later)

The dot product of a vector V can be written 3 ways  V.VV,V> and < VV >.  Since physicists use the last one, that’s what I’ll stick to (mostly).

Recall that to get a real number from the dot product of a complex vector with itself, one must multiply the vector V by its complex conjugate V*.  Here what the complex conjugate is again.  Given a complex number z = a + ib, its complex conjugate (written z*) is a – ib.

z * z* (note the different uses of *) =   a^2 + b^2, which is a real nonNegative number because a and b are both real.  Note that conjugating a complex number twice doesn’t change it –e.g.  z** = z.

This modification of the definition of dot product for complex vectors, leads to significant complications. Why? When V, W are vectors with complex coefficients < VW > is NOT the same as < WV > unlike the case where the vectors have all real coefficients.  Here’s why.  No matter how many components a complex vector has, the dot product is only a sum of the products of just two complex numbers with each other (see the previous post).  The product of two complex numbers is just another one, as is the sum of any (finite) number of complex numbers.  This means that multiplying a mere two complex numbers together will be enough to see the problem. To avoid confusion with V and W which are vectors, I’ll call the complex numbers p and q. Remember that p1, p2, q1 and q2 are all real numbers and i is just, well i (the number which when multiplied by itself gives – 1).

p = p1 + p2i,       q = q1 + q2i

p* = p1 – p2i,      q* = q1 – q2i

p times q* = (p1 + p2i) * (q1 – q2i) = (p1 * q1 + p2 * q2) + i (p2 * q1 – p1 * q2)

p* times q = (p1 – p2i) * (q1 + q2i) =  (p1 * q1 + p2 * q2) + i(p1 * q2 – p2 * q1)

Note that the terms which multiply i are NOT the same (but they are the negative of each other).   So what does < V | W > mean?  Recall that

V = v1 * E1 + v2 * E2 + . … vn * En

W = w1 * E1 + . . + wn * En

VW > = v1 * w1 + v2 * w2 + . . . + vn * wn. ; here the * means multiplication not complex conjugation.

Remember that v1, w1, v2, etc. are now complex numbers, and you’ve just seen that v1* times w1 is NOT the same as v1 times w1*.  Clearly a convention is called for. Malheureusement, physicists use one convention and mathematicians use the other.   Since this is about quantum mechanics, here’s what physicists mean by < V | W >.  They mean the dot product of V* (whose coefficients are the complex conjugates of v1, . . v2) with W.  More explicitly they mean V* . W, but when written in physics notation < V | W >, the * isn’t mentioned (but never forget that it’s there).

Now  v1 * w1 + v2 * w2 + . . . + vn * wn is just another complex number — say z =  x + iy.  To form its complex conjugate we just negate the iy term to get z* = x – iy

Look at

p times q* = (p1 + p2i) *  (q1 – q2i) = (p1 * q1 + p2 * q2) + i (p2 * q1 – p1 * q2)

p* times q = (p1 – p2i) * (q1 + q2i) =  (p1 * q1 + p2 * q2) + i(p1 * q2 – p2 * q1)

once again.  Notice that p times q* is just the complex conjugate of p* times q

So if < V | W > = v1* * w1 + v2* * w2 + . . . + vn* * wn = x + iy ;  here * means 2 different things, complex conjugation when next to vi and multiplication when between vi and wi (sorry for the horrible notation, hopefully  someone knows how to get subscripts into all this).

By the physics convention < W | V > is w1* * v1 + w2* * v2 + . . . + wn* * vn.  Since p times q* is just the complex conjugate of p* times q,  w1* * v1 is the complex conjugate of w1 * v1*.  This means  w1* * v1 + w2* * v2 + . . . + wn* * vn = x – iy.

In shorthand < V | W > = < W | V >*, something you may have seen and puzzled over.  It’s all a result of wanting the dot product of a complex vector to be a real number.  Not handed down on tablets of stone, but the response to a problem.

Next up, vector spaces, linear transformations on them (operators) and their matrix representation.  I hope to pump subsequent posts  out one after the other, but I’m having some minor surgery on the 6th, so there may be a lag.

Next post — https://luysii.wordpress.com/2010/01/11/linear-algebra-survival-guide-for-quantum-mechanics-iiiqrtq/

Linear Algebra survival guide for Quantum Mechanics – I

Every tenor man has to learn to play Body and Soul and every budding chemist has got to learn some quantum mechanics. Forget the Schrodinger equation (for now), quantum mechanics is really written in the language of linear algebra. Feynman warned us not to consider ‘how it can be like that’, but at least you can understand the ‘that’ — e.g. linear algebra. In fact, the instructor in a graduate course in abstract algebra I audited opened the linear algebra section with the remark that the only functions mathematicians really understand are the linear ones.

The definitions used (inner product, matrix multiplication, Hermitian operator) are obscure and strange. You can memorize them and mumble them as incantations when needed, or you can understand why they are the way they are and where they come from — the point of these posts. They were not handed down on tablets of stone.

In what follows, I’ll assume you have some concept of what a vector is (at least in 3 dimensional space) and what complex numbers are. It’s hard to imagine anyone studying quantum mechanics without knowing them to begin with or studying them concurrently.

If someone can tell me how to get mathematical symbols into Kubrick, the template I’m using for this, please post a comment. The guy who wrote Kubrick could not. So the notation which follows is pretty horrible.

x^2 means x times x (or x squared)
x1 means x with subscript 1 (when x is a small letter)
x57 (note two integers follow the x not one) means a matrix element with the first number for the Row and the second for the Column — mnemonic Roman Catholic
X, V, etc. etc. are to be taken as vectors (I’ve got no way to put an arrow on top of them)
E1, E2, are the standard basis vectors — E1 = (1, 0 , 0 . . ), E2 = (0, 1, 0, .. ), En = (0, 0, … 1), Ei  stands for any of them
# stands for any number (which can be real or complex)
i (in italics) always stands for the SQRT[-1]
* has two meanings. When separated by spaces such as x * x it means multiply e.g. x^2
When next to a vector V* or a letter x* it means the complex conjugate of the vector or the number (see later)

The dot product of a vector V (see below) can be written 3 ways  V.V  < V, V> and < V| V >.  Since physicists use the last one, that’s what I’ll stick to.

Linear algebra concerns vector spaces, but before delving into them it’s worthwhile thinking about what linear actually means. Only two things really. A linear function f satisfies
f(a + b) = f(a) + f(b) — thing 1
and
# * f(a) = f(# * a) — thing 2

So what is a NONlinear function? Simple, anything with any variable raised to a power greater than 1. Such as f(x) = x^2. Not linear because
f(a+b) = (a+b)^2 = a^2 + 2 * a * b + b^2 which is not the same as
f(a) + f(b)

Thing 2 implies that if f is linear then f(0) = 0, because 0 * f(a) = 0 and f(0 * a) = f(0) = 0 * f(a)

Quantum mechanics deals in functions (called wavefunctions) and also in operators. Operators change functions into other functions (so they operate on functions the way functions operate on numbers). Differentiation and integration are good examples of operators. They are also good examples of linear operators. The derivative of the sum of two functions is the sum of the derivative of each function taken separtely. Ditto for integrals.

All operators in quantum mechanics are linear operators (this is actually one of the postulates of QM and one of the reasons understanding linear algebra is worthwhile).

The long road to matrix multiplication, eigenvectors and Hermitian matrices begins with the inner product. One can do a lot of linear algebra without them. “Linear Algebra Done Right” by Sheldon Axler doesn’t mention them for the first 99 of its 245 pages. The book is quite clear but only presents the mathematical bones of linear algebra without any physics flesh.

The definition of inner product (dot product) of a vector V with itself written  V | V>, probably came from the notion of vector length. Given the standard basis in two dimensional space E1 = (1,0) and E2 = (0,1) all vectors V can be written as x * E1 + y * E2  (x is known as the coefficient of E1). Vector length is given by the good old Pythagorean theorem as SQRT[ x^2 + y^2]. The dot product (inner product) is just x^2 + y^2 without the square root.

In 3 dimensions the distance of a point (x, y, z) from the origin is SQRT [x^2 + y^2 + z^2]. The definition of vector length (or distance) easily extends (by analogy) to n dimensions where the length of V is SQRT[x1^2 + x2^2 + . . . . + xn^2] and the dot product is x1^2 + x2^2 + . . . . + xn^2. Length is always a non-negative real number.

The definition of inner product also extends to the the dot product of two different vectors V and W where V = v1 * E1 + v2 * E2 + . … vn * En, W = w1 * E1 + . . + wn * En — e.g. V . W = v1 * w1 + v2 * w2 + . . . + vn * wn. Again always a real number, but not always positive as any of the v’s and w’s can be negative. Notice that < V | W > = < W | V > in what we’ve done so far (because we’ve assumed that all the v’s and w’s are real numbers).

However quantum mechanics deals in vectors in which the coefficients of Ei (e.g. v1, w5 etc. etc.) can be complex numbers (which are always of the form a + bi where a and b are real numbers). It’s one of the things about QM you must accept — remember Feynman — don’t ask why it has to be that way. The complex conjugate of v = a + bi is defined a – bi (written v*). Multiplying v * v* together (note the two distinct uses of *) gives a^2 + b^2, which is a nonnegative (it could be zero)  real number since a and b are both real numbers.

All observables in quantum mechanics are real numbers, but the vectors representing quantum states are complex vectors (they have complex coefficients). So to get a real number from the dot product of a complex vector with itself, one must multiply the vector V by its complex conjugate V*.

This modification of the definition of dot product for complex vectors, leads to significant complications. Why? When V, W are vectors with complex coefficients < V | W > is not the same as< W | V > unlike the case when the vectors have all real coefficients.

That’s all for now. In the next post I’ll explain how quantum mechanics gets around this.

Next post — https://luysii.wordpress.com/2010/01/06/linear-algebra-survival-guide-for-quantum-mechanics-ii/