Category Archives: Theological implications of simple chmistry

Would a wiring diagram of the brain help you understand it?

Every budding chemist sits through a statistical mechanics course, in which the insanity and inutility of knowing the position and velocity of each and every of the 10^23 molecules of a mole or so of gas in a container is brought home.  Instead we need to know the average energy of the molecules and the volume they are confined in, to get the pressure and the temperature.

However, people are taking the first approach in an attempt to understand the brain.  They want a ‘wiring diagram’ of the brain. e. g. a list of every neuron and for each neuron a list of the other neurons connected to it, and a third list for each neuron of the neurons it is connected to.  For the non-neuroscientist — the connections are called synapses, and they essentially communicate in one direction only (true to a first approximation but no further as there is strong evidence that communication goes both ways, with one of the ‘other way’ transmitters being endogenous marihuana).  This is why you need the second and third lists.

Clearly a monumental undertaking and one which grows more monumental with the passage of time.  Starting out in the 60s, it was estimated that we had about a billion neurons (no one could possibly count each of them).  This is where the neurological urban myth of the loss of 10,000 neurons each day came from.  For details see

The latest estimate [ Science vol. 331 p. 708 ’11 ] is that we have 80 billion neurons connected to each other by 150 trillion synapses.  Well, that’s not a mole of synapses but it is a nanoMole of them. People are nonetheless trying to see which areas of the brain are connected to each other to at least get a schematic diagram.

Even if you had the complete wiring diagram, nobody’s brain is strong enough to comprehend it.  I strongly recommend looking at the pictures found in Nature vol. 471 pp. 177 – 182 ’11 to get a sense of the  complexity of the interconnection between neurons and just how many there are.  Figure 2 (p. 179) is particularly revealing showing a 3 dimensional reconstruction using the high resolutions obtainable by the electron microscope.  Stare at figure 2.f. a while and try to figure out what’s going on.  It’s both amazing and humbling.

But even assuming that someone or something could, you still wouldn’t have enough information to figure out how the brain is doing what it clearly is doing.  There are at least 3 reasons.

l. Synapses, to a first approximation, are excitatory (turn on the neuron to which they are attached, making it fire an impulse) or inhibitory (preventing the neuron to which they are attached from firing in response to impulses from other synapses).  A wiring diagram alone won’t tell you this.

2. When I was starting out, the following statement would have seemed impossible.  It is now possible to watch synapses in the living brain of awake animal for extended periods of time.  But we now know that synapses come and go in the brain.  The various papers don’t all agree on just what fraction of synapses last more than a few months, but it’s early times.  Here are a few references [ Neuron vol. 69 pp. 1039 – 1041 ’11, ibid vol. 49 pp. 780 – 783, 877 – 887 ’06 ].  So the wiring diagram would have to be updated constantly.

3. Not all communication between neurons occurs at synapses.  Certain neurotransmitters are generally released into the higher brain elements (cerebral cortex) where they bathe neurons and affecting their activity without any synapses for them (it’s called volume neurotransmission)  Their importance in psychiatry and drug addiction is unparalleled.  Examples of such volume transmitters include serotonin, dopamine and norepinephrine.  Drugs of abuse affecting their action include cocaine, amphetamine.  Drugs treating psychiatric disease affecting them include the antipsychotics, the antidepressants and probably the antimanics.

Statistical mechanics works because one molecule is pretty much like another. This certainly isn’t true for neurons. Have a look at  This is of the cerebral cortex — neurons are fairly creepy looking things, and no two shown are carbon copies.

The mere existence of 80 billion neurons and their 150 trillion connections (if the numbers are in fact correct) poses a series of puzzles.  There is simply no way that the 3.2 billion nucleotides of out genome can code for each and every neuron, each and every synapse.  The construction of the brain from the fertilized egg must be in some sense statistical.  Remarkable that it happens at all.  Embryologists are intensively working on how this happens — thousands of papers on the subject appear each year.

As my brain slowly recovers (or at least gets used to)  from the chemical assault on it by inhaled corticosteroids and muscarinic anticholinergic drugs, I’m having a lot of fun reading a book by Melanie Mitchell “Complexity: A Guided Tour” — but her view of neurons is simplistic in the extreme — hopefully that will improve in the last 100 pages.  A book review will follow.  The whole book is quite relevant to the question — just what would you accept as an explanation of how the brain does what it does?  The question leads into some deep philosophic minefields, but they can’t be avoided.  That’s for another time.

Hopefully, I’ll be able to get back to Anslyn and Dougherty in the coming week (after taxes).

Why aren’t we all dead ?

Anslyn && Dougherty is even more fun than Clayden et. al.  It’s far more advanced, and I’m certainly glad I read Clayden first.  On p. 24 they talk about the polarizability of molecules, sonething distinct  from the dipole moment of the molecule.  Polarizability is the ability of the molecule’s electron distribution to distort in the presence of an electric field.  I was suprised to find that the usual suspects (e.g. water) aren’t that polarizable and that the champs are hydrocarbons.   They don’t say how polarizability is measured, but I’ll take them at face value.

We wouldn’t exist without the membranes enclosing our cells which are largely hydrocarbon.  Chemists know that fatty acids have one end (the carboxyl group) which dissolves in water while the rest is pure hydrocarbon.  The classic is stearic acid — 18 carbons in a straight chain with a carboxyl group at one end.  3 molecules of stearic acid are esterified to glycerol in beef tallow (forming a triglyceride).  The pioneers hydrolyzed it to make soap. Saturated fatty acids of 18 carbons or more are solid at body temperature (soap certainly is), but cellular membranes are fairly fluid, and proteins embedded in them move around pretty quickly.  Why?  Because most fatty acids found in biologic membranes over 16 carbons have double bonds in them.  Guess whether they are cis or trans.   Hint:  the isomer used packs less well into crystals — you’ve got it, all the double bonds found in oleic (18 carbons 1 double bond), arachidonic (20 carbons, 4 double bonds) are trans – this keeps membranes fluids as well.   No, they are cis — thanks to PostDoc for pointing this out.  The cis double bond essentially puts a 60 degree kink in the hydrocarbon chain, making it much more difficult to pack in a liquid crystal type structure with all the hydrocarbon chains stretched out.   Then there’s cholesterol which makes up 1/5 or so of membranes by weight — it also breaks up the tendency of fatty acid hydrocarbon chains to align with each other because it doesn’t pack with them very well.  So cholesterol is another fluidizer  of membranes.

How thick is the cellular membrane?  If you figure the hydrocarbon chains of a saturated fatty acid stretched out as far as they can go, you get 1.54 Angstroms * cosine (30 degrees)  = 1.33 Angstroms/carbon — times 16 = 21 Angstroms.  Now double that because cellular membranes are lipid bilayers meaning that they are made of two layers of hydrocarbons facing each other, with the hydrophilic ends (carboxyls, phosphate groups) pointing outward.  So we’re up to 42 Angstroms of thickness for the hydrocarbon part of the membrane.  Add another 10 Angstroms or so for the hydrophilic ends (which include things like serine, choline etc. etc.) and you’re up to about 60 Angstroms thickness for the membrane (which is usually cited as 70 Angstroms — I don’t know why).

Neurologists and neurophysiologists spent a lot of time thinking about membranes, particularly those of neurons.  In all these years, I’ve never hear anyone talk about hydrocarbon polarizability.  It ought to be a huge factor in membrane function.  Why?  Because of the enormous electric field across the membranes enclosing all our cells (not just our neurons).  The potential across the membranes is usually given as 70 milliVolts (inside negatively charged, outside positively charged).  Why is this a big deal?

Because the electric field across our membranes is huge.  70 x 10^-3 volts is 70 milliVolts.  70 Angstroms is 7 nanoMeters (7 x 10^-9) meters.  Divide 7 x 10^-3 volts by 7 x 10^-9 and you get a field of 10,000,000 Volts/meter.   If hydrocarbons are ever going to polarize they should in this environment.  The college physics book I bought for the Quantum Mechanics course a while ago — “Physics for Scientists and Engineers” 4th edition p. 662 talks about lightning.  The potential difference leading to the discharge is the same; 10,000,000 Volts.  This results in a much smaller electric field (probably by a factor of 1,000) because clouds aren’t 1 meter off the ground.

So why don’t our cells collapse and we die?  I don’t know.

Here are a few Physics 102 questions for the cognoscenti out there.

l. Potential difference is due to charge separation.  Assume a flat membrane 1 micron square and 70 Angstroms thick.  How much charge must be separated to account for a potential of 70 milliVolts.  Answer in number of charges rather than Coulombs.

2. Now let’s get real.  We’re talking about neuronal processes here.  So lets talk about a cylindrical membrane 1 micron long (remember that some neuronal processes — such as those going from your spinal cord to your big toe are a million times longer than this).  Diameters of our nerve fiber range from 1 micron to 25 microns.  Ignoring the complication of the myelin sheath, how much charge must be separated to produce a potential across the membrane of the neuronal process of 70 milliVolts.

The more you think about life, the more remarkable it becomes.

Some New Year’s thank you’s – II

Continuing the thank you’s  from your division CEO.

Fourth: The gals in the steno pool.  Over and over they type out parts of the business plan when needed.  The plan itself is immense. War and Peace (English translation) has over half a million words and 3,100,000 characters.  The business plan comes in a weird language with only 4 characters.  But group two of them together and you have 16 possibilities, group 3 and you have 64. The plan itself contains 3.2 billion of these weird characters, or well over 1000 copies of Tolstoy’s epic.

No one is sure just how accurate they are, but if they’re like the group of stenos that copies the whole plan when the organization sprouts a new division, it’s impressive.  About one mistake in 100 million. For details see “”

Strangely, for a long time soi-disant experts thought most of the plan was junk.  It was strangely repetitive and because it didn’t code for the division’s buildings it was dismissed.  Now we know that some parts of the plan not coding for buildings tell us where to put them, when to make them and how many to make.  We now know that the girls are transcribing at least half of the plan, and perhaps most of it.  The experts for a time thought that this was like the turnings from a lathe, intellectual chaff if you will, but now they’re not quite so smug.

Fifth: Manufacturing — row on row of factories turning out (prefab) buildings.  So much so that from the air (which, 100 years ago was the only way to see our division) it was though to be unique to our division class (it was called Nissl substance).  There are a few factories in the far reaches of the division, but most factories are right here in the center with me.  One of shipping’s big jobs is to get the buildings where they’re supposed to go.  All this manufacturing and shipping consumes a lot of energy, so much so that even though our set of divisions constitutes just 3% or less of the organization we consume 20% of the energy of the entire enterprise.

Sixth: Communications — this is both a curse an a blessing.  Our division receives about 1,000 incoming lines from other divisions and they never shut up.  Not only that but they call as often as once every thousandth of a second.  Some of this is handled right at the incoming line, but guess who has to absorb all this information and decide whether or not to send it on.  The decision has been described by some as a computation, but it is far from straight forward.

The outgoing communications don’t use shipping (far too slow).  Special buildings all over the periphery of the organization exist to send things out so that information can go down the 1 meter of so in around 1/100th of a second.  If we were using the trucker analogy of going 90,000 miles instead of a meter, this would be 50 times the speed of light.

Sometimes we have to really step up the pace of our messages.  Pity the poor divisions connected to the cervical spinal cord where commands to move the fingers are received.  The boss has been practicing his piano like a banshee, and is now able to play 10 notes of a C scale in a second.  That’s one message every 100 milliSeconds.  He complains if they arrive unevenly.

Even as busy as the division is, I occasionally wonder about the organization as a whole (the job of the CEO is to think about the larger picture).  I wonder how many divisions there are, and what or who organized us.  Amazingly, no one knows just how many divisions of our type there actually are.  Estimates years ago were in the millions.  Now they’re in the billions.  No one has ever actually counted us, just estimates are all we have.  Hell of a way to run an organization.  Who decides which incoming lines hit our division.  I’m not sure how the division figures out who to send our messages to.  It doesn’t seem conscious.

I’m pretty sure that the business plan can’t specify this sort of stuff.  With only 3.2 billion characters each of which is one of 4 possibilities, this isn’t enough to individually address each of the billions of divisions of the organization.  How did our division every find the division which controls the gastronemius and soleus anyway.  Rumor has it that the entire organization with its billions of divisions arose from just one division like me.  Sort of the big bang of business.  Apparently this happens again and again.  Very hard for this CEO to believe that it all arises by chance.  I’ve been told that I lack sufficient faith that this is so.

Well anyway, our division has done a great job in the past year and we look forward to the next.  I did hear that the boss is thinking of learning to play the organ.  Heaven help us.

Some New Year’s thank you’s – I

Even though I’m the CEO of a tiny department of a very large organization, it’s time to thank those unsung divisions that make it all possible.  It’s been a very good year. Thanks in part to our work, the boss is a lot more adept at using the pedal when he plays the piano.

First: thanks to the guys in shipping and receiving.  Kinesin moves the stuff out and Dynein brings it back home.  Think of how far they have to go.  The head office sits in area 4 of the cerebral cortex and K & D have to travel about 3 feet down to the motorneurons in the first sacral segment of the spinal cord controlling the gastrocnemius and soleus, so the boss can press the pedal on his piano when he wants. Like all good truckers, they travel on the highway.  But instead of rolling they jump.  The highway is pretty lumpy being made of 13 rows of tubulin dimers.

Now chemists are very detail oriented and think in terms of Angstroms (10^-10 meters) about the size of a hydrogen atom. As CEO and typical of cell biologists, I have to think in terms of the big picture, so I think in terms of nanoMeters (10^-9 meters).  Each tubulin dimer is 80 nanoMeters long, and K & D essentially jump from one to the other in 80 nanoMeter steps.  Now the boss is shrinking as he gets older, but my brothers working for players in the NBA have to go more than a meter to contract the gastrocnemius and soleus (among other muscles) to help their bosses jump.  So split the distance and call the distance they have to go one Meter.  How many jumps do Kinesin and Dynein have to make to get there? Just 10^9/80 — call it 10,000,000. The boys also have to jump from one microtubule to another, as the longest microtubule in our division is at most 100 microns (.1 milliMeter).  So even in the best of cases they have to make at least 10,000 transfers between microtubules.  It’s a miracle they get the job done at all.

To put this in perspective, consider a tractor trailer (not a truck — the part with the motor is the tractor, and the part pulled is the trailer — the distinction can be important, just like the difference between rifle and gun as anyone who’s been through basic training knows quite well).  Say the trailer is 48 feet long, and let that be comparable to the 80 nanoMeters K and D have to jump. That’s 10,000,000 jumps of 48 feet or 90,909 miles.  It’s amazing they get the job done.

Second: Thanks to probably the smallest member of the team.  The electron.  Its brain has to be tiny, yet it has mastered quantum mechanics because it knows how to tunnel through a potential barrier.   In order to produce the fuel for K and D it has to tunnel some 20 Angstroms from the di-copper center (CuA) to heme a in cytochrome C oxidase (COX).  Is the electron conscious? Who knows?  I don’t tell it what to do.   Now COX is just a part of one of our larger divisions, the power plant (the mitochondrion).

Third: The power plant.  Amazing to think that it was once (a billion years or more ago) a free living bacterium.  Somehow back in the mists of time one of our predecessors captured it.  The power plant produces gas (ATP) for the motors to work.  It’s really rather remarkable when you think of it.   Instead of carrying a tank of ATP, kinesin and dynein literally swim in the stuff, picking it up from the surroundings as they move down the microtubule.  Amazingly the entire division doesn’t burn up, but just uses the ATP when and where needed.  No spontaneous combustion.

There are some other unsung divisions to talk about (I haven’t forgotten you ladies in the steno pool, and your incredible accuracy — 1 mistake per 100,000,000 letters [ Science vol. 328 pp. 636 – 639 ’10 ]).  But that’s for next time.

To think that our organization arose by chance, working by finding a slightly better solution to problems it face boggles this CEO’s mind (but that’s the current faith — so good to see such faith in an increasingly secular world).

The essential strangeness of the proteins that make us up


        It doesn’t take much energy to denature a protein.  About .4 kiloJoules/amino acid, so that a protein of 100 loses its function (denatures) with an energy input of 40 kiloJoules/Mole or about the energy required to break two measly hydrogen bonds [ Voet and Voet Biochemistry Ed. 3 p. 258 ].  Covalent bonds are a lot stronger, with carbon carbon single bonds and C – H bonds ten times stronger.   All you have to do to denature chymotrypsin is pull apart its catalytic triad of histidine at position #57, aspartic acid at #102 and serine at #195.  Clearly to get these 3 amino acids together the protein backbone has to turn and twist in space. Separating them doesn’t take much energy.  
        Amazingly, denature many of them and they spontaneously reform the active structure.  Certainly the first such protein studied this way (ribonuclease by Anfinsen) did just that, leading to the idea that the 3 dimensional structure of a protein was determined by linear sequence of its amino acids along the backbone.   
      Over the decades crystal structure of protein after protein was solved by Xray crystallography, and everyone came to think of proteins as having ‘a’ structure.  It was quickly found that there are parts in many proteins that won’t sit still even for crystallography, and it is now estimated [ Proc. Natl. Acad. Sci. vol. 103 pp. 12353 – 12358 ’06 ] that 30% of all proteins have stretches of over 30 amino acids that are intrinsically disordered.
        Now sight your eye at the alpha carbon of one of the amino acids of a protein, looking toward the carbonyl carbon.  There are 3 conformational energy minima the carbonyl can adopt.  That’s potentially 3’^99 = 10^48 conformations.  This is clearly an overestimate because of self intersection, but still quite large.  Yet to be crystallizable the protein must choose just one of them and it must be lower in energy by 2 hydrogen bonds than all the rest. 
       Now think like a chemist and think about the side chains of the amino acids.  The hydrocarbon types (alanine, glycine, valine, leucine, isoleucine and perhaps methionine) can dissolve in each other.  Hydrogen bonding is possible between the serine and threonine and any carbonyl on the side chain or any of the amines. Salt bridges are possible between the two acids and 3 of the bases.   The list goes on and on.   Yet somehow the 195+ amino acids of chymotrypsin spontaneously form this one shape.  As a chemist I find this incredibly strange and unlikely.  Among the 10^48 conformation of a 100 amino acid protein are there none within 40 kiloJoules of ‘the’ structure?  If there are, are the energy barriers so high that it is never found? 
       We’ve seen this happen so often we’ve gotten used to it, but speaking as a former chemist, I find this behavior incredibly strange.  I probably know enough math now to really delve into the physical chemistry of protein folding, but haven’t gotten around to it yet,.  But saying that proteins fall down a potential energy funnel seems (to me) like just a fancy way of saying they fold into one shape.  
         My guess is that an incredibly small fraction of the possibilities in protein space have these properties.  There is an experiment which could possibly prove me wrong.  See gedanken-experiment/.
        I mean you don’t even have to be a chemist to see what I’m talking about.  Back in the day, girls used to wear charm bracelets, with little charms hanging of the chain. Some of the chains attract each other, others have the opposite effect. Make one with 100 charms of 20 different types, throw it into a pail of oil and agitate the pail so it doesn’t sink. Do you think just one shape would result?  
       I think our biochemical sense of wonder has been dulled by what we’ve found so far. For some thoughts on this see   Just this month [ Proc. Natl. Acad. Sci. vol. 107 pp. 17710 – 17715 ’10 ] A new player in bone formation was found.  It’s oleic acid esterified to the hydroxyl group of serine.  How many more things are there like this out there? 
        This has nothing to do with mutation, or the evolution of protein structure by natural selection.  That’s for next time. But if proteins with one or a few structures are as rare as I think them to be, it’s going to be tough to get new proteins with this property from old ones by mutation.  Once obtained,  natural selection can go to work on them.  The problem is getting to them in the first place. 



On the improbability of our existence

There have been some great critical responses to some of the posts, which deserved a reply long ago. All the posts criticized involve either a chemical, molecular biological or numeric argument about the macromolecules making us up.  Here they are in a semi-logical sequence.  I’ll deal with the actual criticisms in the next post(s).

Two posts involve simple calculations about how many distinct proteins or polynucleotides life could have made given the mass of the earth to do so and 14 billion years.  Here are the links. (1) (2)

No one has criticized the correctness of the calculations, which show that life on earth could have made only an infinitesmal fraction of the possible proteins of only 100 amino acids, or polynucleotides of 100 bases.  If you disagree say so now.  There has been severe criticism of the implication that evolution works by randomly trying out all such possibilities.  I didn’t really say that, and will deal with this in the next post.  I do think that all of us agree that mutations occur randomly (recombination hotspots excepted) so that the grist for the evolutionary mill is formed essentially willy nilly.    If you disagree say so now. 

What I was really getting at, is that I find the proteins which make us up rather miraculous in that (1) they have one or just a few conformations which give then a fairly stable shape — they certainly do or we wouldn’t be here.  For details see  (2) their side chains don’t react with each other.  For details see 

I think proteins with such magical properties are exceedingly uncommon.  So how would you know how common such proteins actually are ?  While possible in theory, the experiment to investigate the structures of a random sequence of amino acids is impossible to carry out fully.  For details see htts://  It still might give an answer if nearly every random sequence  of say 60 – 100 amino acids had just one or a few structures.  

I’m unimpressed with the argument that there are only 1000 or so protein folds, which significantly narrows the search space. There are huge numbers of proteins in the microorganisms living in the sea which far outnumber what we’ve already studied. Even if correct, how would random mutation find them?   I’d love to see the results of the ‘glass eye’ experiment — for details see

Finally, I must admit that these speculations provided a certain degree of comfort as I watched patients I was unable to help get worse and worse and finally die.  For details see – If our existence is as miraculous as I think it to be, then what really needs to be explained is not suffering and disease, but health and the gift of life.  At long last, a semi-answer to Camus “The Plague” which affected me profoundly as an undergraduate years ago.

Gentlemen, start your engines

On 29 July, Derek Lowe had a short post about Craig Venter (, along with short quote with by Venter describing Francis Collins as a government administrator rather than a scientist, presumably because of Collins’ religious beliefs.  It drew some 76 comments as of today.   Most of the comments concerned whether religion and science were compatible or not.  

Here’s an amusing one

11. bboooooya on July 30, 2010 7:19 AM writes…

“faith or science”

Really? I’ve never seen a discrete electron or neutron myself, but I believe that they exist.

Not able to contain myself any longer, I entered the fray with:

34. retread on July 30, 2010 12:52 PM writes…

Well, to accept that the complexity of cellular biochemistry arose by chance, just from purely random exploration of protein space, requires a faith that trumps anything in Genesis. For details see If you find anything wrong with the purely combinatorial arguments given there, please post a comment there.

Note:  When I was posting for “The Skeptical Chymist”, I used the nom de plume Retread.  When I started Chemiotics II, the name was taken, so I’m using Luysii presently.

This produced

40. daen on July 30, 2010 4:16 PM writes…

@retread: Interesting how some people are so happy to trot out combinatorial complexity arguments to dismiss the possibility of proteins arising through evolution (especially naive, error-filled ones that ignore the fact that it is not random but directed, that many functional proteins consist of repeated sub-groups, that many proteins share functional domains, and so on, all assumptions which prune the combinatorial tree by dozens of orders of magnitude), and yet do not blink at invoking the existence of an omniscient, omnipotent being of infinitely greater complexity to create these complex proteins …

Something about swallowing camels while straining at gnats springs to mind.

To which I replied

43. REtread on July 30, 2010 6:35 PM writes…

#42 Daen: I am far from happy to trot out combinatorial arguments to dismiss the possibility of the present degree of protein complexity and structure arising by chance. I find many of the uses religion has been and is being put to absolutely horrible. I do not like where my arguments seem to lead. They need to be refuted (but I don’t see how).

You need far more than ‘dozens of orders of magnitude’ to trim down protein space so all aspects of it can be explored. The current champ is titin with 30,000 amino acids, 300 modules of three types (1) immunoglobulinlike, (2) type III fibronectin, and (3) unique PEVK insertions. Even linking them together in any particular order is one in 3^300 possibilities, a number larger than all the baryons in the universe.

Only 1.5% of the genome codes for amino acids, but nearly all of it is transcribed, so proteins are only a small part of the story. Molecular biologists are fixed on proteins (they know lots about them, and the technology to study them has been developing for decades). But there is far more to the story. For just how protein-centric molecular biologists are see the current post about Autism Spectrum Disorder.


Since then we’ve had an example of the good and evil to which religion can be put, an example so perfect that I could never have made it up — the slaughter of 10 medical workers in Afghanistan (in the name of religion of course).


This was followed by

48. Wavefunction on July 30, 2010 9:27 PM writes…

Daen is right. I thought we had already made headway into addressing the combinatorial arguments against protein structure and function. Once we accept the co-operative nature of self-assembly, things begin to look much more reasonable. Even a computer program like Rosetta (which is considered state-of-the-art as far as predicting protein folding is concerned) can pare down the vast space of possible protein folding intermediates to a manageable few by using well-established motifs from known protein structures. If this can be done in a few hours by a computer program for a decent-sized protein, I don’t see why it would require an act of faith to believe that nature could implement such a strategy over billions of years.

To which I replied

49. retread on July 30, 2010 10:04 PM writes…

#47 Wavefunction: Of course Rosetta can do this. It starts with proteins which already are known to fold into one shape, to find the how another protein (which is known to have one shape) folds into it. Rosetta is basically starting with the answers in hand, and a question which is known to have an answer.

I’ve got to get some stuff I posted on the Skeptical Chymist back when I was writing for them up on my site for you folks to chew on, but I’m going to be visiting family until the middle of next week. In the meantime have a look at

If mutation is truly random (and it seems to be) I don’t see how nature has the time, space or mass to “do it”.

Followed by:

52. daen on July 31, 2010 5:09 AM writes…

@retread: You’re missing the elephant in the room, which is so often overlooked by those who invoke a purely combinatorial approach to questions of how functional biological systems arise. The elephant is that all proteins do something useful, which is non-random. A naive combinatorial approach based on pure random chance does not take into account the equally sound physical principles of natural selection, which is anything but random. An organism alive today exists in a state of extreme adaptation, from its gross morphology down to its molecular biology. Working backwards, at every step of the way, its ancestors survived. Mutations conferring an adaptive survival advantage upon those ancestors can be traced backward, generation by generation. Other mutations, which may have been deleterious or which did not confer sufficient advantage, have been lost. Surely you know this; it is at the heart of the modern evolutionary synthesis. So to invoke a pure random chance argument and express surprise at the vast numbers it throws up is incongruous and, worse, plain wrong. Your argument is utterly specious.

Followed by:

56. LeeH on August 1, 2010 10:48 AM writes…

@retread: For a simple concrete example of how good solutions to problems that are seemingly infinite can be generated “randomly” simply Google genetic algorithm solutions to the travelling salesman problem.

Briefly, if a salesman has to travel between multiple cities and you want to know the best (i.e. shortest) way to do it, once you consider a rather trivial number of cities you are considering, if done exhaustively, more possible paths than there are atoms in the universe. Yet, using “random” selective methods (such as GAs) you can have Excel generate good solutions in a matter of minutes.

Perhaps this implies that God is somehow, in a divine, intelligent way, extending his mighty hand into Microsoft’s product. A more likely explanation is that invoking combinatoric arguments, without truly understanding combinatorics, is not the way to refute the conclusions of thousands of man-years of consistent evidence.

Followed by:

68. Wavefunction on August 2, 2010 2:15 PM writes…

@Retread: The point was that Rosetta uses a mix-and-match strategy which makes the conformational space required to be searched much smaller than what would result from random search alone. Nature proceeds in a similar way, non-randomly accumulating pre-existing fragments from known protein structures. It would indeed be miraculous if it were purely random. But it’s really not, and this argument is quite well-trodden.

Using another guy’s  blog for the back and forth about this question, didn’t seem quite kosher, so I’ve put up two of the posts I wrote for the Skeptical Chymist (they are the previous two on this blog) which explain my thinking behind  my original comment. How life came into being is one  of the most profound questions we can ask.  Even though presumably scientific, there is no way it can be disentangled from its theological and philosophic implications. Aren’t we fortunate to live when we live, know the chemistry and physics that we know, and possess some of the data needed to address it on a nonintuitive basis.  

So start your engines and comment away (either on the previous two posts or this one).  I’ll eventually respond to all of them, but it may be a while, as on the 15th I leave for “Band Camp for Adults” for a week.

Molecular Biology survival guide for Chemists – II: What DNA is transcribed into

We have 3 RNA polymerases which transcribe DNA into RNA.  Transcription starts at the 3′  end of one of the members of the DNA helix and proceeds toward the 5′ end.  However the RNA produced starts at the 5′ end and proceeds toward the 3′ end.  Why transcribe you might ask?  Because the chemical language is the same — DNA and RNA are both polynucleotides.  The Guanine in DNA codes for Cytosine in RNA, etc. etc.

RNA polymerase I (Pol I to you) transcribes the genes for the RNA found in the ribosome (ribosomal RNA also known as rRNA), RNA polymerase II (Pol II) transcribes the genes for proteins into messenger RNA (mRNA), while RNA polymerase III (Pol III) transcribes the genes for transfer RNA (tRNA) and a lot more. Med students love mnemonics, so here’s one — I makes rRNA, II makes mRNA, III makes tRNA — so the polymerases and the products are in (semi) alphabetical order.

The ribosome is an incredible molecular machine — it contains several RNAs (called rRNAs) containing in total about 4,500 nucleotides and about 50 proteins.  The molecular mass is about 2,500,000 Daltons.  Its job, and its only job as far as we know is to translate the mRNA into protein.  Why translate? Because polynucleotides and proteins are chemically quite different. So information is being translated from one language to another.  Transfer RNAs (tRNAs) are involved. Each different tRNA brings a just one specific amino acid to the ribosome, which then stitches the amino acid to the growing protein.  Since we have 64 possible codons for amino acids (that’s 4^3), we have an abundance of tRNA genes in our DNA, well over 400.

Now it’s time to speak of mRNA or, actually, pre-mRNA.  The previous post noted that most genes come in pieces, parts coding for amino acids (called exons) and parts between the exons, called the introns.  Pol II knows nothing of them, just as the CPU knows nothing of the series of bits it is fed in a program.  It just starts transcribing DNA at a certain point, making mRNA willy nilly, intron and exon and finally quiting.

As mentioned in the previous post, dystrophin has over 2 million nucleotides in its DNA, all of which are transcribed into RNA.  The parts of the RNA actually coding for amino acids is under 15,000 nucleotides long, so all the introns must be spliced out.  This is the function of the spliceosome — another huge molecular machine. It contains 5 RNAs (called small nuclear RNAs, aka snRNAs), along with 50 or so proteins with a total molecular mass again of around 2,500,000 kiloDaltons.   Splicing out introns is a tricky process which is still being worked on.  Mistakes are easy to make, and different tissues will splice the same pre-mRNA in different ways.  All this happens in the nucleus before the mRNA is shipped outside where the ribosome can get at it.

There are some incredible fail safe mechanisms here.  The spliceosome associates a few proteins with the spliced together exon/exon junction, so that if and when the mRNA is read (translated) by the ribosome, if a termination codon occurs too early in the gene, truncating the protein prematurely, a process called nonsense mediated decay destroys the defective mRNA.

The mature mRNA just before it is ready to leave the nucleus has several parts.  From the 5′ end it has a bunch of nucleotides prior to the first codon for the protein (always an AUG which codes for methionine). This is called the 5′ UnTranslated Region (5′ UTR).   U, by the way, stands for Uridine which is the nucleotide in RNA corresponding to thymine in DNA.  Then there is the protein coding part, then there is the 3′ part which is not translated into protein (called the 3′ UnTranslated Region, 3′ UTR).  When Pol II is finished translating the gene, a long stretch of adenines (polyAdenine aka polyA) is added somewhere in the 3′ UTR.   It is added about 30 nucleotides downstream (3′ to) an AAUAAA sequence found in the 3′ UTRs of most protein coding genes.   There are some 20 – 260 adenines in a row in the polyA tract.  Addition is important, as polyA protects the mRNA from degradation — very few things in the cell hang around forever.   Each time the ribosome translates the mRNA into protein some adenines are lost, so for those of you familiar with computer programming, you can regard the polyA as a loop counter.

The 3′ UTR also contains sites where yet another type of RNA (called microRNA) binds.  Genes for microRNA  are also transcribed by Pol II.  Their precursor (pre-microRNA) is then extensively processed (I’ll spare you the gory details)  to form mature microRNAs, which, as the name implies, are rather short — only 20 – 22 nucleotides.  MicroRNAs represent one of the many forms of control on the amount of a given protein that a cell contains. They basepair with complementary sequences in the 3′ UTR of mRNAs and either (1) inhibit protein synthesis of the mRNA by the ribosome or (2) cause degradation of the mRNA.  It’s important to note that a given microRNA can control the levels of many different proteins, if the complementary region is present in their 3′ UTRs.  Also the 3′ UTR of a given mRNA can have regions complementary to many different microRNAs.

That’s quite a bit to throw at you.  I’ve omitted a lot of the complexity, to make the goings on as simple and clear as possible.  Hopefully, I haven’t violated Einstein’s dictum “Everything should be made as simple as possible, but not simpler”.  I think what I’ve said is quite accurate, but comments and corrections are always welcome.

The more I know about the goings on inside our cells, the more impressed I become, and the greater the leap of faith I must make to accept that this all arose by chance.


The next article in the series —

How many distinct RNA polymers can be made using the mass of the earth to do so?

The comments by Another O-chemist and Yggdrasil on the last post were excellent, and just the type I’d hoped to get, but before responding I’d like to throw this post into the mix.

Why RNA? Because that’s what the earliest forms of life were made of according to the best current speculations. What is the mass of the average RNA nucleotide? (base + sugar + phosphate). Phosphate has a mass of 96 Daltons, ribose a mass of 115 Daltons, and the ‘average base has a mass of (112 + 115 + 134 + 150)/4 = 128. So the average mass of an RNA nucleotide is 96 + 115 + 128 = 339 or very nearly 3 nucleotides per kiloDalton.

As before, according to Halliday’s Physics 6 Edition the mass of the earth is 6 * 10^27 grams. Assume the earth is entirely made of C, H, O, N and P in just the proportions we need. By the calculations in the previous post, a kilodalton has a mass of 10^-21 grams. Each position in the polyribonucleotide can be one of the 4 bases.

Now it’s time to calculate the number of distinct possibilities for a polyribonucleotide of length n. Pretty simple — it’s just 4^n. Order is crucial, just as united has a different meaning from untied, GACU is different from AGCU (the nucleotides of RNA are abbreviated A, G, C, U).

So for a polyribonucleotide length of n = 21 there are 4,398,046,511,104 distinct orderings of the nucleotides, Each ordering has a mass of 21/3 = 7 kiloDaltons. That’s 4 trillion of them. We’re up to a mass for all orderings of 10^-9 of a gram (a nanoGram) without breathing hard.

Length 42 gets us to 18,446,744,073,709,551,616 — about 2 * 10^19 possibilities,m each with a mass of 42/3 = 14 kiloDaltons. We’re within two orders of magnitude of 1 gram.

Given the current ratio of the genetic code of 3 nucleotides/amino acid, that’s only enough for a 14 amino acid peptide. Now the 64 possible 3 nucleotide codons only code for 20 amino acids + 1 stop codon, so there is some coding overkill in these numbers.

Not so fast. Consider progeria, a terrible (but fortunately rare) disease — only 50 kids with it worldwide. [ Nature vol. 440 pp. 32 – 34 ’06 ]. Unfortunates with progeria age rapidly and die of old age diseases (heart attack and stroke) in their teens. [ Nature vol. 423 pp. 293 – 298, 298 – 301 ’03 ] The defective gene has been found and is Lamin A (a component of the nucleus which helps to shape it). 18/20 cases showed a de novo mutation at the same place in the gene (1825 C –> T) — in codon #608. This doesn’t change the amino acid (which is glycine) but results in a cryptic splice site within exon 11 resulting in the production of a protein with 50 amino acids missing near the carboxy terminus (but the carboxy terminal end of the protein is still there and can be farnesylated). The truncated mutant is called progerin.

So even two distinct codons mapping to the same amino acid can have profoundly different effects. Further examples include the exonic splicing enhancers and inhibitors. For details see my post of 20 Jan ’09 “The Death of the Synonymous Codon” under Chemiotics in the blog of “The Skeptical Chymist”. It’s too long to go into here but pretty interesting

Onward and upward. 4^60 is 1,329,227,995,784,915,872,903,807,060,280,344,580 or about 10^36 polynucleotides 60 bases long each with a mass of 20 kiloDaltons. The mass of all 10^36 of them is then 2 x 10^37 kiloDaltons. Recall that a kiloDalton is 10^-21 grams, so this group has an aggregate mass of 10^16 grams. It’s pretty clear that by the time we get to a polynucleotide of 90 units we’ll have exhausted the mass of the earth.

4^90 = 1532495540865888858358347027150309180000000000000000000

The ribosome is thought to be a molecular fossil of the RNA world. Although there are some 50 proteins to be found on its surface, its catalytic center is pure RNA. How large are the RNAs of the ribosome? Here’s what Molecular Biology of the Cell 4th edition says (p. 343). The eukaryotic ribosome has a molecular mass of 4.2 megaDaltons and is an 80S particle (S stands for Svedberg unit). It is comprised of a 60S subunit of mass 2.4 megaDaltons and a 40S subunit of mass 1.4 megaDaltons. The 60S subunit has 3 ribosomal RNAs of 5S (120 nucleotides), 28S (4700 nucleotides) and 5.8S (160 nucleotides). The 40S subunit has a single 18S rRNA of 1900 nucleotides.

I leave it to the readers to propose a mechanism to achieve this combinatorial feat. I’m satisfied that the above argument shows that randomly trying out all possibilities and coming up with the RNAs of the ribosome is physically impossible. In some way the nucleotides of the ribosomal RNAs must be linked together consistently. RNA dependent polymerases are known which can do it (but they are proteins). Assume that there exists an RNA which can act as the enzyme to link RNA nucleotides together (the way the ribosome links amino acid together) — a big assumption, but one which current speculation seems to require. Such an enzyme made out of RNA (a ribozyme) must have a pre-existing template of ribosomal RNA to do so. Where did the template come from? How did it arise?

And so grubby old chemistry, the province of nerds and other lower forms of animal life, puts us in direct contact with profound questions of existence. Perhaps it will supply an answer as well.

How many proteins can be made using the entire mass of the earth to do so ?

The mass of the earth is given by my physics book (Halliday 6th Ed.) as 6 x 10^27 grams. If we made just one molecule of each protein containing n amino acids linked together, when would we run out of material? Make a guess. I found the results surprising.

Assume the earth is made of nothing but hydrogen, oxygen, nitrogen, carbon and sulfur. Clearly not true, but we’re going for what mathematicians call an upper bound. If mathematicians can get away with things like “consider a spherical cow” I can get away with this. (The cognoscenti may wish to go for a least upper bound). Proteins are linear chains of 20 different amino acids ranging in mass from glycine at 79 Daltons to tryptophan at 204. When linked together by an amide (peptide) bond, 18 Daltons of mass is lost (water is split out). So figure the average amino acid at 100 Daltons (roughly).

So there are 20 x 20 = 400 distinct proteins of 2 amino acids, 8000 with 3, 160,000 with 4, 3,200,000 with just 5. Shorties like this are called peptides (or polypeptides) and just when you start calling them proteins seems to be a matter of taste.

We’re figuring the mass of the typical amino acid at 100 Daltons, but a Dalton doesn’t have much mass. It is 1/12 the mass of a single atom of carbon-12, Avogadro’s number (about 6 x 10^23) of which have a mass of 12 grams. So one Dalton has a mass of 10^-24 grams (roughly).

The number of distinct proteins containing n amino acids is 20^n. The mass of each protein (in Daltons) is (roughly) 100 x n — depending on the amino acids chosen. The mass of the collection of distinct proteins of length n in grams is (20^n) x (100 x n) x (10^-24). It’s clear that we’re over 1 gram for the collection at only 24 amino acids (as 20^24 is much larger than 10^-24. How far over? 2^24 x 100 x 24 = 40,265,318,400 = 4 x 10^10 grams.

As noted, the mass of the earth is 6 x 10^27 grams. So we’re not too far away at 24 amino acids. Certainly no farther away than another 17 amino acids as 20^17 is much greater than 10^17.

So, the mass of the earth (which isn’t all carbon, hydrogen, etc… ) isn’t enough to make just one molecule of each of the possible proteins 41 amino acids long. 41 amino acids is a very small protein (some would call it a polypeptide). Just about every protein of biological interest is much larger. The champ is a muscle protein called titin which has 27,000+ amino acids.

So what? It means that chemists will never be able to explore more than a tiny morsel of the space of possible proteins. Perhaps computationally we will (I doubt it), but that’s the subject of a future post.

The above is a post I wrote for “The Skeptical Chymist” back in April of 2008 (using the nom de plume Retread). I hoped for a lot of comments (particularly showing how I was wrong, as being correct has a lot of implications). I did get the following interesting comment from Param Priya Singh.

Really Good! However this may not be true. Because the situation which has been discussed is only valid if all possible polypeptides are made- all at once. But in biological reality it may not be the case. What if the sequence space has been explored (by nature) gradually during millions of years? In that case at a particular instance not all, but a limited (but still very large) subset is being explored and is being evolved under the selective pressure. From Param Priya Singh

to which I replied

Param — thanks for your comments. Consider the following: Let us suppose there is a super-industrious post-doc who can make a new protein every nanosecond (reusing the atoms). There are 60 * 60 * 24 * 365 = 31,536,000 ~ 10^7 seconds in a year and 10^10 years (more or less) since the big bang. This is 10^9 * 10^7 * 10 ^10 = 10^26 different 41 amino acid proteins he could make since the dawn of time. But there are 20^41 = 2^41 * 10^41 proteins of length 41 amino acids. 2^41 = 2,199,023,255,552 = 10^12. So he has only tested 10^26 of 10^53 possible 41 amino acid proteins in all this time.

As per your suggestion, this is making one protein at a time. However, even if the hapless post-doc was able to use the entire mass of the earth (6 x 10^27 grams) every nanosecond to make a different set of proteins (one molecule of each), he would never have made all the possibilities for a protein of length of one of the two chains of hemoglobin (141 or 146 amino acids) since time began. Hemoglobin just isn’t that big as proteins go (the gene mutated in cystic fibrosis has well over 1000).

So write in and show me the mistakes in all this. If it stands, this back of the envelope calculation poses severe problems for a very popular theory.