A chemical Gedanken experiment

This post is mostly something I posted on the Skeptical Chymist 2 years ago.  Along with the previous post “Why should a protein have just one shape (or any shape for that matter)” both will be referred to in the next one –”Gentlemen start your motors”, concerning the improbability of the chemistry underlying our existence and whether it is reasonable to believe that it arose by chance. 

In the early days of quantum mechanics Einstein and Bohr threw thought experiments (gedanken experiments) at each other like teenagers tossing cherry bombs.  None of the gedanken experiments were regarded as remotely possible back then, although thanks to Bell and Aspect, quantum nonlocality and entanglement now have a solid experimental basis.  To read more about this you can’t do much better than “The Age of Entanglement” by Louisa Gilder. 

Frankly, I doubt that most strings of amino acids have a dominant shape (e.g., biological meaning), and even if they did, they couldn’t find it quickly enough (theLevinthal paradox).  For details see the previous post.

How would you prove me wrong? The same way you’d prove a pair of dice was loaded. Just make (using solid-phase protein synthesis a la Merrifield) a bunch of random strings of amino acids (each 41 amino acids long) and see how many have a dominant shape. Any sequence forming a crystal does have a dominant shape, if the sequence doesn’t crystallize, use NMR to look at it in solution. You can’t make all of them, because the earth doesn’t have enough mass to do so (see “http://luysii.wordpress.com/2009/12/20/how-many-proteins-can-be-made-using-the-entire-earth-mass-to-do-so/). That’s why this is a gedanken experiment — it can’t possibly be performed in toto.

Even so, the experiment is over (and I’m wrong) if even 1% of the proteins you make turn out to have a dominant shape.

However, choosing a random string of amino acids is far from trivial. Some amino acids appear more frequently than others depending on the protein. Proteins are definitely not a random collection of amino acids. Consider collagen. In its various forms (there are over 20, coded for by at least 30 distinct genes) collagen accounts for 25% of body protein. Statistically, each of the 20 amino acids should account for 5% of the protein, yet one amino acid (glycine) accounts for 30% and proline another 15%. Even knowing this, the statistical chances of producing 300 copies in a row of glycine–any amino acid–any amino acid by a random distribution of the glycines are less than zilch. But one type of bovine collagen protein has over 300 such copies in its 1042 amino acids.

One further example of the nonrandomness of proteins. If you were picking out a series of letters randomly hoping to form a word, you would not expect a series of 10 ‘a’s to show up. But we normally contain many such proteins, and for some reason too many copies of the repeated amino acid produce some of the neurological diseases I (ineffectually) battled as a physician. Normal people have 11 to 34 glutamines in a row in a huge (molecular mass 384 kiloDaltons — that’s over 3000 amino acids) protein known as huntingtin. In those unfortunate individuals withHuntington’s chorea, the number of repeats expands to over 40. One of Max Perutz’s last papers [Proc. Natl. Acad. Sci. USA 99, 5591–5595 (2002)] tried to figure out why this was so harmful.

On to the actual experiment. Suppose you had made 1,000,000 distinct random sequence proteins containing 41 amino acids and none of them had a dominant shape. This proves/disproves nothing. 10^6 is fewer than the possibilities inherent in a string of 5 amino acids, and you’ve only explored 10^6/(20^41) of the possibilities.

Would Karl Popper, philosopher of science, even allow the question of how commonly proteins have a dominant shape to be called scientific? Much of what I know about Popper comes from a fascinating book “Wittgenstein’s Poker” and it isn’t pleasant. Questions not resolvable by experiment fall outside Popper’s canon of questions scientific. The gedanken experiment described can resolve the question one way, but not the other. In this respect it’s like the halting problem in computer science (there is no general rule to tell if a program will terminate).

Would Ludwig Wittgenstein, uberphilosopher, think the question philosophical? Probably not. His major work “Tractatus Logico-Philosophicus” concludes with “What we cannot speak of we must pass over in silence”. While he’s the uberphilosopher he’s also the antiscientist. It’s exactly what we don’t know which leads to the juiciest speculation and most creative experiments in any field of science. That’s what I loved about organic chemistry years ago (and now). It is nearly always possible to design a molecule from scratch to test an idea. There was no reason to make [7]paracyclophane, other than to get up close and personal with the ring current.

If the probability or improbability of our existence, to which the gedanken experiment speaks, isn’t a philosophical question, what is?

Back then, this post produced the following excellent comment.

I’m not sure your assessment of what Popper would regard as science is accurate. Popper advocated “falsifiability”, i.e. that a statement cannot be proved true, only false. Non-scientific statements are those for which evidence that they are false cannot be found. You are in fact giving a perfect example of a situation where falsifiability is useful. If you tested, as you suggested, a million random proteins and many of them formed structures reliably, this would in fact disprove the hypothesis fairly conclusively (if only probabilistically). The fact that the test was passed by the first million proteins would be evidence that the theory was true (though obviously not concrete).

Also, it is relatively easy to choose what random proteins to make. Just use a random number generator (a pseudorandom generator would do too, probably). It doesn’t matter that they would be unlikely to produce a specific sequence generated in nature, as we are looking at specifically wanting to look at random sequences. The idea that 300 glycines is particularly unusual if protein generation is random is probably one which should be treated with a degree of caution. As the sequence was not specified as an unusual sequence beforehand, there are a large number of possible sequences that you could have seized on, and so care is needed.

This is only the most obvious experiment that could be carried out to test this idea, and I’m sure with advances, there is the distinct chance that more ingenious ways could be devised.

Additionally the the mass restriction is not in fact terribly useful except as an illustration that there is a massively large number of proteins, as once you have made and tested a protein, you can in fact reuse its atoms to make another protein.

Finally, I haven’t read Wittgenstein, but that final quote does not really support your statement that he is “anti-science” or would be against the production of novel cyclophanes. Organic chemistry clearly lies in the realm of “what we can speak”, as we are in fact speaking about it. 

Posted by: MCliffe

My response —

MCliffe — thank you for your very thoughtful comments on the post. It’s great to know that someone out there is reading them.

Popper and the logical positivists solved many philosophical problems by declaring them meaningless (which Popper later took to mean not falsifiable). Things got to such a point in the 50s that Bertrand Russell was moved to came up with the meaningful (to most) but non-falsifiable statement — In the event of a nuclear war we shall all be dead.

You are quite right that it is easy to make a random sequence of amino acids using a computer. It’s been shown again and again that our intuitive notion of randomness is usually incorrect. I chose collagen because it is the most common protein in our body, and because it is highly nonrandom. Huntingtin was used because I dealt with its effects as a Neurologist (and because there are 8 more diseases with too many identical amino acids in a row — all of which for some unfathomable reason produce neurologic disease — they are called triplet diseases because it takes 3 nucleotides of DNA to code for a single amino acid).

Even accepting 300 glycines in 1000 or so amino acids (collagen) and putting that frequency into the random generator and turning it on, we would not expect those 300 glycines to appear at position n, position n+4, n+7, . . . , n + 898 randomly.

The idea of using the atoms over and over to escape the mass restriction is clever. Unfortunately it runs up against a time restriction. Let us suppose there is a super-industrious post-doc who can make a new protein every nanosecond (reusing the atoms). There are 60 * 60 * 24 * 365 = 31,536,000 ~ 10^7 seconds in a year and 10^10 years (more or less) since the big bang. This is 10^9 * 10^7 * 10 ^10 = 10^26 different proteins he could make since the dawn of time. But there are 20^41 = 2^41 * 10^41 proteins of length 41 amino acids. 2^41 = 2,199,023,255,552 = 10^12. So he has only tested 10^26 of 10^53 possible 41 amino acid proteins in all this time.

This is what I was getting at by saying the the gedanken experiment was not a priori falsifiable — we lack the time, space and mass to run it to completion. As you note, it could well end quite early if I’m wrong. Suppose 10^9/10^53 of the proteins DO have a dominant shape — the postdoc will be very unlikely to find any of them.

I think your final point is well taken. My reading of “Wittgenstein’s Poker” is that what he was saying in his last sentence really was “What we cannot speak (with certainty) of we must pass over in silence”. We cannot speak of the outcome of this Gedanken experiment with any degree of certainty.

Once again Thanks

Retread (now Luysii)

About these ads
Post a comment or leave a trackback: Trackback URL.


  • Wavefunction  On August 8, 2010 at 11:17 pm

    I will read your thoughtful post in detail later, but for now I have a question. What do you mean by “shape” in this context? Would you say an alpha-helix constitutes a particular shape? If yes, then there is no doubt that many strings of amino acids have this shape. Or consider the classic cross-beta pattern of amyloid, with longitudinal and transverse x-ray reflections at roughly 10 A and 5 A. if you consider this pattern to be a shape, then we know for a fact that virtually every sequence of amino acids can be cajoled to adopt this shape under specific conditions.

    On a different note, going back to my previous comments, Popper is now much less influential than what he was. Falsifiability is a generally very useful concept but it’s far from being a litmus test of a scientific hypothesis. Popper also neglects the use of what’s called “prior knowledge” in judging a hypothesis, something that’s of key value in any scientific theory. For example see the work of Quine, described in among other books, “What Science Knows” by James Franklin. “Wittgenstein’s Poker” and “The Age of Entanglement” are excellent books and I remember being so impressed by the former that I read it in one all-night session.

  • luysii  On August 9, 2010 at 2:28 pm

    It’s a pleasure to be discussing things like with which have some basis in science and fact.

    For an example of why I found neurology so irritating (but nonetheless important), consider the following sentence from the current issue of Neuron (vol. 67 p. 183) “Although system infections, prenancy and puerperium, use of illicit drugs and MENTAL STRESS often trigger a stroke, precisely how these factors exert their effect remains unclear.” Really? Were there a wave of strokes afrer 9/11 in NYC, are there going to be any in Pakistan when it dries out from the floods, or in the survivors of the Chinese mudslide. I had to plow through treacle like this for years.

    Earlier in the same section of the article (amazingly titled “The Science of Stroke”) we find the statement “Many of the damaging effects of oxidative stress on blood vessels are related to the biological inactivation of NO (nitrous oxide) by the free radical superoxide which reduces NO bioavailability and prevents its beneficial effects.” Pretty impressive, no?

    However without skipping a beat, later in the SAME paragraph we find “In addition ROS (Reactive Oxygen Species) can directly promote inflammation . . . by inducing the expression of .. proinflammatory genes through NFkappaB activation. One such NFkappaB dependent gene product, inducible nitric oxide synthease, produces large amounts of NO, which alters vascular structure and function through nitration and nitrosylation of critical proteins. Well, which is it — nitric oxide friend or foe? The author has it both ways (and never seems to notice the contradiction).

    Amazing, and Neuron is one of the premiere journals of neuroscience, a member of the Cell family of journals.

    Rant over, by ‘shape’ I mean the entire 3 dimensional structure of a protein as seen in crystallography. Consider the catalytic triad of chymotrypsin, made of histidine #57, aspartic acid #102 and serine #195. forming the active site of the enzyme. To function as an enzyme these 3 amino acids must be brought together and held there. That’s what I mean by shape. Moving the three apart also explains just what denaturation is, and why it takes so little energy (10 kiloCalories/mole) to denature enzymes.

  • a. nonymaus  On February 11, 2014 at 11:25 am

    To have a set of random sequences of amino acids give information that could inform biology, it can’t be simplemindedly random. Glycine shows up a lot because it is cheap from a metabolic standpoint. If I, as a microbe, can use glycine rather than isoleucine, I will and I’ll burn those extra carbons for more delicious ATP. So, the random selection should be weighted on the basis of metabolic expense. Otherwise we’re doing very arcane polymer chemistry instead of biology.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 57 other followers

%d bloggers like this: