The emperor has no clothes

As an old organic chemist, I’ve always been fascinated with size of proteins (n functional groups in a protein of length n — not counting the amide bonds), and the myriad of shapes they can assume.  It seems nothing short of miraculous (to me at least) that the proteins making us up assume just a few shapes out of the nearly 3^n possible shapes (avoiding self intersection removes a few).

This has been ‘explained’ by the potential energy funnel, down which newly formed proteins slide to their final few destinations.  Now I took quantum mechanics 56+ years ago, and back then a lot of heavy lifting was required just to calculate the potential energy surface required to bring two hydrogen atoms together to form molecular hydrogen.

I’ve never seen a potential energy surface for a protein actually calculated, and I’m not sure molecular dynamics simulations do this (please correct me if I’m wrong).

So I was glad to see the following in a paper by


Jacob Gershon-Cohen Professor of Medical Science
Professor of Biochemistry and Biophysics

at my alma mater Penn Med (the hell with the Perelman’s, Penn sold themselves out to the Perelman’s very cheaply).

“A critical feature of the funneled ELT (Energy Landscape Theory) model is that the many-pathway residue-level conformational search must be biased toward native-like interactions. Otherwise, as noted by Levinthal , an unguided random search would require a very long time. How this bias might be implemented in terms of real protein interactions has never been discovered. One simply asserts that natural evolution has made it so, formulates this view as a so-called principle of minimal frustration, and attributes it to the shape of the funneled energy landscape. 

 Proc. Natl. Acad. Sci. vol. 114 pp. 8253 – 8258 ’17.

So the potential energy funnel of energy landscape theory is not something you can calculate explicitly (like a gravitational or an electrical potential), but just a high-falutin’ description of what happens inside our cells, masquerading as an explanation.

So when does a description become an explanation?  Newton famously said Hypotheses non fingo (Latin for “I feign no hypotheses” when discussing the action at a distance which his theory of gravity entailed.

Well it becomes an explanation when you can use the description to predict and define new phenomena — e.g. using Newton’s laws to send a projectile to Jupiter, using Einstein’s theory of gravitation to predict black holes and gravitational waves etc. etc.

In this sense Energy Landscape Theory is just words.  If it wasn’t you could predict the shape an arbitrary string of amino acids would assume (and you can’t).  Theory does work fairly well when folding algorithms are given a protein of known shape (but not published), but try them out on an arbitrary string — which I don’t think has been done.

But it gets worse.  ELT sweeps the problem of why a protein should have one (or a few) shapes under the rug, by assuming that they do.  I’m far from convinced that this is the case in general, which means that the proteins which make us up are quite special.

I’ll conclude with an earlier post on this subject, which basically says that an experiment to decide the issue, while possible in theory is physically impossible to fully perform.

A chemical Gedanken experiment

This post is mostly something I posted on the Skeptical Chymist 2 years ago.  Along with the previous post “Why should a protein have just one shape (or any shape for that matter)” both will be referred to in the next one –“Gentlemen start your motors”, concerning the improbability of the chemistry underlying our existence and whether it is reasonable to believe that it arose by chance.

In the early days of quantum mechanics Einstein and Bohr threw thought experiments (gedanken experiments) at each other like teenagers tossing cherry bombs.  None of the gedanken experiments were regarded as remotely possible back then, although thanks to Bell and Aspect, quantum nonlocality and entanglement now have a solid experimental basis.  To read more about this you can’t do much better than “The Age of Entanglement” by Louisa Gilder.

Frankly, I doubt that most strings of amino acids have a dominant shape (e.g., biological meaning), and even if they did, they couldn’t find it quickly enough (theLevinthal paradox).  For details see the previous post.

How would you prove me wrong? The same way you’d prove a pair of dice was loaded. Just make (using solid-phase protein synthesis a la Merrifield) a bunch of random strings of amino acids (each 41 amino acids long) and see how many have a dominant shape. Any sequence forming a crystal does have a dominant shape, if the sequence doesn’t crystallize, use NMR to look at it in solution. You can’t make all of them, because the earth doesn’t have enough mass to do so (see “ That’s why this is a gedanken experiment — it can’t possibly be performed in toto.

Even so, the experiment is over (and I’m wrong) if even 1% of the proteins you make turn out to have a dominant shape.

However, choosing a random string of amino acids is far from trivial. Some amino acids appear more frequently than others depending on the protein. Proteins are definitely not a random collection of amino acids. Consider collagen. In its various forms (there are over 20, coded for by at least 30 distinct genes) collagen accounts for 25% of body protein. Statistically, each of the 20 amino acids should account for 5% of the protein, yet one amino acid (glycine) accounts for 30% and proline another 15%. Even knowing this, the statistical chances of producing 300 copies in a row of glycine–any amino acid–any amino acid by a random distribution of the glycines are less than zilch. But one type of bovine collagen protein has over 300 such copies in its 1042 amino acids.

One further example of the nonrandomness of proteins. If you were picking out a series of letters randomly hoping to form a word, you would not expect a series of 10 ‘a’s to show up. But we normally contain many such proteins, and for some reason too many copies of the repeated amino acid produce some of the neurological diseases I (ineffectually) battled as a physician. Normal people have 11 to 34 glutamines in a row in a huge (molecular mass 384 kiloDaltons — that’s over 3000 amino acids) protein known as huntingtin. In those unfortunate individuals withHuntington’s chorea, the number of repeats expands to over 40. One of Max Perutz’s last papers [Proc. Natl. Acad. Sci. USA 99, 5591–5595 (2002)] tried to figure out why this was so harmful.

On to the actual experiment. Suppose you had made 1,000,000 distinct random sequence proteins containing 41 amino acids and none of them had a dominant shape. This proves/disproves nothing. 10^6 is fewer than the possibilities inherent in a string of 5 amino acids, and you’ve only explored 10^6/(20^41) of the possibilities.

Would Karl Popper, philosopher of science, even allow the question of how commonly proteins have a dominant shape to be called scientific? Much of what I know about Popper comes from a fascinating book “Wittgenstein’s Poker” and it isn’t pleasant. Questions not resolvable by experiment fall outside Popper’s canon of questions scientific. The gedanken experiment described can resolve the question one way, but not the other. In this respect it’s like the halting problem in computer science (there is no general rule to tell if a program will terminate).

Would Ludwig Wittgenstein, uberphilosopher, think the question philosophical? Probably not. His major work “Tractatus Logico-Philosophicus” concludes with “What we cannot speak of we must pass over in silence”. While he’s the uberphilosopher he’s also the antiscientist. It’s exactly what we don’t know which leads to the juiciest speculation and most creative experiments in any field of science. That’s what I loved about organic chemistry years ago (and now). It is nearly always possible to design a molecule from scratch to test an idea. There was no reason to make [7]paracyclophane, other than to get up close and personal with the ring current.

If the probability or improbability of our existence, to which the gedanken experiment speaks, isn’t a philosophical question, what is?

Back then, this post produced the following excellent comment.

I’m not sure your assessment of what Popper would regard as science is accurate. Popper advocated “falsifiability”, i.e. that a statement cannot be proved true, only false. Non-scientific statements are those for which evidence that they are false cannot be found. You are in fact giving a perfect example of a situation where falsifiability is useful. If you tested, as you suggested, a million random proteins and many of them formed structures reliably, this would in fact disprove the hypothesis fairly conclusively (if only probabilistically). The fact that the test was passed by the first million proteins would be evidence that the theory was true (though obviously not concrete).

Also, it is relatively easy to choose what random proteins to make. Just use a random number generator (a pseudorandom generator would do too, probably). It doesn’t matter that they would be unlikely to produce a specific sequence generated in nature, as we are looking at specifically wanting to look at random sequences. The idea that 300 glycines is particularly unusual if protein generation is random is probably one which should be treated with a degree of caution. As the sequence was not specified as an unusual sequence beforehand, there are a large number of possible sequences that you could have seized on, and so care is needed.

This is only the most obvious experiment that could be carried out to test this idea, and I’m sure with advances, there is the distinct chance that more ingenious ways could be devised.

Additionally the the mass restriction is not in fact terribly useful except as an illustration that there is a massively large number of proteins, as once you have made and tested a protein, you can in fact reuse its atoms to make another protein.

Finally, I haven’t read Wittgenstein, but that final quote does not really support your statement that he is “anti-science” or would be against the production of novel cyclophanes. Organic chemistry clearly lies in the realm of “what we can speak”, as we are in fact speaking about it.

Posted by: MCliffe

My response —

MCliffe — thank you for your very thoughtful comments on the post. It’s great to know that someone out there is reading them.

Popper and the logical positivists solved many philosophical problems by declaring them meaningless (which Popper later took to mean not falsifiable). Things got to such a point in the 50s that Bertrand Russell was moved to came up with the meaningful (to most) but non-falsifiable statement — In the event of a nuclear war we shall all be dead.

You are quite right that it is easy to make a random sequence of amino acids using a computer. It’s been shown again and again that our intuitive notion of randomness is usually incorrect. I chose collagen because it is the most common protein in our body, and because it is highly nonrandom. Huntingtin was used because I dealt with its effects as a Neurologist (and because there are 8 more diseases with too many identical amino acids in a row — all of which for some unfathomable reason produce neurologic disease — they are called triplet diseases because it takes 3 nucleotides of DNA to code for a single amino acid).

Even accepting 300 glycines in 1000 or so amino acids (collagen) and putting that frequency into the random generator and turning it on, we would not expect those 300 glycines to appear at position n, position n+4, n+7, . . . , n + 898 randomly.

The idea of using the atoms over and over to escape the mass restriction is clever. Unfortunately it runs up against a time restriction. Let us suppose there is a super-industrious post-doc who can make a new protein every nanosecond (reusing the atoms). There are 60 * 60 * 24 * 365 = 31,536,000 ~ 10^7 seconds in a year and 10^10 years (more or less) since the big bang. This is 10^9 * 10^7 * 10 ^10 = 10^26 different proteins he could make since the dawn of time. But there are 20^41 = 2^41 * 10^41 proteins of length 41 amino acids. 2^41 = 2,199,023,255,552 = 10^12. So he has only tested 10^26 of 10^53 possible 41 amino acid proteins in all this time.

This is what I was getting at by saying the the gedanken experiment was not a priori falsifiable — we lack the time, space and mass to run it to completion. As you note, it could well end quite early if I’m wrong. Suppose 10^9/10^53 of the proteins DO have a dominant shape — the postdoc will be very unlikely to find any of them.

I think your final point is well taken. My reading of “Wittgenstein’s Poker” is that what he was saying in his last sentence really was “What we cannot speak (with certainty) of we must pass over in silence”. We cannot speak of the outcome of this Gedanken experiment with any degree of certainty.

Once again Thanks

Post a comment or leave a trackback: Trackback URL.


  • Bryan  On August 9, 2017 at 12:11 pm

    You may be interested in reading this recent paper from David Baker’s lab in which they made an attempt at the thought experiment you proposed:

  • luysii  On August 9, 2017 at 7:33 pm

    Bryan: Thanks

    I read the paper back in the day. Here’s the abstract — Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are “encoded” in the thousands of known protein structures, “decoding” them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 negative control sequences. This analysis identified more than 2500 stable designed proteins in four basic folds—a number sufficient to enable us to systematically examine how sequence determines folding and stability in uncharted protein space. Iteration between design and experiment increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal
    of a tight feedback cycle between computation and experiment and has the potential to transform computational protein design into a data-driven science.

    However, what they did is use the data on proteins which DO fold into stable structures to design more. This is obviously important for drug design.

    Although it’s close to what I had in mind, they really aren’t starting with random collections strings of amino acids and seeing what they would do. My guess is that they wouldn’t find stable structures. But it’s only a guess.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: