The emperor has no clothes

As an old organic chemist, I’ve always been fascinated with size of proteins (n functional groups in a protein of length n — not counting the amide bonds), and the myriad of shapes they can assume.  It seems nothing short of miraculous (to me at least) that the proteins making us up assume just a few shapes out of the nearly 3^n possible shapes (avoiding self intersection removes a few).

This has been ‘explained’ by the potential energy funnel, down which newly formed proteins slide to their final few destinations.  Now I took quantum mechanics 56+ years ago, and back then a lot of heavy lifting was required just to calculate the potential energy surface required to bring two hydrogen atoms together to form molecular hydrogen.

I’ve never seen a potential energy surface for a protein actually calculated, and I’m not sure molecular dynamics simulations do this (please correct me if I’m wrong).

So I was glad to see the following in a paper by

S. WALTER ENGLANDER, Ph.D.

Jacob Gershon-Cohen Professor of Medical Science
Professor of Biochemistry and Biophysics

at my alma mater Penn Med (the hell with the Perelman’s, Penn sold themselves out to the Perelman’s very cheaply).

“A critical feature of the funneled ELT (Energy Landscape Theory) model is that the many-pathway residue-level conformational search must be biased toward native-like interactions. Otherwise, as noted by Levinthal , an unguided random search would require a very long time. How this bias might be implemented in terms of real protein interactions has never been discovered. One simply asserts that natural evolution has made it so, formulates this view as a so-called principle of minimal frustration, and attributes it to the shape of the funneled energy landscape. 

 Proc. Natl. Acad. Sci. vol. 114 pp. 8253 – 8258 ’17.

So the potential energy funnel of energy landscape theory is not something you can calculate explicitly (like a gravitational or an electrical potential), but just a high-falutin’ description of what happens inside our cells, masquerading as an explanation.

So when does a description become an explanation?  Newton famously said Hypotheses non fingo (Latin for “I feign no hypotheses” when discussing the action at a distance which his theory of gravity entailed.

Well it becomes an explanation when you can use the description to predict and define new phenomena — e.g. using Newton’s laws to send a projectile to Jupiter, using Einstein’s theory of gravitation to predict black holes and gravitational waves etc. etc.

In this sense Energy Landscape Theory is just words.  If it wasn’t you could predict the shape an arbitrary string of amino acids would assume (and you can’t).  Theory does work fairly well when folding algorithms are given a protein of known shape (but not published), but try them out on an arbitrary string — which I don’t think has been done.

But it gets worse.  ELT sweeps the problem of why a protein should have one (or a few) shapes under the rug, by assuming that they do.  I’m far from convinced that this is the case in general, which means that the proteins which make us up are quite special.

I’ll conclude with an earlier post on this subject, which basically says that an experiment to decide the issue, while possible in theory is physically impossible to fully perform.

A chemical Gedanken experiment

This post is mostly something I posted on the Skeptical Chymist 2 years ago.  Along with the previous post “Why should a protein have just one shape (or any shape for that matter)” both will be referred to in the next one –“Gentlemen start your motors”, concerning the improbability of the chemistry underlying our existence and whether it is reasonable to believe that it arose by chance.

In the early days of quantum mechanics Einstein and Bohr threw thought experiments (gedanken experiments) at each other like teenagers tossing cherry bombs.  None of the gedanken experiments were regarded as remotely possible back then, although thanks to Bell and Aspect, quantum nonlocality and entanglement now have a solid experimental basis.  To read more about this you can’t do much better than “The Age of Entanglement” by Louisa Gilder.

Frankly, I doubt that most strings of amino acids have a dominant shape (e.g., biological meaning), and even if they did, they couldn’t find it quickly enough (theLevinthal paradox).  For details see the previous post.

How would you prove me wrong? The same way you’d prove a pair of dice was loaded. Just make (using solid-phase protein synthesis a la Merrifield) a bunch of random strings of amino acids (each 41 amino acids long) and see how many have a dominant shape. Any sequence forming a crystal does have a dominant shape, if the sequence doesn’t crystallize, use NMR to look at it in solution. You can’t make all of them, because the earth doesn’t have enough mass to do so (see “https://luysii.wordpress.com/2009/12/20/how-many-proteins-can-be-made-using-the-entire-earth-mass-to-do-so/). That’s why this is a gedanken experiment — it can’t possibly be performed in toto.

Even so, the experiment is over (and I’m wrong) if even 1% of the proteins you make turn out to have a dominant shape.

However, choosing a random string of amino acids is far from trivial. Some amino acids appear more frequently than others depending on the protein. Proteins are definitely not a random collection of amino acids. Consider collagen. In its various forms (there are over 20, coded for by at least 30 distinct genes) collagen accounts for 25% of body protein. Statistically, each of the 20 amino acids should account for 5% of the protein, yet one amino acid (glycine) accounts for 30% and proline another 15%. Even knowing this, the statistical chances of producing 300 copies in a row of glycine–any amino acid–any amino acid by a random distribution of the glycines are less than zilch. But one type of bovine collagen protein has over 300 such copies in its 1042 amino acids.

One further example of the nonrandomness of proteins. If you were picking out a series of letters randomly hoping to form a word, you would not expect a series of 10 ‘a’s to show up. But we normally contain many such proteins, and for some reason too many copies of the repeated amino acid produce some of the neurological diseases I (ineffectually) battled as a physician. Normal people have 11 to 34 glutamines in a row in a huge (molecular mass 384 kiloDaltons — that’s over 3000 amino acids) protein known as huntingtin. In those unfortunate individuals withHuntington’s chorea, the number of repeats expands to over 40. One of Max Perutz’s last papers [Proc. Natl. Acad. Sci. USA 99, 5591–5595 (2002)] tried to figure out why this was so harmful.

On to the actual experiment. Suppose you had made 1,000,000 distinct random sequence proteins containing 41 amino acids and none of them had a dominant shape. This proves/disproves nothing. 10^6 is fewer than the possibilities inherent in a string of 5 amino acids, and you’ve only explored 10^6/(20^41) of the possibilities.

Would Karl Popper, philosopher of science, even allow the question of how commonly proteins have a dominant shape to be called scientific? Much of what I know about Popper comes from a fascinating book “Wittgenstein’s Poker” and it isn’t pleasant. Questions not resolvable by experiment fall outside Popper’s canon of questions scientific. The gedanken experiment described can resolve the question one way, but not the other. In this respect it’s like the halting problem in computer science (there is no general rule to tell if a program will terminate).

Would Ludwig Wittgenstein, uberphilosopher, think the question philosophical? Probably not. His major work “Tractatus Logico-Philosophicus” concludes with “What we cannot speak of we must pass over in silence”. While he’s the uberphilosopher he’s also the antiscientist. It’s exactly what we don’t know which leads to the juiciest speculation and most creative experiments in any field of science. That’s what I loved about organic chemistry years ago (and now). It is nearly always possible to design a molecule from scratch to test an idea. There was no reason to make [7]paracyclophane, other than to get up close and personal with the ring current.

If the probability or improbability of our existence, to which the gedanken experiment speaks, isn’t a philosophical question, what is?

Back then, this post produced the following excellent comment.

I’m not sure your assessment of what Popper would regard as science is accurate. Popper advocated “falsifiability”, i.e. that a statement cannot be proved true, only false. Non-scientific statements are those for which evidence that they are false cannot be found. You are in fact giving a perfect example of a situation where falsifiability is useful. If you tested, as you suggested, a million random proteins and many of them formed structures reliably, this would in fact disprove the hypothesis fairly conclusively (if only probabilistically). The fact that the test was passed by the first million proteins would be evidence that the theory was true (though obviously not concrete).

Also, it is relatively easy to choose what random proteins to make. Just use a random number generator (a pseudorandom generator would do too, probably). It doesn’t matter that they would be unlikely to produce a specific sequence generated in nature, as we are looking at specifically wanting to look at random sequences. The idea that 300 glycines is particularly unusual if protein generation is random is probably one which should be treated with a degree of caution. As the sequence was not specified as an unusual sequence beforehand, there are a large number of possible sequences that you could have seized on, and so care is needed.

This is only the most obvious experiment that could be carried out to test this idea, and I’m sure with advances, there is the distinct chance that more ingenious ways could be devised.

Additionally the the mass restriction is not in fact terribly useful except as an illustration that there is a massively large number of proteins, as once you have made and tested a protein, you can in fact reuse its atoms to make another protein.

Finally, I haven’t read Wittgenstein, but that final quote does not really support your statement that he is “anti-science” or would be against the production of novel cyclophanes. Organic chemistry clearly lies in the realm of “what we can speak”, as we are in fact speaking about it.

Posted by: MCliffe

My response —

MCliffe — thank you for your very thoughtful comments on the post. It’s great to know that someone out there is reading them.

Popper and the logical positivists solved many philosophical problems by declaring them meaningless (which Popper later took to mean not falsifiable). Things got to such a point in the 50s that Bertrand Russell was moved to came up with the meaningful (to most) but non-falsifiable statement — In the event of a nuclear war we shall all be dead.

You are quite right that it is easy to make a random sequence of amino acids using a computer. It’s been shown again and again that our intuitive notion of randomness is usually incorrect. I chose collagen because it is the most common protein in our body, and because it is highly nonrandom. Huntingtin was used because I dealt with its effects as a Neurologist (and because there are 8 more diseases with too many identical amino acids in a row — all of which for some unfathomable reason produce neurologic disease — they are called triplet diseases because it takes 3 nucleotides of DNA to code for a single amino acid).

Even accepting 300 glycines in 1000 or so amino acids (collagen) and putting that frequency into the random generator and turning it on, we would not expect those 300 glycines to appear at position n, position n+4, n+7, . . . , n + 898 randomly.

The idea of using the atoms over and over to escape the mass restriction is clever. Unfortunately it runs up against a time restriction. Let us suppose there is a super-industrious post-doc who can make a new protein every nanosecond (reusing the atoms). There are 60 * 60 * 24 * 365 = 31,536,000 ~ 10^7 seconds in a year and 10^10 years (more or less) since the big bang. This is 10^9 * 10^7 * 10 ^10 = 10^26 different proteins he could make since the dawn of time. But there are 20^41 = 2^41 * 10^41 proteins of length 41 amino acids. 2^41 = 2,199,023,255,552 = 10^12. So he has only tested 10^26 of 10^53 possible 41 amino acid proteins in all this time.

This is what I was getting at by saying the the gedanken experiment was not a priori falsifiable — we lack the time, space and mass to run it to completion. As you note, it could well end quite early if I’m wrong. Suppose 10^9/10^53 of the proteins DO have a dominant shape — the postdoc will be very unlikely to find any of them.

I think your final point is well taken. My reading of “Wittgenstein’s Poker” is that what he was saying in his last sentence really was “What we cannot speak (with certainty) of we must pass over in silence”. We cannot speak of the outcome of this Gedanken experiment with any degree of certainty.

Once again Thanks

What our DNA looks like inside a living cell

Time to rewrite the textbooks.  DNA in the living cell looks nothing like the pictures that have appeared in textbooks for years. Gone are the 30 nanoMeter fiber and higher order structures.

Here is the old consensus of how DNA in the nucleus is organized.

There are two different structural models of the 30 nanoMeter fiber (1) solenoid — diameter 33 nanoMeters with 6 nucleosomes ever 11 nanoMeters along the axis (2) two start zigzag fiber — diameter 27 – 30 nanoMeters with 5 – 6 nucleosomes every 11 nanoMeters.
The 30 nanoMeter fiber is throught to assemble into helically folded 120 nanoMeter chromonema, 300  – 700 nanoMeter chromitids and mitotic chromosomes (1,400 nanoMeters).     The chromonema structures 9measured between 100 and 130 nanoMeters) are based on electron micrographic studies of permeabilized nuclei from which other components have been extracted with detergenes and high salt to visualize chromatin — hardly physiologic.

Got all that?  Good, now forget it.  It’s wrong.

First off, forget nanoMeters.  Organic chemists think in Angstroms — the diameter of the smallest atom Hydrogen is almost exactly 1 Angstrom, making it the perfect organic chemical yardstick.  If you must think in nanoMeters, just divide the number of Angstroms by 10.

First, a few numbers to get started.
 The classic form of DNA is B-DNA (this is still correct). https://en.wikipedia.org/wiki/Nucleic_acid_double_helix.  Each nucleotide pair is 3.4 Angstroms above the next and there are 10.4 nucleotides per turn of the helix  (so 1 full turn of B DNA is 35.36 Angstroms).  The diameter of B-DNA is 19 Angstroms.

The nucleosome consists of 147 bases of DNA wrapped around a central mass made of 8 histone proteins. The histone octamer is made of two copies each of histones H2A, H2B, H3 and H4.  The core particle in its entirety is 100 Angstroms in diameter and 57 Angstroms along the axis of the disk and possesses nearly dyadic symmetry.  There are 1.65 turns of DNA around the histone octamer, and during the trip there are 14 contact points between histones and DNA.

Now on to the actual paper [ Science vol. 357 pp. 354 – 355, 370, eaag0025 1 –> 13 ‘ 17 ]  The movies contained within alone are worth a year’s subscription to Science.

To visualize DNA in the living cells the authors invented a technique called  Chromatin Electron Microscopy Tomography (ChromEMT).

 DNA is transparent to electrons.  They use a fluorescent  DNA binding dye (Deep Red fluorescing AnthraQuinone Nr.5  ). For a structure see — http://onlinelibrary.wiley.com/doi/10.1002/1097-0320(20000801)40:4%3C280::AID-CYTO4%3E3.0.CO;2-7/full.  It has 3 probably aromatic rings fused together like anthracene, so it could easily intercalate between the bases of the double helix.   Then there are OH groups and amines to bind to the backbone.  The dye gets into cells easily.  Most importantly, DRAQ5 produces reactive oxygen species when hit by the right kind of light.  Somehow they get diaminobenzidine in the cells, which the reactive oxygen species polymerizes to polybenzimidazole.

 We’re not done yet.  The polymer is also transparent to electrons, but it can react with good old Osmium tetroxide (which is electron dense). permitting visualization of DNA on electron microscopy (at last)

  The technique is the first that can be used in living cells.  It shows that most chromatin in the nucleus is mostly organized as a disordered polymer of 50 to 240 Angstroms diameter.   This is consistent with beads on a string (with nucleosomes being the beads).  They found little evidence for higher order structures (the 300 to 1,200 Angstrom fibers of classic textbook models — which is in fact based on in vitro visualization of non-native chromatin. The 30 nanoMeter chromatin fiber (300 Angstroms) is nowhere to be seen.  However, they do find 300 Angstrom fibers using their new  method but only in nuclei purified from hypotonically lysed chicken RBCs treated with MgCl2 (hardly physiologic).

       They were able to make a movie of an electron micrograph in the nucleus using eight tilts of the stage There is more DNA at the nuclear rim (as that’s where the heterochromatin is mostly), but you still see the little 5 – 24  nanoMeter circles (just more of them the closer you get to the nuclear membrane).
      Another movie of a mitotic chromosome shows the same little circles (50 – 240 Angstroms) just packed together more closely.  You just see a lot of them, but there is no obvious bunching of them into higher structures.
     The technique (ChromEMT is amazing in that it allows the ultrastructure of individual chromatin chains, megabase domains and mitotitc chromosomes to be resolved and visualized as a continuum in serial slices.   The found that the 5 – 12 and 12 – 24 chromatin diameters were the same regardless of how heavily the chromatin was compacted.
      The paper is incredible and worth a year’s subscription to Science.  It likely is behind a paywall.
It’s hard to get your mind around the amount of compaction involved in getting the meter of DNA of the human genome into a nucleus.  Molecular Biology of the Cell 4th Edition p. 198 put it this way —  Compacting the meter of DNA into a 6 micron nucleus is like compacting 24 miles of very fine thread into a tennis ball.
I actually wrote a series of posts, trying to put the amount of compaction into human scale.  Here is the first post — follow the links at the end to the others.

The cell nucleus and its DNA on a human scale – I

The nucleus is a very crowded place, filled with DNA, proteins packing up DNA, proteins patching up DNA, proteins opening up DNA to transcribe it etc. Statements like this produce no physical intuition of the sizes of the various players (to me at least).  How do you go from the 1 Angstrom hydrogen atom, the 3.4 Angstrom thickness per nucleotide (base) of DNA, the roughly 20 Angstrom diameter of the DNA double helix, to any intuition of what it’s like inside a spherical nucleus with a diameter of 10 microns?

How many bases are in the human genome?  It depends on who you read — but 3 billion (3 * 10^9) is a lowball estimate — Wikipedia has 3.08, some sources have 3.4 billion.  3 billion is a nice round number.  How physically long is the genome?  Put the DNA into the form seen in most textbooks — e.g. the double helix.  Well, an Angstrom is one ten billionth (10^-10) of a meter, and multiplying it out we get

3 * 10^9 (bases/genome) * 3.4 * 10^-10 (meters/base) = 1 (meter).

The diameter of a typical nucleus is 10 microns (10 one millionths of a meter == 10 * 10^-6 = 10^-5 meter.   So we’ve got fit the textbook picture of our genome into something 1/100,000 smaller. We’ll definitely have to bend it like Beckham.

As a chemist I think in Angstroms, as a biologist in microns and millimeters, but as an American I think in feet and inches.  To make this stuff comprehensible, think of driving from New York City to Seattle.  It’s 2840 miles or 14,995,200 feet (according to one source on the internet). Now we’re getting somewhere.  I know what a foot is, and I’ve driven most of those miles at one time or other.  Call it 15 million feet, and pack this length down by a factor of 100,000.  It’s 150 feet, half the size of a (US) football field.

Next, consider how thick DNA is relative to its length.  20 Angstroms is 20 * 10^-10 meters or 2 nanoMeters (2 * 10^-9 meters), so our DNA is 500 million times longer than it is thick.  What is 1/500,000,000 of 15,000,000 feet?  Well, it’s 3% of a foot which is .36  of an inch, very close to 3/8 of an inch.   At least in my refrigerator that’s a pair of cooked linguini twisted around each other (the double helix in edible form).  The twisting is pretty tight, a complete turn of the two strands every 35.36 angstroms, or about 1 complete turn every 1.5 thicknesses, more reminiscent of fusilli than linguini, but fusilli is too thick.  Well, no analogy is perfect.  If it were, it would be a description.   One more thing before moving on.

How thinly should the linguini be sliced to split it apart into the constituent bases?  There are roughly 6 bases/thickness, and since the thickness is 3/8 of an inch, about 1/16 of an inch.  So relative to driving from NYC to Seattle, just throw a base out the window every 1/16th of an inch, and you’ll be up to 3 billion before you know it.

You’ve been so good following to this point that you get tickets for 50 yardline seats in the superdome.  You’re sitting far enough back so that you’re 75 feet above the field, placing you right at the equator of our 150 foot sphere. The north and south poles of the sphere are over the 50 yard line. halfway between the two sides.  You are about to the watch the grounds crew pump 15,000,000 feet of linguini into the sphere. Will it burst?  We know it won’t (or we wouldn’t exist).  But how much of the sphere will the linguini take up?

The volume of any sphere is 4/3 * pi * radius^3.  So the volume of our sphere of 10 microns diameter is 4/3 * 3.14 * 5 * 5 * 5 *  = 523 cubic microns. There are 10^18 cubic microns in a meter.  So our spherical nucleus has a volume of 523 * 10^-18 cubic meters.  What is the volume of the DNA cylinder? Its radius is 10 Angstroms or 1 nanoMeter.  So its volume is 1 meter (length of the stretched out DNA) * pi * 10^-9 * 10^-9 meters 3.14 * 10^-18 cubic meters (or 3.14 cubic microns == 3.14 * 10^-6 * 10^-6 * 10^-6

Even though it’s 15,000,000 feet long, the volume of the linguini is only 3.14/523 of the sphere.  Plenty of room for the grounds crew who begin reeling it in at 60 miles an hour.  Since they have 2840 miles of the stuff to reel in, we’ll have to come back in a few days to watch the show.  While we’re waiting, we might think of how anything can be accurately located in 2840 miles of linguini in a 150 foot sphere.

Here’s a link to the next paper in the series

https://luysii.wordpress.com/2010/03/23/the-cell-and-its-nucleus-on-a-human-scale-ii/

Dynamic allostery

It behooves drug chemists to know as much as they can about protein allostery, since so many of their drugs attempt to manipulate it.  An earlier post discussed dynamic allostery which is essentially change in ligand binding affinity without structural change in the protein binding the ligand.  A new paper challenges the concept.

First here’s the old post, and then the new stuff

Remember entropy? — Take II

Organic chemists have a far better intuitive feel for entropy than most chemists. Condensations such as the Diels Alder reaction decrease it, as does ring closure. However, when you get to small ligands binding proteins, everything seems to be about enthalpy. Although binding energy is always talked about, mentally it appears to be enthalpy (H) rather than Gibbs free energy (F).

A recent fascinating editorial and paper [ Proc. Natl. Acad. Sci. vol. 114 pp. 4278 – 4280, 4424 – 4429 ’17 ]shows how the evolution has used entropy to determine when a protein (CzrA) binds to DNA and when it doesn’t. As usual, advances in technology permit us to see this (e.g. multidimensional heteronuclear nuclear magnetic resonance). This allows us to determine the motion of side chains (methyl groups), backbones etc. etc. When CzrA binds to DNA methyl side chains on the protein move more, increasing entropy (deltaS) as well. We all know the Gibbs free energy of reaction (deltaF) isn’t just enthalpy (deltaH) but deltaH – TdeltaS, so an increase in deltaS pushes deltaF lower meaning the reaction proceeds in that direction.

Binding of Zinc redistributes these side chain motion so that entropy decreases, and the protein moves off DNA. The authors call this dynamics driven allostery. The fascinating thing, is that this may happen without any conformational change of CzrA.

I’m not sure that molecular dynamics simulations are good enough to pick this up. Fortunately newer NMR techniques can measure it. Just another complication for the hapless drug chemist thinking about protein ligand interactions.

A recent paper [ Proc. Natl. Acad. Sci. vol. 114 pp. 6563  – 6568 ’17 ] went into more detail about measuring side chain motions  as a surrogate for conformational entropy.  It can now be measured by NMR.  They define complete restriction of  the methyl group symmetry axis as 1, and complete disorder, and state that ‘a variety of models’ imply that the value is LINEARLY related to conformational entropy making it an ‘entropy meter’.  They state that measurement of fast internal side chain motion is largely restricted to the methyl group — this makes me worry that other side chains (which they can’t measure) are moving as well and contributing to entropy.

The authors studied some 28 protein/ligand systems, and found that the contribution of conformational entropy to ligand binding can be favorable, negligible or unfavorable.

What is bothersome to the authors (and to me) is that there were no obvious structural correlates between the degree of conformation entropy and protein structure.  So it’s something you measure not something you predict, making life even more difficult for the computational chemist studying protein ligand interactions.

Now the new stuff [ Proc. Natl. Acad. Sci. vol. 114 pp. 7480 – 7482, E5825 – E5834 ’17 ].  It’s worth considering what ‘no structural change’ means.  Proteins are moving all the time.  Bonds are vibrating at rates up to 10^15 times a second.  Methyl groups are rotating, hydrogen bonds are being made and broken.  I think we can assume that no structural change means no change in the protein backbone.

The work studied a protein of interest to neurological function, the PDZ3 domain — found on the receiving side of a synapse (post-synaptic side).  Ligand binding produced no change in the backbone, but there were significant changes in the distribution of electrons — which the authors describe as an enthalpic rather than an entropic effect.  Hydrogen bonds and salt bridges changed.  Certainly any change in the charge distribution would affect the pKa’s of acids and bases. The changes in charge distribution the ligand would see due to hydrogen ionization from acids and binding to bases would certainly hange ligand binding — even forgetting van der Waals effects.

How do neural nets do what they do?

Isn’t it enough that neural nets beat humans at chess, go and checkers, drive cars, recognize faces, find out what plays Shakespeare did and didn’t write?  Not at all.  Figuring out how they do what they do may allow us to figure out how the brain does what it does.

Science recently had a great bunch of articles on neural nets, deep learning [ Science vol. 356 pp. 16 – 30 ’17 ].  Chemists will be interested in p. 27 “Neural networks learn the art of chemical synthesis”.  The articles are quite accessible to the scientific layman.

To this retired neurologist, the most interesting of the bunch was the article (pp. 22 – 270 describing attempts to figure out how neural nets do what they do. Welcome to the world of the neuroscientist where a similar problem has engaged us for centuries.  DARPA is spending 70 million on exactly this according to the article.

If you are a little shaky on all this — I’ve copied a previous post on the subject (along with a few comments it inspired) below the ****

Here are four techniques currently in use:

  1. Counterfactual probes — the classic black box technique — vary the input (text, images, sound, ..  )and watch how it affects output.  It goes by the fancy name of Local Interpretable Model agnostic Explanations (LIME).  This allows the parts of the input most important in the net’s original judgement.
  2. Start with a black image or a zeroed out array of text and transition step by step toward the example being tested.  Then you watch the jumps in certainty the net makes, and you can figure out what it thinks is important.
  3. General Additive Model (GAM) is a statistical technique based on linear regression.  It operates on data to massage it.  The net is then presented with a variety of operations of GAM and studied to see which are the best at data massage so the machine can make a correct decision.
  4. Glass Box wires monotonic relationships (e.g the price of a house goes up with the number of square feet) INTO the neural net — allowing better control of what it does

The articles don’t appear to be behind a paywall, so have at it.

***

NonAlgorithmic Intelligence

Penrose was right. Human intelligence is nonAlgorithmic. But that doesn’t mean that our physical brains produce consciousness and intelligence using quantum mechanics (although all matter is what it is because of quantum mechanics). The parts (even small ones like neurotubules) contain so much mass that their associated wavefunction is too small to exhibit quantum mechanical effects. Here Penrose got roped in by Kauffman thinking that neurotubules were the carriers of the quantum mechanical indeterminacy. They aren’t, they are just too big. The dimer of alpha and beta tubulin contains 900 amino acids — a mass of around 90,000 Daltons (or 90,000 hydrogen atoms — which are small enough to show quantum mechanical effects).

So why was Penrose right? Because neural nets which are inherently nonAlgorithmic are showing intelligent behavior. AlphaGo which beat the world champion is the most recent example, but others include facial recognition and image classification [ Nature vol. 529 pp. 484 – 489 ’16 ].

Nets are trained on real world images and told whether they are right or wrong. I suppose this is programming of a sort, but it is certainly nonAlgorithmic. As the net learns from experience it adjusts the strength of the connections between its neurons (synapses if you will).

So it should be a simple matter to find out just how AlphaGo did it — just get a list of the neurons it contains, and the number and strengths of the synapses between them. I can’t find out just how many neurons and connections there are, but I do know that thousands of CPUs and graphics processors were used. I doubt that there were 80 billion neurons or a trillion connections between them (which is what our brains are currently thought to have).

Just print out the above list (assuming you have enough paper) and look at it. Will you understand how AlphaGo won? I seriously doubt it. You will understand it less well than looking at a list of the positions and momenta of 80 billion gas molecules will tell you its pressure and temperature. Why? Because in statistical mechanics you assume that the particles making up an ideal gas are featureless, identical and do not interact with each other. This isn’t true for neural nets.

It also isn’t true for the brain. Efforts are underway to find a wiring diagram of a small area of the cerebral cortex. The following will get you started — https://www.quantamagazine.org/20160406-brain-maps-micron-program-iarpa/

Here’s a quote from the article to whet your appetite.

“By the end of the five-year IARPA project, dubbed Machine Intelligence from Cortical Networks (Microns), researchers aim to map a cubic millimeter of cortex. That tiny portion houses about 100,000 neurons, 3 to 15 million neuronal connections, or synapses, and enough neural wiring to span the width of Manhattan, were it all untangled and laid end-to-end.”

I don’t think this will help us understand how the brain works any more than the above list of neurons and connections from AlphaGo. There are even more problems with such a list. Connections (synapses) between neurons come and go (and they increase and decrease in strength as in the neural net). Some connections turn on the receiving neuron, some turn it off. I don’t think there is a good way to tell what a given connection is doing just by looking a a slice of it under the electron microscope. Lastly, some of our most human attributes (emotion) are due not to connections between neurons but due to release of neurotransmitters generally into the brain, not at the very localized synapse, so it won’t show up on a wiring diagram. This is called volume neurotransmission, and the transmitters are serotonin, norepinephrine and dopamine. Not convinced? Among agents modifying volume neurotransmission are cocaine, amphetamine, antidepressants, antipsychotics. Fairly important.

So I don’t think we’ll ever truly understand how the neural net inside our head does what it does.

 Here are a few of the comments

“So why was Penrose right? Because neural nets which are inherently nonAlgorithmic are showing intelligent behavior. ”

I picked up a re-post of this comment on Quanta and thought it best to reply to you directly. Though this appears to be your private blog I can’t seem to find a biography, otherwise I’d address you by name.

My background is computer science generally and neural networks (with the requisite exposure to statistical mechanics) in particular and I must correct the assertion you’ve made here; neural nets are in fact both algorithmic and even repeatable in their performance.

I think what you’re trying to say is the structure of a trained network isn’t known by the programmer in advance; rather than build a trained intelligence, the programmer builds an intelligence that may be trained. The method is entirely mathematical though, mostly based on the early work of Boltzmann and explorations of the Monte-Carlo algorithms used to model non-linear thermodynamic systems.

For a good overview of the foundations, I suggest J.J. Hopfield’s “Neural networks and physical systems with emergent collective computational abilities”, http://www.pnas.org/content/79/8/2554.abstract

Regards,
Scott.

Scott — thanks for your reply. I blog anonymously because of the craziness on the internet. I’m a retired neurologist, long interested in brain function both professionally and esthetically. I’ve been following AI for 50 years.

I even played around with LISP, which was the language of AI in the early days, read Minsky on Perceptrons, worried about the big Japanese push in AI in the 80’s when they were going to eat our lunch etc. etc.

I think you’d agree that compared to LISP and other AI of that era, neural nets are nonAlgorithmic. Of course setting up virtual neurons DOES involve programming.

The analogy with the brain is near perfect. You can regard our brains as the output of the embryologic programming with DNA as the tape.

But that’s hardly a place to stop. How the brain and neural nets do what they do remains to be determined. A wiring diagram of the net is available but really doesn’t tell us much.

Again thanks for responding.

Scott

Honestly it would be hard for me to accept that the nets I worked on weren’t algorithmic since they were literally based on formal algorithms derived directly from statistical mechanics, most of which was based on Boltzmann’s work back in the 19th century. Hopfield, who I truly consider the “father” of modern neural computing, is a physicist (now at Princeton I believe). Most think he was a computer scientists back on the 70’s when he did his basic work at Cal Tech, but that’s not really the case.

I understand what you’re trying to say, that the actual training portion of NN development isn’t algorithmic, but the NN software itself is and it’s extremely precise in its structure, much more so than say, for instance, a bubble sort. It’s pretty edgy stuff even now.

I began working on NN’s in ’82 after reading Hopfield’s seminal paper, I was developing an AI aimed at self-diagnosing computer systems for a large computer manufacturer now known as Hewlett-Packard (at the time we were a much smaller R&D company who were later aquired). We also explored expert systems and ultimately deployed a solution based on KRL, which is a LISP development environment built by a small Stanford AI spinoff. It ended up being a dead end; that was an argument I lost (I advocated the NN direction as much more promising but lost mostly for political reasons). Now I take great pleasure in gloating 🙂 even though I’m no longer commercially involved with either the technology or that particular company.

Luysii

I thought we were probably in agreement about what I said. Any idea on how to find out just how many ‘neurons’ (any how many levels) there are in AlphaGo? It would be interesting to compare the numbers with our current thinking about the numbers of cortical neurons and synapses (which grows ever larger year by year).

Who is the other Scott — Aronson? 100 years ago he would have been a Talmudic scholar (as he implies by the title of his blog).

Yes neural nets are still edgy, and my son is currently biting his fingernails in a startup hoping to be acquired which is heavily into an application of neural nets (multiple patents etc. etc.)

A possible new player

Drug development is very hard because we don’t know all the players inside the cell. A recent paper describes an entirely new class of player — circular DNA derived from an ancient virus.  The authoress is Laura Manuelidis, who would have been a med school classmate had I chosen to go to Yale med instead of Penn.   She is the last scientist standing who doesn’t believe Prusiner’s prion hypothesis.  She didn’t marry the boss’s daughter being female, so she married the boss instead;  Elias Manuelidis a Yale neuropathologist who would be 99 today had he not passed away at 72 in 1992.

The circular DNAs go by the name of SPHINX  an acronym  for  Slow Progressive Hidden INfections of X origin.  They have no sequences in common with bacterial or eukaryotic DNA, but there some homology to a virus infecting Acinebacter, a wound pathogen common in soil and water.

How did she find them?  By doggedly pursuing the idea the neurodegenerative diseases such as Cruetzfeldt Jakob Disease (CJD) and scrapie were due to an infectious agent triggering aggregation of the prion protein.

As she says:  “The cytoplasm of CJD and scrapie-infected cells, but not control cells, also contains virus-like particle arrays and because we were able to isolate these nuclease-protected particles with quantitative recovery of infectivity, but with little or no detectable PrP (Prion Protein), we began to analyze protected nucleic acids. Using Φ29 rolling circle amplification, several circular DNA sequences of <5 kb (kilobases) with ORFs (Open Reading Frames) were thereby discovered in brain and cultured neuronal cell lines. These circular DNA sequences were named SPHINX elements for their initial association with slow progressive hidden infections of X origin."

SPHINX itself codes for a 324 amino acid protein, which is found in human brain, concentrated in synaptic boutons.  Strangely, even though the DNAs are presumably viral derived, they contain intervening sequences which don't code for protein.

The use of rolling circle amplification is quite clever, as it will copy only circular DNA.

Stanley Prusiner is sure to weigh in.  Remarkably, Prusiner was at Penn Med when I was and was even in my med school fraternity (Nu Sigma Nu)  primarily a place to eat lunch and dinner.  I probably ate with him, but have no recollection of him whatsoever.

Circular DNAs outside chromosomes are called plasmids. Bacteria are full of them. The best known eukaryote containing plasmids is yeast. Perhaps we have them as well. Manuelidis may be the first person to look.

Should you take aspirin after you exercise?

I just got back from a beautiful four and a half mile walk around a reservoir behind my house.  I always take 2 adult aspirin after such things like this.  A recent paper implies that perhaps I should not [ Proc. Natl. Acad. Sci. vol. 114 pp. 6675 – 6684 ’17 ].  Here’s why.

Muscle has a set of stem cells all its own.  They are called satellite cells.  After injury they proliferate and make new muscle. One of the triggers for this is a prostaglandin known as PGE2 — https://en.wikipedia.org/wiki/Prostaglandin_E2 — clearly a delightful structure for the organic chemist to make.  It binds to a receptor on the satellite cell (called EP4R) following which all sorts of things happen, which will make sense to you if you know some cellular biochemistry.  Activation of EP4R triggers activation of the cyclic AMP (CAMP) phosphoCREB pathway.  This activates Nurr1, a transcription factor which causes cellular proliferation.

Why no aspirin? Because it inhibits cyclo-oxygenase which forms the 5 membered ring of PGE2.

I think you should still aspirin afterwards, as the injury produced in the paper was pretty severe — muscle toxins, cold injury etc. etc. Probably the weekend warriors among you don’t damage your muscles that much.

A few further points about aspirin and the NSAIDs

Now aspirin is an NSAID (NonSteroid AntiInflammatory Drug) — along with a zillion others (advil, anaprox, ansaid, clinoril, daypro, dolobid, feldene, indocin — etc. etc. a whole alphabet’s worth). It is rather different in that it has an acetyl group on the benzene ring.  Could it be an acetylating agent for things like histones and transcription factors, producing far more widespread effects than those attributable to cyclo-oxygenase inhibition.   I’ve looked at the structures of a few of them — some have CH2-COOH moieties in them, which might be metabolized to an acetyl group –doubt.  Naproxen (Anaprox, Naprosyn) does have an acetyl group — but the other 13 structures I looked at do not.

Another possible negative of aspirin after exercise, is the fact that inhibition of platelet cyclo-oxygenase makes it harder for them to stick together and form clots (this is why it is used to prevent heart attack and stroke). So aspirin might result in more extensive micro-hemorrhages in muscle after exercise (if such things exist).

Gotterdamerung — The Twilight of the GWAS

Life may be like a well, but cellular biochemistry and gene function is like a mattress.  Push on it anywhere and everything changes, because it’s all hooked together.  That’s the only conclusion possible if a review of genome wide association studies (GWAS) is correct [ Cell vol. 169 pp. 1177 – 1186 ’17 ].

 It’s been a scandal for years that GWAS studies as they grow larger and larger are still missing large amounts of the heritability of known very heritable conditions (e.g. schizophrenia, height).  It’s been called the dark matter of the genome (e.g. we know it’s there, but we don’t know what it is).

If you’re a little shaky about how GWAS works have a look at https://luysii.wordpress.com/2014/08/24/tolstoy-rides-again-schizophrenia/ — it will come up again later in this post.

We do know that less than 10% of the SNPs found by GWAS lie in protein coding genes — this means either that they are randomly distributed, or that they are in regions controlling gene expression.  Arguing for randomness — the review states that the heritability contributed by each chromosome tends to be closely proportional to chromosome length.  Schizophrenia is known to be quite heritable, and monozygotic twins have a concordance rate of 40%.  Yet an amazing study (which is quoted but which I have not read) estimates that nearly 100% of all 1 megabase windows in the human genome contribute to schizophrenia heritability (Nature Genet. vol. 47 pp. 1385 – 1392 ’15). Given the 3.2 gigaBase size of our genome that’s 3,200 loci.

Another example is the GIANT study about the heritability of height.  The study was based on 250,000 people and some 697 gene wide significant loci were found.  In aggregate they explain a mere SIXTEEN PERCENT.

So what is going on?

It gets back to the link posted earlier. The title —  “Tolstoy rides again”  isn’t a joke.  It refers to the opening sentence of Anna Karenina — “Happy families are all alike; every unhappy family is unhappy in its own way”.  So there are many routes to schizophrenia (and they are spread all over the genome).

The authors of the review think that larger and larger GWAS studies (some are planned with over a million participants) are not going to help and are probably a waste of money.  Whether the review is Gotterdamerung for GWAS isn’t clear, but the review is provocative.The review is new and it will be interesting to see the response by the GWAS people.

So what do they think is going on?  Namely that everything in organismal and cellular biochemistry, genetics and physiology is related to everything else.  Push on it in one place and like a box spring mattress, everything changes.  The SNPs found outside the DNA coding for proteins are probably changing the control of protein synthesis of all the genes.

The dark matter of the genome is ‘the plan’ which makes the difference between animate and inanimate matter.   For more on this please see — https://luysii.wordpress.com/2015/12/15/it-aint-the-bricks-its-the-plan-take-ii/

Fascinating and enjoyable to be alive at such a time in genetics, biochemistry and molecular biology.

Happy Fourth of July

Only immigrants truly appreciate this country.  So it’s worth repeating an earlier post about them. Happy fourth of July.

Hitler’s gifts (and Russia’s gift)

In the summer of 1984 Barack Obama was at Harvard Law, his future wife was a Princeton undergraduate, and Edward Frenkel a 16 year old mathematical prodigy was being examined for admission to Moscow State University. He didn’t get in because he was Jewish. His blow by blow description of the 5 hour exam on pp. 28 – 38 of his book “Love & Math” is as painful to read as it must have been for him to write.

A year earlier the left in Europe had mobilized against the placement of Pershing missiles in Europe by president Reagan, already known there as a crude and witless former actor, but, unfortunately possessed of nuclear weapons. Tens of thousands marched. He had even called the Soviet Union an Evil Empire that year. Leftists the world over were outraged. How unsophisticated to even admit the possibility of evil. Articles such as “Reagan’s image in Europe does not help Allies in deploying American missiles” appeared in the liberal press.

The hatred of America is nothing new for the left.

Reset the clock to ’60 – ’62 when I was a grad student in the Harvard Chemistry department. The best place to meet women was the International house. It had a piano, and a Polish guy who played Chopin better than I did. It had a ping pong table, and another Polish guy who beat me regularly. The zeitgeist at Harvard back then, was that America was rather crude (the Ugly American was quite popular), boorish and unappreciative of the arts, culture etc. etc.

One woman I met was going on and on about this, particularly the condition of the artist in America, and how much better things were in Europe. I brought up Solzhenitzen, and the imprisonment of dissidents over there. Without missing a beat, she replied that this just showed how important the Russian government thought writers and artists were. This was long before Vietnam.

It was definitely a Saul on the road to Damascus moment for me. When the left began spelling America, Amerika in the 60s and 70s, I just ignored it.

Fast forward to this fall, and the Nobels. The 7th Chemistry Nobel bestowed on a department member when I was there went to Marty Karplus. The others were Woodward, Corey, Lipscomb, Gilbert, Hoffman, Bloch. While Bill Lipscomb was a Kentucky gentleman to a T (and a great guy), Hoffman spent World War II hiding out in an attic, his father being in a concentration camp (guess why). Konrad Bloch (who looked as teutonic as they come) also got out of Europe due to his birth. Lastly Karplus got out of Euruope as a child for the same reason. Don Voet, a fellow grad student, whose parents got out of Europe for (I’ll make you guess), used to say that the Universal Scientific Language was — broken English.

So 3/7 of the Harvard Chemistry Nobels are Hitler and Europe’s gifts to America.

Russia, not to be outdone, gave us Frenkel. Harvard recognized his talent, and made him a visiting professorship at age 21, later enrolling him in grad school so he could get a PhD. He’s now a Stanford prof.

So the next time, someone touts the “European model” of anything, ask them about Kosovo, or any of this.

***

Those of you in training should consider the following. You really won’t know how good what you are getting really is until 50 years or so have passed. That’s not to say Harvard Chemistry’s reputation wasn’t very good back then. Schleyer said ‘now you’re going to Mecca’ when he heard I’d gotten in.

Also to be noted, is that all 7 future Nobelists in the early 60s weren’t resting on their laurels, but actively creating them. The Nobels all came later

Remember entropy? — Take II

Organic chemists have a far better intuitive feel for entropy than most chemists. Condensations such as the Diels Alder reaction decrease it, as does ring closure. However, when you get to small ligands binding proteins, everything seems to be about enthalpy. Although binding energy is always talked about, mentally it appears to be enthalpy (H) rather than Gibbs free energy (F).

A recent fascinating editorial and paper [ Proc. Natl. Acad. Sci. vol. 114 pp. 4278 – 4280, 4424 – 4429 ’17 ]shows how the evolution has used entropy to determine when a protein (CzrA) binds to DNA and when it doesn’t. As usual, advances in technology permit us to see this (e.g. multidimensional heteronuclear nuclear magnetic resonance). This allows us to determine the motion of side chains (methyl groups), backbones etc. etc. When CzrA binds to DNA methyl side chains on the protein move more, increasing entropy (deltaS) and as well all know the Gibbs free energy of reaction (deltaF) isn’t just enthalpy (deltaH) but deltaH – TdeltaS, so an increase in deltaS pushes deltaF lower meaning the reaction proceeds in that direction.

Binding of Zinc redistributes these side chain motion so that entropy decreases, and the protein moves off DNA. The authors call this dynamics driven allostery. The fascinating thing, is that this may happen without any conformational change of CzrA.

I’m not sure that molecular dynamics simulations are good enough to pick this up. Fortunately newer NMR techniques can measure it. Just another complication for the hapless drug chemist thinking about protein ligand interactions.

A recent paper [ Proc. Natl. Acad. Sci. vol. 114 pp. 6563  – 6568 ’17 ] went into more detail about measuring side chain motions  as a surrogate for conformational entropy.  It can now be measured by NMR.  They define complete restriction of  the methyl group symmetry axis as 1, and complete disorder, and state that ‘a variety of models’ imply that the value is LINEARLY related to conformational entropy making it an ‘entropy meter’.  They state that measurement of fast internal side chain motion is largely restricted to the methyl group — this makes me worry that other side chains (which they can’t measure) are moving as well and contributing to entropy.

The authors studied some 28 protein/ligand systems, and found that the contribution of conformational entropy to ligand binding can be favorable, negligible or unfavorable.

What is bothersome to the authors (and to me) is that there were no obvious structural correlates between the degree of conformation entropy and protein structure.  So it’s something you measure not something you predict, making life even more difficult for the computational chemist studying protein ligand interactions.

Correctly taken to task by two readers and some breaking news

I should have amended the previous post to say I mistrust unverified models.  Here are two comments

#1 Andyextance

  • “Leaving aside the questions of the reliability of models in different subjects, and whether all of your six reasons truly relate to models, I have one core question: Without models, how can we have any idea about what the future might hold? Models may not always be right – but as long as they have some level of predictive skill they can often at least be a guide.”

    Absolutely correct — it’s all about prediction, not plausibility.

#2 Former Bell Labs denizen

“And yet you board a commercial airliner without hesitation, freely trusting your life to the models of aerodynamics, materials science, control system theory, electronics, etc. that were used in designing the aircraft. Similar comments apply to entering a modern skyscraper, or even pushing the brake pedal on your automobile.
Perhaps what you are really saying is that you don’t trust models until their correctness is demonstrated by experience; after that, you trust them. Hey, nothing to disagree with there.”
Correct again
Breaking news
This just in — too late for yesterday’s post — the climate models have overestimated the amount of warming to be expected this century — the source  is an article  in
Nature Geoscience (2017) doi:10.1038/ngeo2973 — behind a paywall — but here’s the abstract
In the early twenty-first century, satellite-derived tropospheric warming trends were generally smaller than trends estimated from a large multi-model ensemble. Because observations and coupled model simulations do not have the same phasing of natural internal variability, such decadal differences in simulated and observed warming rates invariably occur. Here we analyse global-mean tropospheric temperatures from satellites and climate model simulations to examine whether warming rate differences over the satellite era can be explained by internal climate variability alone. We find that in the last two decades of the twentieth century, differences between modelled and observed tropospheric temperature trends are broadly consistent with internal variability. Over most of the early twenty-first century, however, model tropospheric warming is substantially larger than observed; warming rate differences are generally outside the range of trends arising from internal variability. The probability that multi-decadal internal variability fully explains the asymmetry between the late twentieth and early twenty-first century results is low (between zero and about 9%). It is also unlikely that this asymmetry is due to the combined effects of internal variability and a model error in climate sensitivity. We conclude that model overestimation of tropospheric warming in the early twenty-first century is partly due to systematic deficiencies in some of the post-2000 external forcings used in the model simulations.
 
Unfortunately the abstract doesn’t quantify generally smaller.
 
Models whose predictions are falsified by data are not to be trusted.
 
Yet another reason Trump was correct to get the US out of the Paris accords— in addition to the reasons he used — no method of verification, no penalties for failure to reduce CO2 etc. etc.  The US would tie itself in economic knots trying to live up to it, while other countries would emit pious goals for reduction and do very little. 
In addition, \ I find it rather intriguing that the article was not published in Nature Climate Change   –,http://www.nature.com/nclimate/index.html — which would seem to be the appropriate place.  Perhaps it’s just too painful for them.