Category Archives: Chemistry (relatively pure)

The flying Wallendas of the synapse

Is anything similar to the flying Wallendas ( https://en.wikipedia.org/wiki/The_Flying_Wallendas) going on in the synapse? The first electron micrographs of the synaptic cleft back in the day showed a clear space about 400 Angstroms (40 nanoMeters) thick.  Well we now know that there are tons of proteins occupying this space — a copy of a previous post

The bouillabaisse of the synaptic cleft

appears after the **** at the end of this post.  It shows just how many proteins occupy that clear space. Could a presynaptic protein directly bond to a postsynaptic protein across the cleft (perhaps with the help of a third or fourth Wallenda protein between the two?  A nice review [ Neuron vol. 96 pp. 680 – 696 ’17b ] http://www.cell.com/neuron/fulltext/S0896-6273(17)30935-2 sets out what is known.

We know that neurexins (presynaptic) bind to neuroligins (postsynaptic) across the cleft.  This is the best studied pair, and most of the earlier post discusses what is known about them.

Figure 1c p. 682 is particularly fascinating as it shows that there are many more molecules which shake hands across the cleft.  Even more interesting is the fact that just where they are relative to the center/periphery of  the synapses isn’t shown for the neurexin/neuroligin pair and the LAR/Strk pair (e.g. one of the best studied pairs) because apparently this isn’t known.   The ephrins/ephrin pair and the syncam pair are in the center, while N-cadherin is shown at the edge.

One of the crucial elements of the post-synaptic membrane, the AMPAR receptor for glutamic protrudes its amino terminal domain 1/3 of the way across the cleft (assuming it is 40 nanoMeters thick).

Postsynaptic receptors are said to be clustered in nanoDomains 80 – 100 nanoMeters in diameter, Similarly, presynaptic RIM nanoClusters are the same size and are said to be aligned with postSynaptic nanoClusters of PSD95 as measured by 3D-STORM, the current most cutting edge technique we have for visualizing these things [ Nature vol. 536 pp. 210 – 214 ’17 ].

So, all in all, the paper is fascinating and shows how much more there is to know.

Unfortunately the paper contains one statement which raises my chemical hackles;  “A consistent prediction across models is that the glutamate concentration profile reaches a very high peak (over 1 milliMolar), but only for a brief time period (100 milliSeconds) and over a small distance (100 nanoMeters).” Glutamate is the major excitatory neurotransmitter in brain and is what binds to AMPAR.

Models are lovely, but how many molecules of glutamic acid are they talking about?  It’s easy (but tedious) to figure this out.

We know the volume they are talking about: a cylinder 100 nanoMeters in diameter and 40 nanoMeters tall (the width of the synaptic cleft).   So it contains pi * 100 * 40 = 12,566 cubic nanometers –round this down to 10^4 cubic nanoMeters. A liter is a cube .1 meters (10 centimeters) on a side. So 10 centimeters is 10^8 nanoMeters, meaning that a liter contains (10^8)^3 = 10^24 cubic nanoMeters.

A 1 molar solution of anything contains 6 * 10^23 molecules per liter (Avogadro’s number), so a 1 milliMolar solution (of glutamate in this case) contains 6 * 10^20 molecules/liter or  6 * 10^-4 molecules per cubic nanoMeter. Multiply this by the volume of the cylinder and you get a grand total of 6 molecules of glutamic acid in the cylinder.

If I’ve done the calculations correctly (and I think I have), “a very high peak (over 1 milliMolar)” is basically scientific garbage, the concept of concentration being stretched far beyond its range of meaningful applicability.

I’d love to stand corrected if my calculations are incorrect. Just make a comment.

 

*****

The bouillabaisse of the synaptic cleft

The synaptic cleft is so small ( under 400 Angstroms — 40 nanoMeters ) that it can’t be seen with the light microscope ( the smallest wavelength of visible light 3,900 Angstroms — 390 nanoMeters).  This led to a bruising battle between Cajal and Golgi a just over a century ago over whether the brain was actually made of cells.  Even though Golgi’s work led to the delineation of single neurons he thought the brain was a continuous network.  They both won the Nobel in 1906.

Semifast forward to the mid 60s when I was in medical school.  We finally had the electron microscope, so we could see synapses. They showed up as a small CLEAR spaces (e.g. electrons passed through it easily leaving it white) between neurons.  Neurotransmitters were being discovered at the same time and the synapse was to be the analogy to vacuum tubes, which could pass electricity in just one direction (yes, the transistor although invented hadn’t been used to make anything resembling a computer — the Intel 4004 wasn’t until the 70s).  Of course now we know that information flows back and forth across the synapse, with endocannabinoids (e. g. natural marihuana) being the major retrograde neurotransmitter.

Since there didn’t seem to be anything in the synaptic cleft, neurotransmitters were thought to freely diffuse across it to being to receptors on the other (postsynaptic) side e.g. a free fly zone.

Fast forward to the present to a marvelous (and grueling to read because of the complexity of the subject not the way it’s written) review of just what is in the synaptic cleft [ Cell vol. 171 pp. 745 – 769 ’17 ] http://www.cell.com/cell/fulltext/S0092-8674(17)31246-1 (It is likely behind a paywall).  There are over 120 references, and rather than being just a catalogue, the single author Thomas Sudhof extensively discusseswhich experimental work is to be believed (not that Sudhof  is saying the work is fraudulent, but that it can’t be used to extrapolate to the living human brain).  The review is a staggering piece of work for one individual.

The stuff in the synaptic cleft is so diverse, and so intimately involved with itself and the membranes on either side what what is needed for comprehension is not a chemist but a sociologist.  Probably most of the molecules to be discussed are present in such small numbers that the law of mass action doesn’t apply, nor do binding constants which rely on large numbers of ligands and receptors. Not only that, but the binding constants haven’t been been determined for many of the players.

Now for some anatomic detail and numbers.  It is remarkably hard to find just how far laterally the synaptic cleft extends.  Molecular Biology of the Cell ed. 5 p. 1149 has a fairly typical picture with a size marker and it looks to be about 2 microns (20,000 Angstroms, 2,000 nanoMeters) — that’s 314,159,265 square Angstroms (3.14 square microns).  So let’s assume each protein takes up a square 50 Angstroms on a side (2,500 square Angstroms).  That’s room for 125,600 proteins on each side assuming extremely dense packing.  However the density of acetyl choline receptors at the neuromuscular junction is 8,700/square micron, a packing also thought to be extremely dense which would give only 26,100 such proteins in a similarly distributed CNS synapse. So the numbers are at least in the right ball park (meaning they’re within an order of magnitude e.g. within a power of 10) of being correct.

What’s the point?

When you see how many different proteins and different varieties of the same protein reside in the cleft, the numbers for  each individual element is likely to be small, meaning that you can’t use statistical mechanics but must use sociology instead.

The review focuses on the neurExins (I capitalize the E  to help me remember that they are prEsynaptic).  Why?  Because they are the best studied of all the players.  What a piece of work they are.  Humans have 3 genes for them. One of the 3 contains 1,477 amino acids, spread over 1,112,187 basepairs (1.1 megaBases) along with 74 exons.  This means that just over 1/10 of a percent of the gene is actually coding for for the amino acids making it up.  I think it takes energy for RNA polymerase II to stitch the ribonucleotides into the 1.1 megabase pre-mRNA, but I couldn’t (quickly) find out how much per ribonucleotide.  It seems quite wasteful of energy, unless there is some other function to the process which we haven’t figured out yet.

Most of the molecule resides in the synaptic cleft.  There are 6 LNS domains with 3 interspersed EGFlike repeats, a cysteine loop domain, a transmembrane region and a cytoplasmic sequence of 55 amino acids. There are 6 sites for alternative splicing, and because there are two promoters for each of the 3 genes, there is a shorter form (beta neurexin) with less extracellular stuff than the long form (alpha-neurexin).  When all is said and done there are over 1,000 possible variants of the 3 genes.

Unlike olfactory neurons which only express one or two of the nearly 1,000 olfactory receptors, neurons express mutiple isoforms of each, increasing the complexity.

The LNS regions of the neurexins are like immunoglobulins and fill at 60 x 60 x 60 Angstrom box.  Since the synaptic cleft is at most 400 Angstroms long, the alpha -neurexins (if extended) reach all the way across.

Here the neurexins bind to the neuroligins which are always postsynaptic — sorry no mnemonic.  They are simpler in structure, but they are the product of 4 genes, and only about 40 isoforms (due to alternative splicing) are possible. Neuroligns 1, 3 and 4 are found at excitatory synapses, neuroligin 2 is found at inhibitory synapses.  The intracleft part of the neuroligins resembles an important enzyme (acetylcholinesterase) but which is catalytically inactive.  This is where the neurexins.

This is complex enough, but Sudhof notes that the neurexins are hubs interacting with multiple classes of post-synaptic molecules, in addition to the neuroligins — dystroglycan, GABA[A] receptors, calsystenins, latrophilins (of which there are 4).   There are at least 50 post-synaptic cell adhesion molecules — “Few are well understood, although many are described.”

The neurexins have 3 major sites where other things bind, and all sites may be occupied at once.  Just to give you a taste of he complexity involved (before I go on to  larger issues).

The second LNS domain (LNS2)is found only in the alpha-neurexins, and binds to neuroexophilin (of which there are 4) and dystroglycan .

The 6th LNS domain (LNS6) binds to neuroligins, LRRTMs, GABA[A] receptors, cerebellins and latrophilins (of which there are 4)_

The juxtamembrane sequence of the neurexins binds to CA10, CA11 and C1ql.

The cerebellins (of which there are 4) bind to all the neurexins (of a particular splice variety) and interestingly to some postsynaptic glutamic acid receptors.  So there is a direct chain across the synapse from neurexin to cerebellin to ion channel (GLuD1, GLuD2).

There is far more to the review. But here is something I didn’t see there.  People have talked about proton wires — sites on proteins that allow protons to jump from one site to another, and move much faster than they would if they had to bump into everything in solution.  Remember that molecules are moving quite rapidly — water is moving at 590 meters a second at room temperature. Since the synaptic cleft is 40 nanoMeters (40 x 10^-9 meters, it should take only 40 * 10^-9 meters/ 590 meters/second   60 trillionths of a second (60 picoSeconds) to cross, assuming the synapse is a free fly zone — but it isn’t as the review exhaustively shows.

It it possible that the various neurotransmitters at the synapse (glutamic acid, gamma amino butyric acid, etc) bind to the various proteins crossing the cleft to get their target in the postsynaptic membrane (e.g. neurotransmitter wires).  I didn’t see any mention of neurotransmitter binding to  the various proteins in the review.  This may actually be an original idea.

I’d like to put more numbers on many of these things, but they are devilishly hard to find.  Both the neuroligins and neurexins are said to have stalks pushing them out from the membrane, but I can’t find how many amino acids they contain.  It can’t find how much energy it takes to copy the 1.1 megabase neurexin gene in to mRNA (or even how much energy it takes to add one ribonucleotide to an existing mRNA chain).

Another point– proteins have a finite lifetime.  How are they replenished?  We know that there is some synaptic protein synthesis — does the cell body send packages of mRNAs to the synapse to be translated there.  There are at least 50 different proteins mentioned in the review, and don’t forget the thousands of possible isoforms, each of which requires a separate mRNA.

Old Chinese saying — the mountains are high and the emperor is far away. Protein synthesis at the synaptic cleft is probably local.  How what gets made and when is an entirely different problem.

A large part of the review concerns mutations in all these proteins associated with neurologic disease (particularly autism).  This whole area has a long and checkered history.  A high degree of cynicism is needed before believing that any of these mutations are causative.  As a neurologist dealing with epilepsy I saw the whole idea of ion channel mutations causing epilepsy crash and burn — here’s a link — https://luysii.wordpress.com/2011/07/17/we’ve-found-the-mutation-causing-your-disease-not-so-fast-says-this-paper/

Once again, hats off to Dr. Sudhof for what must have been a tremendous amount of work

Advertisements

Why drug discovery is hard #29 — a very old player doing a very new thing

We all know what RNA does don’t we?  It binds to other RNAs and to DNA.  Sure lots of new forms of RNA have been found: microRNAs, competitive endogenous RNA (ceRNA), long nonCoding (for protein) RNA (lncRNA), piwiRNAs, small interfering RNAs (siRNAs), . .. The list appears endless.  But the basic mechanism of action of RNA in the cell is binding to some other polynucleotide (RNA or DNA) and affecting its function.

Not so fast.  A new paper http://science.sciencemag.org/content/358/6366/1051 describes  lncRNA-ACOD1, a cellular RNA induced by a variety of viruses.  lncRNA-ACOD1 binds to an enzyme enhancing its catalytic efficiency.  Now that’s new.  Certainly RNAs and proteins bind to each other in the ribosome, and in RNAase P, but here the proteins serve to structure the RNA so it can carry out its catalytic function, not the other way around.

The enzyme bound is called GOT2 (Glutamic Oxaloacetic Transaminase 2).  Much interesting cellular biochemistry is discussed in the paper which I’ll skip, except to say that the virus uses the hyped up GOT2 to repurpose the cell’s metabolic machinery for its own evil ends.

lncRNA-ACOD1 has 3 exons and a polyAdenine tail.  There are two transcript variants containing  2,330 and 2,259 nucleotides.  There are only 100 copies/cell.  lncRNA-ACOD1 nucleotides #165 – #390 bind to amino acids #54 – #68 of GOT2.

So what are the other 2000 or so nucleotides of lncRNA-ACOD1 doing?   The phenomenon of RNA binding to protein is quite likely to be more widespread.  Both the GOT2 interacting motif and the interacting sequence of lncRNA-ACOD1 are well conserved across species of hosts and viruses.

Although viruses co-opt lncRNA-ACOD1, it is normally expressed in the heart as is GOT2 with no viral infection at all.  So we have likely stumbled onto an entirely new method of cellular metabolic control, AND a whole new set of players and interactions for drugs to act on (if they aren’t already doing this unknown to us).

This is series member #29 of why drug development is hard, most of which concentrated on the fact that we don’t know all the players.  lncRNA-ACOD1 is different — RNA is a player we’ve known for a very long time  but it appears to be playing a game entirely new to us.

It is also good to see cutting edge research like this coming out of China.  Hopefully it will stand up, but enough questionable stuff has come from them that every Chinese paper is under a cloud.

This is why I love reading the current literature.  You never know what you’re going to find.  It’s like opening presents.

A few Thanksgiving thank you’s

As CEO of a very large organization, it’s time to thank those unsung divisions that make it all possible.  Fellow CEOs should take note and act appropriately regardless of the year it’s been for them.

First: thanks to the guys in shipping and receiving.  Kinesin moves the stuff out and Dynein brings it back home.  Think of how far they have to go.  The head office sits in area 4 of the cerebral cortex and K & D have to travel about 3 feet down to the motorneurons in the first sacral segment of the spinal cord controlling the gastrocnemius and soleus, so the boss can press the pedal on his piano when he wants. Like all good truckers, they travel on the highway.  But instead of rolling they jump.  The highway is pretty lumpy being made of 13 rows of tubulin dimers.

Now chemists are very detail oriented and think in terms of Angstroms (10^-10 meters) about the size of a hydrogen atom. As CEO and typical of cell biologists, I have to think in terms of the big picture, so I think in terms of nanoMeters (10^-9 meters).  Each tubulin dimer is 80 nanoMeters long, and K & D essentially jump from one to the other in 80 nanoMeter steps.  Now the boss is shrinking as he gets older, but my brothers working for players in the NBA have to go more than a meter to contract the gastrocnemius and soleus (among other muscles) to help their bosses jump.  So split the distance and call the distance they have to go one Meter.  How many jumps do Kinesin and Dynein have to make to get there? Just 10^9/80 — call it 10,000,000. The boys also have to jump from one microtubule to another, as the longest microtubule in our division is at most 100 microns (.1 milliMeter).  So even in the best of cases they have to make at least 10,000 transfers between microtubules.  It’s a miracle they get the job done at all.

To put this in perspective, consider a tractor trailer (not a truck — the part with the motor is the tractor, and the part pulled is the trailer — the distinction can be important, just like the difference between rifle and gun as anyone who’s been through basic training knows quite well).  Say the trailer is 48 feet long, and let that be comparable to the 80 nanoMeters K and D have to jump. That’s 10,000,000 jumps of 48 feet or 90,909 miles.  It’s amazing they get the job done.

Second: Thanks to probably the smallest member of the team.  The electron.  Its brain has to be tiny, yet it has mastered quantum mechanics because it knows how to tunnel through a potential barrier.   In order to produce the fuel for K and D it has to tunnel some 20 Angstroms from the di-copper center (CuA) to heme a in cytochrome C oxidase (COX).  Is the electron conscious? Who knows?  I don’t tell it what to do.   Now COX is just a part of one of our larger divisions, the power plant (the mitochondrion).

Third: The power plant.  Amazing to think that it was once (a billion years or more ago) a free living bacterium.  Somehow back in the mists of time one of our predecessors captured it.  The power plant produces gas (ATP) for the motors to work.  It’s really rather remarkable when you think of it.   Instead of carrying a tank of ATP, kinesin and dynein literally swim in the stuff, picking it up from the surroundings as they move down the microtubule.  Amazingly the entire division doesn’t burn up, but just uses the ATP when and where needed.  No spontaneous combustion.

There are some other unsung divisions to talk about (I haven’t forgotten you ladies in the steno pool, and your incredible accuracy — 1 mistake per 100,000,000 letters [ Science vol. 328 pp. 636 – 639 ’10 ]).  But that’s for next time.

To think that our organization arose by chance, working by finding a slightly better solution to problems it face boggles this CEO’s mind (but that’s the current faith — so good to see such faith in an increasingly secular world).

The bouillabaisse of the synaptic cleft

The synaptic cleft is so small ( under 400 Angstroms — 40 nanoMeters ) that it can’t be seen with the light microscope ( the smallest wavelength of visible light 3,900 Angstroms — 390 nanoMeters).  This led to a bruising battle between Cajal and Golgi a just over a century ago over whether the brain was actually made of cells.  Even though Golgi’s work led to the delineation of single neurons he thought the brain was a continuous network.  They both won the Nobel in 1906.

Semifast forward to the mid 60s when I was in medical school.  We finally had the electron microscope, so we could see synapses. They showed up as a small CLEAR spaces (e.g. electrons passed through it easily leaving it white) between neurons.  Neurotransmitters were being discovered at the same time and the synapse was to be the analogy to vacuum tubes, which could pass electricity in just one direction (yes, the transistor although invented hadn’t been used to make anything resembling a computer — the Intel 4004 wasn’t until the 70s).  Of course now we know that information flows back and forth across the synapse, with endocannabinoids (e. g. natural marihuana) being the major retrograde neurotransmitter.

Since there didn’t seem to be anything in the synaptic cleft, neurotransmitters were thought to freely diffuse across it to being to receptors on the other (postsynaptic) side e.g. a free fly zone.

Fast forward to the present to a marvelous (and grueling to read because of the complexity of the subject not the way it’s written) review of just what is in the synaptic cleft [ Cell vol. 171 pp. 745 – 769 ’17 ] http://www.cell.com/cell/fulltext/S0092-8674(17)31246-1 (It is likely behind a paywall).  There are over 120 references, and rather than being just a catalogue, the single author Thomas Sudhof extensively discusseswhich experimental work is to be believed (not that Sudhof  is saying the work is fraudulent, but that it can’t be used to extrapolate to the living human brain).  The review is a staggering piece of work for one individual.

The stuff in the synaptic cleft is so diverse, and so intimately involved with itself and the membranes on either side what what is needed for comprehension is not a chemist but a sociologist.  Probably most of the molecules to be discussed are present in such small numbers that the law of mass action doesn’t apply, nor do binding constants which rely on large numbers of ligands and receptors. Not only that, but the binding constants haven’t been been determined for many of the players.

Now for some anatomic detail and numbers.  It is remarkably hard to find just how far laterally the synaptic cleft extends.  Molecular Biology of the Cell ed. 5 p. 1149 has a fairly typical picture with a size marker and it looks to be about 2 microns (20,000 Angstroms, 2,000 nanoMeters) — that’s 314,159,265 square Angstroms (3.14 square microns).  So let’s assume each protein takes up a square 50 Angstroms on a side (2,500 square Angstroms).  That’s room for 125,600 proteins on each side assuming extremely dense packing.  However the density of acetyl choline receptors at the neuromuscular junction is 8,700/square micron, a packing also thought to be extremely dense which would give only 26,100 such proteins in a similarly distributed CNS synapse. So the numbers are at least in the right ball park (meaning they’re within an order of magnitude e.g. within a power of 10) of being correct.

What’s the point?

When you see how many different proteins and different varieties of the same protein reside in the cleft, the numbers for  each individual element is likely to be small, meaning that you can’t use statistical mechanics but must use sociology instead.

The review focuses on the neurExins (I capitalize the E  to help me remember that they are prEsynaptic).  Why?  Because they are the best studied of all the players.  What a piece of work they are.  Humans have 3 genes for them. One of the 3 contains 1,477 amino acids, spread over 1,112,187 basepairs (1.1 megaBases) along with 74 exons.  This means that just over 1/10 of a percent of the gene is actually coding for for the amino acids making it up.  I think it takes energy for RNA polymerase II to stitch the ribonucleotides into the 1.1 megabase pre-mRNA, but I couldn’t (quickly) find out how much per ribonucleotide.  It seems quite wasteful of energy, unless there is some other function to the process which we haven’t figured out yet.

Most of the molecule resides in the synaptic cleft.  There are 6 LNS domains with 3 interspersed EGFlike repeats, a cysteine loop domain, a transmembrane region and a cytoplasmic sequence of 55 amino acids. There are 6 sites for alternative splicing, and because there are two promoters for each of the 3 genes, there is a shorter form (beta neurexin) with less extracellular stuff than the long form (alpha-neurexin).  When all is said and done there are over 1,000 possible variants of the 3 genes.

Unlike olfactory neurons which only express one or two of the nearly 1,000 olfactory receptors, neurons express mutiple isoforms of each, increasing the complexity.

The LNS regions of the neurexins are like immunoglobulins and fill at 60 x 60 x 60 Angstrom box.  Since the synaptic cleft is at most 400 Angstroms long, the alpha -neurexins (if extended) reach all the way across.

Here the neurexins bind to the neuroligins which are always postsynaptic — sorry no mnemonic.  They are simpler in structure, but they are the product of 4 genes, and only about 40 isoforms (due to alternative splicing) are possible. Neuroligns 1, 3 and 4 are found at excitatory synapses, neuroligin 2 is found at inhibitory synapses.  The intracleft part of the neuroligins resembles an important enzyme (acetylcholinesterase) but which is catalytically inactive.  This is where the neurexins.

This is complex enough, but Sudhof notes that the neurexins are hubs interacting with multiple classes of post-synaptic molecules, in addition to the neuroligins — dystroglycan, GABA[A] receptors, calsystenins, latrophilins (of which there are 4).   There are at least 50 post-synaptic cell adhesion molecules — “Few are well understood, although many are described.”

The neurexins have 3 major sites where other things bind, and all sites may be occupied at once.  Just to give you a taste of he complexity involved (before I go on to  larger issues).

The second LNS domain (LNS2)is found only in the alpha-neurexins, and binds to neuroexophilin (of which there are 4) and dystroglycan .

The 6th LNS domain (LNS6) binds to neuroligins, LRRTMs, GABA[A] receptors, cerebellins and latrophilins (of which there are 4)_

The juxtamembrane sequence of the neurexins binds to CA10, CA11 and C1ql.

The cerebellins (of which there are 4) bind to all the neurexins (of a particular splice variety) and interestingly to some postsynaptic glutamic acid receptors.  So there is a direct chain across the synapse from neurexin to cerebellin to ion channel (GLuD1, GLuD2).

There is far more to the review. But here is something I didn’t see there.  People have talked about proton wires — sites on proteins that allow protons to jump from one site to another, and move much faster than they would if they had to bump into everything in solution.  Remember that molecules are moving quite rapidly — water is moving at 590 meters a second at room temperature. Since the synaptic cleft is 40 nanoMeters (40 x 10^-9 meters, it should take only 40 * 10^-9 meters/ 590 meters/second   60 trillionths of a second (60 picoSeconds) to cross, assuming the synapse is a free fly zone — but it isn’t as the review exhaustively shows.

It it possible that the various neurotransmitters at the synapse (glutamic acid, gamma amino butyric acid, etc) bind to the various proteins crossing the cleft to get their target in the postsynaptic membrane (e.g. neurotransmitter wires).  I didn’t see any mention of neurotransmitter binding to  the various proteins in the review.  This may actually be an original idea.

I’d like to put more numbers on many of these things, but they are devilishly hard to find.  Both the neuroligins and neurexins are said to have stalks pushing them out from the membrane, but I can’t find how many amino acids they contain.  It can’t find how much energy it takes to copy the 1.1 megabase neurexin gene in to mRNA (or even how much energy it takes to add one ribonucleotide to an existing mRNA chain).

Another point– proteins have a finite lifetime.  How are they replenished?  We know that there is some synaptic protein synthesis — does the cell body send packages of mRNAs to the synapse to be translated there.  There are at least 50 different proteins mentioned in the review, and don’t forget the thousands of possible isoforms, each of which requires a separate mRNA.

Old Chinese saying — the mountains are high and the emperor is far away. Protein synthesis at the synaptic cleft is probably local.  How what gets made and when is an entirely different problem.

A large part of the review concerns mutations in all these proteins associated with neurologic disease (particularly autism).  This whole area has a long and checkered history.  A high degree of cynicism is needed before believing that any of these mutations are causative.  As a neurologist dealing with epilepsy I saw the whole idea of ion channel mutations causing epilepsy crash and burn — here’s a link — https://luysii.wordpress.com/2011/07/17/we’ve-found-the-mutation-causing-your-disease-not-so-fast-says-this-paper/

Once again, hats off to Dr. Sudhof for what must have been a tremendous amount of work

Antibodies without antibodies

If you knew exactly how an important class of antibodies interacted with its target, could you design a (relatively) small molecule to act the same way.  These people did, and the work has very exciting implications for infectious disease [ Science vol. 358 pp. 450 – 451, 496 – 502 ’17 ].

The influenza virus is a very slippery target.  Its genome is made of RNA, and copying it is quite error prone, so that mutants are formed all the time.  That’s why the vaccines of yesteryear are useless today.   However there are things called broadly neutralizing antibodies which work against many strains of the virus.  It attacks a vulnerable site on the hemagglutinin protein (HA) of the virus.  It is in the stem of the virus, and binding of the antibody here prevents the conformational change required for the virus to escape the endosome, a fact interesting in itself in that it implies that it only works after the virus enters the cell, although the authors do not explicitly state this.

Study of one broadly neutralizing antibody showed that binding to the site was mediated by a single hypervariable loop.  So the authors worked with a cyclic peptide mimicking the loop.  This has several advantages, in particular the fact that the entropic work of forcing a floppy protein chain into the binding conformation is already done before the peptide meets its target.

The final cyclic peptide contained 11 amino acids, of which 5 weren’t natural. It neutralized pandemic H1 and avaian H5 influenza A strains at nanoMolar concentration.

It’s important that crystal structures of the broadly neutralizing antibody binding to HA were available — this requires atomic level resolution.  I’m not sure cryoEM is there yet.

We don’t understand amyloid very well

I must admit I was feeling pretty snarky about our understanding of amyloid and Alzheimer’s after the structure of Abeta42 was published.  In particular the structure explained why the alanine 42–> threonine 42 mutation was protective against Alzheimer’s disease while the alanine 42 –> valine 42 mutation increases the risk.  That’s all explained in the last post — https://luysii.wordpress.com/2017/10/12/abeta42-at-last/ — but a copy will appear at the end.

In that post I breathlessly hoped for the structure of aBeta40 which is known to be less toxic to neurons.  Well it’s here and it shows how little we understand about what does and what doesn’t form amyloid.  The structure appears in a paper about the amyloid formed by another protein (FUS) to be described later — Cell 171, 615–627, October 19, 2017 — figure 7 p. 624.

Now all Abeta40 lacks are the last 2 amino acids of Abeta42 — isoleucine at 41 and alanine at 42.  So solve the Schrodinger equation for it, and stack it up so it forms amyloid, or use your favorite molecular dynamics or other modeling tool.  Take a guess what it looks like.

Abeta42 is a dimer, a beta40 is a trimer, even though the first 40 amino acids of both are identical.

It gets worse. FUS (FUsed in Sarcoma) is a 526 amino acid protein which binds to RNA and is mostly found in the nucleus.  Neurologists are interested in it because over 50 mutations in have been found in amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD).   FUS contains a low complexity domain (LCD) of 214 amino acids, 80% of which are one of 4 amino acids (glycine, serine, glutamine and tyrosine).  At high protein concentrations this domain of FUS forms long unbundled fibrils with the characteristic crossBeta structure of amyloid.  Only 57/214 of the LCD amino acids are part of the structured core of the amyloid — the rest are disordered.

Even worse the amino acids forming the amyloid core (#39 -#95) are NOT predicted by a variety of computational methods predicting amyloid formation (Agrescan, FISH, FOLDamyloid, Metamyl, PASTA 2.0).  The percentages of gly, ser, gln and tyr in the core forming region are pretty much the same as in the whole protein.  The core forming region has no repeats longer than 4 amino acids.

The same figure 7 has the structure of the amyloid formed by alpha-synuclein, which accumulates in the Lewy bodies of Parkinson’s disease.  It just has one peptide per layer of amyloid.

When you really understand something you can predict things, not just describe them as they are revealed.

 

Abeta42 at last

It’s easy to see why cryoEM got the latest chemistry Nobel.  It is telling us so much.  Particularly fascinating to me as a retired neurologist is the structure of the Abeta42 fibril reported in last Friday’s Science (vol. 358 pp. 116 – 119 ’17).

Caveats first.  The materials were prepared using an aqueous solution at low pH containing an organic cosolvent — so how physiologic could the structure actually be?  It probably is physiologic as the neurotoxicity of the fibrils to neurons in culture was the same as fibrils grown at neutral pH.  This still isn’t the same as fibrils grown in the messy concentrated chemical soup known as the cytoplasm.  Tending to confirm their findings is the fact that NMR and Xray diffraction on the crystals produced the same result.

The fibrils were unbranched and microns long (implying at least 2,000 layers of the beta sheets to be described).  The beta sheets stack in parallel and in register giving the classic crossBeta sheet structure.  They were made of two protofilaments winding around each other.  Each protofilament contains all 42 amino acids of Abeta42 and all of them form a completely flat beta sheet structure.

Feast your eyes on figure 2 p. 117.  In addition to showing the two beta sheets of the two protofilaments, it shows how they bind to each other.  Aspartic acid #1 of one sheet binds to lysine #28 of the other.  Otherwise the interface is quite hydrophobic.  Alanine2 of one sheet binds to alanine42 of the other, valine39 of one sheet binds to valine 39 of the other.  Most importantly isoLeucine 41 of one sheet binds to glycine38 of the other.

This is important since the difference between the less toxic Abeta40 and the toxic Abeta 42 are two hydrophobic amino acids Isoleucine 41 and Alanine 42.  This makes for a tighter, longer, more hydrophobic interface between the protofilaments stabilizing them.

That’s just a guess.  I can’t wait for work on Abeta40 to be reported at this resolution.

A few other points.  The beta sheet of each protomer is quite planar, but the planes of the two protomers are tilted by 10 degrees accounting for the helicity of the fibril. The fibril is a rhombus whose longest edge is about 70 Angstroms.

Even better the structure explains a mutation which is protective against Alzheimer’s.  This remains the strongest evidence (to me at least) that Abeta peptides are significantly involved in Alzheimer’s disease, therapeutic failures based on this idea notwithstanding.  The mutation is a change of alanine2 to threonine which can’t possibly snuggle up hydrophobically to isoleucine nearly as well as alanine did. This should significantly weaken the link between the two protofilaments and make fibril formation more difficult.

The Abeta structure of the paper also explains another mutation. This one increases the risk of Alzheimer’s disease (like many others which have been discovered).  It involves the same amino acid (alanine2) but this time it is changed to the morehydrophobic valine, probably resulting in a stronger hydrophobic interaction with isoLeucine41 (assuming that valine’s greater bulk doesn’t get in the way sterically).

Wonderful stuff to think and speculate about, now that we actually have some solid data to chew on.

Forgotten but not gone

Life is said to have originated in the RNA world.  We all know about the big 3 important RNAs for the cell, mRNA, ribosomal RNA and transfer RNA.  But just like the water, sewer, power and subway systems under Manhattan, there is another world down there in the cell which doesn’t much get talked about.  These are RNAs, whose primary (and possibly only) function is to interact with other RNAs.

Start with microRNAs (of which we have at least 1,500 as of 12/12).  Their function is to bind to messenger RNA (mRNA) and inhibit translation of the mRNA into protein.  The effects aren’t huge, but they are a more subtle control of protein expression, than the degree of transcription of the gene.

Then there are ceRNAs (competitive endogenous RNAs) which have a large number of binding sites for microRNAs — humans have a variety of them all with horrible acronyms — HULC, PTCSC3 etc. etc. They act as sponges for microRNAs keeping them bound and quiet.

Then there are circular RNAs.  They’d been missed until recently, because typical RNA sequencing methods isolate only RNAs with characteristic tails, and a circular RNA doesn’t have any.  One such is called CiRS7/CDR1) which contain 70 binding sites for one particular microRNA (miR-7).  They are unlike to be trivial.  They are derived from 15% of actively transcribed genes.  They ‘can be’ 10 times as numerous as linear RNAs (like mRNA and everything else) — probably because they are hard to degrade < Science vol. 340 pp. 440 – 441 ’17 >. So some of them are certainly RNA sponges — but all of them?

The latest, and most interesting class are the nonCoding RNAs found in viruses. Some of them function to attack cellular microRNAs and help the virus survive. Herpesvirus saimiri a gamma-herpes virus establishes latency in the T lymphocytes of New World primates, by expressing 7 small nuclear uracil-rich nonCoding RNAs (called HSURs).  They associate with some microRNAs, and rather than blocking their function act as chaperones < Nature vol. 550 pp. 275 – 279 ’17 >.  They HSURs also bind to some mRNAs inhibiting their function — they do this by helping miR-16 bind to their targets — so they are chaperones.  So viral Sm-class RNAs may function as microRNA adaptors.

Do you think for one minute, that the cell isn’t doing something like this.

I have a tendency to think of RNAs as always binding to other RNAs by classic Watson Crick base pairing — this is wrong as a look at any transfer RNA structure will show. https://en.wikipedia.org/wiki/Transfer_RNA.  Far more complicated structures may be involved, but we’ve barely started to look.

Then there are the pseudogenes, which may also have a function, which is to be transcribed and sop up microRNAs and other things — I’ve already written about this — https://luysii.wordpress.com/2010/07/14/junk-dna-that-isnt-and-why-chemistry-isnt-enough/.  Breast cancer cells think one (PTEN1) is important enough to stop it from being transcribed, even though it can’t be translated into protein.

Does she or doesn’t she? Only her geneticist knows for sure

Back in the day there was a famous ad for Claroil — Does she or doesn’t she? Only her hairdresser knows for sure.  Now it’s the geneticist who can sequence genes for Two Pore Channels in pigment forming cells (melanocytes) who really knows.

Except for redheads, skin and hair color is determined by how much eumelanin you have.  All human melanins are  polymers of oxidation products of tyrosine (DOPA, DOPAquinone) and indole 5,6 quinone, so its chemical structure isn’t certain.  It is made inside a specialized organelle of the melanocyte called (logically enough) the melanosome.

There is all sorts of interesting chemistry and physiology involved.  In particular a melanosome protein called Pmel17 adopts an amyloid-like structure (so not all amyloid is bad !) for the construction of melanin.  The crucial enzyme oxidizing tyrosine is tyrosinase, and its activity strongly depends on pH, being most active at pH 7 (neutral pH).

In the melanosome membrane is found TPC2, which helps control ion flow in and out of the melanosome.  Two mutations Methionine #484 –> Leucine (or M484L) and Glycine #734 –> Glutamic acid (G734E) are associated with a shift from brown to blond.  You have blond hair if your melanosomes make less melanin.  Both mutations result in an increase in TPC2 activity resulting in lower pH, lower tyrosinase activity and less melanin in the melanosome — voila — a blond.

So it doesn’t take a big (one amino acid in over 734) change in the huge TCP2 protein for the shift to occur.

Abeta42 at last

It’s easy to see why cryoEM got the latest chemistry Nobel.  It is telling us so much.  Particularly fascinating to me as a retired neurologist is the structure of the Abeta42 fibril reported in last Friday’s Science (vol. 358 pp. 116 – 119 ’17).  

Caveats first.  The materials were prepared using an aqueous solution at low pH containing an organic cosolvent — so how physiologic could the structure actually be?  It probably is physiologic as the neurotoxicity of the fibrils to neurons in culture was the same as fibrils grown at neutral pH.  This still isn’t the same as fibrils grown in the messy concentrated chemical soup known as the cytoplasm.  Tending to confirm their findings is the fact that NMR and Xray diffraction on the crystals produced the same result.

The fibrils were unbranched and microns long (implying at least 2,000 layers of the beta sheets to be described).  The beta sheets stack in parallel and in register giving the classic crossBeta sheet structure.  They were made of two protofilaments winding around each other.  Each protofilament contains all 42 amino acids of Abeta42 and all of them form a completely flat beta sheet structure.

Feast your eyes on figure 2 p. 117.  In addition to showing the two beta sheets of the two protofilaments, it shows how they bind to each other.  Aspartic acid #1 of one sheet binds to lysine #28 of the other.  Otherwise the interface is quite hydrophobic.  Alanine2 of one sheet binds to alanine42 of the other, valine39 of one sheet binds to valine 39 of the other.  Most importantly isoLeucine 41 of one sheet binds to glycine38 of the other.

This is important since the difference between the less toxic Abeta40 and the toxic Abeta 42 are two hydrophobic amino acids Isoleucine 41 and Alanine 42.  This makes for a tighter, longer, more hydrophobic interface between the protofilaments stabilizing them.

That’s just a guess.  I can’t wait for work on Abeta40 to be reported at this resolution.

A few other points.  The beta sheet of each protomer is quite planar, but the planes of the two protomers are tilted by 10 degrees accounting for the helicity of the fibril. The fibril is a rhombus whose longest edge is about 70 Angstroms.

Even better the structure explains a mutation which is protective against Alzheimer’s.  This remains the strongest evidence (to me at least) that Abeta peptides are significantly involved in Alzheimer’s disease, therapeutic failures based on this idea notwithstanding.  The mutation is a change of alanine2 to threonine which can’t possibly snuggle up hydrophobically to isoleucine nearly as well as alanine did. This should significantly weaken the link between the two protofilaments and make fibril formation more difficult.

The Abeta structure of the paper also explains another mutation. This one increases the risk of Alzheimer’s disease (like many others which have been discovered).  It involves the same amino acid (alanine2) but this time it is changed to the more hydrophobic valine, probably resulting in a stronger hydrophobic interaction with isoLeucine41 (assuming that valine’s greater bulk doesn’t get in the way sterically).

Wonderful stuff to think and speculate about, now that we actually have some solid data to chew on.

The emperor has no clothes

As an old organic chemist, I’ve always been fascinated with size of proteins (n functional groups in a protein of length n — not counting the amide bonds), and the myriad of shapes they can assume.  It seems nothing short of miraculous (to me at least) that the proteins making us up assume just a few shapes out of the nearly 3^n possible shapes (avoiding self intersection removes a few).

This has been ‘explained’ by the potential energy funnel, down which newly formed proteins slide to their final few destinations.  Now I took quantum mechanics 56+ years ago, and back then a lot of heavy lifting was required just to calculate the potential energy surface required to bring two hydrogen atoms together to form molecular hydrogen.

I’ve never seen a potential energy surface for a protein actually calculated, and I’m not sure molecular dynamics simulations do this (please correct me if I’m wrong).

So I was glad to see the following in a paper by

S. WALTER ENGLANDER, Ph.D.

Jacob Gershon-Cohen Professor of Medical Science
Professor of Biochemistry and Biophysics

at my alma mater Penn Med (the hell with the Perelman’s, Penn sold themselves out to the Perelman’s very cheaply).

“A critical feature of the funneled ELT (Energy Landscape Theory) model is that the many-pathway residue-level conformational search must be biased toward native-like interactions. Otherwise, as noted by Levinthal , an unguided random search would require a very long time. How this bias might be implemented in terms of real protein interactions has never been discovered. One simply asserts that natural evolution has made it so, formulates this view as a so-called principle of minimal frustration, and attributes it to the shape of the funneled energy landscape. 

 Proc. Natl. Acad. Sci. vol. 114 pp. 8253 – 8258 ’17.

So the potential energy funnel of energy landscape theory is not something you can calculate explicitly (like a gravitational or an electrical potential), but just a high-falutin’ description of what happens inside our cells, masquerading as an explanation.

So when does a description become an explanation?  Newton famously said Hypotheses non fingo (Latin for “I feign no hypotheses” when discussing the action at a distance which his theory of gravity entailed.

Well it becomes an explanation when you can use the description to predict and define new phenomena — e.g. using Newton’s laws to send a projectile to Jupiter, using Einstein’s theory of gravitation to predict black holes and gravitational waves etc. etc.

In this sense Energy Landscape Theory is just words.  If it wasn’t you could predict the shape an arbitrary string of amino acids would assume (and you can’t).  Theory does work fairly well when folding algorithms are given a protein of known shape (but not published), but try them out on an arbitrary string — which I don’t think has been done.

But it gets worse.  ELT sweeps the problem of why a protein should have one (or a few) shapes under the rug, by assuming that they do.  I’m far from convinced that this is the case in general, which means that the proteins which make us up are quite special.

I’ll conclude with an earlier post on this subject, which basically says that an experiment to decide the issue, while possible in theory is physically impossible to fully perform.

A chemical Gedanken experiment

This post is mostly something I posted on the Skeptical Chymist 2 years ago.  Along with the previous post “Why should a protein have just one shape (or any shape for that matter)” both will be referred to in the next one –“Gentlemen start your motors”, concerning the improbability of the chemistry underlying our existence and whether it is reasonable to believe that it arose by chance.

In the early days of quantum mechanics Einstein and Bohr threw thought experiments (gedanken experiments) at each other like teenagers tossing cherry bombs.  None of the gedanken experiments were regarded as remotely possible back then, although thanks to Bell and Aspect, quantum nonlocality and entanglement now have a solid experimental basis.  To read more about this you can’t do much better than “The Age of Entanglement” by Louisa Gilder.

Frankly, I doubt that most strings of amino acids have a dominant shape (e.g., biological meaning), and even if they did, they couldn’t find it quickly enough (theLevinthal paradox).  For details see the previous post.

How would you prove me wrong? The same way you’d prove a pair of dice was loaded. Just make (using solid-phase protein synthesis a la Merrifield) a bunch of random strings of amino acids (each 41 amino acids long) and see how many have a dominant shape. Any sequence forming a crystal does have a dominant shape, if the sequence doesn’t crystallize, use NMR to look at it in solution. You can’t make all of them, because the earth doesn’t have enough mass to do so (see “https://luysii.wordpress.com/2009/12/20/how-many-proteins-can-be-made-using-the-entire-earth-mass-to-do-so/). That’s why this is a gedanken experiment — it can’t possibly be performed in toto.

Even so, the experiment is over (and I’m wrong) if even 1% of the proteins you make turn out to have a dominant shape.

However, choosing a random string of amino acids is far from trivial. Some amino acids appear more frequently than others depending on the protein. Proteins are definitely not a random collection of amino acids. Consider collagen. In its various forms (there are over 20, coded for by at least 30 distinct genes) collagen accounts for 25% of body protein. Statistically, each of the 20 amino acids should account for 5% of the protein, yet one amino acid (glycine) accounts for 30% and proline another 15%. Even knowing this, the statistical chances of producing 300 copies in a row of glycine–any amino acid–any amino acid by a random distribution of the glycines are less than zilch. But one type of bovine collagen protein has over 300 such copies in its 1042 amino acids.

One further example of the nonrandomness of proteins. If you were picking out a series of letters randomly hoping to form a word, you would not expect a series of 10 ‘a’s to show up. But we normally contain many such proteins, and for some reason too many copies of the repeated amino acid produce some of the neurological diseases I (ineffectually) battled as a physician. Normal people have 11 to 34 glutamines in a row in a huge (molecular mass 384 kiloDaltons — that’s over 3000 amino acids) protein known as huntingtin. In those unfortunate individuals withHuntington’s chorea, the number of repeats expands to over 40. One of Max Perutz’s last papers [Proc. Natl. Acad. Sci. USA 99, 5591–5595 (2002)] tried to figure out why this was so harmful.

On to the actual experiment. Suppose you had made 1,000,000 distinct random sequence proteins containing 41 amino acids and none of them had a dominant shape. This proves/disproves nothing. 10^6 is fewer than the possibilities inherent in a string of 5 amino acids, and you’ve only explored 10^6/(20^41) of the possibilities.

Would Karl Popper, philosopher of science, even allow the question of how commonly proteins have a dominant shape to be called scientific? Much of what I know about Popper comes from a fascinating book “Wittgenstein’s Poker” and it isn’t pleasant. Questions not resolvable by experiment fall outside Popper’s canon of questions scientific. The gedanken experiment described can resolve the question one way, but not the other. In this respect it’s like the halting problem in computer science (there is no general rule to tell if a program will terminate).

Would Ludwig Wittgenstein, uberphilosopher, think the question philosophical? Probably not. His major work “Tractatus Logico-Philosophicus” concludes with “What we cannot speak of we must pass over in silence”. While he’s the uberphilosopher he’s also the antiscientist. It’s exactly what we don’t know which leads to the juiciest speculation and most creative experiments in any field of science. That’s what I loved about organic chemistry years ago (and now). It is nearly always possible to design a molecule from scratch to test an idea. There was no reason to make [7]paracyclophane, other than to get up close and personal with the ring current.

If the probability or improbability of our existence, to which the gedanken experiment speaks, isn’t a philosophical question, what is?

Back then, this post produced the following excellent comment.

I’m not sure your assessment of what Popper would regard as science is accurate. Popper advocated “falsifiability”, i.e. that a statement cannot be proved true, only false. Non-scientific statements are those for which evidence that they are false cannot be found. You are in fact giving a perfect example of a situation where falsifiability is useful. If you tested, as you suggested, a million random proteins and many of them formed structures reliably, this would in fact disprove the hypothesis fairly conclusively (if only probabilistically). The fact that the test was passed by the first million proteins would be evidence that the theory was true (though obviously not concrete).

Also, it is relatively easy to choose what random proteins to make. Just use a random number generator (a pseudorandom generator would do too, probably). It doesn’t matter that they would be unlikely to produce a specific sequence generated in nature, as we are looking at specifically wanting to look at random sequences. The idea that 300 glycines is particularly unusual if protein generation is random is probably one which should be treated with a degree of caution. As the sequence was not specified as an unusual sequence beforehand, there are a large number of possible sequences that you could have seized on, and so care is needed.

This is only the most obvious experiment that could be carried out to test this idea, and I’m sure with advances, there is the distinct chance that more ingenious ways could be devised.

Additionally the the mass restriction is not in fact terribly useful except as an illustration that there is a massively large number of proteins, as once you have made and tested a protein, you can in fact reuse its atoms to make another protein.

Finally, I haven’t read Wittgenstein, but that final quote does not really support your statement that he is “anti-science” or would be against the production of novel cyclophanes. Organic chemistry clearly lies in the realm of “what we can speak”, as we are in fact speaking about it.

Posted by: MCliffe

My response —

MCliffe — thank you for your very thoughtful comments on the post. It’s great to know that someone out there is reading them.

Popper and the logical positivists solved many philosophical problems by declaring them meaningless (which Popper later took to mean not falsifiable). Things got to such a point in the 50s that Bertrand Russell was moved to came up with the meaningful (to most) but non-falsifiable statement — In the event of a nuclear war we shall all be dead.

You are quite right that it is easy to make a random sequence of amino acids using a computer. It’s been shown again and again that our intuitive notion of randomness is usually incorrect. I chose collagen because it is the most common protein in our body, and because it is highly nonrandom. Huntingtin was used because I dealt with its effects as a Neurologist (and because there are 8 more diseases with too many identical amino acids in a row — all of which for some unfathomable reason produce neurologic disease — they are called triplet diseases because it takes 3 nucleotides of DNA to code for a single amino acid).

Even accepting 300 glycines in 1000 or so amino acids (collagen) and putting that frequency into the random generator and turning it on, we would not expect those 300 glycines to appear at position n, position n+4, n+7, . . . , n + 898 randomly.

The idea of using the atoms over and over to escape the mass restriction is clever. Unfortunately it runs up against a time restriction. Let us suppose there is a super-industrious post-doc who can make a new protein every nanosecond (reusing the atoms). There are 60 * 60 * 24 * 365 = 31,536,000 ~ 10^7 seconds in a year and 10^10 years (more or less) since the big bang. This is 10^9 * 10^7 * 10 ^10 = 10^26 different proteins he could make since the dawn of time. But there are 20^41 = 2^41 * 10^41 proteins of length 41 amino acids. 2^41 = 2,199,023,255,552 = 10^12. So he has only tested 10^26 of 10^53 possible 41 amino acid proteins in all this time.

This is what I was getting at by saying the the gedanken experiment was not a priori falsifiable — we lack the time, space and mass to run it to completion. As you note, it could well end quite early if I’m wrong. Suppose 10^9/10^53 of the proteins DO have a dominant shape — the postdoc will be very unlikely to find any of them.

I think your final point is well taken. My reading of “Wittgenstein’s Poker” is that what he was saying in his last sentence really was “What we cannot speak (with certainty) of we must pass over in silence”. We cannot speak of the outcome of this Gedanken experiment with any degree of certainty.

Once again Thanks