Category Archives: Molecular Biology

The emperor has no clothes

As an old organic chemist, I’ve always been fascinated with size of proteins (n functional groups in a protein of length n — not counting the amide bonds), and the myriad of shapes they can assume.  It seems nothing short of miraculous (to me at least) that the proteins making us up assume just a few shapes out of the nearly 3^n possible shapes (avoiding self intersection removes a few).

This has been ‘explained’ by the potential energy funnel, down which newly formed proteins slide to their final few destinations.  Now I took quantum mechanics 56+ years ago, and back then a lot of heavy lifting was required just to calculate the potential energy surface required to bring two hydrogen atoms together to form molecular hydrogen.

I’ve never seen a potential energy surface for a protein actually calculated, and I’m not sure molecular dynamics simulations do this (please correct me if I’m wrong).

So I was glad to see the following in a paper by

S. WALTER ENGLANDER, Ph.D.

Jacob Gershon-Cohen Professor of Medical Science
Professor of Biochemistry and Biophysics

at my alma mater Penn Med (the hell with the Perelman’s, Penn sold themselves out to the Perelman’s very cheaply).

“A critical feature of the funneled ELT (Energy Landscape Theory) model is that the many-pathway residue-level conformational search must be biased toward native-like interactions. Otherwise, as noted by Levinthal , an unguided random search would require a very long time. How this bias might be implemented in terms of real protein interactions has never been discovered. One simply asserts that natural evolution has made it so, formulates this view as a so-called principle of minimal frustration, and attributes it to the shape of the funneled energy landscape. 

 Proc. Natl. Acad. Sci. vol. 114 pp. 8253 – 8258 ’17.

So the potential energy funnel of energy landscape theory is not something you can calculate explicitly (like a gravitational or an electrical potential), but just a high-falutin’ description of what happens inside our cells, masquerading as an explanation.

So when does a description become an explanation?  Newton famously said Hypotheses non fingo (Latin for “I feign no hypotheses” when discussing the action at a distance which his theory of gravity entailed.

Well it becomes an explanation when you can use the description to predict and define new phenomena — e.g. using Newton’s laws to send a projectile to Jupiter, using Einstein’s theory of gravitation to predict black holes and gravitational waves etc. etc.

In this sense Energy Landscape Theory is just words.  If it wasn’t you could predict the shape an arbitrary string of amino acids would assume (and you can’t).  Theory does work fairly well when folding algorithms are given a protein of known shape (but not published), but try them out on an arbitrary string — which I don’t think has been done.

But it gets worse.  ELT sweeps the problem of why a protein should have one (or a few) shapes under the rug, by assuming that they do.  I’m far from convinced that this is the case in general, which means that the proteins which make us up are quite special.

I’ll conclude with an earlier post on this subject, which basically says that an experiment to decide the issue, while possible in theory is physically impossible to fully perform.

A chemical Gedanken experiment

This post is mostly something I posted on the Skeptical Chymist 2 years ago.  Along with the previous post “Why should a protein have just one shape (or any shape for that matter)” both will be referred to in the next one –“Gentlemen start your motors”, concerning the improbability of the chemistry underlying our existence and whether it is reasonable to believe that it arose by chance.

In the early days of quantum mechanics Einstein and Bohr threw thought experiments (gedanken experiments) at each other like teenagers tossing cherry bombs.  None of the gedanken experiments were regarded as remotely possible back then, although thanks to Bell and Aspect, quantum nonlocality and entanglement now have a solid experimental basis.  To read more about this you can’t do much better than “The Age of Entanglement” by Louisa Gilder.

Frankly, I doubt that most strings of amino acids have a dominant shape (e.g., biological meaning), and even if they did, they couldn’t find it quickly enough (theLevinthal paradox).  For details see the previous post.

How would you prove me wrong? The same way you’d prove a pair of dice was loaded. Just make (using solid-phase protein synthesis a la Merrifield) a bunch of random strings of amino acids (each 41 amino acids long) and see how many have a dominant shape. Any sequence forming a crystal does have a dominant shape, if the sequence doesn’t crystallize, use NMR to look at it in solution. You can’t make all of them, because the earth doesn’t have enough mass to do so (see “https://luysii.wordpress.com/2009/12/20/how-many-proteins-can-be-made-using-the-entire-earth-mass-to-do-so/). That’s why this is a gedanken experiment — it can’t possibly be performed in toto.

Even so, the experiment is over (and I’m wrong) if even 1% of the proteins you make turn out to have a dominant shape.

However, choosing a random string of amino acids is far from trivial. Some amino acids appear more frequently than others depending on the protein. Proteins are definitely not a random collection of amino acids. Consider collagen. In its various forms (there are over 20, coded for by at least 30 distinct genes) collagen accounts for 25% of body protein. Statistically, each of the 20 amino acids should account for 5% of the protein, yet one amino acid (glycine) accounts for 30% and proline another 15%. Even knowing this, the statistical chances of producing 300 copies in a row of glycine–any amino acid–any amino acid by a random distribution of the glycines are less than zilch. But one type of bovine collagen protein has over 300 such copies in its 1042 amino acids.

One further example of the nonrandomness of proteins. If you were picking out a series of letters randomly hoping to form a word, you would not expect a series of 10 ‘a’s to show up. But we normally contain many such proteins, and for some reason too many copies of the repeated amino acid produce some of the neurological diseases I (ineffectually) battled as a physician. Normal people have 11 to 34 glutamines in a row in a huge (molecular mass 384 kiloDaltons — that’s over 3000 amino acids) protein known as huntingtin. In those unfortunate individuals withHuntington’s chorea, the number of repeats expands to over 40. One of Max Perutz’s last papers [Proc. Natl. Acad. Sci. USA 99, 5591–5595 (2002)] tried to figure out why this was so harmful.

On to the actual experiment. Suppose you had made 1,000,000 distinct random sequence proteins containing 41 amino acids and none of them had a dominant shape. This proves/disproves nothing. 10^6 is fewer than the possibilities inherent in a string of 5 amino acids, and you’ve only explored 10^6/(20^41) of the possibilities.

Would Karl Popper, philosopher of science, even allow the question of how commonly proteins have a dominant shape to be called scientific? Much of what I know about Popper comes from a fascinating book “Wittgenstein’s Poker” and it isn’t pleasant. Questions not resolvable by experiment fall outside Popper’s canon of questions scientific. The gedanken experiment described can resolve the question one way, but not the other. In this respect it’s like the halting problem in computer science (there is no general rule to tell if a program will terminate).

Would Ludwig Wittgenstein, uberphilosopher, think the question philosophical? Probably not. His major work “Tractatus Logico-Philosophicus” concludes with “What we cannot speak of we must pass over in silence”. While he’s the uberphilosopher he’s also the antiscientist. It’s exactly what we don’t know which leads to the juiciest speculation and most creative experiments in any field of science. That’s what I loved about organic chemistry years ago (and now). It is nearly always possible to design a molecule from scratch to test an idea. There was no reason to make [7]paracyclophane, other than to get up close and personal with the ring current.

If the probability or improbability of our existence, to which the gedanken experiment speaks, isn’t a philosophical question, what is?

Back then, this post produced the following excellent comment.

I’m not sure your assessment of what Popper would regard as science is accurate. Popper advocated “falsifiability”, i.e. that a statement cannot be proved true, only false. Non-scientific statements are those for which evidence that they are false cannot be found. You are in fact giving a perfect example of a situation where falsifiability is useful. If you tested, as you suggested, a million random proteins and many of them formed structures reliably, this would in fact disprove the hypothesis fairly conclusively (if only probabilistically). The fact that the test was passed by the first million proteins would be evidence that the theory was true (though obviously not concrete).

Also, it is relatively easy to choose what random proteins to make. Just use a random number generator (a pseudorandom generator would do too, probably). It doesn’t matter that they would be unlikely to produce a specific sequence generated in nature, as we are looking at specifically wanting to look at random sequences. The idea that 300 glycines is particularly unusual if protein generation is random is probably one which should be treated with a degree of caution. As the sequence was not specified as an unusual sequence beforehand, there are a large number of possible sequences that you could have seized on, and so care is needed.

This is only the most obvious experiment that could be carried out to test this idea, and I’m sure with advances, there is the distinct chance that more ingenious ways could be devised.

Additionally the the mass restriction is not in fact terribly useful except as an illustration that there is a massively large number of proteins, as once you have made and tested a protein, you can in fact reuse its atoms to make another protein.

Finally, I haven’t read Wittgenstein, but that final quote does not really support your statement that he is “anti-science” or would be against the production of novel cyclophanes. Organic chemistry clearly lies in the realm of “what we can speak”, as we are in fact speaking about it.

Posted by: MCliffe

My response —

MCliffe — thank you for your very thoughtful comments on the post. It’s great to know that someone out there is reading them.

Popper and the logical positivists solved many philosophical problems by declaring them meaningless (which Popper later took to mean not falsifiable). Things got to such a point in the 50s that Bertrand Russell was moved to came up with the meaningful (to most) but non-falsifiable statement — In the event of a nuclear war we shall all be dead.

You are quite right that it is easy to make a random sequence of amino acids using a computer. It’s been shown again and again that our intuitive notion of randomness is usually incorrect. I chose collagen because it is the most common protein in our body, and because it is highly nonrandom. Huntingtin was used because I dealt with its effects as a Neurologist (and because there are 8 more diseases with too many identical amino acids in a row — all of which for some unfathomable reason produce neurologic disease — they are called triplet diseases because it takes 3 nucleotides of DNA to code for a single amino acid).

Even accepting 300 glycines in 1000 or so amino acids (collagen) and putting that frequency into the random generator and turning it on, we would not expect those 300 glycines to appear at position n, position n+4, n+7, . . . , n + 898 randomly.

The idea of using the atoms over and over to escape the mass restriction is clever. Unfortunately it runs up against a time restriction. Let us suppose there is a super-industrious post-doc who can make a new protein every nanosecond (reusing the atoms). There are 60 * 60 * 24 * 365 = 31,536,000 ~ 10^7 seconds in a year and 10^10 years (more or less) since the big bang. This is 10^9 * 10^7 * 10 ^10 = 10^26 different proteins he could make since the dawn of time. But there are 20^41 = 2^41 * 10^41 proteins of length 41 amino acids. 2^41 = 2,199,023,255,552 = 10^12. So he has only tested 10^26 of 10^53 possible 41 amino acid proteins in all this time.

This is what I was getting at by saying the the gedanken experiment was not a priori falsifiable — we lack the time, space and mass to run it to completion. As you note, it could well end quite early if I’m wrong. Suppose 10^9/10^53 of the proteins DO have a dominant shape — the postdoc will be very unlikely to find any of them.

I think your final point is well taken. My reading of “Wittgenstein’s Poker” is that what he was saying in his last sentence really was “What we cannot speak (with certainty) of we must pass over in silence”. We cannot speak of the outcome of this Gedanken experiment with any degree of certainty.

Once again Thanks

What our DNA looks like inside a living cell

Time to rewrite the textbooks.  DNA in the living cell looks nothing like the pictures that have appeared in textbooks for years. Gone are the 30 nanoMeter fiber and higher order structures.

Here is the old consensus of how DNA in the nucleus is organized.

There are two different structural models of the 30 nanoMeter fiber (1) solenoid — diameter 33 nanoMeters with 6 nucleosomes ever 11 nanoMeters along the axis (2) two start zigzag fiber — diameter 27 – 30 nanoMeters with 5 – 6 nucleosomes every 11 nanoMeters.
The 30 nanoMeter fiber is throught to assemble into helically folded 120 nanoMeter chromonema, 300  – 700 nanoMeter chromitids and mitotic chromosomes (1,400 nanoMeters).     The chromonema structures 9measured between 100 and 130 nanoMeters) are based on electron micrographic studies of permeabilized nuclei from which other components have been extracted with detergenes and high salt to visualize chromatin — hardly physiologic.

Got all that?  Good, now forget it.  It’s wrong.

First off, forget nanoMeters.  Organic chemists think in Angstroms — the diameter of the smallest atom Hydrogen is almost exactly 1 Angstrom, making it the perfect organic chemical yardstick.  If you must think in nanoMeters, just divide the number of Angstroms by 10.

First, a few numbers to get started.
 The classic form of DNA is B-DNA (this is still correct). https://en.wikipedia.org/wiki/Nucleic_acid_double_helix.  Each nucleotide pair is 3.4 Angstroms above the next and there are 10.4 nucleotides per turn of the helix  (so 1 full turn of B DNA is 35.36 Angstroms).  The diameter of B-DNA is 19 Angstroms.

The nucleosome consists of 147 bases of DNA wrapped around a central mass made of 8 histone proteins. The histone octamer is made of two copies each of histones H2A, H2B, H3 and H4.  The core particle in its entirety is 100 Angstroms in diameter and 57 Angstroms along the axis of the disk and possesses nearly dyadic symmetry.  There are 1.65 turns of DNA around the histone octamer, and during the trip there are 14 contact points between histones and DNA.

Now on to the actual paper [ Science vol. 357 pp. 354 – 355, 370, eaag0025 1 –> 13 ‘ 17 ]  The movies contained within alone are worth a year’s subscription to Science.

To visualize DNA in the living cells the authors invented a technique called  Chromatin Electron Microscopy Tomography (ChromEMT).

 DNA is transparent to electrons.  They use a fluorescent  DNA binding dye (Deep Red fluorescing AnthraQuinone Nr.5  ). For a structure see — http://onlinelibrary.wiley.com/doi/10.1002/1097-0320(20000801)40:4%3C280::AID-CYTO4%3E3.0.CO;2-7/full.  It has 3 probably aromatic rings fused together like anthracene, so it could easily intercalate between the bases of the double helix.   Then there are OH groups and amines to bind to the backbone.  The dye gets into cells easily.  Most importantly, DRAQ5 produces reactive oxygen species when hit by the right kind of light.  Somehow they get diaminobenzidine in the cells, which the reactive oxygen species polymerizes to polybenzimidazole.

 We’re not done yet.  The polymer is also transparent to electrons, but it can react with good old Osmium tetroxide (which is electron dense). permitting visualization of DNA on electron microscopy (at last)

  The technique is the first that can be used in living cells.  It shows that most chromatin in the nucleus is mostly organized as a disordered polymer of 50 to 240 Angstroms diameter.   This is consistent with beads on a string (with nucleosomes being the beads).  They found little evidence for higher order structures (the 300 to 1,200 Angstrom fibers of classic textbook models — which is in fact based on in vitro visualization of non-native chromatin. The 30 nanoMeter chromatin fiber (300 Angstroms) is nowhere to be seen.  However, they do find 300 Angstrom fibers using their new  method but only in nuclei purified from hypotonically lysed chicken RBCs treated with MgCl2 (hardly physiologic).

       They were able to make a movie of an electron micrograph in the nucleus using eight tilts of the stage There is more DNA at the nuclear rim (as that’s where the heterochromatin is mostly), but you still see the little 5 – 24  nanoMeter circles (just more of them the closer you get to the nuclear membrane).
      Another movie of a mitotic chromosome shows the same little circles (50 – 240 Angstroms) just packed together more closely.  You just see a lot of them, but there is no obvious bunching of them into higher structures.
     The technique (ChromEMT is amazing in that it allows the ultrastructure of individual chromatin chains, megabase domains and mitotitc chromosomes to be resolved and visualized as a continuum in serial slices.   The found that the 5 – 12 and 12 – 24 chromatin diameters were the same regardless of how heavily the chromatin was compacted.
      The paper is incredible and worth a year’s subscription to Science.  It likely is behind a paywall.
It’s hard to get your mind around the amount of compaction involved in getting the meter of DNA of the human genome into a nucleus.  Molecular Biology of the Cell 4th Edition p. 198 put it this way —  Compacting the meter of DNA into a 6 micron nucleus is like compacting 24 miles of very fine thread into a tennis ball.
I actually wrote a series of posts, trying to put the amount of compaction into human scale.  Here is the first post — follow the links at the end to the others.

The cell nucleus and its DNA on a human scale – I

The nucleus is a very crowded place, filled with DNA, proteins packing up DNA, proteins patching up DNA, proteins opening up DNA to transcribe it etc. Statements like this produce no physical intuition of the sizes of the various players (to me at least).  How do you go from the 1 Angstrom hydrogen atom, the 3.4 Angstrom thickness per nucleotide (base) of DNA, the roughly 20 Angstrom diameter of the DNA double helix, to any intuition of what it’s like inside a spherical nucleus with a diameter of 10 microns?

How many bases are in the human genome?  It depends on who you read — but 3 billion (3 * 10^9) is a lowball estimate — Wikipedia has 3.08, some sources have 3.4 billion.  3 billion is a nice round number.  How physically long is the genome?  Put the DNA into the form seen in most textbooks — e.g. the double helix.  Well, an Angstrom is one ten billionth (10^-10) of a meter, and multiplying it out we get

3 * 10^9 (bases/genome) * 3.4 * 10^-10 (meters/base) = 1 (meter).

The diameter of a typical nucleus is 10 microns (10 one millionths of a meter == 10 * 10^-6 = 10^-5 meter.   So we’ve got fit the textbook picture of our genome into something 1/100,000 smaller. We’ll definitely have to bend it like Beckham.

As a chemist I think in Angstroms, as a biologist in microns and millimeters, but as an American I think in feet and inches.  To make this stuff comprehensible, think of driving from New York City to Seattle.  It’s 2840 miles or 14,995,200 feet (according to one source on the internet). Now we’re getting somewhere.  I know what a foot is, and I’ve driven most of those miles at one time or other.  Call it 15 million feet, and pack this length down by a factor of 100,000.  It’s 150 feet, half the size of a (US) football field.

Next, consider how thick DNA is relative to its length.  20 Angstroms is 20 * 10^-10 meters or 2 nanoMeters (2 * 10^-9 meters), so our DNA is 500 million times longer than it is thick.  What is 1/500,000,000 of 15,000,000 feet?  Well, it’s 3% of a foot which is .36  of an inch, very close to 3/8 of an inch.   At least in my refrigerator that’s a pair of cooked linguini twisted around each other (the double helix in edible form).  The twisting is pretty tight, a complete turn of the two strands every 35.36 angstroms, or about 1 complete turn every 1.5 thicknesses, more reminiscent of fusilli than linguini, but fusilli is too thick.  Well, no analogy is perfect.  If it were, it would be a description.   One more thing before moving on.

How thinly should the linguini be sliced to split it apart into the constituent bases?  There are roughly 6 bases/thickness, and since the thickness is 3/8 of an inch, about 1/16 of an inch.  So relative to driving from NYC to Seattle, just throw a base out the window every 1/16th of an inch, and you’ll be up to 3 billion before you know it.

You’ve been so good following to this point that you get tickets for 50 yardline seats in the superdome.  You’re sitting far enough back so that you’re 75 feet above the field, placing you right at the equator of our 150 foot sphere. The north and south poles of the sphere are over the 50 yard line. halfway between the two sides.  You are about to the watch the grounds crew pump 15,000,000 feet of linguini into the sphere. Will it burst?  We know it won’t (or we wouldn’t exist).  But how much of the sphere will the linguini take up?

The volume of any sphere is 4/3 * pi * radius^3.  So the volume of our sphere of 10 microns diameter is 4/3 * 3.14 * 5 * 5 * 5 *  = 523 cubic microns. There are 10^18 cubic microns in a meter.  So our spherical nucleus has a volume of 523 * 10^-18 cubic meters.  What is the volume of the DNA cylinder? Its radius is 10 Angstroms or 1 nanoMeter.  So its volume is 1 meter (length of the stretched out DNA) * pi * 10^-9 * 10^-9 meters 3.14 * 10^-18 cubic meters (or 3.14 cubic microns == 3.14 * 10^-6 * 10^-6 * 10^-6

Even though it’s 15,000,000 feet long, the volume of the linguini is only 3.14/523 of the sphere.  Plenty of room for the grounds crew who begin reeling it in at 60 miles an hour.  Since they have 2840 miles of the stuff to reel in, we’ll have to come back in a few days to watch the show.  While we’re waiting, we might think of how anything can be accurately located in 2840 miles of linguini in a 150 foot sphere.

Here’s a link to the next paper in the series

https://luysii.wordpress.com/2010/03/23/the-cell-and-its-nucleus-on-a-human-scale-ii/

A possible new player

Drug development is very hard because we don’t know all the players inside the cell. A recent paper describes an entirely new class of player — circular DNA derived from an ancient virus.  The authoress is Laura Manuelidis, who would have been a med school classmate had I chosen to go to Yale med instead of Penn.   She is the last scientist standing who doesn’t believe Prusiner’s prion hypothesis.  She didn’t marry the boss’s daughter being female, so she married the boss instead;  Elias Manuelidis a Yale neuropathologist who would be 99 today had he not passed away at 72 in 1992.

The circular DNAs go by the name of SPHINX  an acronym  for  Slow Progressive Hidden INfections of X origin.  They have no sequences in common with bacterial or eukaryotic DNA, but there some homology to a virus infecting Acinebacter, a wound pathogen common in soil and water.

How did she find them?  By doggedly pursuing the idea the neurodegenerative diseases such as Cruetzfeldt Jakob Disease (CJD) and scrapie were due to an infectious agent triggering aggregation of the prion protein.

As she says:  “The cytoplasm of CJD and scrapie-infected cells, but not control cells, also contains virus-like particle arrays and because we were able to isolate these nuclease-protected particles with quantitative recovery of infectivity, but with little or no detectable PrP (Prion Protein), we began to analyze protected nucleic acids. Using Φ29 rolling circle amplification, several circular DNA sequences of <5 kb (kilobases) with ORFs (Open Reading Frames) were thereby discovered in brain and cultured neuronal cell lines. These circular DNA sequences were named SPHINX elements for their initial association with slow progressive hidden infections of X origin."

SPHINX itself codes for a 324 amino acid protein, which is found in human brain, concentrated in synaptic boutons.  Strangely, even though the DNAs are presumably viral derived, they contain intervening sequences which don't code for protein.

The use of rolling circle amplification is quite clever, as it will copy only circular DNA.

Stanley Prusiner is sure to weigh in.  Remarkably, Prusiner was at Penn Med when I was and was even in my med school fraternity (Nu Sigma Nu)  primarily a place to eat lunch and dinner.  I probably ate with him, but have no recollection of him whatsoever.

Circular DNAs outside chromosomes are called plasmids. Bacteria are full of them. The best known eukaryote containing plasmids is yeast. Perhaps we have them as well. Manuelidis may be the first person to look.

Should you take aspirin after you exercise?

I just got back from a beautiful four and a half mile walk around a reservoir behind my house.  I always take 2 adult aspirin after such things like this.  A recent paper implies that perhaps I should not [ Proc. Natl. Acad. Sci. vol. 114 pp. 6675 – 6684 ’17 ].  Here’s why.

Muscle has a set of stem cells all its own.  They are called satellite cells.  After injury they proliferate and make new muscle. One of the triggers for this is a prostaglandin known as PGE2 — https://en.wikipedia.org/wiki/Prostaglandin_E2 — clearly a delightful structure for the organic chemist to make.  It binds to a receptor on the satellite cell (called EP4R) following which all sorts of things happen, which will make sense to you if you know some cellular biochemistry.  Activation of EP4R triggers activation of the cyclic AMP (CAMP) phosphoCREB pathway.  This activates Nurr1, a transcription factor which causes cellular proliferation.

Why no aspirin? Because it inhibits cyclo-oxygenase which forms the 5 membered ring of PGE2.

I think you should still aspirin afterwards, as the injury produced in the paper was pretty severe — muscle toxins, cold injury etc. etc. Probably the weekend warriors among you don’t damage your muscles that much.

A few further points about aspirin and the NSAIDs

Now aspirin is an NSAID (NonSteroid AntiInflammatory Drug) — along with a zillion others (advil, anaprox, ansaid, clinoril, daypro, dolobid, feldene, indocin — etc. etc. a whole alphabet’s worth). It is rather different in that it has an acetyl group on the benzene ring.  Could it be an acetylating agent for things like histones and transcription factors, producing far more widespread effects than those attributable to cyclo-oxygenase inhibition.   I’ve looked at the structures of a few of them — some have CH2-COOH moieties in them, which might be metabolized to an acetyl group –doubt.  Naproxen (Anaprox, Naprosyn) does have an acetyl group — but the other 13 structures I looked at do not.

Another possible negative of aspirin after exercise, is the fact that inhibition of platelet cyclo-oxygenase makes it harder for them to stick together and form clots (this is why it is used to prevent heart attack and stroke). So aspirin might result in more extensive micro-hemorrhages in muscle after exercise (if such things exist).

Gotterdamerung — The Twilight of the GWAS

Life may be like a well, but cellular biochemistry and gene function is like a mattress.  Push on it anywhere and everything changes, because it’s all hooked together.  That’s the only conclusion possible if a review of genome wide association studies (GWAS) is correct [ Cell vol. 169 pp. 1177 – 1186 ’17 ].

 It’s been a scandal for years that GWAS studies as they grow larger and larger are still missing large amounts of the heritability of known very heritable conditions (e.g. schizophrenia, height).  It’s been called the dark matter of the genome (e.g. we know it’s there, but we don’t know what it is).

If you’re a little shaky about how GWAS works have a look at https://luysii.wordpress.com/2014/08/24/tolstoy-rides-again-schizophrenia/ — it will come up again later in this post.

We do know that less than 10% of the SNPs found by GWAS lie in protein coding genes — this means either that they are randomly distributed, or that they are in regions controlling gene expression.  Arguing for randomness — the review states that the heritability contributed by each chromosome tends to be closely proportional to chromosome length.  Schizophrenia is known to be quite heritable, and monozygotic twins have a concordance rate of 40%.  Yet an amazing study (which is quoted but which I have not read) estimates that nearly 100% of all 1 megabase windows in the human genome contribute to schizophrenia heritability (Nature Genet. vol. 47 pp. 1385 – 1392 ’15). Given the 3.2 gigaBase size of our genome that’s 3,200 loci.

Another example is the GIANT study about the heritability of height.  The study was based on 250,000 people and some 697 gene wide significant loci were found.  In aggregate they explain a mere SIXTEEN PERCENT.

So what is going on?

It gets back to the link posted earlier. The title —  “Tolstoy rides again”  isn’t a joke.  It refers to the opening sentence of Anna Karenina — “Happy families are all alike; every unhappy family is unhappy in its own way”.  So there are many routes to schizophrenia (and they are spread all over the genome).

The authors of the review think that larger and larger GWAS studies (some are planned with over a million participants) are not going to help and are probably a waste of money.  Whether the review is Gotterdamerung for GWAS isn’t clear, but the review is provocative.The review is new and it will be interesting to see the response by the GWAS people.

So what do they think is going on?  Namely that everything in organismal and cellular biochemistry, genetics and physiology is related to everything else.  Push on it in one place and like a box spring mattress, everything changes.  The SNPs found outside the DNA coding for proteins are probably changing the control of protein synthesis of all the genes.

The dark matter of the genome is ‘the plan’ which makes the difference between animate and inanimate matter.   For more on this please see — https://luysii.wordpress.com/2015/12/15/it-aint-the-bricks-its-the-plan-take-ii/

Fascinating and enjoyable to be alive at such a time in genetics, biochemistry and molecular biology.

Life at 250 Atmospheres pressure 1.8 tons/square inch

Tube worms (actually a form of mollusc) live on the depths of the ocean floor where there is almost no light, and very little oxygen. Just as plants use light energy to remove electrons from water to form oxygen and fix carbon, passing the stolen electrons back to oxygen taxing it though intermediary metabolism, symbiotic bacteria living in the worms remove electrons from hydrogen sulfide (H2S) formed by the hydrothermal vents on the seafloor. . How did the tube worms get this far down? By riding decaying wood down there. [Proc. Natl. Acad. Sci. vol. 114 pp. E3652 – E3658 ’17 ] This is the wooden-steps hypothesis [Distel DL, et al. (2000) Nature 403:725–726] which states that the large chemosynthetic mussels (ship worms) found at deep-sea hydrothermal vents descend from much smaller species associated with sunken wood and other organic deposits, and that the endosymbionts of these progenitors made use of hydrogen sulfide from biogenic sources (e.g., decaying wood) rather than from vent fluids.

At 2500 meters down the water pressure is 3750 pounds per square inch. One can only imagine the changes required in the amino acid sequences of their proteins required so they aren’t denatured or aggregated by such pressure.

The idea that life on planetary moons with subsurface oceans (Ganymede, Europa, Titan, Enceladus) could exist is no longer as fantastic as it initially seemed.

If it be found the implications for our conception of our place in the natural world are enormous.

Why wasn’t this mentioned in Genesis or any known creation myth? Assume for the moment that there actually is a creator who made itself known to our ancestors. If it tried to give Abraham, Gautama Budda, Mohammed et. etc. knowledge of these things, it wouldn’t have been believed. Planets? Planets with moons? Please. A few miracles here and there would be all that would be needed.

Remember entropy?

Organic chemists have a far better intuitive feel for entropy than most chemists. Condensations such as the Diels Alder reaction decrease it, as does ring closure. However, when you get to small ligands binding proteins, everything seems to be about enthalpy. Although binding energy is always talked about, mentally it appears to be enthalpy (H) rather than Gibbs free energy (F).

A recent fascinating editorial and paper [ Proc. Natl. Acad. Sci. vol. 114 pp. 4278 – 4280, 4424 – 4429 ’17 ]shows how the evolution has used entropy to determine when a protein (CzrA) binds to DNA and when it doesn’t. As usual, advances in technology permit us to see this (e.g. multidimensional heteronuclear nuclear magnetic resonance). This allows us to determine the motion of side chains (methyl groups), backbones etc. etc. When CzrA binds to DNA methyl side chains on the protein move more, increasing entropy (deltaS) and as well all know the Gibbs free energy of reaction (deltaF) isn’t just enthalpy (deltaH) but deltaH – TdeltaS, so an increase in deltaS pushes deltaF lower meaning the reaction proceeds in that direction.

Binding of Zinc redistributes these side chain motion so that entropy decreases, and the protein moves off DNA. The authors call this dynamics driven allostery. The fascinating thing, is that this may happen without any conformational change of CzrA.

I’m not sure that molecular dynamics simulations are good enough to pick this up. Fortunately newer NMR techniques can measure it. Just another complication for the hapless drug chemist thinking about protein ligand interactions.

The incredible combinatorial complexity of cellular biochemistry

K8, K14, K20, T92, P125, S129, S137, Y176, T195, K276, T305, T308, T312, P313, T315, T326, S378, T450, S473, S477, S479. No, this is not some game of cosmic bingo. They represent amino acid positions in Protein Kinase B (AKT).

In the 1 letter amino acid code K is lysine T, threonine, S serine, P proline, Y tyrosine.

All 21 amino acids are modified (or not) one of them in 3 ways. This gives 4 * 2^20 = 4,194,304 possible post-translational modifications. Will we study all of them? It’s pretty easy to substitute alanine for serine or threonine making an unmodifiable position, or to substitute aspartic acid for threonine or serine making a phosphorylation mimic which is pretty close to phosphoserine or phosphothreonine, creating even more possibilities for study.

Most of the serines, threonines, tyrosines listed are phosphorylated, but two of the threonines are Nacetyl glucosylated. The two prolines are hydroxylated in the ring. The lysines can be methylated, acetylated, ubiquitinated, sumoylated. I did take the trouble to count the number of serines in the complete amino acid sequence and there are 24, of which only 6 are phosphorylated — so the phosphorylation pattern is likely to be specific and selected for. Too lazy do the same for lysine, threonine, tyrosine and proline. Here’s a link to the full sequence if you want to do it — http://www.uniprot.org/uniprot/P31749

The phosphorylations at each serine/threonine/tyrosine are carried out by not more than one of the following 8 kinases (CK2, IKKepsilon, ACK1,TBK1, PDK1, GSK3alpha, mTORC2 and CDK2)

AKT contains some 481 amino acids, divided (by humans for the purposes of comprehension) into 4 regions Pleckstrin Homology (#1 – #108), linker (#108 – #152) catalytic –e.g. kinase (#152 – #409),regulatory (#409 – #481).

This is from an excellent review of the functions of AKT in Cell vol. 169 pp. 381 – 3405 ’17. It only takes up the first two pages of the review before the functionality of AKT is even discussed.

This raises the larger issue of the possibility of human minds comprehending cellular biochemistry.

This is just one protein, although a very important one. Do you think we’ll ever be able to conduct enough experiments, to figure out what each modification (along or in combination) does to the many functions of AKT (and there are many)?

Now design a drug to affect one of the actions of AKT (particularly since AKT is the cellular homolog of a viral oncogene). Quite a homework assignment.

Progress has been slow but not for want of trying

Progress in the sense of therapy for Alzheimer’s disease and Glioblastoma multiforme is essentially nonexistent, and we could use better therapy for Parkinsonism. This doesn’t mean that researchers have given up. Far from it. Three papers all in last week’s issue of PNAS came up with new understanding and possibly new therapeutic approaches for all three.

You’ll need some serious molecular biological and cell physiological chops to get through the following.

l. Glioblastoma multiforme — they aren’t living much longer than they were when I started pracice 45 years ago (about 2 years — although of course there are exceptions).

The human ZBTB family of genes consists of 49 members coding for transcription factors. BCL6 is also known as ZBTB27 and is a master regulator of lymph node germinal responses. To execute its transcriptional activity, BCL6 requires homodimerization and formation of a complex with a variety of cofactors including BCL6 corerpressor (BCoR), nuclear receptor corepressor 1 (NCoR) and Silencing Mediator of Retinoic acid and Thyroid hormone receptor (SMRT). BCL6 inhibitors block the interaction between BCL6 and its friends, selectively killing BCL6 addicted cancer cells.

The present paper [ Proc. Natl. Acad. Sci. vol. 114 pp. 3981 – 3986 ’17 ] shows that BCL6 is required for glioblastoma cell viability. One transcriptional target of BCL6 is AXL, a tyrosine kinase. Depletion of AXL also decreases proliferation of glioblastoma cells in vitro and in vivo (in a mouse model of course).

So here are two new lines of attack on a very bad disease.

2. Alzheimer’s disease — the best we can do is slow it down, certainly not improve mental function and not keep mental function from getting worse. ErbB2 is a member of the Epidermal Growth Factor Receptor (EGFR) family. It is tightly associated with neuritic plaques in Alzheimer’s. Ras GTPase activation mediates EGF induced stimulation of gamma secretase to increase the nuclear function of the amyloid precursor protein (APP) intracellular domain (AICD). ErbB2 suppresses the autophagic destruction of AICD, physically dissociating Beclin1 vrom the VPS34/VPS15 complex independently of its kinase activity.

So the following paper [ Proc. Natl. Acad. Sci. vol. 114 pp. E3129 – E3138 ’17 ] Used a compound downregulating ErbB2 function (CL-387,785) in mouse models of Alzheimer’s (which have notoriously NOT led to useful therapy). Levels of AICD declined along with beta amyloid, and the animals appeared smarter (but how smart can a mouse be?).

3.Parkinson’s disease — here we really thought we had a cure back in 1972 when L-DOPA was first released for use in the USA. Some patients looked so good that it was impossible to tell if they had the disease. Unfortunately, the basic problem (death of dopaminergic neurons) continued despite L-DOPA pills supplying what they no longer could.

Nurr1 is a protein which causes the development of dopamine neurons in the embryo. Expression of Nurr1 continues throughout life. Nurr1 appears to be a constitutively active nuclear hormone receptor. Why? Because the place where ligands (such as thyroid hormone, steroid hormones) bind to the protein is closed. A few mutations in the Nurr1 gene have been associated with familial parkinsonism.

Nurr1 functions by forming a heterodimer with the Retinoid X Receptor alpha (RXRalpha), another nuclear hormone receptor, but one which does have an open binding pocket. A compound called BRF110 was shown by the following paper [ Proc. Natl. Acad. Sci. vol. 114 pp. 3795 – 3797, 3999 – 4004 ’17 ] to bind to the ligand pocked of RXRalpha increasing its activity. The net effect is to enhance expression of dopamine neuron specific genes.

More to the point MPP+ is a toxin pretty selective for dopamine neurons (it kills them). BRF110 helps survival against MPP+ (but only if given before toxin administration). This wouldn’t be so bad because something is causing dopamine neurons to die (perhaps its a toxin), so BRF110 may fight the decline in dopamine neuron numbers, rather than treating the symptoms of dopamine deficiency.

So there you have it 3 possible new approaches to therapy for 3 bad disease all in one weeks issue of PNAS. Not easy reading, perhaps, but this is where therapy is going to come from (hopefully soon).

An obvious idea we’ve all missed

In 3+ decades as a clinical neurologist I saw several hundred unfortunate people with primary brain tumors. Not one of them was made of proliferating neurons. Not a single one. Most were tumors derived from glial cells (gliomas, glioblastomas, astrocytomas, oligodendrogliomas) which make up half the cells in the brain. Some came from the coverings of the brain (meningiomas), or the ventricular lining (ependymomas).

A recent paper in Nature (vol. 543 pp.681 – 686 ’17) decided that it might be worthwhile to figure out why some organs rarely if ever develop cancer (brain, heart, skeletal muscle). Obvious isn’t it? But no one did it until now.

Most of these tissues are terminally differentiated (unlike, skin, lung, breast and gut) and don’t undergo cellular division. This means that they don’t have to copy their DNA over and over to replenish old and dying cells, and so they are much less likely to develop mutation.

They also use oxidative phosphorylation (a mitochondrial function) rather than glycolysis to generate energy. So they looked for genes that were upregulated in terminally differentiated muscle (not brain) cells relative to proliferating muscle cell precursors. Not a complicated idea to test once you think of it (but you and I didn’t). They found 5 such, and tested them for their ability to suppress tumor growth. One such (LACTB) decreased the growth rate of a variety of tumor cells in vitro and in vivo (e.g.– when transplanted into immunodeficient animals). Amazingly it seems to have no effect on normal cells.

Showing how little we understand the goings on inside our cells, why don’t you try to guess what LACTB given your (and our) knowledge of cellular biochemistry and physiology.

LACTB changes mitochondrial lipid metabolism, by reducing the rate of decarboxylation of mitochondrial phosphatidyl serine — say what?

Even when you know what LACTB is doing you’d be hard pressed to figure out how this effect slows cancer cell growth (and possibly prevents it from occuring at all).

So given our knowledge we’d have never found LACTB and having found it we still don’t know how it works.