Category Archives: Molecular Biology Survival Guide

Kinetic traps and life

“It is well known that the thermodynamically stable state of proteins in a crowded environment is insoluble fibrils” [ Proc. Natl. Acad. Sci. vol. 119 pp. e2122078119 ’22 ].  However even under ideal conditions the time scale for their formation is hours to days [ Nat. Rev. Mol. Cell Biol. 15, 384–396 (2014) ].  Hopefully it’s even longer (decades) for senile plaques (abeta) neurofibrils (tau) and Lewy bodies (alpha-synuclein) to form.  The fact that equilibrium takes such a long time to reach, allows rapid synthesis and degradation of proteins to avoid their aggregation.  So we live because our proteins are trapped in a less the equilibrium (metastable) state by kinetics — e.g. a kinetic trap.

TDP43 and the anisosome

Neurologists have been interested in TDP43 (Tar Dna binding Protein of 43 kiloDaltons) for a long time. Mutants cause some cases of ALS (Amyotrophic Lateral Sclerosis — Lou Gehrig disease) and FTD (FrontoTemporal Dementia).  Some 50 different mutations in the protein have been found in cases of these two diseases.  Intracellular inclusions containing TDP are found in > 90% of sporadic ALS (no mutations) and 45% of FTD.

TDP43 contains 414 amino acids (as you might expect for a protein with a 43 kiloDalton mass).  There is an amino terminal ubiquitinlike fold, two RNA Recognition Motifs (RRMs) followed by a glycine rich low complexity sequence prion-like domain at the other (carboxy) end.  The disease causing mutations are found in the low complexity sequence. 

A  phase separated structure (the anisosome) never seen before involves  mutant TDP43 [ Science vol. 371 pp. 585, abb4309 pp. 1 –> 15 ’21 ].  It is a phase separated mass with liquid spherical shells and liquid cores.  The shells showed birefringence — evidence of a liquid crystal.  The cores show the HSP70 chaperone bound to TDP43 (which wasn’t binding RNA).

ATP is required to maintain the chaperone activity of HSP70. When ATP levels are reduced, the anisosome is converted into the protein aggregates seen in ALS and FTD.  So the anisosome is a protective mechanism. 

Biology is clearly leading chemistry around by the nose.  No chemist would ever have predicted something like this, or received a grant to mix all this stuff in a test tube not even thinking about stoichiometry and see what happened.  For more details on phase separation please see an old post — https://luysii.wordpress.com/2020/12/20/neuroscience-can-no-longer-ignore-phase-separation/

Here’s some stuff from that post to whet your appetite

Advances in cellular biology have largely come from chemistry.  Think DNA and protein structure, enzyme analysis.  However, cell biology is now beginning to return the favor and instruct chemistry by giving it new objects to study. Think phase transitions in the cell, liquid liquid phase separation, liquid droplets, and many other names (the field is in flux) as chemists begin to explore them.  Unlike most chemical objects, they are big, or they wouldn’t have been visible microscopically, so they contain many, many more molecules than chemists are used to dealing with.

These objects do not have any sort of definite stiochiometry and are made of RNA and the proteins which bind them (and sometimes DNA).  They go by any number of names (processing bodies, stress granules, nuclear speckles, Cajal bodies, Promyelocytic leukemia bodies, germline P granules.  Recent work has shown that DNA may be compacted similarly using the linker histone [ PNAS vol.  115 pp.11964 – 11969 ’18 ]

The objects are defined essentially by looking at them.  By golly they look like liquid drops, and they fuse and separate just like drops of water.  Once this is done they are analyzed chemically to see what’s in them.  I don’t think theory can predict them now, and they were never predicted a priori as far as I know.

No chemist in their right mind would have made them to study.  For one thing they contain tens to hundreds of different molecules.  Imagine trying to get a grant to see what would happen if you threw that many different RNAs and proteins together in varying concentrations.  Physicists have worked for years on phase transitions (but usually with a single molecule — think water).  So have chemists — think crystallization.

Proteins move in and out of these bodies in seconds.  Proteins found in them do have low complexity of amino acids (mostly made of only a few of the 20), and unlike enzymes, their sequences are intrinsically disordered, so forget the key and lock and induced fit concepts for enzymes.

Are they a new form of matter?  Is there any limit to how big they can be?  Are the pathologic precipitates of neurologic disease (neurofibrillary tangles, senile plaques, Lewy bodies) similar.  There certainly are plenty of distinct proteins in the senile plaque, but they don’t look like liquid droplets.

It’s a fascinating field to study.  Although made of organic molecules, there seems to be little for the organic chemist to say, since the interactions aren’t covalent.  Time for physical chemists and polymer chemists to step up to the plate.

 

Duchenne muscular dystrophy — a novel genetic treatment

Could the innumerable genetic defects underlying Duchenne muscular dystrophy all be treated the same way?  Possibly.  Paradoxically, the treatment involves actually making the gene  even worse.

Understanding how and why this might work involves a very deep dive into molecular biology.  You might start by looking at the series of five background articles I wrote — start at https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ and follow the links.

I have a personal interest in Duchenne muscular dystrophy because I ran such a clinic from ’72 to ’87 watching young boys and adolescents die from it.  The major advance during that time, was NOT medical or anything I did, but lighter braces, so the boys could stay ambulatory longer.  Things have improved as survival has improved by a decade so they die in their late 20s.

So lets start.  Duchenne muscular dystrophy is caused by a mutation in the gene coding for dystrophin, a large (3,685 amino acids) protein which ties the contractile apparatus of the muscle cell (actin and myosin) to the cell membrane. Although it isn’t the largest protein we have — titin, another muscle protein with 34,350 amino acids is, the gene for dystrophin is the largest we have, weighing in at 2,220,233 nucleotides.  This is why Duchenne is one of the most common diseases due to a defect in a single gene, the gene is so large that lots of things can (and do) go wrong with it.

The gene comes in 79 pieces (exons) which account for under 1/200 of the nucleotides of the gene.  The rest must be spliced out and discarded.  Have a look at http://www.dmd.nl.  to see what can go wrong — the commonest is deletion of parts of the gene (60 – 70% of cases), followed by duplication of other parts (10% of cases) with the rest being mutations that change one amino acid to another.

Duchenne isn’t like cystic fibrosis where some 600 different mutations in the causative CFTR gene were known by 2003 but with 90% of cases due to just one.  So any genetic treatment for that young boy sitting in front of you had better be personalized to his particular mutation.

Or should it?

Possibly not.  We’ll need to discuss 3 things first

l. Nonsense Mediated Decay (NMD)

2. Nonsense Induced Transcriptional Compensation (NITC).

3. The MDX mouse model of Duchenne muscular dystrophy

Nonsense mediated decay.  Nonsense is a poor term, because the 3 nonSense codons (out of 64 possible) tell the ribosome to stop translating mRNA into protein and drop off the mRNA.  That isn’t nonsense.  I prefer stop codon, or termination codon

An an incredibly clever piece of business tells the ribosome (which is after all an inanimate object) when a stop codon occurs too early in the mRNA when there are a bunch of codons afterwards needed to make up the whole protein.

Lets go back to dystrophin and its 79 exons, and the fact that 99.5% of the gene is made of introns which are spliced out.   Remember the mRNA starts at the 5′ end and ends at the 3′ end.  The ribosome reads and translates it from 5′ to 3′. When an intron is spliced out, a protein complex of several proteins is placed on the mRNA some 20 – 24 basepairs 5′ to the splice site (this happens in the nucleus way before the mRNA gets near a ribosome in the cytoplasm).  The complex is called the Exon Junction Complex (EJC). The ribosome then happily munches along the mRNA from 5′ to 3′ knocking off the EJCs as it moves, until it hits a termination codon and drops off.

Over 95% of  genes do not have introns after the termination codon.  What happens if it does? Well then it is called a premature termination codon (PTC) and there is usually an EJC 3′ (downstream) to it.  If a termination codon is present 50 -55 nucleotides 5′ (upstream) to an EJC then NMD occurs.

Whenever any termination codon is reached, release protein factors (eRF1, eRF3, SMG1) bind to the mRNA.  It there is an EJC around (which there shouldn’t be) the interaction between the two complexes triggers phosphorylation of one of EJC proteins, triggering NMD.

So that’s how NMD happens, when there is a PTC.  Clever no?

Nonsense Induced Transcriptional Compensation (NITC).  I realize that this is a lot to throw at you, but a treatment for Duchenne is worth the effort (not to mention other genetic diseases in which the mechanism to be described also applies).

NITC is something I never heard about until two papers appearing in the 13 April Nature (vol. 568 pp. 179 – 180 (editorial), 193 – 197, 259 – 263).  Ever since we could knock out by placing a PTC early (near the 5′ end) of the gene we’ve been surprised by some of the results –e.g. knocking out some genes thought to be crucial had little or no effect.  Other technologies which didn’t affect the gene, but which decreased the expression of the mRNA (such as RNA interference, aka Post-Transcriptional gene silencing — PTGS) did have big phenotypic effects.

This turns out to be due NITC, which turns out to be due to increased transcription of genes which are ancestrally related to the mutant. Gene.  Hard to believe.

Time to go back to NMD.  It doesn’t break mRNA down nucleotide by nucleotide, but fragments it.  These fragments get into the nucleus, and bind to complementary genomic sequences of the gene containing the PTC, and also to genes ancestrally related to the mutant gene (so they’ll have similar nucleotide sequences). Then epigenetics takes over because the fragments recruit the COMPASS complex which catalyzes the formation of H3K4Me3 which is part of the histone code which helps turn on transcription of the gene.  The sequence similarity of ancestrally related genes, allows them and only them to be turned on by NITC.  Even cleverer than finding a PTC by the ribosome.

Something so incredible needs evidence.  Well heterozygotic zebrafish can bemade to have one normal gene and one with a PTC. What do you think happens?  The normal gene is upregulated (e.g. more is made).  Pretty good.

Finally the Mdx mouse.  I’ve been reading about it for years.  It has a PTC in exon 23 of the dystrophin gene, resulting in a protein only 27% as long as it should be.  All sorts of therapeutic maneuvers have been tried on it.  Now any drug development chemist will tell you that animal models are lousy, but they’re all we’ve got.

The remarkable thing about the mdx mouse, is that they don’t get weak.  They do have muscle pathology.  All the verbiage above probably explains why.

So to treat ALL forms of Duchenne put in a premature termination codon (PTC) in exon #23 of the human gene. It should work as there are  4 dystrophin related proteins scattered around the genome — their names are — utrophin, dystrophin related protein 2 (DRP2), alpha dystrobrevin, and beta dystrobrevin

There is an even better way to look for a place to put a PTC in the dystrophin gene.  Our genomes are filled with errors — for details see — https://luysii.wordpress.com/2018/05/01/how-badly-are-thy-genomes-oh-humanity-take-ii/.

There are lots of very normal people around with supposedly lethal mutations (including PTCs) in their genomes.  Probably scattered about various labs are at least 1,000,000 exome sequences in presumably normal people.  I’m not sure how much clinical information about them is available (other than that they are normal).  Hopeful their sex is.  Look at the dystrophin gene of normal males (females can be perfectly healthy carrying a mutant dystrophin gene as it is found on the X chromosome and they have 2) and see if PTCs are to be found.  You can’t have a better animal model than that.

At over 1,000 words this is the longest post I’ve written, and hopefully the most useful.

The bouillabaisse of the synaptic cleft

The synaptic cleft is so small ( under 400 Angstroms — 40 nanoMeters ) that it can’t be seen with the light microscope ( the smallest wavelength of visible light 3,900 Angstroms — 390 nanoMeters).  This led to a bruising battle between Cajal and Golgi a just over a century ago over whether the brain was actually made of cells.  Even though Golgi’s work led to the delineation of single neurons he thought the brain was a continuous network.  They both won the Nobel in 1906.

Semifast forward to the mid 60s when I was in medical school.  We finally had the electron microscope, so we could see synapses. They showed up as a small CLEAR spaces (e.g. electrons passed through it easily leaving it white) between neurons.  Neurotransmitters were being discovered at the same time and the synapse was to be the analogy to vacuum tubes, which could pass electricity in just one direction (yes, the transistor although invented hadn’t been used to make anything resembling a computer — the Intel 4004 wasn’t until the 70s).  Of course now we know that information flows back and forth across the synapse, with endocannabinoids (e. g. natural marihuana) being the major retrograde neurotransmitter.

Since there didn’t seem to be anything in the synaptic cleft, neurotransmitters were thought to freely diffuse across it to being to receptors on the other (postsynaptic) side e.g. a free fly zone.

Fast forward to the present to a marvelous (and grueling to read because of the complexity of the subject not the way it’s written) review of just what is in the synaptic cleft [ Cell vol. 171 pp. 745 – 769 ’17 ] http://www.cell.com/cell/fulltext/S0092-8674(17)31246-1 (It is likely behind a paywall).  There are over 120 references, and rather than being just a catalogue, the single author Thomas Sudhof extensively discusseswhich experimental work is to be believed (not that Sudhof  is saying the work is fraudulent, but that it can’t be used to extrapolate to the living human brain).  The review is a staggering piece of work for one individual.

The stuff in the synaptic cleft is so diverse, and so intimately involved with itself and the membranes on either side what what is needed for comprehension is not a chemist but a sociologist.  Probably most of the molecules to be discussed are present in such small numbers that the law of mass action doesn’t apply, nor do binding constants which rely on large numbers of ligands and receptors. Not only that, but the binding constants haven’t been been determined for many of the players.

Now for some anatomic detail and numbers.  It is remarkably hard to find just how far laterally the synaptic cleft extends.  Molecular Biology of the Cell ed. 5 p. 1149 has a fairly typical picture with a size marker and it looks to be about 2 microns (20,000 Angstroms, 2,000 nanoMeters) — that’s 314,159,265 square Angstroms (3.14 square microns).  So let’s assume each protein takes up a square 50 Angstroms on a side (2,500 square Angstroms).  That’s room for 125,600 proteins on each side assuming extremely dense packing.  However the density of acetyl choline receptors at the neuromuscular junction is 8,700/square micron, a packing also thought to be extremely dense which would give only 26,100 such proteins in a similarly distributed CNS synapse. So the numbers are at least in the right ball park (meaning they’re within an order of magnitude e.g. within a power of 10) of being correct.

What’s the point?

When you see how many different proteins and different varieties of the same protein reside in the cleft, the numbers for  each individual element is likely to be small, meaning that you can’t use statistical mechanics but must use sociology instead.

The review focuses on the neurExins (I capitalize the E  to help me remember that they are prEsynaptic).  Why?  Because they are the best studied of all the players.  What a piece of work they are.  Humans have 3 genes for them. One of the 3 contains 1,477 amino acids, spread over 1,112,187 basepairs (1.1 megaBases) along with 74 exons.  This means that just over 1/10 of a percent of the gene is actually coding for for the amino acids making it up.  I think it takes energy for RNA polymerase II to stitch the ribonucleotides into the 1.1 megabase pre-mRNA, but I couldn’t (quickly) find out how much per ribonucleotide.  It seems quite wasteful of energy, unless there is some other function to the process which we haven’t figured out yet.

Most of the molecule resides in the synaptic cleft.  There are 6 LNS domains with 3 interspersed EGFlike repeats, a cysteine loop domain, a transmembrane region and a cytoplasmic sequence of 55 amino acids. There are 6 sites for alternative splicing, and because there are two promoters for each of the 3 genes, there is a shorter form (beta neurexin) with less extracellular stuff than the long form (alpha-neurexin).  When all is said and done there are over 1,000 possible variants of the 3 genes.

Unlike olfactory neurons which only express one or two of the nearly 1,000 olfactory receptors, neurons express mutiple isoforms of each, increasing the complexity.

The LNS regions of the neurexins are like immunoglobulins and fill at 60 x 60 x 60 Angstrom box.  Since the synaptic cleft is at most 400 Angstroms long, the alpha -neurexins (if extended) reach all the way across.

Here the neurexins bind to the neuroligins which are always postsynaptic — sorry no mnemonic.  They are simpler in structure, but they are the product of 4 genes, and only about 40 isoforms (due to alternative splicing) are possible. Neuroligns 1, 3 and 4 are found at excitatory synapses, neuroligin 2 is found at inhibitory synapses.  The intracleft part of the neuroligins resembles an important enzyme (acetylcholinesterase) but which is catalytically inactive.  This is where the neurexins.

This is complex enough, but Sudhof notes that the neurexins are hubs interacting with multiple classes of post-synaptic molecules, in addition to the neuroligins — dystroglycan, GABA[A] receptors, calsystenins, latrophilins (of which there are 4).   There are at least 50 post-synaptic cell adhesion molecules — “Few are well understood, although many are described.”

The neurexins have 3 major sites where other things bind, and all sites may be occupied at once.  Just to give you a taste of he complexity involved (before I go on to  larger issues).

The second LNS domain (LNS2)is found only in the alpha-neurexins, and binds to neuroexophilin (of which there are 4) and dystroglycan .

The 6th LNS domain (LNS6) binds to neuroligins, LRRTMs, GABA[A] receptors, cerebellins and latrophilins (of which there are 4)_

The juxtamembrane sequence of the neurexins binds to CA10, CA11 and C1ql.

The cerebellins (of which there are 4) bind to all the neurexins (of a particular splice variety) and interestingly to some postsynaptic glutamic acid receptors.  So there is a direct chain across the synapse from neurexin to cerebellin to ion channel (GLuD1, GLuD2).

There is far more to the review. But here is something I didn’t see there.  People have talked about proton wires — sites on proteins that allow protons to jump from one site to another, and move much faster than they would if they had to bump into everything in solution.  Remember that molecules are moving quite rapidly — water is moving at 590 meters a second at room temperature. Since the synaptic cleft is 40 nanoMeters (40 x 10^-9 meters, it should take only 40 * 10^-9 meters/ 590 meters/second   60 trillionths of a second (60 picoSeconds) to cross, assuming the synapse is a free fly zone — but it isn’t as the review exhaustively shows.

It it possible that the various neurotransmitters at the synapse (glutamic acid, gamma amino butyric acid, etc) bind to the various proteins crossing the cleft to get their target in the postsynaptic membrane (e.g. neurotransmitter wires).  I didn’t see any mention of neurotransmitter binding to  the various proteins in the review.  This may actually be an original idea.

I’d like to put more numbers on many of these things, but they are devilishly hard to find.  Both the neuroligins and neurexins are said to have stalks pushing them out from the membrane, but I can’t find how many amino acids they contain.  It can’t find how much energy it takes to copy the 1.1 megabase neurexin gene in to mRNA (or even how much energy it takes to add one ribonucleotide to an existing mRNA chain).

Another point– proteins have a finite lifetime.  How are they replenished?  We know that there is some synaptic protein synthesis — does the cell body send packages of mRNAs to the synapse to be translated there.  There are at least 50 different proteins mentioned in the review, and don’t forget the thousands of possible isoforms, each of which requires a separate mRNA.

Old Chinese saying — the mountains are high and the emperor is far away. Protein synthesis at the synaptic cleft is probably local.  How what gets made and when is an entirely different problem.

A large part of the review concerns mutations in all these proteins associated with neurologic disease (particularly autism).  This whole area has a long and checkered history.  A high degree of cynicism is needed before believing that any of these mutations are causative.  As a neurologist dealing with epilepsy I saw the whole idea of ion channel mutations causing epilepsy crash and burn — here’s a link — https://luysii.wordpress.com/2011/07/17/we’ve-found-the-mutation-causing-your-disease-not-so-fast-says-this-paper/

Once again, hats off to Dr. Sudhof for what must have been a tremendous amount of work

Is a rational treatment for chronic fatigue syndrome at hand?

If an idea of mine is correct, it is possible that some patients with chronic fatigue syndrome (CFS) can be treated with specific medications based on the results of a few blood tests. This is precision medicine at its finest.  The data to test this idea has already been acquired, and nothing further needs to be done except to analyze it.

Athough the initial impetus for the idea happened only 3 months ago, there have been enough twists and turns that the best way explanation is by a timeline.

First some background:

As a neurologist I saw a lot of people who were chronically tired and fatigued, because neurologists deal with muscle weakness and diseases like myasthenia gravis which are associated with fatigue.  Once I ruled out neuromuscular disease as a cause, I had nothing to offer then (nor did medicine).  Some of these patients were undoubtedly neurotic, but there was little question in my mind that many others had something wrong that medicine just hadn’t figured out yet — not that it hasn’t been trying.

Infections of almost any sort are associated with fatigue, most probably caused by components of the inflammatory response.  Anyone who’s gone through mononucleosis knows this.    The long search for an infectious cause of chronic fatigue syndrome (CFS) has had its ups and downs — particularly downs — see https://luysii.wordpress.com/2011/03/25/evil-scientists-create-virus-causing-chronic-fatigue-syndrome-in-lab/

At worst many people with these symptoms are written off as crazy; at best, diagnosed as depressed  and given antidepressants.  The fact that many of those given antidepressants feel better is far from conclusive, since most patients with chronic illnesses are somewhat depressed.

The 1 June 2017 Cell had a long and interesting review of cellular senescence by Norman Sharpless [ vol. 169 pp. 1000 – 1011 ].  Here is some background about the entity.  If you are familiar with senescent cell biology skip to the paragraph marked **** below

Cells die in a variety of ways.  Some are killed (by infections, heat, toxins).  This is called necrosis. Others voluntarily commit suicide (this is called apoptosis).   Sometimes a cell under stress undergoes cellular senescence, a state in which it doesn’t die, but doesn’t reproduce either.  Such cells have a variety of biochemical characteristics — they are resistant to apoptosis, they express molecules which prevent them from proliferating and — most importantly — they secrete a variety of proinflammatory molecules collectively called the Senescence Associated Secretory Phenotype — SASP).

At first the very existence of the senescent state was questioned, but exist it does.  What is it good for?  Theories abound, one being that mutation is one cause of stress, and stopping mutated cells from proliferating prevents cancer. However, senescent cells are found during fetal life; and they are almost certainly important in wound healing.  They are known to accumulate the older you get and some think they cause aging.

Many stresses induce cellular senescence of which mutation is but one.  The one of interest to us is chemotherapy for cancer, something obviously good as a cancer cell turned senescent has stopped proliferating.   If you know anyone who has undergone chemotherapy, you know that fatigue is almost invariable.

****

One biochemical characteristic of the senescent cell is increased levels of a protein called p16^INK4a, which helps stop cellular proliferation.  While p16^INK4a can easily be measured in tissue biopsies, tissue biopsies are inherently invasive. Fortunately, p16^INK4a can also be measured in circulating blood cells.

What caught my eye in the Cell paper was a reference to a paper about cancer [ Cancer Discov. vol. 7 pp. 165 – 176 ’17 ] by M. Demaria, in which the levels of p16^INK4a correlated with the degree of fatigue after chemotherapy.  The more p16^INK4a in the blood cells the greater the fatigue.

I may have been the only reader of both papers with clinical experience wth chronic fatigue syndrome.  It is extremely difficult to objectively measure a subjective complaint such as fatigue.

As an example of the difficulty in correlating subjective complaints with objective findings, consider the nearly uniform complaint of difficulty thinking in depression, with how such patients actually perform on cognitive tests — e. g. there is  little if any correlation between complaints and actual performance — here’s a current reference — Scientific Reports 7, Article number: 3901(2017) —  doi:10.1038/s41598-017-04353.

If the results of the Cancer paper could be replicated, p16^INK4 would be the first objective measure of a patient’s individual sense of fatigue.

So I wrote both authors, suggesting that the p16^INK4a test be run on a collection of chronic fatigue syndrome (CFS) patients. Both authors replied quickly, but thought the problem would be acquiring patients.  Demaria said that Sharpless had a lab all set up to do the test.

Then fate (in the form of Donald Trump) supervened.  A mere 9 days after the Cell issue appeared, Sharpless was nominated to be the head of the National Cancer Institute by President Trump.  This meant Dr. Sharpless had far bigger fish to fry, and he would have to sever all connection with his lab because of conflict of interest considerations.

I also contacted a patient organization for chronic fatigue syndrome without much success.  Their science advisor never responded.

There matters stood until 22 August when a paper and an editorial about it came out [ Proc. Natl. Acad. Sci. vol. 114 pp. 8914 – 8916, E7150 – E7158 ’17 ].  The paper represented a tremendous amount of data (and work).  The blood levels of 51 cytokines (measures of inflammation) and adipokines (hormones released by fat) were measured in both 192 patients with CFS (which can only be defined by symptoms) and 293 healthy controls matched for age and gender.

In this paper, levels of 17 of the 51 cytokines correlated with severity of CFS. This is a striking similarity with the way the p16^INK4 levels correlated with the degree of fatigue after chemotherapy).  So I looked up the individual elements of the SASP (which can be found in Annu Rev Pathol. 21010; 5: 99–118.)  There are 74 of them. I wondered how many of the 51 cytokines measured in the PNAS paper were in the SASP.  This is trickier than it sounds as many cytokines have far more than one name.  The bottom line is that 20 SASPs are in the 51 cytokines measured in the paper.

If the fatigue of CFS is due to senescent cells and the SASPs  they release, then they should be over-represented in the 17 of the 51 cytokines correlating with symptom severity.  Well they are; 9 out of the 17 are SASP.  However although suggestive, this increase is not statistically significant (according to my consultants on Math Stack Exchange).

After wrote I him about the new work, Dr. Sharpless noted that CFS is almost certainly a heterogeneous condition. As a clinician with decades of experience, I’ve certainly did see some of the more larcenous members of our society who used any subjective diagnosis to be compensated, as well as a variety of individuals who just wanted to withdraw from society, for whatever reason. They are undoubtedly contaminating the sample in the paper. Dr. Sharpless thought the idea, while interesting, would be very difficult to test.

But it wouldn’t at all.  Not with the immense amount of data in the PNAS paper.

Here’s how. Take each of the 9 SASPs and see how their levels correlate with the other 16 (in each of the 192 CSF patients). If they correlate better with SASPs than with nonSASPs, than this would be evidence for senescent cells being the cause some cases of CFS. In particular, patients with a high level of any of the 9 SASPs should be studied for such correlations.  Doing so should weed out some of the heterogeneity of the 192 patients in the sample.

This is why the idea is testable and, even better, falsifiable, making it a scientific hypothesis (a la Karl Popper).  The data to refute it is in the possession of the authors of the paper.

Suppose the idea turns out to be correct and that some patients with CFS are in fact that way because, for whatever reason, they have a lot of senescent cells releasing SASPs.

This would mean that it would be time to start trials of senolyic drugs which destroy senescent cells on the group with elevated SASPs. Fortunately, a few senolytics are currently inc linical use.  This would be precision medicine at its finest.

Being able to alleviate the symptoms of CFS would be worthwhile in itself, but SASP levels could also be run on all sorts of conditions associated with fatigue, most notably infection. This might lead to symptomatic treatment at least.  Having gone through mono in med school, I would have loved to have been able to take something to keep me from falling asleep all the time.

Are you as smart as the (inanimate) blind watchmaker

Here’s a problem the cell has solved. Can you? Figure out a way to send a protein to two different membranes in the cell (the membrane encoding it { aka the plasma membrane }, and the endoplasmic reticulum) in the proportions you wish.

The proteins must have exactly the same sequence and content of amino acids, ruling out alternative splicing of exons in the mRNA (if this is Greek to you have a look at the following post — https://luysii.wordpress.com/2012/01/09/molecular-biology-survival-guide-for-chemists-v-the-ribosome/ and the others collected under — https://luysii.wordpress.com/category/molecular-biology-survival-guide/).

The following article tells you how the cell does it. Recall that not all of the messenger RNA (mRNA) is translated into protein. The ribosome latches on to the 5′ end of the mRNA,  subsequently moving toward the 3′ end until it finds the initiator codon (AUG which codes for methionine). This means that there is a 5′ untranslated region (5′ UTR). It then continues moving 3′ ward stitching amino acids together.  Similarly after the ribosome reaches the last codon (one of 3 stop codons) there is a 3′ untranslated region (3′ UTR) of the mRNA. The 3′ UTR isn’t left alone but is cleaved and a polyAdenine tail added to it. The 3′ UTR is where most microRNAs bind controlling mRNA stability (hence the amount of protein produced from a given mRNA).

The trick used by the cell is described in [ Nature vol. 522 pp. 363 – 367 ’15 ]. The 3’UTR is alternatively processed producing a variety of short and long 3’UTRs. One such protein where this happens is CD47 — which is found on the surface of most cells where it stops the cell from being eaten by scavenger cells such as macrophages. The long 3′ UTR of CD47 allows efficient cell surface expression, while the short 3′ UTR localizes it to the endoplasmic reticulum.

How could this possibly work? Once the protein is translated by the ribosome, it leaves the ribosome and the mRNA doesn’t it? Not quite.

They say that the long 3′ UTR of CD47 acts as a scaffold to recruit a protein complex which contains HuR (aka ELAVL1), an RNA binding protein and SET to the site of translation. The allows interaction of SET with the newly translated cytoplasmic domains of CD47, resulting in subsequent translocation of CD47 to the plasma membrane via activated RAC1.

The short 3′ UTR of CD47 doesn’t have the sequence binding HuR and SET, so CD47 doesn’t get to the plasma membrane, rather to the endoplasmic reticulum.

The mechanism may be quite general as HuR binds to thousands of mRNAs. The paper gives two more examples of proteins where this happens.

It’s also worth noting that all this exquisite control, does NOT involve covalent bond formation and breakage (e.g. not what we consider classic chemical reactions). Instead it’s the dance of one large molecular object binding to another in other ways. The classic chemist isn’t smiling. The physical chemist is.

The Bach Fugue of the Genome

There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy.
– Hamlet (1.5.167-8), Hamlet to Horatio

Just when you thought we’d figured out what genomes could do, the virusoid of rice yellow mottle virus performs a feat of dense coding I’d have thought impossible. The following work requires a fairly sophisticated understanding of molecular biology which the articles in “Molecular Biology Survival Guide for Chemists” might provide the background. Give it a shot. This is fascinating stuff. If the following seems incomprehensible, start with –https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ and then follow the links forward.

Virusoids are single stranded circular RNAs which are dependent on a virus for replication. They are distinct from viroids because viroids need nothing else to replicate. Neither the virusoid or the viroid were thought to code for protein (until now). They are usually found inside the protein shells of plant viruses.

[ Proc. Natl. Acad. Sci. vol. 111 pp. 14542 – 14547 ’14 ] Viroids and virusoids (viroid like satellite RNAs) are small (220 – 450 nucleotide) covalently closed circular RNAs. They are the smallest known replicating circular RNA pathogens. They replicate via a rolling circle mechanism to produce larger concatemers which are then processed into monomeric forms by a self-splicing hammerhead ribozyme, or by cellular enzymes.

The rice yellow mottle virus (RYMV) contains a virusoid which is a covalently closed circular RNA of a mere 220 nucleotides. A 16 kiloDalton basic protein is made from it. How can this be? Figure the average molecular mass of an amino acid at 100 Daltons, and 3 codons per amino acid. This means that 220 can code for 73 amino acids at most (e.g. for a 7 – 8 kiloDalton protein).

So far the RYMV virusoid is the only RNA of viroids and virusoids which actually codes for a protein. The virusoid sequence contains an internal ribosome entry site (IRES) of the following form UGAUGA. Intiation starts at the AUG, and since 220 isn’t an integral multiple of 3 (the size of amino acid codons), it continues replicating in another reading frame until it gets to one of the UGAs (termination codons) in UGAUGA or UGAUGA. Termination codons can be ignored (leaky codons) to obtain larger read through proteins. So this virusoid is a circular RNA with no NONcoding sequences which codes for a protein in either 2 or 3 of the 3 possible reading frames. Notice that UGAUGA contains UGA in both of the alternate reading frames ! So it is likely that the same nucleotide is being read 2 or 3 ways. Amazing ! ! !

It isn’t clear what function the virusoid protein performs for the virus when the virus has infected a cell. Perhaps there aren’t any, and the only function of the protein is to help the virusoid continue existence inside the virus.

Talk about information density. The RYMV virusoid is the Bach Fugue of the genome. Bach sometimes inverts the fugue theme, and sometimes plays it backwards (a musical palindrome if you will).

It is unfortunate that more people don’t understand the details of molecular biology so they can appreciate mechanisms of this elegance. Whether you think understanding it is an esthetic experience, is up to you. I do. To me, this resembles the esthetic experience that mathematics offers.

A while back I wrote a post, wondering if the USA was acquiring brains from the MidEast upheavals, the way we did from Europe because of WWII. Here’s the link https://luysii.wordpress.com/2014/09/28/maryam-mirzakhani/.

Clearly Canada has done just that. Here are the authors of the PNAS paper above and their affiliations. Way to go Canada !

Mounir Georges AbouHaidar
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Srividhya Venkataraman
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Ashkan Golshani
bBiology Department, Carleton University, Ottawa, ON, Canada K1S 5B6
Bolin Liu
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Tauqeer Ahmad
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and

A primer on prions

Actually Kurt Vonnegut came up with the basic idea behind prions in his 1963 Novel “Cat’s Cradle”. Instead of proteins, it involved a form of water (Ice-9) which had never been seen before, but one which was solid at room temperature. Unfortunately, it also solidified all liquid water it came in contact with effectively ending life on earth.

Now for some history.

The first Xray crystallographic structures of proteins were incredibly seductive intellectually, much as false color functional magnetic resonance (fMRI) images are today. It was hard not to think of them as the structure of the protein.

Nowaday we know that lots of proteins have at least one intrinsically disordered (trans. unstructured) segment of 30 amino acids ore more. [ Nature vol. 411 pp. 151 – 153 ’11 ] says 40%, and also that 25% of all human proteins are likely to be disordered (translation; unstructured) from end to end — basic on a bioinformatics program.

I’ve always been amazed that any protein has only a few shapes, purely on the basis of the chemistry — read this if you have the time — https://luysii.wordpress.com/2010/08/04/why-should-a-protein-have-just-one-shape-or-any-shape-for-that-matter/. Clearly the proteins making us up do have a relatively limited number of shapes (or we’d all be dead).

The possible universe of proteins from which our proteins are selected is enormously large. In fact the whole earth doesn’t have enough mass (even if it were made entirely of hydrogen, carbon, nitrogen, oxygen and sulfur) to make just one copy of the 20^100 possible proteins of length 100. For the calculation please see — https://luysii.wordpress.com/2009/12/20/how-many-proteins-can-be-made-using-the-entire-earth-mass-to-do-so/ — if you have the time.

So, even though it is meaningful question philosophically, just how common proteins with a few shapes are in this universe, we’ll never be able to carry out the experiment. Popper would say it’s a scientifically meaningless question, because it can’t be experimentally decided. Bertrand Russell would not.

Again, if you have time, take a look at https://luysii.wordpress.com/2010/08/08/a-chemical-gedanken-experiment/

Which, at long last, brings us to prions.

They were first discovered in yeast, and were extremely hard to figure out as they represented something in the cytoplasm which contained no DNA and yet which was heritable. The first prion was discovered nearly 50 years ago. It was called [PSI+] and it produced a lot of new proteins in yeast containing it (which is how its effects were measured) Mating [ PSI+ ] with [ psi-] (e.g. yeast cells without [ PSI+ ] converted the [ psi-] to [ PSI+ ]. It couldn’t be mapped to any known genetic element. Also [ PSI+ ] was lost at a higher rate than would be expected for a DNA mutation. The first clue that [ PSI+ ] was a protein was that it was lost faster when yeast were grown in the presence of protein denaturants (such as guanidine).

It turned out that [ PSI + ] was an aggregated form of the Sup35 protein, which basically functioned to suppress the ribosome from reading through the stop codon. If you need background on what was just said please see — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ and the subsequent 4 posts. This is why [ PSI+ ] yeast produced longer proteins.Things began to get exciting when Sup35 was dissected so domains could be found which induced [ PSI+ ] formation. Amazingly these domains spontaneously formed visible fibers in vitro resembling amyloid in some respects (binding the dye Congo Red for one). Then they found that preformed fibers, greatly accelerated fiber formation by unpolymerized Sup35 — beginning to sound a bit lice Ice 9 doesn’t it. Yeasts have many other prions, but the best studied and most informative is the one formed from Sup35.

So that’s how prions were found (in yeast) and what they are — an aggregated form of a given protein in a slightly different shape, which can cause another molecule of the same protein to adopt the prion proteins new shape. Amazingly, we have prions within us. But that’s the subject of the next post.

Molecular Biology Survival Guide for Chemists — V: The Ribosome

The ribosome is where the rubber meets the road (in the protein-centric view of the cell).  It is a monstrously large molecular machine 200 – 300 Angstroms in diameter.  Remember that the diameter of the double helix is only 20 Angstroms.   It takes messenger RNA (mRNA) and, using it as a code translates the sequence of nucleotides into a sequence of amino acids (e.g. a protein).  Get a copy of the 16 December ’11 issue of Science, and stare at the cover for a while.  It’s a picture of the eukaryotic (yeast) ribosome in all its glory. The details are to be found [ Science vol. 334 pp. 1524 – 1539 ’11 ].  If you have an issue hanging around. around also look at pp. 1509 – 1510, as some ribosomal background is required before a post on that subject.

The article gives the structure of the Saccharomyces cerevisiae ribosome at 3 Angstroms resolution.  Quite a feat.  It comes in two parts, a large subunit which sediments at 60 Svedberg units, and a ‘small’ one at 40S.

The large subunit contains 3 RNA molecules and 46 proteins, the small one contains 1 RNA and 33 proteins.  Total molecular mass is around 2.5 megadaltons.  It’s maddening, but I can’t seem to find out just how many nucleotides our ribosomal RNAs (rRNAs) contain in toto.  It is well over 5,000 however.   So the number of atoms in the RNAs alone is over 200,000.  There must be many more atoms than that contained in the associated proteins, as the phosphates have a mass of 98, the ribose 115, the pyrmidines around 100.  So they don’t account for more than 40% of the total ribosomal mass.  If anyone can give me exact numbers, I’ll update this.

The actual catalysis is not accomplished by the 79 proteins, but by the RNAs themselves.  This is thought to be a living relic of an RNA world where life actually began.  The proteins are mostly found on the surface of the ribosome.

There are a gigantic number of things to say about the ribosome, but I’m just going to put in the facts needed so pure chemist types can read other posts. This post will be expanded as necessary when further background is needed.

Amino acids are linked together (the rate is only 2 – 6 per second) by the beast. This is OK as the average cell has over 10 million ribosomes (neurons probably have more).  The article above notes that most of the changes between the ribosome of bacteria and that of celled organisms (eukaryotes) make our ribosomes bigger.  The proteins are bigger, the rRNAs are longer.

The actual synthesis of proteins takes place deep in the center of the ribosome, where the two subunits come together.  How does the protein get out?  It is extruded (like sausage) through the exit tunnel, which is 100 Angstroms long in the E. Coli ribosome, where it’s diameter varies between 10 to 20 Angstroms.  Since the alpha helix is 11 Angstroms wide, this means that little if any other secondary structures (beta turns, beta sheets) and no tertiary structure at all can form within it.  It’s probably longer (and possibly wider) in our ribosomes.

The tail of RNA polymerase II and the limits of chemical explanation

When I study math books, I’m always amazed at how much the reader is expected to internalize and retain.  A theorem proved 100 pages or so ago is referred to in the course of a proof without further ado.  The pure chemist reading this longest of posts, with minimal exposure to modern molecular biology, may feel the same way.  You’ll need all 4 articles of https://luysii.wordpress.com/category/molecular-biology-survival-guide/, and all 6 articles of https://luysii.wordpress.com/category/the-cell-nucleus-on-a-human-scale/ at your fingertips to get through this one.  The stuff is at my mental fingertips because I’ve been learning and thinking about it for decades.  Perhaps mathematicians are the same way, or perhaps they really are smarter than everyone else.

The article assumes you have a solid chemical background.  I find it somewhat sad that only a chemist with a decent molecular biological background can fully understand the elegance and beauty of what is to follow. I hope this post and the 10 above provide enough background for what is to follow.

Recall that eukaryotic RNA polymerase II (pol II) is really a complex of 12 distinct proteins in man with a total mass of 550 kiloDaltons.  The RBP1 subunit is the largest of the 12 and contains a truly fascinating carboxy terminal domain (CTD) — to be discussed in some detail later in this post.  The function of pol II is transcription of a protein coding gene into messenger RNA (mRNA). Pol II binds to DNA upstream (5′ to) the DNA which actually codes for the amino acids making up the protein. Just binding there (this site is called the promoter) is far from enough for gene transcription to actually begin.  5 general transcription factors (pol II transcription factors B, D, E, F, H — aka TFIIB, etc.) are required.  All 5  general transcription factors are actually multiprotein complexes.  Then there is the mediator complex, a complex of more than 20 proteins which allows communication between transcriptional activators (enhancers) and repressors found elsewhere in the DNA.  So the whole gemish contains 60 proteins with a mass of 3,500,000 Daltons.  The heaviest atom in all this is phosphorus, so this means at least 100,000 atoms are involved.  Have a look at Science vol. 288 pp. 632 – 633, 640 – 649 ’00 — it’s old but good and written by Kornberg fils who won his Nobel for this work.

I’ve mentioned some of the processing that goes on after the section of the DNA actually coding for amino acids is transcribed into RNA (splicing, the polyA tail, etc. etc.).  There is also some modification of the 5′ end of the RNA (called the cap), requiring a variety of binding proteins and enzymes to occur.

Just binding to the promoter, separating the two strands of DNA and starting to copy (transcribe) one of them into RNA is not enough.  This happens all the time, but after making  RNAs 5 – 10 nucleotides long, pol II pauses, releases the RNA just made and pops back to the promoter (which it really never left).  The other proteins of the 3.5 megaDalton initiation complex hold onto pol II keeping it there.

Here is where the carboxy terminal domain of the largest subunit of pol II comes in.  It is a fascinating structure, which can only be completely understood by the chemist.  It is made of 52 imperfect repeats of the 7 amino acids.  Here is the consensus repeat (listed from the amino terminal end to the carboxy terminal end — as protein sequences are always presented).

Tyrosine Serine Proline Threonine Serine Proline Serine

What should strike the biochemically oriented chemist is that the 3 (out of 20) amino acids with hydroxyl groups account for 5/7 ths of the structure.  This means that all of them can be phosphorylated.  The two prolines are hardly dull, because they make it impossible for classic alpha helices to form — sometimes they are called helix breakers.  The OH groups mean that the heptad is quite hydrophilic.  Phosphorylation of any two OHs of the heptad means that the chain will be pretty much straight out due to charge charge repulsion.  The number of distinct phosphorylated states of even one heptad is 2^5 =32, that for the whole CTD is 32^52.

Chemists more familiar with biochemistry, know that phosphorylation and dephosphorylation of serine, threonine and tyrosine is extensively used by the cell to control protein/protein interactions.  That’s why our genome codes for 518 different protein kinases (which esterify hydroxyls by phosphate  despite the rather weird name) and 137 phosphatases.

So the phosphorylation state (how much, which ones) of the carboxy terminal domain determine which proteins bind to it.  Here is where the fun begins.

Just to give a glimpse of what is going on in our cells all the time, here are the gory details of formation of the cap at the 5′ of mRNA.  You don’t have to read the details between the asterisks to follow the rest of the post

***

   [ Proc. Natl. Acad. Sci. vol. 86 pp. 5795 – 5799 ’89 ] All cellular cytoplasmic mRNAs have a 7 methyl guanylate cap attached to their 5′ ends.  The cap structure is added early during the transcription of mRNA by RNA polymerase II in the nucleus (after the first 25 nucleotides of a given mRNA are formed).  
       Three enzymes are involved in mRNA cap formation 
   (1) an RNA triphosphatase which cleaves the 5′ triphosphate terminus of the primary transcript to a 5′ diphosphate terminated RNA 
   (2) a guanyltransferase, which caps the structure with GMP — forming a 5′ – 5′ linkage 
   (3) a methyl transferase which adds a methyl group to the nitrogen at position #7 of guanine (see the structure of 7 methyl guanosine). 
    (4) The cap structure can then be further methylated by a ribose 2’0 methyltransferase.
*** 

The 3 capping enzymes bind to the phosphorylated carboxy terminal domain of pol II, so they can grab the newly formed 5′ end of the mRNA as it emerges from a tunnel in pol II.  Not only that, but the enzymes bind to a specific pattern of phosphorylation of the tail (namely serine #5 by a kinase called Cdk7).

         An intricate mechanism exists to stop transcription from proceeding too far, so the 5′ end of the emerging RNA is properly processed.  During the formation of the transcription initiation complex (or soon after initiation) DRB sensitivity inducing factor (DSIF) is recruited to the transcription complex (by binding to the CTD).  Additionally, after initiation of transcription, the negative elongation factor (NELF) is recruited through interaction with DSIF.  This results in the arrest of the transcription complex before it enters into productive elongation. DSIF/NELF mediated arrest is then relieved by means of phosphorylation of the carboxy terminal domain on serine #2 by positive transcription elongation factor b (P-TEFb) and the transcription complex resumes elongation.  This causes DSIF and NELF (both are proteins) to drop away from the CTD.

       Even so, pol II is still linked to the initiation complex at the promoter.  How does it get started again and move away from the promoter? The process is called promoter clearance or promoter escape.  Another phosphorylation of the CTD is involved — this time on serine #5 by a kinase called Cdk7, which is found in one of the general transcription factor complexes (TFIIH).     

       Eventually a whole bunch of proteins (called the super elongation complex) binds to the CTD allowing not just escape, but movement down DNA.  The complex includes the P-TEFb, ELL2, AFF4, AFF1 ENL and AF9 proteins.  So now pol II is chugging down DNA adding a new base every 50 milliSeconds or so.  A whole other group of kinases modifies the CTD so different proteins can bind to it after the terminal codon is reached and finish processing the mRNA.  I’m going to skip this as you have the general idea, but rest assured it is just as complicated as putting on the 5′ cap described above.

Now for the exquisite mechanisms described in Proc. Natl. Acad. Sci. vol. 108 pp. 14717 – 14718 ’11.  In the previous post –https://luysii.wordpress.com/2011/09/18/the-cell-and-its-nucleus-on-a-human-scale-vi-untwisting-the-linguini/ — I wondered how the large pol II enzyme transcribes DNA wound twice around the nucleosome (I really haven’t found an answer that satisfies me).  Work has shown that pol II slows down when it reaches a nucleosome (it incorporates fewer nucleotides into the growing mRNA per second.

“95% of human multiexon protein coding genes are alternatively spliced” [ Nature vol. 465 pp. 16 – 17 ‘1o ]  So how is the decision made between two alternative exons by the splicing machinery?  It turns out that pol II is involved here as well.  There is no logical reason it has to be.  The whole mRNA could be formed by the polymerase and then it could move elsewhere in the nucleus to the splicing machinery.  But in this one well studied case, alternative splicing occurs as pol II is transcribing one particular gene (which is mutated in type I neurofibromatosis).

Now for a side trip to neurology.  There is an awful disease called paraneoplastic encephalomyelitis.  The brain is subject to an immune attack in some patients with cancer (and in some it can be the first symptom) with resultant dementia, convulsions, incoordination and death.  For years we wondered what the immune system was attacking.  Now we know it is any of three proteins (HuB, HuC, HuD) found only in the brain.  They bind to messenger RNA.  Why the immune system sometimes chooses them for attack and how cancer sometimes triggers this isn’t known for sure.  One of the theories is that the cancer cells produce something that immunologically looks lik the Hu proteins, which the immune system regards as foreign.  Fortunately it is fairly rare, but I did see a few cases.

Also recall that the nucleosome is only the first stage of the 100,000 fold compaction of DNA required to fit it into the nucleus.  The higher order arrangement of nucleosomes is the matter of decades of intense study which unfortunately hasn’t reached a conclusion, but there is no question that nucleosomes are close together in the nucleus, whether or not the 30 nanoMeter fiber packing 6 or so nucleosomes per level of the fiber.

So the 3 Hu’s are yet another set of proteins binding to the carboxy terminal domain (CTD) of the large subunit of pol II.  So what?  They interact with histone deacetylase 2 (HDAC2) which removes the acetyl group from the the epsilon amino group of lysine, changing an amide to an amine — increasing the positive charge on the nitrogen.  This has the effect of compacting DNA as the protonated amine can then bind to the zillions of negatively charged phosphates of the DNA backbone.  Here’s another place where you simply must know chemistry to understand what’s going on.

So a protein bound to the CTD of pol II recruits another protein which chemically modifies another protein around which DNA is wrapped.  This has the remarkable effect of directly linking the epigenetic machinery to the transcription machinery.  Epigenetics had been thought of as determining which proteins were made in a given cell (e.g. an on/off effect) rather than how they were spliced.

How does this work? The theory is advanced the certain splicing signals are stronger than others. This means if the transcription machinery is slowed down (say by more chromosome compaction), it will have a chance to splice at the weaker splicing signal.

Things are even more complicated.  Back in the day, newsreels were shown before movies (rather than the hideous trailers of today). They sometimes amused American audiences by showing sped up films of crazed foreigners playing the sport of curling — see http://en.wikipedia.org/wiki/Curling.  A (very heavy) stone is essentially slid on ice toward a target.  In front of the stone are two guys sweeping furiously, to alter the surface of the ice, so the stone lands where they want it to.  With sped up film, they look like idiots.

The PNAS article proposes that something like that happens during transcription — preceding the pol II complex are enzymes called histone acetyl transferase (HATs) the yang to the yin of the HDAC. They acetylate the epsilon amino group of lysines on the histones making up the nucleosomes (making it harder for lysine to bind to the phosphates of DNA.  This presumably opens up compacted DNA letting pol II (which is pretty large itself at 5 x 5 x 7 nanoMeters) get through the chroatin easing transcription. These are the sweepers of curling.  Then along comes pol II.  Near the end of its run along the gene, it recruits Hu proteins which recruit HDAC2 which closes up chromatin again.

Elegant yes?  Incredible, no?

Hopefully, a few readers have actually made it this far.  For questions, critiques, ambiguities, errors of fact, etc. etc., just post a comment.

Now for some philosophy. You can’t really understand any of this without knowing a fair amount of organic chemistry and some protein chemistry as well.   Chemistry explains how all this happens.  It is totally useless in explaining why.  As soon as you ask just what the CTD, the Hu proteins, HDACs, HATs, pol II or anything else in the cell are for, you are in the land of Aristotle, where everything had an innate purpose and function.  You have crossed the Cartesian divide between the physical and the world of ideas, a place where chemistry can no longer help you.

        Still, it is a magnificent thing to have the background to contemplate all this.  Even so,  I’m sure our knowledge is far from complete.  No one said it better than Pascal — “Man is but a reed, the most feeble thing in nature, but he is a thinking reed.”