Category Archives: Molecular Biology

Very sad

The failure of Lilly’s antibody against the aBeta protein is very sad on several levels. My year started out going to a memorial service for a college classmate, fellow doc and friend who died of Alzheimer’s disease. He had some 50 papers to his credit mostly involving clinical evaluation of drugs such as captopril. Even so it was an uplifting experience — here’s a link –

There is a large body of theory that says it should have worked. Derek Lowe’s blog “In the Pipeline” has much more — and the 80 or so comments on his post will expose you to many different points of view on Abeta — here’s the link.

It’s time to ‘let 100 flowers bloom’ in Alzheimer’s research — E. g. it’s time to look at some far out possibilities — we know that most will be wrong that they will be crushed, as Mao did to all the flowers. Even so it’s worth doing.

So to buck up your spirits, here’s an old post (not a link) raising the possibility that Alzheimer’s might be a problem in physics rather than chemistry. If that isn’t enough another post follows that one on Lopid (Gemfibrozil).

Could Alzheimer’s disease be a problem in physics rather than chemistry?

Two seemingly unrelated recent papers could turn our attention away from chemistry and toward physics as the basic problem in Alzheimer’s disease. God knows we could use better therapy for Alzheimer’s disease than we have now. Any new way of looking at Alzheimer’s, no matter how bizarre,should be welcome. The approaches via the aBeta peptide, and the enzymes producing it just haven’t worked, and they’ve really been tried — hard.

The first paper [ Proc. Natl. Acad. Sci. vol. 111 pp. 16124 – 16129 ’14 ] made surfaces with arbitrary degrees of roughness, using the microfabrication technology for making computer chips. We’re talking roughness that’s almost smooth — bumps ranging from 320 Angstroms to 800. Surfaces could be made quite regular (as in a diffraction grating) or irregular. Scanning electron microscopic pictures were given of the various degrees of roughness.

Then they plated cultured primitive neuronal cells (PC12 cells) on surfaces of varying degrees of roughness. The optimal roughness for PC12 to act more like neurons was an Rq of 320 Angstroms.. Interestingly, this degree of roughness is identical to that found on healthy astrocytes (assuming that culturing them or getting them out of the brain doesn’t radically change them). Hippocampal neurons in contact with astrocytes of this degree of roughness also began extending neurites. It’s important to note that the roughness was made with something neurons and astrocytes never see — silica colloids of varying sizes and shapes.

Now is when it gets interesting. The plaques of Alzheimer’s disease have surface roughness of around 800 Angstroms. Roughness of the artificial surface of this degree was toxic to hippocampal neurons (lower degrees of roughness were not). Normal brain has a roughness with a median at 340 Angstroms.

So in some way neurons and astrocytes can sense the amount of roughness in surfaces they are in contact with. How do they do this — chemically it comes down to Piezo1 ion channels, a story in themselves [ Science vol. 330 pp. 55 – 60 ’10 ] These are membrane proteins with between 24 and 36 transmembrane segments. Then they form tetramers with a huge molecular mass (1.2 megaDaltons) and 120 or more transmembrane segments. They are huge (2,100 – 4,700 amino acids). They can sense mechanical stress, and are used by endothelial cells to sense how fast blood is flowing (or not flowing) past them. Expression of these genes in mechanically insensitive cells makes them sensitive to mechanical stimuli.

The paper is somewhat ambiguous on whether expressing piezo1 is a function of neuronal health or sickness. The last paragraph appears to have it both ways.

So as we leave paper #1, we note that that neurons can sense the physical characteristics of their environment, even when it’s something as un-natural as a silica colloid. Inhibiting Piezo1 activity by a spider venom toxin (GsMTx4) destroys this ability. The right degree of roughness is healthy for neurons, the wrong degree kills them. Clearly the work should be repeated with other colloids of a different chemical composition.

The next paper [ Science vol. 342 pp. 301, 316 – 317, 373 – 377 ’13 ] Talks about the plumbing system of the brain, which is far more active than I’d ever imaged. The glymphatic system is a network of microscopic fluid filled channels. Cerebrospinal fluid (CSF) bathes the brain. It flows into the substance of the brain (the parenchyma) along arteries, and the fluid between the cellular elements (interstitial fluid) it exchanges with flows out of the brain along the draining veins.

This work was able to measure the amount of flow through the lymphatics by injected tracer into the CSF and/or the brain parenchyma. The important point about this is that during sleep these channels expand by 60%, and beta amyloid is cleared twice as quickly. Arousal of a sleeping mouse decreases the influx of tracer by 95%. So this amazing paper finally comes up with an explanation of why we spend 1/3 of our lives asleep — to flush toxins from the brain.

If you wish to read (a lot) more about this system — see an older post from when this paper first came out —

So what is the implication of these two papers for Alzheimer’s disease?

The surface roughness of the plaques (800 Angstroms roughness) may physically hurt neurons. The plaques are much larger or Alzheimer would never have seen them with the light microscopy at his disposal.

The size of the plaques themselves may gum up the brain’s plumbing system.

The tracer work should certainly be repeated with mouse models of Alzheimer’s, far removed from human pathology though they may be.

I find this extremely appealing because it gives us a new way of thinking about this terrible disorder. In addition it might explain why cognitive decline almost invariably accompanies aging, and why Alzheimer’s disease is a disorder of the elderly.

Next, assume this is true? What would be the therapy? Getting rid of the senile plaques in and of itself might be therapeutic. It is nearly impossible for me to imagine a way that this could be done without harming the surrounding brain.

Before we all get too excited it should be noted that the correlation between senile plaque burden and cognitive function is far from perfect. Some people have a lot of plaque (there are ways to detect them antemortem) and normal cognitive function. The work also leaves out the second pathologic change seen in Alzheimer’s disease, the neurofibrillary tangle which is intracellular, not extracellular. I suppose if it caused the parts of the cell containing them to swell, it too could gum up the plumbing.

As far as I can tell, putting the two papers together conceptually might even be original. Prasad Shastri, the author of the first paper, was very helpful discussing some points about his paper by Email, but had not heard of the second and is looking at it this weekend.

Also a trial of Lopid (Gemfibrozil) as something which might be beneficial is in progress — there is some interesting theory behind this. The trial has about another year to go. Here’s that post and happy hunting

Takes me right back to grad school

How many times in grad school did you or your friends come up with a good idea, only to see it appear in the literature a few months later by someone who’d been working on it for much longer. We’d console ourselves with the knowledge that at least we were thinking well and move on.

Exactly that happened to what I thought was an original idea in my last post — e.g. that Gemfibrozil (Lopid) might slow down (or even treat) Alzheimer’s disease. I considered the post the most significant one I’d ever written, and didn’t post anything else for a week or two, so anyone coming to the blog for any reason would see it first.

A commenter on the first post gave me a name to contact to try out the idea, but I’ve been unable to reach her. Derek Lowe was quite helpful in letting me link to the post, so presently the post has had over 200 hits. Today I wrote an Alzheimer’s researcher at Yale about it. He responded nearly immediately with a link to an ongoing clinical study in progress in Kentucky

On Aug 3, 2015, at 3:04 PM, Christopher van Dyck wrote:

Dear Dr. xxxxx

Thanks for your email. I agree that this is a promising mechanism.
My colleague Greg Jicha at U.Kentucky is already working on this:

Our current efforts at Yale are on other mechanisms:

We can’t all test every mechanism, but hopefully we can collectively test the important ones.

-best regards,
Christopher H. van Dyck, MD
Professor of Psychiatry, Neurology, and Neurobiology
Director, Alzheimers Disease Research Unit

Am I unhappy about losing fame and glory being the first to think of it? Not in the slightest. Alzheimer’s is a terrible disease and it’s great to see the idea being tested.

Even more interestingly, a look at the website for the study shows, that somehow they got to Gemfibrozil by a different mechanism — microRNAs rather than PPARalpha.

I plan to get in touch with Dr. Jicha to see how he found his way to Gemfibrozil. The study is only 1 year in duration, and hopefully is well enough powered to find an effect. These studies are incredibly expensive (and an excellent use of my taxes). I never been involved in anything like this, but data mining existing HMO data simply has to be cheaper. How much cheaper I don’t know.

Here’s the previous post —

Could Gemfibrozil (Lopid) be used to slow down (or even treat) Alzheimer’s disease?

Is a treatment of Alzheimer’s disease at hand with a drug in clinical use for nearly 40 years? A paper in this week’s PNAS implies that it might (vol. 112 pp. 8445 – 8450 ’15 7 July ’15). First a lot more background than I usually provide, because some family members of the afflicted read everything they can get their hands on, and few of them have medical or biochemical training. The cognoscenti can skip past this to the text marked ***

One of the two pathologic hallmarks of Alzheimer’s disease is the senile plaque (the other is the neurofibrillary tangle). The major component of the plaque is a fragment of a protein called APP (Amyloid Precursor Protein). Normally it sits in the cellular membrane of nerve cells (neurons) with part sticking outside the cell and another part sticking inside. The protein as made by the cell contains anywhere from 563 to 770 amino acids linked together in a long chain. The fragment destined to make up the senile plaque (called the Abeta peptide) is much smaller (39 to 42 amino acids) and is found in the parts of APP embedded in the membrane and sticking outside the cell.

No protein lives forever in the cell, and APP is no exception. There are a variety of ways to chop it up, so its amino acids can be used for other things. One such chopper is called ADAM10 (aka Kuzbanian). ADAM10breaks down APP in such a way that Abeta isn’t formed. The paper essentially found that Gemfibrozil (commercial name Lopid) increases the amount of ADAM10 around. If you take a mouse genetically modified so that it will get senile plaques and decrease ADAM10 you get a lot more plaques.

The authors didn’t artificially increase the amount of ADAM10 to see if the animals got fewer plaques (that’s probably their next paper).

So there you have it. Should your loved one get Gemfibrozil? It’s a very long shot and the drug has significant side effects. For just how long a shot and the chain of inferences why this is so look at the text marked @@@@


How does Gemfibrozil increase the amount of ADAM10 around? It binds to a protein called PPARalpha which is a type of nuclear hormone receptor. PPARalpha binds to another protein called RXR, and together they turn on the transcription of a variety of genes, most of which are related to lipid metabolism. One of the genes turned on is ADAM10, which really has never been mentioned in the context of lipid metabolism. In any event Gemfibrozil binds to PPARalpha which binds more effectively to RAR which binds more effectively to the promoter of the ADAM10 gene which makes more ADAM10 which chops of APP in such fashion that Abeta isn’t made.

How in the world the authors got to PPARalpha from ADAM10 is unknown — but I’ve written the following to the lead author just before writing this post.

Dr. Pahan;

Great paper. People have been focused on ADAM10 for years. It isn’t clear to me how you were led to PPARgamma from reading your paper. I’m not sure how many people are still on Gemfibrozil. Probably most of them have some form of vascular disease, which increases the risk of dementia of all sorts (including Alzheimer’s). Nonetheless large HMOs have prescription data which can be mined to see if the incidence of Alzheimer’s is less on Gemfibrozil than those taking other lipid lowering agents, or the population at large. One such example (involving another class of drugs) is JAMA Intern Med. 2015;175(3):401-407, where the prescriptions of 3,434 individuals 65 years or older in Group Health, an integrated health care delivery system in Seattle, Washington. I thought the conclusions were totally unwarranted, but it shows what can be done with data already out there. Did you look at other fibrates (such as Atromid)?

Update: 22 July ’15

I received the following back from the author

Dear Dr.

Wonderful suggestion. However, here, we have focused on the basic science part because the NIH supports basic science discovery. It is very difficult to compete for NIH R01 grants using data mining approach.

It is PPARα, but not PPARγ, that is involved in the regulation of ADAM10. We searched ADAM10 gene promoter and found a site where PPAR can bind. Then using knockout cells and ChIP assay, we confirmed the participation of PPARα, the protein that controls fatty acid metabolism in the liver, suggesting that plaque formation is controlled by a lipid-lowering protein. Therefore, many colleagues are sending kudos for this publication.

Thank you.

Kalipada Pahan, Ph.D.

The Floyd A. Davis, M.D., Endowed Chair of Neurology


Departments of Neurological Sciences, Biochemistry and Pharmacology

So there you have it. An idea worth pursuing according to Dr. Pahan, but one which he can’t (or won’t). So, dear reader, take it upon yourself (if you can) to mine the data on people given Gemfibrozil to see if their risk of Alzheimer’s is lower. I won’t stand in your way or compete with you as I’m a retired clinical neurologist with no academic affiliation. The data is certainly out there, just as it was for the JAMA Intern Med. 2015;175(3):401-407 study. Bon voyage.


There are side effects, one of which is a severe muscle disease, and as a neurologist I saw someone so severely weakened by drugs of this class that they were on a respirator being too weak to breathe (they recovered). The use of Gemfibrozil rests on the assumption that the senile plaque and Abeta peptide are causative of Alzheimer’s. A huge amount of money has been spent and lost on drugs (antibodies mostly) trying to get rid of the plaques. None have helped clinically. It is possible that the plaque is the last gasp of a neuron dying of something else (e.g. a tombstone rather than a smoking gun). It is also possible that the plaque is actually a way the neuron was defending itself against what was trying to kill it (e.g. the plaque as a pile of spent bullets).

A scary paper: Cancer by proxy

Can a good kid growing up in a bad neighborhood turn bad? Most think so. What about a genetically normal cell growing up in a bad neighborhood? Can it turn cancerous if its neighbors have a mutation ? A recent paper [ Nature vol. 539 pp.304 – 308 ’16b] demonstrates how this can happen.

A gene called PTPN11 is mutated in myelomonocytic leukemia (MML)in humans and mice. Expressing the mutant in blood cells causes leukemia in mice (nothing spectacular there).

However, expressing the mutant in marrow supporting cells, not blood cells or blood stem cells for long enough gives MML in mice which can be transplanted into normal mice producing MML there.

Note that the blood stem cells don’t contain the mutant gene. One theory has it that mutant PTPN11 recruits monocytes, which then produce other stuff (CCL3 also known as MIP1alpha and interleukin1Beta), which then turns on blood stem cells to proliferate madly causing leukemia. Giving a CCL3 receptor antagonist reverses the myeloproliferation (but it isn’t clear to me if it reverses the leukemia once established)

As far as we know the cells developing into MML don’t contain mutant PTPN11. So it’s cancer by proxy. Obviously some changes (mutations, epigenetic changes) have have occurred in the leukemic cells, but at this point we don’t know what they are.

What is ICP27 trying to tell us? One of you could get a PhD if you figure it out !

It wouldn’t be the first time a viral protein led us to an important cellular mechanism. Consider what the polio virus taught us about the translation of mRNA into protein. It cleaves two components of eIF-4F (eukaryotic Initiation (of ribosome translation of mRNA into protein) Factor 4F totally shutting down synthesis of mRNAs with a cap on their 5′ end (which is most of them). Poliovirus proteins don’t have these caps so their proteins continue to be made.

Well this brings us to ICP27 (Infected Cell Protein 27) a product of the Herpes Simplex virus. You can read all about it in [ Proc. Natl. Acad. Sci. vol. 113 pp. 12256 – 12261 ’16 ]. ICP27 is essential for herpes virus infection. This work shows that it inhibits intron splicing (but in under 1% of cellular genes) and also promotes the use of alternative 5′ splice sites.

It also induces the expression of pre-mRNAS prematurely cleaved and polyAdenylated from cryptic polyAdenylation signals located in intron 1 or intron 2 of an amazing 1% of all cellular genes. These prematurely cleaved and polyAdenylated mRNA sometimes contain novel open reading frames (ORFs). They are typically intronless (they should be) and under 2 kiloBases long. They are expressed early during viral infection and efficiently exported to cytoplasm. The ICP27 targeted genes are GC rich (as are all Herpes simplex genes), contain cytosine rich sequences near the 5′ splice site.

The paper also showed that optimization of splice site sequences, or mutation of nearby cytosines eliminated ICP27 mediated splicing inhibition. Introduction of cytosine rich sequences to an ICP27 INsensitive splicing reporter conferred susceptibility to ICP27.

How is this going to help you get a PhD? Ask yourself. What are cryptic polyAdenylation signals doing in the first two introns in so many genes? It seems obvious (to me) that as well as the virus the cell is using them for some purpose. It isn’t hard to mutate something to the signal for polyadenylation AAUAAA. Interestingly cleavage doesn’t occur here, but 30 nucleotides or so downstream. The sequence occurs every 4^6 == 4096 nucleotides (if they’re random). I’m not sure what the total length of introns #1 and #2 are of our 20,000 or so protein coding genes, but someone should be able to find out and see if 200 occurrences of this sequence is more than would be expected by chance.

The plot thickens when the paper notes that “Over 200 genes are affected by ICP27. Over 30 (including PML, STING, TRAF6, PPP6C, MAP3K7, FBXw11, IFNAR2, NKFB1, RELA and CREBP are related to the immune pathway). Do you think the cell doesn’t use this pathway as well?

What about the existence of other viral (and cellular) proteins doing the same sort of thing (but on different introns perhaps). What are those novel open reading frames in the alternatively spliced mRNAs doing?

Fascinating stuff. Time to get busy if you’re an enterprising grad student, or young faculty member.

The proteasome branches out

The surface of a protein is not at all like a ball of yarn, even though they are both one long string. This has profound implications for the immune system. Look at any solved protein structure. The backbone bobs and weaves taking water hating (hydrophobic) amino acids into the center of the protein, and putting water loving (hydrophilic) amino acids on the surface. So even though the peptide backbone is continuous, only discontinuous patches of it are displayed on the protein surface.

Which is a big problem for the immune system which wants to recognize the surface of the protein (which is all it first gets to see with an invading bug). Now we know that foreign proteins are ingested by the cell, chopped up by the proteasome, and fragments loaded on to immune molecules (class I Major Histocompatibility Complex antigens) and displayed on the cell surface so the immune system can learn what it looks like and react to it. The peptides aren’t very long — under 11 or so amino acids, but they are continuous.

What if the really distinct part of the protein surface (e.g. the immunogen)  is made of two distinct patches from the backbone? A fascinating paper shows how the immune system might still recognize it. Chop the protein up into fragments by the proteasome, and then have the fragments from adjacent patches put back together. You know that any enzyme can be run in reverse, so if the proteasome can split peptide bonds apart it can also join them together.

This is exactly what was found in a recent paper — Science vol. 354 pp. 354 – 358 ’16. The small peptides (containing at most 11 amino acids) finding their way to the cell surface were analyzed in a technical tour de force. In aggregate they go by the fancy name of immunopeptidome. They found that the proteasome IS actually splicing peptide fragments together. This is called Proteasome Catalyzed Peptide Splicing (PCPS). The present work shows that it accounts for 1/3 of the class I immunopeptidome in terms of diversity and 1/4 in terms of abundance. One-third of self antigens are represented on the cell surface of the immune cell line they studied (GR-LCL the GR-lymphoblastoid cell line) ONLY by spliced peptides. The ordering of the spliced peptide was the same as the parent protein in only half. There was no preference for the length of the protein skipped by the splice.

The work has huge implications for immunology, not least autoimmune disease.

So today I wrote the author the following

Dr. Mishto

Terrific paper ! Do you have any evidence for the spliced peptides being spatially contiguous on the surface of the parent protein. Have you looked?

This makes a lot of sense, because the immune system should ‘want’ to recognize protein conformations as they exist in the living cell, rather than stretches of amino acid sequence in the parent protein. Also, with few exceptions the surface of a given protein in vivo is a collection of discontinuous peptide sequences of the parent protein. I’ve always wondered how the immune system did this, and perhaps your paper explains things.


and got this back almost immediately

Dear Luysii

Interesting idea. We shall have a look for few examples where the crystallography structure or the parental protein is disclosed already.



It doesn’t get any better than this. Tomorrow I will be exactly 78 years and 6 months old. It shows I can still think (on occasion).

Addendum 17 Nov ’16;  It looks as though proteins are fed into the central cavity of the proteasome as a completely denatured single strand.  See figure 5 of PNAS 113 pp 12991 -m12996 ’16.  The channel to get in appears quite narrow.

The butterfly effect in embryology

How the snake lost its legs. No, this isn’t a Just So story a la Rudyard Kipling, but a fascinating paper in Cell (vol. 167 pp. 598 – 600, 633 – 642 ’16 ). All it takes is a 17 nucleotide deletion in ZRS (Zone of polarizing activity Regulatory Sequence), an enhancer of gene expression involved in limb development. The enhancer is at least 1,300 nucleotides long (but I can’t find out just how long ZRS is). The deletion removes a binding site for a transcription factor (ETS) which turns on some limb development genes.

ZRS has long been known to be involved in limb development, and mutations distributed over 700 nucleotides are associated with a variety of human limb malformations. So the authors sequenced the enhancer in a variety of species (including many snakes) and found that only snakes had the deletion.

Then they put the snake ZRS into genetically engineered transgenic mice and found markedly shortened limbs. That was all it took. Reintroducing the missing 17 nucleotides into the transgenics restores normal limb development. Staggering what genetic technology is capable of.

Where does the butterfly effect come in? Because the enhancer is 1,000,000 nucleotides away from some of the genes it controls. If you were studying sequences around the genes it controls, you’d never find the deletion (until you’d run through a large number of grad students). Human biology (with limb malformations) told the authors where to look.

Straightened out 1,000,000 nucleotides is 3,200,000 Angstroms,or 320 microns (32 times the size of the average 10 micron nucleus). Remarkable how it finds its target. You might be interested in a series of posts which try to imagine these goings on at human scale — blowing up the nucleus so it fits in a football stadium with our double stranded DNA blown up to the size of linguini with a total total length of 2840 miles. Start here –

The world’s longest allosteric effect

I think there is some very interesting protein physical chemistry to be discovered/worked out based on a recent report [ Nature vol. 537 pp. 107 – 111 ’16 ]. It involves a long (2,200 Angstrom) coiled coil protein called EEA1 (Early Endosome Antigen 1). It contains 1,400 amino acids 1,275 of which form a coiled coil.

If you are conversant with the alpha helix and how two of them form a coiled coil, jump to ****. Otherwise here is some background and links to pictures which should help.

The alpha helix is a type of protein secondary structure in which the protein backbone assumes the shape of a coiled spring. There are 3.64 amino acids per turn. A single turn is 5.4 Angstroms high and 11 Angstroms wide. The alpha helix is right handed. That is to say, that if you orient the chain so that your thumb points from the N terminal to C terminal amino acid, the chain will twist in the direction of the fingers of the right hand as it rises. For some reason I can’t provide a link to a very large number of images for you hit. However, when I go to Google and type images of alpha helices you see them immediately — you’ll have to do the same to get there.

Coiled coils have two alpha helices winding around each other. This means that for secure interactions, the same types of amino acids must repeat again and again. A 7 residue periodicity (abcdefg)n in the distribution of nonpolar and charged amino acid residues is a feature characteristic of proteins which form alpha helices coiled about each other (coiled coil molecules). The 7 amino acids are lettered a – g from amino to carboxy. Positions a and d are usually hydrophobic amino acids (Leu, Ile, Val, Ala), positions e and g are usually polar or charged. The nonpolar a and d side chains associate by means of complementary knobs into holes packing. Each individual alpha helix is right handed, but the two helices wind around each other with a left handed turn. There are 3.64 amino acids per turn of an alpha helix, so for a regular repeating structure an amino acid should appear at the same position in space on the alpha helix (which forms a rigid rod). To see all the pictures you want — go to Google and type “Images of the Alpha Helix”.

To get the number of amino acids down so there are 3.5 per/turn (so the structure can repeat exactly every 7 amino acids –e.g. after 2 alpha helical turns) left handed supercoiling of each helix occurs (it’s a chicken and the egg situation). The helices are at an angle of 18 degrees to each other, and every 3.5 amino acids still form a 5.4 Angstrom (when one helix is viewed in isolation), but due to the tilt, they take up 5.1 Angstroms. This means that the same type of amino acid is found at positions 1, 8, 15, 22 etc. All intermediate filament proteins (keratin, neurofilaments, vimentin, etc.) contain a coiled coil structure. So to see all the pictures you could want — go to Google and type “Images of coiled coil proteins”

So the 1,275 amino acids of EEA1 divided by 3.5 and multiplied by 5.1 give you a coiled coil of fairly enormous length for a protein (1,858 Angstroms) — average protein diameter (if there is such a thing) is under 50 Angstroms

Functionally, EEA1 seems to be used as a tether with one end free and the other end hooked to a target membrane which wants to ‘catch’ the early endosome. The target membrane isn’t specified in the paper. Apparently EEA1 when not binding the endosome, is in a fully extended state, at around 2,000 Angstroms.

A protein called Rab5 is found on the early endosome membrane, and when EEA1 contacts it, the long coiled coil helix collapses, dragging the endosome toward the target membrane.  This is entropy in action, there being far more configurations of a collapsed protein than a rigidly extended one. To feel entropy for yourself, just pull on a rubber band, entropic effects just like this one are what you feel pulling back.

The collapse of EEA1  is an allosteric effect and a very long one, although the authors note long range allosteric effects are “not uncommon among coiled coil proteins”.

EEA1 is more complicated than initialy described. It contains amino acids which disrupt the 7 amino acid periodicity of the coiled coil (making it a jointed structure). The authors then made an EEA1 protein without the joints (so it was a perfect very long coiled coil). Binding of this protein to Rab5 on an endosome doesn’t result in collapse. So clearly normal EEA1 collapses at the ‘joints’.

The authors talk about some hypotheses as to how this happens in the Supplementary material (but I was unable to find).

So here’s a good research proejct for an enterprising grad student: either find out why and how a protein with multiple joints should exist in a fully extended configuration, or figure out how binding of Rab5 at one end of EEA1 produces such profound allosteric changes through this long linear protein. Happy hunting and thinking.

I must say it’s a pleasure to get back to chemistry after writing about the neurologic and medical issues of the presidential candidates.

Addendum 29 September — I wrote one the following to one of the authors (Dr. Grill) sending him the post above

Dr. Grill

Greatly enjoyed the paper.  I could never find the discussion of possible mechanism in the supplementary material.  You might enjoy the following post written about the paper

He replied as follows:

“Dear Luysii thank you very much for the kind words, and I really like your title!

With the supplementary discussion, besides the method part there is an additional supplement file on the Nature website that is easy to miss…I attach it here for you. We discuss this a bit more, but I must admit that this is not very satisfactory at the moment. We just don’t know how this works, and much of our efforts at the moment are dedicated to understand”
So for other readers of the original paper who also can’t find the supplement with the authors’ speculations as to what is going on– here  is what he sent.

” A key question is how Rab5 can induce such a long-range global molecular transition in flexibility of EEA1. Indeed, long-range allosteric effects have been observed for other coiled-coil proteins. In the case of myosin, the presence of discontinuities in the coiled-coil heptads drive structural changes to flexibility. Other tethering factors may bend through large breaks in coiled-coil structure acting as joints, although it remains to be shown whether and how conformational changes are triggered by Rab binding, as shown for EEA1.

Furthermore, a dynamically flexible coiled-coil is mostly extended, provided its ends are free60. However, when the ends of this coiled coil are tethered, bent, or when torsion is locally applied, compensatory structural changes are propagated and even amplified through the length of the structure. Our results suggest that a change in intrinsic static curvature may contribute but is not the major cause for the reduction in end-to-end distance. However, a more rigorous assessment would require visualizing the thermal fluctuations of the bound and unbound EEA1 very rapidly and in three dimensions.

Force generation due to entropic effects plays a key role in many processes in biology ranging from DNA cytoskeletal filaments to motor proteins. Switching a molecule from stiff to flexible could be an effective and general mechanism of many coiled-coil proteins for generating an attractive force, thereby pulling two objects together or allowing reactions otherwise hindered by polymer rigidity. Future experiments will test to what extent the entropic collapse is a general mechanism used not only by membrane tethers but also in other biological processes.”


The plural of anecdote IS data

Five years ago I wrote a post on the perils of implicating a gene as the cause of a disease because one or two people with the disease had a mutation there (see the bottom). That is now back in spades with a new report from the Exome Aggregation Consortium (ExAC) [ Nature vol. 536 pp. 249, 277 – 278, 285 – 291 ’16 ].

What they did was to aggregate sequence data from 60,704 people on the parts of their genomes coding for the amino acids making up proteins (the exome — The paper has 80+ authors. The data is publicly available and is planed to grow to 120,000 exomes and 20,000 whole genomes in the next year. Both are orders of magnitude larger than any individual exome study so far. So study enough anecdotes (small studies) and pretty soon you have real data

The articles state that over a million people have now had either their exomes or their whole genomes sequenced ! ! !

The amount of variation in the human genome is simply incredible. Some 7,404,909 variants in the exome were described, of which 54% had never been seen before. These account for 1/8 of all the sites in all our exomes, implying that the exome comprises 60 megaBases of the 3200 megaBase human genome (1.8%). Most of the variants were single amino acid changes due changes in a single nucleotide, but there were 317,381 insertions or deletions (95% shorter than 6 nucleotides).

99% of all variants had a frequency of under 1% (e.g. not found in in more than 607 people), with half being found only once in the 60,704. 8% of the sites with variation contain more than one (consistent with what you’d expect of a Poisson distribution).

What is so remarkable is that the average participant has 54 variants previously classified as responsible for a genetic disorder. Not only that 183/192 variants thought to cause a rare hereditary disease were found in many healthy people, implying that they were incidental findings (anecdotes) rather than causal. It shows you what happens when you have adequate data.

They are pretty sure that their work will stand, because the exomes were sequenced many times over (deeply sequenced in the lingo) more than 10x in over 80% of the cohort.

I’d also written earlier about how full of errors our genomes are — see

A lot of the variants produced termination codons in the body of the exome, so a full-length protein couldn’t be produced from the gene (these are called truncation variants) — some 179,774 in the 7,404,909. Most occurred just once. Even so this means that most of the cohort had at least one or two. Even this rather negative knowledge was useful — since we have about 20,000 protein coding genes, they found 3,230 in which truncation variants NEVER occurred, implying that the protein is crucial to survival.


We’ve found the mutation causing your disease — not so fast, says this paper (posted 17 July 2011)

This post takes a while to get to the main points, but hang in there, the results are striking (and disturbing).

First: a bit of history. In the bad old days (any time over about 30 years ago) there was basically only one way to look for a disc in the spinal canal pressing on a nerve producing symptoms (usually pain, followed by numbness and weakness). It was the myelogram, where a spinal tap was done, an oily substance (containing iodine which Xrays don’t penetrate well) was injected into the spinal canal, and Xrays taken. The disc showed up as a defect in the column of dye (not really a dye as any chemist can see). This usually led to surgery if a disc was found, even if it was one or two spinal levels from where clinicians thought it should be based on their examination and other tests such as electromyography (EMG). This was usually put down to anatomic variability. Results were less than perfect.

Myelography was a rather stressful procedure, and I usually brought patients into the hospital the night before, got a cardiogram (to make sure their heart could take it, and that they hadn’t had a silent heart attack). Then the myelography itself, which wasn’t painful as the radiologist put the needle in under fluoroscopy so they could see exactly where to go. However many people got severe post-spinal headaches (invariably doctor’s wives), sometimes requiring a blood patch to plug the hole where the (large) needle used to inject the ‘dye’ went — it had to be large because the ‘dye’ was rather oily (viscous). The bottom line was that you didn’t subject a patient to a myelogram unless they were having a significant problem. Only very symptomatic people had the test, and usually when nonsurgical therapy had been tried and failed.

Fast forward to the MRI (Magnetic Resonance Imaging) era (nuclear magnetic resonance to the chemist, but radiologists were smart enough to get the word nuclear removed so patients would submit to the test). A painless technique, but stressful for some because of the close quarters in the MRI machine. You could look at the whole spinal canal, and see far more anatomic detail, because you actually see the disc (rather than its impression on a column of dye) and the surrounding bones, ligaments etc. etc.

What did we find? There were tons of people with discs where they shouldn’t be (e.g. herniated discs) who were having no problems at all. This led to a lot more careful assessment of patients, with far better correlation of anatomic defect and clinical symptoms.

What in the world does this all have to do with the genetics of disease? Patience; you’re about to find out.

There’s an interesting interview with Eric Lander (of Human Genome Project fame) in the current PNAS (p. 11319). He notes that in 1990 sequencing a single genome cost $3,000,000,000. He thinks that at some time in the next 5 years we’ll be able to do this for $1,000, a 3 million-fold improvement in cost. The genome has around 3,000,000,000 positions to sequence. As things stand now, it’s literally nothing to determine the sequence of a few million positions in DNA.

On to Cell vol. 145 pp. 1036 – 1048 ’11 which sequenced some 9,000,000 positions of DNA. This didn’t make a big splash (but its implications might). Just a single paper, buried in the middle of the 24 June ’11 Cell — it didn’t even rate an editorial. Now, as chemists, if you’re a bit shaky on what follows, all the background you need can be found in the series of articles found here –

As a neurologist, I treated a lot of patients with epilepsy (recurrent convulsions, recurrent seizures). 2% of children and 1% of adults have it (meaning that half of the kids with it will outgrow it, as did the wife of an old friend I saw this afternoon). Some forms of epilepsy run in families with strict inheritance (like sickle cell anemia or cystic fibrosis). 20 such forms have been tied down to single nucleotide polymorphisms (SNPs) in 20 different genes coding for protein (there are other kinds of genes) — all is explained in the background material above). 17/20 of these SNPs are in a type of protein known as an ion channel. These channels are present in all our cells, but in neurons they are responsible for the maintenance of a membrane potential across the membrane, which has the ability change abruptly causing an nerve cell to fire an impulse. In a very simplistic way, one can regard a convulsion (epileptic seizure) as nerve cells gone wild, firing impulses without cease, until the exhausted neurons shut down and the seizure ends.

However, the known strictly hereditary forms of epilepsy account for at most 1 – 2% of all people with epilepsy. The 9,000,000 determinations of DNA sequence were performed on 237 ion channel genes, but just those parts of the genes actually coding for amino acids (these are the exons). They studied 152 people with nonhereditary epilepsy (also known as idiopathic epilepsy) and, most importantly, they looked at the same channels in 139 healthy normal people with no epilepsy at all.

Looking at the 17/237 ion channels known to cause strictly hereditary epilepsy they found that 96% of cases of nonhereditary (idiopathic) epilepsy had one or more missense mutations (an amino acid at a given position different than the one that should be there). Amazingly, 70% of normal people also had missense mutations in the 17. Looking at the broader picture of all 237 channels, they found 300 different mutations in the 139 normals, of which 23 were in the 17. Overall they found 989 SNPs in all the channels in the whole group, of which 415 were nonsynonumous.

Well what about mutational load? Suppose you have more than one mutation in the 17 genes. 77% the cases with idiopathic epilepsy had 2 or more mutations in the 17, but so did 30% of the people without epilepsy at all.

The relation between myelography and early genetic work on disease should be clear. Back then, a lot was taken as abnormal as only the severely afflicted could be studied, due to time, money and technological constraints. As the authors note “causality cannot be assigned to any particular variant”. Many potentially pathogenic genetic variants in known dominant channel genes are present in normals.

What was not clear to me from reading the paper is whether any of the previously described mutations in the 17 are thought to be causative of strictly hereditary epilepsy were present in the 139 normals.

A very interesting point is how genetically diverse the human population actually is (and they only studied Caucasians and Hispanics — apparently no Blacks). No individual was free of SNPs. No two individuals (in the 139 + 152) had the same set of SNPs. Since they found 989 SNPs in the combined group, even in this small sample of proteins (17 of 20,000) this averages out to more than 3 per individual. Well, are there ‘good’ SNPs in the asymptomatic group, and ‘bad’ SNPs in the patients with idiopathic epilepsy? Not really, the majority of the SNPs were present in both groups.

I leave it to your imagination what this means for ‘personalized medicine’. We’re literally just beginning to find out what’s out there. This is the genetic analog of the asymptomatic disc. We may not know all we thought we knew about genetics and disease. Heisenberg must be smiling, wherever he is.

What reading the literature is like when things are barely understood

There is a very exciting paper to be described in a post to appear shortly. I ran a muscular dystrophy clinic for 15 years, and saw lots of Amyotrophic Lateral Sclerosis (ALS) — even though, strictly speaking it is not a muscular dystrophy. The muscular Dystrophy Association was founded by parents of weak children, before we could actually separate motor neuron disease from myopathy. In retirement, I’ve kept up an interest in ALS (particularly since all I could do for patients as a doc was — (drumroll) — basically nothing).

The fact that a fair amount of even sporadic ALS has a problem with a protein called C9ORF72 was particularly fascinating. All this came out less than five years ago (October 2011). Everything is far from clearcut even now.

That being the case, it might be of interest to look at the notes I accumulated as scientists began to explore what was wrong with C9ORF72, how the protein normally does whatever it does (we still don’t know really) and how the mutated product of the gene causes trouble (there are 3 main theories).

What you’ll see in what follows is the heat of scientific battle (warts and all), where things are far from clear. Enjoy. This is basically what used to be called a core-dump (back in the day when computer memory was made of metallic cores). Things are far from cut and dried even now so it might be of interest to see the many angles of attack on the problem, the confusion, the conflicting theories, as things became a bit more clear. It’s the scientific enterprise in action against a very horrible disease (trust me).

I’ll try and clear up the typos. I’ll also try to put the notes on the papers in semi-chronological order, but I make no guarantees. The notes may be incomprehensible, as they include only what I didn’t know rather than all the background needed to understand what’s in them .

First a bit of background — FTD stands for FrontoTemporal Dementia.

The #9p21 chromosomal region is another locus for ALS/FTD. It contains something called C9orf72, which contains a GGGGCC hexnucleotide repeat in the intron between noncoding exons 1a and 1b. Normal alleles contain less than 24 repeats (range 2 – 23). Those with ALS + FTD contain over 30 (actually they think the repeat length is much higher — 700 to 1,600 ! ! !). ORF probably stands for open reading frame.

The expansion is present in 12% of familial FTD and 22.5% of familial ALS — making it the most common genetic abnormality in both conditions. More importantly it is found in 21% of sporadic ALS and 29% of FTD in the Finnish population. Later they say it is the most common genetic cause of sporadic ALS (but only in 4%).

There are 3 possible mechanisms of toxicity
l. The RNA transcribed from the repeat acts as an RNA sponge, binding all sorts of RNAs it shouldn’t
2. Repeat Assoaicted Non-ATG translation (RAN translation) see later
3. Decreased expression of the mRNA for C9ORF72.

[ Science vol. 338 pp. 1282 – 1283 ’12 ] Now 40% of familial ALS, 21% of familial frontotemporal dementia, and 8% of sporadic ALS, 5% of sporadic frontotemporal dementia have expansions in C9orf72.

Not much is known about C9orf72 — it is conserved across species. It contains no previously known protein domains. The expansion leads to loss of one alternatively spliced C9ORF72 isoform (normally 3 isoforms are expressed), and to the formation of nuclear RNA foci (which appear to be composed mostly of the expansion). [ Neuron vol. 79 pp. 416 – 438 ’13 ] The function of C9ORF72 is unknown (8/13).

The current (12/12) thinking is that the repeats produce a glob of RNA which traps RNA binding proteins which have better things to do. The best analogy is myotonic dystrophy in which an expanded 3 nucleotide repeat sequesters muscleblind, an RNA binding protein involved in splicing.

The expansion is present in 46% of familial ALS in Finland and 21% of sporadic ALS there. But Finns are somewhat different genetically. The expansion is found in 1/3 of European ancestry familial ALS.

Interestingly some of the patients with FTD presented with nonfluent progressive aphasia.

[ Cell vol. 152 pp. 691 – 698 ’13, Neuron vol. 77 pp. 639 – 646 ’13 ] The protein aggregates of C9orf72 mutants contain TDP43 inclusions. But they also show additional p62 and ubiquilin positive pathology (with no TDP43 present). The abnormal proteins are due to translation of the expanded GGGGCC repeats (which should be nonCoding as they are in introns). This is an example of Repeat Associated Non-ATG translation (RAN). This was first shown for expanded CAG repeats, which can be translated in all 3 reading frames giving polyGlutamine, polyLysine and polySerine . A minimum of 58 CAG repeats was required for translation.

This work looked for translation of GGGGCC in all 3 reading frames (poly glycine-proline, poly glycine-alanine, polyglycine-arginine. They found that poly glycine-proline was found and in the protein inclusions which were p62 positive and TDP43 negative. Similar inclusions weren’t present in other neurodegenerative diseases, known to have nucleotide inclusions.

[ Proc. Natl. Acad. Sci. vol. 110 pp 7533 – 7534, 7778 – 7783 ’13 ] The expanded C9orf72 repeat is enough to cause neurodegeneration (mammalian neurons, and D. melanogaster). They placed either 3 or 30 copies of GGGCC into an epidermal growth factor vector between the start of transcription and the first ATG codon. The repeat can sequester the RNA binding protein Pur alpha (and other Pur family members). Interestingly, TDP43 didn’t bind to the repeat RNA, nor did hnRNP A2/B1 which binds to fragile X CGG repeat containing RNA. Overexpression of of Pur alpha is able to abort the neurogeneration in the mammalian neuonal cell line (Neuro-2a). So probably the excessive repeat number is acting as an RNA sponge.

Pur alpha is evolutionarily conserved. It controlls the cell cycle and differentiation. It is also a pomonent of the RNA transport granule. It interacts with Pur beta.

30 was as many repeats as they could manipulate experimentally — normals have 2 – 8 repeats, but patients with disease have from 100s to 1,000s of repeats, so the pathogenesis might be different.

[ Neuron vol. 80 pp. 257 – 258, 415 – 428 ’13 ] Expression of C9orf72’s mRNA in frontotemporal dementia/als (FTD/ALS) patients is reduced by 50%, and the expanded repeat and neighboring CgP islands are hypermethylated consistent with transcriptional silencing. Also the cytoplasmic aggregates staining positively for P62 appear to result from protein translation through the hexanucleotide repeat.

This work used induced pluripotent stem cells (iPSCs) derived from C9ALS/FTD patients. They show decreased C9orf72 mRNA, nuclear and cytoplasmic GGGGCC RNA foci, and expression of one RAN product (Gly Pro dipeptide). Neurons derived from the iPSCs also show enhanced sensitvity to glutamic acid excitotoxicity, and a transcriptional profile that ‘partially’ overlaps with transcriptional changes seen in iPSC neurons derived from mutant SOD1 ALS patients.

In addition, some 19 proteins were found which associate with the GGGGCC repeats in vitro. ADARB2 does this and participates in RNA editing.

ASOs (AntiSense OIigonucleotides ??) were used to suppress C9orf72 RNA expression. This led to reversal in many of the phenotypes of the iPSC neurons (suppression of glutamic acid toxicity, reduction in RNA foci formation). This implies that the GGGGCC repeats trigger toxicity through a gain of function mechanism. [ Proc. Natl. Acad. Sci. vol. 110 pp. E4530 – E4539 ’13 ] Nuclear RNA foci containing GGGCC in patient cells (wbc’s fibroblasts, glia, neurons) were ssen in patients with repeat expansion. The Foci weren’t present in sporadic ALS or ALS/FTD caused by other mutations (SOD1, TDP43, tau), Parkinsonism, or nonNeurological controls. Antisense oligonucleotides reduced the GGGGCC containing nuclear foci without alteraling overall C9orf72 RNA levels. SiNRAS didn’t work.

The Rx was applied to living mice and it was well tolerated.

[ Proc. Natl. Acad. Sci. vol. 110 pp E4968 – E4977 ’13 ] C9orf72 antisense transcripts are elevated in the brains of those with the expansion. Repeat expansion GGCCCC RNAs accumulate in nuclear foci in the brain. Sense and antisense foci accumulate in the blood and are potential biomarkers. RAN translation occurs in BOTH sense and antisense expansion transcripts — so all 6 proteins described above are made. The proteins accumulate in cytoplasmic aggregates in affected brain regions (e.g. frontal and motor cortex, spinal cord neurons).

[ Nature vol. 507 pp. 175 – 177, 195 – 200 ’14 ] C9orf72 has repeated hexanucleotide units (GGGGCC). Two or more G quartets stacked on top of one another form a G-quadruplex. In the expanded repeats of C9orf72 in ALS and frontotemporal dementia, stable quadruplexes form in DNA as well as the RNA transcribed from it.

Sequences which can form G-quadruplexes are conserved during evolution, so they presumably are doing something useful. They are found in transcriptional start sites. This work shows that G-quadruplex assembly in DNA increases transcriptional pauses in the expanded repeat (unsurprising). Also the G-quadruplexes in C9orf72 DNA promote the formation of stable R-loops — triple stranded structures that assemble when a newly form RNA transcript exiting RNA polymerase II invades the double helix and binds to one DNA strand, displacing the other. If the R-loops aren’t resolved, they can halt transcriptional elongation.

Not only that, but abortive GGGGCC containing RNAs accumulate in the spinal cord and motor cortex of patients with the expanded repeats. The RNAs are truncated in the GGGGCC region, and the amount is linearly proportional to the length of the hexanucleotide repeat. This explains how they could accumulate along with decreased level of full length C9orf72 mRNA (and presumably the protein made from it).

A ‘few dozen’ proteins binding the GGGGCC repeats have been found. One of them is nucleolin, involved in the formation of the ribosome within the nucleolus It is mislocalized to RNA foci in neurons of the motor cortex of patients with C9orf72 related disease. The lack of mature ribosomes results in the buildup of untranslated mRNA in the cytoplasm.

[ Science vol. 345 pp. 1118 – 1119, 1139 – 1145, 1192 – 1194 ’14 ] Normally the number of GGGGCC repeats in C9orf72 ranges from 2 to 23, with hundreds or even thousands of copies in the disease range. Possibilities
l. Interference with C9orf72 expression — e. g. loss of function
2. Sponging up RNA binding proteins by the transcript
3. Repeat associated non-ATG translation (RAN translation) in all reading frames (sense and antisense).

A series of stop codons in both the sense and antisense RNAs was engineered every 12 repeats, stopping formation of the dipeptide repeat proteins. The new RNAs still formed the G-quadruplexes, and both RNAs formed RNA foci when expressed in cultured neurons.

Putting them into Drosophila showed that the pure repeats able to form dipeptides causing degeneration in the fly eye, while the interrupted constructs (producing RNA only) did not. The same was true when expressed in the nervous systems of adult flies. Blocking translation of the RNA partially suppressed the phenotype.

There are 5 possible dipeptide products of RAN of GGGGCC (GA, GP, PA, GR, PR — G == Glycine, P == Proline, A == Alanine, R = Arginine). Then RNAs using alternate codons for the dipeptides were used (so GGGGCC wasn’t present). Expressing Glycine Arginine (GR) or Proline Arginine (PR) was toxic, Glycine Alanine showing ‘some’ toxicity later in life.

Some RNA binding proteins containing low complexity sequences (aka prion-like domains) — these are FUS, EWSR1, TAF14, hnRNPA2 — form polymeric assemblies, which incorporate into hydrogels in vitro. The assemblies are similar to RNA granules. Many of the RNA binding proteins associating with hydrogels hare serine arginine (SR) sequences. The SR domain proteins are regulated by phosphorylation on serine, also controlling the association with hydrogels. It is hypothesized that the GR and PR transcripts associate with hydrogels (or similar assemblies such as RNA granules), but are impervious to the regulatory action of the kinases (no serine to phosphorylate), so they might clog up the trafficking of SR domain containing RNA binding proteins moving in an out of the granules to transfer information throughout the cell.

[ Neuron vol 84 pp. 1213 – 1225 ’14 ] Proline Arginine dipeptides are neurotoxic. They form aggregates in nucleoli in experimental systems. Nuclear aggregates were also found in postmortem spinal cord from C9ORF72 ALS and ALS/FTD patients. Intronic GGGGCC transcripts are also toxic. Repeat associated non-ATG translation (RAN translation) is thought to depend on RNA hairpin structures using GC pairing.

[ Cell vol. 158 pp. 967 ’14 (abstract of something to appear in Science) ] Peptide translated from GGGGCC expansions containng arginines (Gly Arg and Pro Arg) are harmful — 3 other dipeptide repeats are harmless. The peptides bind to nucleoi and impede RNA biogenesis. Interestingly Ser-Arg repeats proteins (SR proteins) are important in RNA splicing. The GlyARG and PROARG repeat peptides alter splicing of the amino acid transporter EAAT2, similar to that seen in ALS. Interestingly, the peptides are readily taken up by cells in culture, translocating to the nucleus.

Also a small molecule has been developed which targets GGGGCC RNA expansions. It inhibits translation of the dipeptide repeat proteins from the expansions (see Science vol. 353 pp. 64 ****

GlyPro in CSF is a biomarker of ALS patients with the C9orf7s expansion.

The normal function of C9orf72 isn’t known. It is structurally related to DENN (Differentially Expressed in Normal and Neoplastic cells) proteins, which are GDP/GTP exchange factors for Rab GTPases.

At this point it isn’t known if the proteins generated by RAN are toxic. The protein inclusions are present in unaffected areas of the brain (lateral geniculate) as well as the vulnerable areas (cortex, hippocampus).

The initiation of RNA translation is thought to depend on RNA hairpin structures which use C:G complementary pairing. CAG (but not CAA) repeats undergo RAN translation. Protein aggregates occured only in brain intestes despite the fact that C9orf72 is expressed all over the body (but expression is highest in brain).

It is possible that antisense RNA could be formed from the opposite strand (e.g. CCCCGG) giving poly pro-ala, poly pro-gly and poly pro-arg.

[ Science vol. 1106 – 1112 ’15 ] Just expressing 66 GGGGCC repeats without an ATG start codon using an AdenoAssociated Virus (AAV) vector in mice was enough to produce neurodegeneration with RNA foci, inclusoins of poly QP, GA and GR and TDP43 pathology. There was cortical neuron and cerebellar Purkinje cell loss and gliosis.

[ Nature vol. 525 pp. 36 – 37, 56 – 61, 129 – 133 ’15 ] (GGGGCC)30 was expressed in the Drosophila eye. This leads to the rough eye trait and is easily scored, allowing you to look at the effect of other genes on it. Mutations activating RanGAP suppressed rough eyes. RanGAP binds to GGGGCC on the cytoplasmic face of the nuclear pore. Enhancing nuclear import or suppressing nuclear export of proteins also suppressed neurodegeneration. RanGAP physically interacts with the GGGGCC Hexanucleotide Repeat Expansion resulting in its mislocalization. The mislocalization is found in neurons derived from iPSCs from a patient with C9orf72 type ALS, and also in brain tissue from other patients with C9orf72 ALS.

Nuclear import is impaired due to HRE expression (fly and iPSC derived neurons). The defects can be ‘rescued’ by small molecules and antisense oligonucleotides targeting the HRE G-quadruplexes. This may actually be a way to Rx ALS ! ! ! !

Another paper crossed (GGGGCC)58 flies with missing chromosomal segments. They found a variety of nuclear import factors whose inactivation worsened rough eye.

Expression of constructs of in GGGGCC)8, 28 and 58 lacking an AUG start codon in Drosophila was done. The constructs could only produce Repeat Associated NonAUG translation products (e.g. dipeptides). The dipeptides disrupt nuclear import of fluorescent test substrates and of normal nuclear proteins (notably TDP43). In addition RNA export from the nucleus is also compromised. The deleterious effects could be modified by 18 genetic regions (found by large scale unbiased genetic screening). THey coded for components of the nuclear pore complex, nuclear RNA export machinery and nuclear import.

Dipeptides produced from GGGGCC and GGGGCCn’s disrupt the nucleolus, so this may be an additional cause of repeat toxicity.

[ Neuron vol. 88 pp. 892 – 901 ’15 ] A mouse model containng the full human C9orf72 repeat which was either normal (15 repeats) or expanded (100 – 1,000 repeats) — using bacterial artificial chromosomes (BACs) — thes mice are called C9-BACexpanded. They show widespread RNA foci and RAN translated dipeptides. Nucleolin distribution was altered. However the mice showed normal behavior and there was no neurodegenration. This is surprising.

[ Nature vol. 535 p. 327’16 (abstr. of Sci. Transl. Med ’16) ] Mice with mutations diminishing or eliminating the function of C9ORF72 (unknown as of 8/13) developed autoimmune disease.

[ Science vol. 351 pp. 1324 – 1329 ’16 ] Two independent mouse lines lacking the ortholog of C9orf72 (3110043021Rik) in all tissues developed normally and aged without any motor neuron disease. Instead they developed progressive splenomegaly and lymphadenopathy with accumulation of engorged macrophagelike cells. There was age related neuroInflammation similar to C9orf72 ALS but not sporadic ALS. There was no evidence of neurodegeneration however.

[ Neuron vol. 90 pp. 427 – 430, 531 -534, 535 – 550 ’16 ] BAC transgenic mice using patient derived gene constructs expressing (some of? all of?) C9ORF72 are reported.

A germline knockout develops blood abnormalities (splenomegaly, lymphadenopathy and premature death). The data conflict on which of the 5 products of RAN (Repeat Associated NonATG) translation are the most toxic (GP, GA, GR, PA, PA, PR).

In this study, mice with increased levels of repeats (up to 450) showed no evidence of motor neuron disease, and the brain was normal. They at least did have some trouble with cognition.

THe second study put in the full C9 gene with 5′ and 3′ flanking sequences. 4 lines of transgenics with repeats ranging from 37 to 500 were characterized. These mice did have peirpheral and central neurodegeneration, with motor deficits. There was a decrease in cortical neurons, Purkinje cells. This is the first time any transgenic has shown neurodegeneration. The deficits are reversible with antisense oligonucleotides. There was a disparity in disease expression between male and female mice.

RNA foci and DPR (DiPeptide Repeat) proteins don’t accumulate in the most affected brain regions.

[ Science vol. 353 pp. 647 – 648, 708 – 712 ’16 ] Spt4 is a highly conserved transcription elongation factor which regulates RNA polymerase II processivity (along with its binding partner Spt5). Spt4 is required to transcribe long trinucleotide repeats found in open reading frames, or in non protein coding regions of DNA templates (in S. cerevisiae). Mutations of Spt4 decrease synthesis of (and restored enzymatic activity to) expanded polyQ proteins (in yeast) without affecting genes lacking the excessive CAG repeats. It might also work in nonCAG repeats.

Targeting Spt4 (with antiSense oligonucleotides) reduces production of the C9orf72 expansion associated RNA and protein, and helps neurodegeneration in model systems. Repeat expansions are transcribed in both the sense and antisense directions. Yeast Spt4 (human homolog SUPT4H) is a small evolutionarily conserved zinc finger protein which forms a complex with Spt5, which then binds to RNA polymerase II regulating transcription elongation (pol II processivity).

DRB is a RNA polymerase II inhibitor. The complex of Spt4 and Spt5 homologs in man (SUPT4H, SUPT5H) is called DSIF (DRB Sensitivity Inducing Factor)

Depletion of Spt4 or its binding partner (Spt5 ) decreases the number of both sense and antisense repeat transcripts and RNA foci. One of the 6 RAN translation products (polyGlyPro) is substantially reduced by Spt4 depletion.

The study was in human c9ALS fibroblasts. However, side effects are certainly possible — in addition to decreasing the expression of C9ORF72, 95% depletion of SUPT4H1 altered (how?) the expression of another 300 genes. In mice deletion of both copies of SUPT4 is embryonic lethal, but deleting one produced no effects up to 18 months of age.

Time for drug chemists to go to the Multiplex

30 – 40% of all the drugs currently in clinical use are thought to target G Protein Coupled Receptors (GPCRs). Just how many GPCRs inhabit our genome isn’t clear. The latest estimate is 850 which is 4.2% of the 20,077 annotated protein genes we have. That being the case, it behooves drug chemists to know everything about them and how they work.

A recent paper [ Cell vol. 166 pp. 907 – 919 ’16 ] shows that a lot of the old thinking about GPCRs is wrong. Binding of a ligand to a GCPR results in a conformational change in its 7 transmembrane segments, so that the parts inside the cell bind to a heterotrimer of proteins which bind (and hydrolyze) GTP — this is the G protein. So far so good. The trimer splits up into its 3 constituents, unimaginatively called alpha, beta and gamma, each of which can act as a messenger that a ligand from outside the cell has landed on a GPCR, binding to other proteins causing all sorts of effects (e.g. can act as a second messenger)

All good things must end, and termination of GPCR signaling was thought to involve phosphorylation of the intracellular segment of the GPCR, binding of another protein (betaArrestin), removal from the cell membrane (so it can no longer bind its extracellular ligand) and then either destruction or recycling back to the cell membrane. So the old paradigm was betaArrestin binding equals the end of signaling.

It was thought that betaArrestin and the G protein competed for binding to the same intracellular amino acids of the GPCR. Not so says this paper. For some GPCRs both can bind, and signaling can continue, even though the complex of GPCR, G protein and betaArrestin is now inside the cell in an endosome. The complex is called the Multiplex. The examples given are GPCRs for parathyroid hormone (PTH) and Thyroid Stimulating Hormone (TSH). Blurry pictures are given of the complex. GPCRs have been divided into several classes and GPCRs for TSH and PTH are class B GPCRs — which contain a long phosphorylatable tail in the cytoplasm. The G protein binds to these GPCRs by its core region, while betaArrestin binds to the tail. Signaling continues apace.

You are alive because the lipid bilayer of your plasma membrane is asymmetric

You are an organism with trillions of cells. A mosquito bit you depositing millions of viruses in your tissues. The virus can reproduce only within one of your cells and it has exploited all sorts of protein protein chemistry to get in. Antibodies (if you are fortunate enough to have them) can get rid of the extracellular critters. However, 500,000 have made into the same number of your cells, and are merrily trying to reproduce.

How does the asymmetry of the lipid bilayer of your plasma membrane help you survive. If each virus infected cell killed itself before the virus reproduced, you’d survive. Although 500,000 is a large number is is less than 1 millionth of your cell total.

Well you do have intracellular defenses against viruses, called the innate immune system. One of them is a protein with the ugly name of gasdermin D. The activated innate immune system (in the form of inflammatory caspases) cleaves gasdermin. This breaks up the inhibition of the amino terminal part of gasdermin by the carboxy terminal part giving a fragment which binds to one particular membrane component (phosphatidyl serine) which makes up 20% of the inner leaflet of the cell membrane. It then forms a large diameter (to a cell 140 Angstroms is quite large) pore in the cell membrane. No cell can survive this, so it dies, releasing cellular contents (probably some viral components but not fully formed one). For details see [ Nature vol. 535 pp 111 – 116, 153 – 158 ’16 ]

Wait a minute. The toxic gasdermin fragment is also released. So how come it doesn’t kill everything in sight? Because our cellular membranes keep phosphatidyl serine confined to the inner membrane, normal cells don’t show it on their exterior, so they can be bathed in gasdermin with no ill effect. What is responsible for this asymmetry — believe it or not an ATP consuming enzyme called flippase (about this more later) which takes any phosphatidyl serine it finds on the outer leaflet and schleps it back inside the cell.

There is all sorts of elegant chemistry which explains just how gasdermin binds to phosphatidyl serine and none of the many other phospholipids found on the inner leaflet. There is more elegant chemistry explaining how flippase works (see later).

What chemistry cannot explain, is why organisms would ‘want’ an asymmetric membrane. As soon as you get into the function of a particular compound in an organism, chemistry is powerless to tell you why. Nothing else can explain how a given molecule does what it does on the molecular level but that is not enough for a satisfying explanation.

One further explanation before some hard core cellular biochemistry follows (after ***). Our cells are dying all the time. The lining of your gut is replaced every 5 days. Even the longest lasting element of your blood is gone after half a year, and most other elements are turned over at least once a month. When these cells die, they must be cleaned up, without undue fuss (such as inflammation). The cleaners are cells called macrophages. A dying cell releases chemical signals, actually called ‘eat me’, one of which is phosphatidyl serine found on the membrane fragments of a dead cell. The fact that flippases keep it on the inner leaflet means that macrophages won’t attack a normal cell.

Slick isn’t it?


Flippase is a MgATPdependent aminophospholipid translocase. It localizes phosphatidylserine and phosphatidylethanolamine to the inner membrane leaflet by rapidly translocating them from the outer to the inner leaflet against an electrochemical gradient. The stoichiometry between amino phospholipid translocation and ATP hydrolysis is close to one (how will the cell have enough ATP to do anything else?). The flippase is inhibited by high calcium, and by pseudosubstrates such as vanadate, acetylphosphate and para-nitrophenyl phosphate, and by SH reactive reagents such as N-ethylmaleimide and pyridyldithioethylamine (PDA) a specific inhibitor of phospholipid translocation

[ Proc. Natl. Acad. Sci. vol. 109 pp. 1449 – 1454 ’12 ] P4-ATPases are a subfamily of P-type ATPases. They transport aminophospholipids from the exoplasmic to the cytoplasmic leaflet (and are known as flippases). Man has 14 P4-ATPases, expressed in various cell types. They are thought to be similar to the catalytic subunits of the Ca++ ATPase, and the Na, K ATPase, consisting of cytoplasmic, N, P and A domains and a membrane domain made of 10 transmembrane helices (M1 – M10).

[ Proc. Natl. Acad. Sci. vol. 111 pp. E1334 – E1343 ’14 ] The P4-ATPases are thought to resemble the classic P-type ATPase cation pumps — a transmembrane domain of 10 helices and 3 cytoplasmic domains (P for phosphorylation, N for nucleotide binding and A for actuator). ATP8A2 forms an intermediate phosphorylated on aspartic acid (E2P)and undergoes a catalytic cycle similar to the sodium pump (Na+, K+ ATPase). Dephosphorylation of E2P is activated by the transported substrates phosphatidyl serine (PS) and phosphatidyl ethanolamine (PE), similar to the K+ activation of dephosphorylation in the sodium pump.

PE and PS are 10x as large as the cations transported by the sodium pump. This is known as the giant substrate problem. This work shows that isoleucine #364 (mutated in — patients with the ataxia, retardation and dysequilibrium syndrome Eur. J. Hum. Genet. vol. 21 pp. 281 – 285 ’13 aka CAMRQ syndrome ) forms a hydrophobic gate separating the entry and exit sites of PS. I364 likely directs the sequential formation and annihilation of water filled cavities (as shown by molecular dynamics simulations) allowing transport of the hydrophilic phospholipid head group, in a groove outlined by TMs 1, 2, 4 and 6, with the hydrocarbon chains following passively, still in the membrane lipid phase (and presumably outside the channel) — this must disrupt the hell out of the protein as it passes. They call this the credit card model — only the interaction with part of the molecule is important — just as the magnetic stripe is the only important thing about the credit card.