Category Archives: Molecular Biology

Bad news on the AIDs front

Bad news for those hoping for an AIDs cure. As you know, the active virus (HIV1) has a genome made of RNA. However, thanks to an enzyme it possesses called reverse transcriptase (which has led to Nobel prizes), it copies itself into DNA and integrates into the genome of lymphocytes. There it sits presumably doing nothing, but it’s always capable of activating and producing more infectious virus.

We seem to have fought the virus to a draw, using a cocktail of drugs which attack different aspects — HAART (Highly Active Antiretroviral Therapy). Success is usually considered being unable to detect viral RNA in the blood (see later). However blood cells are short-lived. What about the longer living lymphocytes found in the lymph nodes and spleen.

That’s what was studied in a current paper [ Nature vol. 530 pp. 5` – 45 ’16 ] but in only 3 people. All had no detectable virus in the blood (under 48 copies/milliLiter — an incredibly tiny amount — see later). What they did was to biopsy lymph nodes in the groin on study entry and at 3 and 6 months.

Then they sequenced the genomes of the lymphocytes from the nodes, to study the HIV1 DNA integrated into the genome. They found that the genome changed with time. This is very bad. Why?

Because it implies that, even though you the virus in the blood, the virus was not staying latent in the lymph nodes, but coming out of the lymphocytes and forming infectious virus which then mutated. Subsequently the mutated virus integrated into the genome of another lymphocyte. So even with what we consider excellent control, the virus is not purely latent. Drug resistance could arise from mutations (although they didn’t see it in this study).

Clearly, more people need to be studied this way (but serial biopsies? It will probably be done in prisoners, if such things are still done).

It’s worthwhile thinking about how incredibly selective and accurate our methods of analysis are. 48 copies of the viral RNA per milliLiter of blood is the lower limit of detection. Remember that water has a molecular weight of 18, so a liter of distilled water is 1000 grams / 18 grams = 55.5 Molar. A mole has 6 x 10^23 molecules. A milliLiter is 10^-3 liters. So 1 milliLiter of distilled water has 55 * 6 * 10^23 * 10^-3 == 3 * 10^22 molecules of water in it so the assay is finding 48 or more molecules of HIV1 RNA in the water haystack. Even figuring that the concentration of water in blood is 1/10 that of distilled water, this is still impressive.

smORFs, dwORFs and now uORFs

A recent post described small Open Reading Frames (smORFs) and DWarf Open Reading Frames (DWORFS) — see the link at the bottom. Now it’s time for uORFs (upstream Open Reading Frames). Upstream of what you might ask? Well messenger RNA is grabbed by the ribosome at one end (called the 5′ end). The current thinking was that the ribosome marched along the mRNA from the 5′ to the 3′ direction looking for the sequence Adenine Uridine Guanine (AUG) which codes for methionine. It then begins reading the mRNA 3 nucleotides at a time and tacking amino acids onto the methionine. This is called translating mRNA into protein. What about the 5′ end of the mRNA before the AUG is reached (perhaps hundreds of nucleotides later) — it isn’t translated which is why its called the 5′ UTR (5′ UnTranslated Region). In bacteria its only a few nucleotides, but our 5′ UTRs can have thousands —

Two other terms of art are upstream and downstream. Since the ribosome flows from 5′ to 3′ on mRNA, any nucleotide 5′ to a given point is called upstream, and anything 3′ is called downstream. Logical terminology — what a pleasure.

So a uORF is an upstream Open Reading Frame. Upstream to what? Why to the AUG (the initiator codon). The assumption had always been that since there was no initiator AUG codon on this region — that proteins couldn’t be made from the uORF. Wrong.

This is where [ Science vol. 351 p. 465 aad2867 – 1 –> 9 ’16 ] comes in. It turns out that the ribosome can translate some of these uORFs in protein, and the paper describes a clever technique (called 3T) they developed to find them. One of the problems in finding uORF proteins is that some are quite small, and are missed in the usual protein assays. One uORF from ATF4 contains only3 amino acids which is so small that mass spectrometry can’t see it.

The paper makes the amazing statement that — Nearly half of all mammalian mRNAs harbor uORFs in the 5′ UTRs, and many are initiated with nonAUG start codons. They may be a general mechanism to regulate downstream coding sequence expression and gives two citations that I must have missed in my reading .

For instance Binding immunoglobulin Protein (BiP aka Heat Shock Protein family A member 5 – HSPA5 ) contains uORFs exclusively initiated by UUG and CUG start codons (not AUG).

What might the functions of uORF actually be? The obvious one is that the proteins made from them might actually be doing something. What could a 3 amino acid protein possibly do? Lots. Consider thyrotropin releasing hormone which helps control your thyroid — it is pyroglutamic acid histidine proline. Then there is met-encephalin which has 5 amino acids and is one of the endogenous opiate peptides your brain uses.

Another possibility is that just translating the uORF into protein controls the translation of the protein starting with the AUG codon. This isn’t so far fetched. A recent paper [ Nature vol. 529 pp. 551 – 554 ’16 ] gave a 3 dimensional structure for RNA polymerase II transcribing a DNA template into mRNA. The authoress (Carrie Bernecky) was kind enough to supply the dimensions of the complex when I wrote her. Remember you can consider the DNA double helix as a cylinder 20 Angstroms in diameter. It is roughly 150 x 150 x 160 Angstroms. Figuring 3 stacked nucleotides/10 Angstroms, this is enough to obstruct 45 nucleotides of DNA upstream of the actual start site.

This is just another example of room at the bottom, where all sorts of small molecule metabolites, small RNAs, small DNAs are just being unearthed and their structure determined. For more on this please see the following link

SmORFs and DWORFs — has molecular biology lost its mind?

SmORFs and DWORFs — has molecular biology lost its mind?

There’s Plenty of Room at The Bottom is a famous talk given by Richard Feynman 56 years ago. He was talking about something not invented until decades later — nanotechnology. He didn’t know that the same advice now applies to molecular biology. The talk itself is well worth reading — here’s the link

Those not up to speed on molecular biology can find what they need at — Just follow the links (there are only 5) in the series.

lncRNA stands for long nonCoding RNA — nonCoding for protein that is. Long is taken to mean over 200 nucleotides. There is considerable debate concerning how many there are — but “most estimates place the number in the tens of thousands” [ Cell vol. 164 p. 69 ’16 ]. Whether they have any cellular function is also under debate. Could they be like the turnings from a lathe, produced by the various RNA polymerases we have (3 actually) simply transcribing the genome compulsively? I doubt this, because transcription takes energy and cells are a lot of things but wasteful isn’t one of them.

Where does Feynmann come in? Because at least one lncRNA codes for a very small protein using a Small Open Reading Frame (SMORF) to do so. The protein in question is called DWORF (for DWorf Open Reading Frame). It contains only 34 amino acids. Its function is definitely not trivial. It binds to something called SERCA, which is a large enzyme in the sarcoplasmic reticulum of muscle which allows muscle to relax after contracting. Muscle contraction occurs when calcium is released from the endoplasmic reticulum of muscle.  SERCA takes the released calcium back into the endoplasmic reticulum allowing muscle to contract. So repetitive muscle contraction depends on the flow and ebb of calcium tides in the cell. Amazingly there are 3 other small proteins which also bind to SERCA modifying its function. Their names are phospholamban (no kidding) sarcolipin and myoregulin — also small proteins of 52, 31 and 46 amino acids.

So here is a lncRNA making an oxymoron of its name by actually coding for a protein. So DWORF is small, but so are its 3 exons, one of which is only 4 amino acids long. Imagine the gigantic spliceosome which has a mass over 1,300,000 Daltons, 10,574 amino acids making up 37 proteins, along with several catalytic RNAs, being that precise and operating on something that small.

So there’s a whole other world down there which we’ve just begun to investigate. It’s probably a vestige of the RNA world from which life is thought to have sprung.

Then there are the small molecules of intermediary metabolism. Undoubtedly some of them are used for control as well as metabolism. I’ll discuss this later, but the Human Metabolome DataBase (HMDB) has 42,000 entries and METLIN, a metabolic database has 240,000 entries.

Then there is competitive endogenous RNA –

Do you need chemistry to understand this? Yes and no. How the molecules do what they do is the province of chemistry. The description of their function doesn’t require chemistry at all. As David Hilbert said about axiomatizing geometry, you don’t need points, straight lines and planes You could use tables, chairs and beer mugs. What is important are the relations between them. Ditto for the chemical entities making us up.

I wouldn’t like that.  It’s neat to picture in my mind our various molecular machines, nuts and bolts doing what they do.  It’s a much richer experience.  Not having the background is being chemical blind..  Not a good thing, but better than nothing.

The most interesting thing to an evolutionist is not that APOE4 increases the risk of Alzheimer’s disease

Neurologists were immensely excited by the discovery 25 years ago that the APOE4 variant of APOlipoprotein E increases the risk of Late Onset Alzheimer’s Disease (LOAD). 24,000 papers later (Google Scholar) we still don’t know how it does it. Should all this work have been done ? Of course ! !  Once we know the mechanism(s) by which APOE4 increases Alzheimer’s risk we’ll have new ideas to help us attack.

The APOE gene has 3 variants (alleles) APOE2, 3 and 4. The protein is average sized (299 amino acids). The 3 alleles differ at two positions (amino acids #112 and #158) where either cysteine or arginine can be found. The frequency of APOE4 is 14% in the adult white population, that of E3 is 78% and that of E2 is 8%.

Fascinating as this all is, it’s not what’s interesting from an evolutionary point of view.

[ Proc. Natl. Acad. Sci. vol. 113 pp. 17 – 18, 74 – 79 ’16 ] Postmenpausal longevity in females is not limited to humans. Humans, orcas and pilot whales are the only vertebrate species known to have prolonged postreproductive lifespans. Our fertility ends at about the same age that fertility ends in other female hominids (the great apes). However, apes rarely live into their 40s (even in captivity).

Unlike APOE4, APOE2 and APOE3 protect against late onset Alzheimer’s.

The fascinating point is that APOE2 and APOE3 aren’t found in the great apes. They are a human invention. Now LOAD occurs well past reproduction, so there should be no reason in terms of reproductive success for them to arise and be more common in human populations than the original APOE4.

Even more interesting is some work on another protein CD33, found on immune cells and glia in the brain [ Neuron vol. 78 pp. 575 – 577, 631 – 643 ’13 ] A minor allele (21% frequency in human populations) of CD33 (SNP rs 3865444) protects against Alzheimer’s. The allele is associated with reductions in CD33 expression in microglia, and also with reduction in levels of insoluble Abeta42 in (Alzheimer’s) brain. The numbers of CD33+ microglia correlate with insoluble Abeta42 levels and amyloid plaque burden. So decreasing (or inhibiting) CD33 function might help Alzheimer patients.

Again the protective allele is only found in man. The great apes don’t have it just the major (nonprotective) allele.

Again, there is no way that having the allele directly improves your reproductive success. By the time it is protecting you, you’re infertile.

What in the world is going on? Why did alleles protective against Alzheimer’s arise in two very different proteins in the course of human evolution?

“There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.” — Mark Twain.

The reason these alleles probably arose gets us in to an ancient battle in evolutionary theory — what is the actual unit of selection? It may be the group rather than the individual. Face it, human infants and children are helpless for longer than other primates, and need others to care for them, for at least 5 years. Who better than grandma and grandpa? So the fact that with granny around more children survive to reproduce constitutes group selection (I think).

As Theodosius Dobzhansky said “Nothing in Biology Makes Sense Except in the Light of Evolution”

Maybe there is something to it after all

Nearly 8 years ago I wrote a post (see below) about a rather fantastic paper, that said that in order to turn on gene transcription, the DNA had to be damaged first. This caused all sorts of repair enzymes to rush to the damaged site, opening up the chromatin there and allowing RNA polymerase II (which is large) to get to the DNA and transcription to proceed. I wrote the original author who was Italian, who said he was ill, but nothing further appeared about the idea (as far as I know). Remember what Carl Sagan said “Extraordinary claims require extraordinary evidence.”

Now Science (vol. 351 p. 147 ’16 ) has an abstract of an article in Nat. Commun. 6, 10191 (2015). “Curiously, DNA repair factors have been found associated with tran- scriptionally paused, inducible genes. Bunch et al. show that the activation of paused and inducible genes in human tissue culture cells triggers DNA breaks at the RNA polymerase pause site. The subsequent recruitment and signaling activity of DNA repair factors is critical for DNA repair, release of the RNA polymerase, and the transition to the transcrip- tion elongation phase of gene expression.”

Here’s the relevant portion of the post from 2/08. How about that ! ! ! DNA breaks are even more spectacular than 8 oxo-guanine

An incredible article appeared last month in the journal Science. (see below for the abstract). If it can be verified and if it applies generally, our conception of just how genes coding for protein are turned on will be radically changed (yes, there are many other kinds of genes other than those coding for proteins). If DNA compaction, nucleosomes, histones, lysine methylation and demethylation, the histone code, nuclear hormone receptors (particularly the estrogen receptor), DNA glycosylase and topoisomerase aren’t old friends have a look at the first comment on this post for the background you need (it’s back on the Skeptical Chymist). Don’t worry, there is plenty of chemistry to follow.

Some histone code modifications are reversible, particularly acetylation of the epsilon amino group of lysine. Enzymes acetylating histone lysines are called histone acetylases, those removing it are called histone deacetylatases (HDACs). However, lysine methylation was thought to be permanent until ’04 when several enzymes able to demethylate lysine were found. One such enzyme is called LSD1 (it has nothing to do with the hallucinogen). It removes the two methyl groups from lysine #9 of histone #3 (H3K9me2). If this modification is present on a nucleosome near a gene, the gene is silenced, so the methyls must be removed so the protein it codes for can be made.

The estrogen receptor + estrogen complex bound to the ERE (the estrogen response element – a 15 nucleotide DNA sequence) triggers H3K9me2 removal. The process of demethylation is oxidative (how else would you split a nitrogen to hydrocarbon bond?). Hydrogen peroxide is produced, a loose cannon which oxidizes the juicy electron-rich bases of DNA nearby, forming in particular 8 oxo-guanine, as guanine is the most easily oxidized DNA base. Since 21% of the DNA bases in our genome are guanine, H2O2 doesn’t have far to look. This calls in some fairly heavy artillery (DNA glycosylase to remove the 8 oxo-guanine, topoisomerase IIbeta to unwind the DNA so it can be repaired, the repair enzymes, etc, etc…). Naturally this opens up the compacted DNA structure around the gene allowing RNA polymerase II to do its work transcribing the estrogen responsive gene into mRNA (once the damage is repaired).

So according to this paper, estrogen turns on gene transcription by damaging DNA. This is fantastic (if true). There’s more. The estrogen receptor is but one member of a group of proteins called nuclear hormone receptors. The name comes from the fact that other hormones (progesterone, androgen, thyroid, glucocorticoids, mineralocorticoids) have their own proteins that turn on (or turn off) genes the same way. Subsequently it was found that some vitamin metabolites (vitamin D3, vitamin A) have similar receptors even though they aren’t hormones. The human genome contains 48 such proteins. Less than half of them have known ligands. Those with known ligands have their finger in just about every metabolic pie in the cell.

One final point. It has been estimated that 8-oxoguanine is formed 100,000 times each day in every cell. Perhaps its formation is physiologic rather than pathologic. Where does that leave antioxidant therapy, which has been touted to do everything but cure hemorrhoids? Well, one such trial was done on 29,000 Finnish men at high risk for lung cancer (they were smokers) [New England J. Med. vol. 330 pp. 1029-1035 (1994)] Alpha tocopherol (one antioxidant used in the study) didn’t decrease the incidence of lung cancer, and there was an 18% higher incidence of lung cancer among the men receiving beta carotene (another antioxidant). In medicine, theory is great but data trumps it every time.

Science vol. 301 pp. 202 – 206 ’08, B. Perillo et. al.

Modifications at the N-terminal tails of nucleosomal histones are required for efficient transcription in vivo. We analyzed how H3 histone methylation and demethylation control expression of estrogen-responsive genes and show that a DNA-bound estrogen receptor directs transcription by participating in bending chromatin to contact the RNA polymerase II recruited to the promoter. This process is driven by receptor-targeted demethylation of H3 lysine 9 at both enhancer and promoter sites and is achieved by activation of resident LSD1 demethylase. Localized demethylation produces hydrogen peroxide, which modifies the surrounding DNA and recruits 8-oxoguanine–DNA glycosylase 1 and topoisomeraseIIβ, triggering chromatin and DNA conformational changes that are essential for estrogen-induced transcription. Our data show a strategy that uses controlled DNA damage and repair to guide productive transcription.

An uplifting way to start the New Year

This not a scientific post. Going to a memorial service for an old friend hardly seems like an uplifting way to begin the new year. And yet it was. David and I had been friends since ’58 when we were in the same eating club. He also became an M. D. and unfortunately passed away of a slowly dementing illness, probably Alzheimer’s. As a neurologist I could do nothing for him. What little I did accomplish was discussing the scientific aspects with with his wife, explaining the latest breakthroughs she’d read about (which never were). She was a rock, standing by him until the end. Having taken care of many such patients, and having an uncle die of it, I know just how hard this is.

What in the world could be uplifting about something like this? Seeing how David’s intelligence and personality has now marched on through 4 children and (at least) 4 grandchildren. So in a way he really isn’t gone. What was uncanny was seeing David’s eyes staring at me out of his oldest daughter. It really is remarkable, given what we think we know about genetics, and that 10,000 or so of our 20,000 protein coding genes come from one parent, that an offspring will resemble just one parent and not be an amalgam of both. Perhaps just a few genes determine what we look like.

The grandchildren I talked to ranged in age from about 8 to 17. All were smart and articulate. I tried to push them to use their obvious brains to go into research and perhaps prevent or treat what happened to their grandfather. The littlest one said that he was going to be a particle physicist.

I don’t remember talking religion with David or anyone else back in college. There were devout members of the club who would march in glowing after Sunday church, only to be treated by hungover club mates to a chorus of “Onward Christian Soldiers”. One classmate did become the Lutheran Bishop of Western New York, but he certainly didn’t push his religiosity. The most religious one I do remember became a physics professor at Berkeley.

Of course there were remembrances, that of his oldest daughter being the most interesting (to me). She is a religious Christian who clearly loved her father very much, even though he was a professed atheist, although with a strong sense of right and wrong. They used to argue about the existence or nonexistence of God. She and I agreed that he would never do anything that he thought was wrong, probably one of the reasons I liked him (remember the hungover reprobates of a few paragraphs ago). I suppose his daughter now has the last word, but such an argument really has no end.

It was pretty hard to be a doc back in the 60s and 70s watching good people suffer and die, and still conceive of a benevolent creator. “The Plague” by Camus with its hideous death scene of a child pretty much sums up the argument against one.

And yet, now that we know so much more molecular biology, cellular and organismal biochemistry and physiology, our existence seems totally miraculous. I at least have achieved a sense of peace about illness, suffering and death. These things seem natural. What is truly miraculous is that we are well and functional for so long.

You can take or leave the argument from design of Reverend Paley — here it is

“”In crossing a heath, suppose I pitched my foot against a stone, and were asked how the stone came to be there; I might possibly answer, that, for anything I knew to the contrary, it had lain there forever: nor would it perhaps be very easy to show the absurdity of this answer. But suppose I had found a watch upon the ground, and it should be inquired how the watch happened to be in that place; I should hardly think of the answer I had before given, that for anything I knew, the watch might have always been there. … There must have existed, at some time, and at some place or other, an artificer or artificers, who formed [the watch] for the purpose which we find it actually to answer; who comprehended its construction, and designed its use. … Every indication of contrivance, every manifestation of design, which existed in the watch, exists in the works of nature; with the difference, on the side of nature, of being greater or more, and that in a degree which exceeds all computation.”

The more chemistry and biochemistry I know about what’s going on inside us, the harder I find it to accept that this arose by chance.

This does not make me an anti-evoloutionist. One of the best arguments for evolution, is the evidence for descent with modification, one of its major tenets. The fact that we can use one of our proteins to replace one on yeast using our present genetic technology is hard to explain any other way.

Actually to me now, the existence or nonexistence of a creator is irrelevant. The facts of how we are built is not something you need faith about. The awe about it all comes naturally the more we know and the more we find out.

The old year goes out with a bang

A huge amount of cellular genomics will have to be redone if the following paper is replicated. Remember “Extraordinary claims require extraordinary evidence.” Carl Sagan.

What’s all the shouting about? Normally when you think about messenger RNA (mRNA) as it exists in the cytoplasm after the initial transcript is significantly massaged in the nucleus, you think about the part that codes for amino acids. This ‘coding region’ is the part that is translated into amino acids by the ribosome. But mRNA is invariably larger having nucleotides at each end (3′ and 5′) which have other uses. These are called the 3′ Untranslated Region (3′ UTR) and 5′ Untranslated Region (5′ UTR).

So if you do single cell RNA sequencing (which we can do now) it shouldn’t matter what nucleotide sequence you search for (5′ UTR, 3′ UTR or the coding region) as all mRNA contains one of each.

Not so says this paper [ Neuron vol. 88 pp. 1149 – 1156 ’15 ].

Given the mRNA for a given protein in a single cell, using a probe for the 3’UTR and a probe for the coding sequence should give you the same abundance for both. That’s not what they found at all for single neurons from the brain. In some cases there was much more RNA coding for the 3’UTR than for the coding segment of a given mRNA for a protein. In others there was much less. Even more impressively is that the 3’UTR/(3’UTR + coding) ratio for a given protein varies between different parts of the brain. Obviously this ratio should be .5 given what we knew about mRNA in the past. The ratio has to be between 0 and 1.

Well they looked at a lot of proteins. The did find around 1,400 genes with a ratio of .5 (as expected), but they found 700 showing a ratio of .2 (lots more 3’UTR than coding sequence), and 1,100 showing a ratio of .8. Overall plotting the ratio vs. number of genes with that ratio gives something looking like a bell curve (Gaussian distribution).

It’s long been known that mRNA levels don’t exactly correlate with the levels of proteins made from them. If there’s lots of 3’UTRs around the authors found that there was relatively little protein made from the gene.

A variety of brain atlases have published mRNA abundances for various regions of the brain. If they just used one probe (as they probably did) this is clearly not enough.

The 3’UTRs may be acting as ceRNAs (competitive endogenous RNAs). These have been known for years — I’ve included a post of 3 years ago on the subject (at the end).

So this work (if replicated) throws everything we thought we knew about mRNA into a cocked hat. It’s why I love science, there’s always something really new to think about. Happy New Year !!!

Chemiotics II
Lotsa stuff, basically scientific — molecular biology, organic chemistry, medicine (neurology), math — and music
Why drug discovery is so hard: reason #20 — competitive endogenous RNAs

The chemist will appreciate le Chatelier’s principle in action in what follows. We are far from knowing all the players controlling cellular behavior. So how in the world will we find drugs to change cellular behavior when we don’t know all the things affecting it. The latest previously unknown cellular player to enter the lists are competitive endogenous RNAs (ceRNAs). For details see Cell vol. 147 pp. 344 – 357, 382 – 395 ’11. The background the pure chemist needs for what follows can all be found in the category “Molecular Biology Survival Guide.

Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.

Molecular biology is full of such semantic cherry bombs as nonCoding DNA (which meant DNA which didn’t cord for protein), a subset of Junk DNA. Another is the pseudogene — these are genes that look like they should code for protein, except that they don’t because of lack of an initiation codon or a premature termination codon. Except for these differences, they have the nucleotide sequence to code for a known protein. It is estimated that the human genome contains as many pseudogenes (20,000) as it contains true protein coding genes [ Genome Res. vol. 12 pp. 272 – 280 ’02 ]. We now know that well over half the genome is transcribed into mRNA, including the pseudogenes.

PTEN (you don’t want to know what it stands for) is a 403 amino acid protein which is one of the most commonly mutated proteins in human cancer. Our genome also contains a pseudogene for it (called PTENP). Interestingly deletion of PTENP (not PTEN) is found in some cancers. However PTENP deletion is associated with decreased amounts of the PTEN protein itself, something you don’t want as PTEN is a tumor suppressor. How PTEN accomplishes this appears to be fairly well known, but is irrelevant here.

Why should loss of PTENP decrease PTEN itself? The reason is because the mRNA made from PTENP, even though it has a premature termination codon, and can’t be made into protein, is just as long, so it also contains the 3’UTR of PTEN. This means PTENP is sopping up microRNAs which would otherwise decrease the level of PTEN. Think of PTENP mRNA as a sponge.

Subtle isn’t it? But there’s far more. At least PTENP mRNA closely resembles the PTEN mRNA. However other mRNAs coding for completely different proteins, also have binding sites in their 3’UTR for the microRNA which binds to the 3UTR of PTEN, resulting in its destruction. So transcription of a completely different gene (the example of ZEB2 is given) can control the abundance of another protein. Essentially its mRNA is acting as a sponge, sopping up the killer microRNA.

It gets worse. Most microRNAs have binding sites on the mRNAs of many different proteins, and PTEN itself has a 3’UTR which binds to 10 different microRNAs.

So here is a completely unexpected mechanism of control of protein levels in the cell. The general term for this is competitive endogenous RNA (ceRNA). Two years ago the number of human microRNAs was thought to be around 1,000. Unlike protein coding genes, it’s far from obvious how to find them by looking at the sequence of our genome, so there may be quite a few more.

So most microRNAs bind the 3’UTR of more than one protein (the average number is unclear at this point), and most proteins have binding sites for microRNAs in their 3’UTR (again the average number is unclear). What a mess. What subtlety. What an opportunity for the regulation of cellular function. Who is going to be smart enough to figure out a drug which will change this in a way that we want. Absence of evidence of a regulatory mechanism is not evidence of its absence. A little humility is in order.

It ain’t the bricks it’s the plan — take II

A recent review in Neuron (vol. 88 pp. 681 – 677 ’15) gives a possible new explanation of how our brains came to be so different from apes (if not our behavior of late).

You’ve all heard that our proteins are only 2% different than the chimp, so we are 98% chimpanzee. The facts are correct, the interpretation wrong. We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

This is like saying Monticello and Independence Hall are just the same because they’re both made out of bricks. One could chemically identify Monticello bricks as coming from the Virginia piedmont, and Independence Hall bricks coming from the red clay of New Jersey, but the real difference between the buildings is the plan.

It’s not the proteins, but where and when and how much of them are made. The control for this (plan if you will) lies outside the genes for the proteins themselves, in the rest of the genome (remember only 2% of the genome codes for the amino acids making up our 20,000 or so protein genes). The control elements have as much right to be called genes, as the parts of the genome coding for amino acids. Granted, it’s easier to study genes coding for proteins, because we’ve identified them and know so much about them. It’s like the drunk looking for his keys under the lamppost because that’s where the light is.

We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

All the molecular biology you need to understand what follows is in the following post —

The neuron paper is detailed and fascinating to a neurologist, but toward the end it begins to fry far bigger fish.

Until about 10 years ago, molecular biology was incredibly protein-centric. Consider the following terms — nonsense codon, noncoding DNA, junk DNA. All are pejorative and arose from the view that all the genome does is code for protein. Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein. Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).

Well all that has changed. The ENCODE Consortium showed that well over half (and probably all) our DNA is transcribed into RNA — for details see This takes energy, and it is doubtful (to me at least) that organisms would waste this much energy if the products were not doing something useful.

I’ve discussed microRNAs elsewhere — for details please see — They don’t code for protein either, but control how much of a given protein is made.

The Neuron paper concerns lncRNAs (long nonCoding RNAs). They don’t code for protein either and contain over 200 nucleotides. There are a lot of them (10,000 – 50,000 are known to be expressed in man. Amazingly 40% of them are expressed in the brain, and not just in adult life, but during embryonic development. Expression of some of them is restricted to specific brain areas. It is easier for an embryologist to tell what type a cell is during brain cortical development by looking at the lncRNAs expressed than by the proteins a given cell is making. The paper contains multiple examples of the lncRNAs controlling when and where a protein is made in the brain.

lncRNAs can contain multiple domains, each of which has a different affinity for a particular RNA (such as the mRNA for a protein), or DNA, or protein. In the nucleus they influence the DNA binding sites of transcription factors, RNA polymerase II, the polycomb repressor complex. The review goes on with many specific examples of lncRNA function — synaptic plasticity, neurotic extension.

Getting back to proteins, the vast majority are nearly the same in all mammals (this is where the 2% Chimpanzee argument comes from). Here is where it gets interesting. Roughly 1/3 of lncRNAs found in man are primate specific. This includes hundreds of lncRNAs found only in man. The paper gives evidence that hundreds of them have shown evidence of positive selection in humans.

So the paper provides yet another mechanism (with far more detail than I’ve been able to provide here) for why our brains are so much larger, and different in many ways than our nearest evolutionary ancestor, the chimpanzee. This is the largest molecular biological difference found so far for the human brain as opposed to every other brain. Fascinating stuff. Stay tuned. I think this is a watershed paper.

A new form of matter?

Has cellular biology and biochemistry shown us a new form of matter? It’s certainly something I never studied in PChem back in the day. It goes by multiple names, and may be more than one thing.

Start with the nucleolus — it’s been known for years, a visible agglomeration of proteins and RNA in the nucleus, not bound by a membrane. Then there is the processing body (aka stress granule), also made of proteins and RNA (but different ones — transcription factors and mRNAs). Then there is the nuclear pore, made of ‘low complexity sequence tails of proteins surrounding the pore (mostly phenylalanine glycine repeats — aka FG repeats) thought to form a barrier to protein movement through the pore. Then there are RNA granules – said to occur by a phase transition to a hydrogel-like phase (whatever that is). Neurologists have long been interested in FUS/TLS a protein which is mutated in some forms of Amyotrophic Lateral Sclerosis and dementia.

I do think that we’re at the blind men and the elephant stage trying to sort all this out (which, of course, makes it fascinating and a fit subject for scientific work — apologies Wittgenstein — “What we cannot speak about we must pass over in silence”

So in what follows you’ll find a lot of information about these matters, which does not have a neat and tidy explanation. This is what science looks like when it’s being done.

[ Cell vol. 149 pp. 753 – 767, 768 -779 ’12 ] RNA granules don’t just occur in dendrites — they are found in (1) germ cell P granules of C. elegans embryos (2) polar granules of Drosophila embryos (3) stress grnules appearing in cultured yeast and mammalian cells on nutrient deprivation or other forms of metabolic stress (4) neuronal granules transporting mRNAs to dendrites.

Unsurprisingly, the granule contains RNA binding proteins (with KH or RNA Recognition motif (RRM) domains). These domains allow proteins containing them to recognize 3’ untranslated regions of target mRNA in a sequence specific manner (really?).

This work shows that structures resembling RNA granules can be reversibly aggregated and disaggregated in a soluble cellfree system in response to a small molecule (a biotinylated isoxazole ) The proteins in the granules contain low complexity sequences (LC sequences). which show little diversity in their amino acid composition (which is usually repetitive). One example is the leucine rich domain. LC sequences are all you need for aggregation by the isoxazole. The domains undergo a concentration dependent phase transition to a hydrogel-like state with no chemical present?? The hydrogels are made of uniformly polymerized amyloidlike fibers. The fibers form and dissolve and don’t cause trouble (unlike classic amyloid).

LC sequences are particularly enriched in RNA and DNA binding proteins. FUS (FUsed in Sarcoma) is an RNA binding protein containing an LC domain (Gly/Ser Tyr Gly/Ser repeats). Hydrogel droplets formed from the LC sequence of FUS can retain proteins containing either the FUS LC sequence or other LC sequences.

This work finds a potential use for LC sequences — they allow the movement of regulatory proteins into and out of organized subcellular domains, via reversible polymerization into dynamic amyloidlike fibers. It’s possible that something similar occurs in Cajal bodies, nuclear speckles and nuclear factories involved in RNA splicing.

[ Proc. Natl. Acad. Sci. vol. 99 pp. 13583 – 13588 ’02 ] They range in size from 2000 Angstroms to several microns. None of them are bounded by a membrane. It is thought that the same processes leading to the formation of nuclear bodies (e.g. a phase transition) is responsible for similar bodies occuring in the cytoplasm) — e.g. P bodies (Processing bodies), stress granules.

Each type is identified immunologically by antibodies against its components (either signature proteins or ribonucleoproteins or even small nuclear RNAs. They include
l. The Cajal body (the coiled body)
2. The promyelocytic body (PML body, POD)
3. Splicing related bodies
a. SC35 speckles (interchromatin granule cluster)
4. The GEM body
5. The matrix associated deacetylase body
6. HAP body
7. nucleoli associated paraspeckles
8. Nucleoli themselves.

The integrity of a nuclear body can be disrupted after depletion of its normal components — PODs are disrupted in acute PML.

The Cajal body and GEM are colocalized, but otherwise there doesn’t seem to be much association among the different nuclear bodies.

[ Cell vol. 162 pp. 1066 – 1077 ’15 ] FUS forms liquid compartments at sites of DNA damage in the nucleus and in the cytoplasm on stress. With time liquid droplets of FUS convert with time to an aggregated state, a conversion accelerated by mutations (in the prionlike domain) derived from patients.

Why is the compartment called liquidlike? FUS molecules rapidly rearrange within the compartment. The comaprtments formed by FUS are spherical. Two FUS compartments can fuse and relax into one sphere.

FUS compartments belong to a set of RNA protein compartments (P granules, nucleoli) which ‘probably’ form by liquid liquid demixing (phase separation) from cytoplasm.

The conversion between a liquid to a solidlike state is concentration dependent, and mutations blocking nuclear localization sequence (NLS) functgion produce increased concentrations in the cytoplasm with aggregation.

The prionlike domain of FUS is intrinsically disordered.

[ Neuron vol. 88 pp. 678 – 690 ’15 ] Mutations in a bunch of RNA binding proteins (TDP43, FUS, ataxin2, hnRNPA1, hnRNPB2) are associated with ALS/FTD (Amyotrophic Lateral Sclerosis/FrontoTemporal Dementia). Poorly soluble assemblies of the mutant RNA binding protein are found in the nucleus and cytoplasm in the patients.

The assemblies differ from amyloids in the following ways
l. They are soluble in urea
2. They have low beta sheet content
3. They have a mixed granular/fibrillar appearance on EM
4. They don’t bind dyes diagnostic for amyloid (e.g. thioflavin T)
5. When fluorescently labeled, they don’t show the reductions in in vivo fluorescent lifetimes typical of conventional amyloid.

This work shows that the LC domain (Low Complexity domain) of normal FUS undergoes phase transitions, reversibly shifting between dispersed liquid droplets and hydrogel-like phases (defined how). FUS mutants limit the ability to shift between phases, instead increasing the propensity of FUS to condense into poorly soluble stable (e.g. irreversible) fibrillar hydrogel-like assemblies (e.g. a new type of phase. Spontaneous occurrence of this might explain sporadic ALS/FTD with FUS pathology even when no mutations are present. These assemblies selectively entrap other ribonucleoproteins, impair local RNP granule function and decrease new protein synthesis in axon terminals of cultured neurons. The work was done in C. elegans.

“The biophysics of conversion from liquid droplet to reversible hydrogel is not yet clear”. Thw two differ only slightly in viscosity.

The FG repeats (phenylalanine, glycine repeats) of nucleoporins show structural characteristics typical of natively unfolded proteins (e.g. highly flexible proteins lacking ordered secondary structure). They can be quite long (200 – 700 amino acids in yeast). Protease sensitivity shows that most FG repeat containing nucleoporins are disordered in situ within the nuclear pore complexes of purified yeast nuclei. This makes it likely that they form a meshwork of random coils at the pore through which nuclear transport proceeds. Natively unfolded proteins show the following biochemical features

l. multiple domains allowing simultaneous interactions with multiple binding partners
2. nonrigid binding domains that can accomodate a variety of interacting partners
3. fast molecular association and dissociation rates.

Another model has FG domains interacting with each other in the pore to form a protein meshwork which acts as a separate hydrophobic phase. Transport complexes can partititon into this phase because they can bind to the GF repeats. Proteins unable to bind to the FG repeats are excluded from the hydrophobic phase. Molecules below 30 – 40 kiloDaltons get through the water filled holes in the gel.

To get through the pore a midsize protein must recruit a large receptor to pass through a narrow channel. The receptors replace the FG – FG binding of the nups with each other by binding to themselves — they essentially dissolve into the gel.

An alternate view holds that FG repeats form a network of unlinked polymers whose thermally activated undulations create a zone of ‘entropic exclusion’. The entropic penalty in collapsing the chains allows a barrier to form. However by binding to the repeats, carriers can circumvent the exclusion — replacing one type of bond with another.

There are several models for the FG repeats in the nuclear pore. The most convincing (to me) is the ‘selective phase’ model — a sievelike meshwork is formed within the NPC via interactions between FG repeats. The size of the FG mesh determines the upper limits of the diffusion gate (e.g. — the molecules getting through without help — in this case under 30 kiloDaltons). The binding of nuclear transport receptors (NTRs) to the FG repeats is proposed to locally dissolve the FG-FG network, allowing passage of whatever is bound to the NTRs.

‘Sufficiently concentrated’ solutions of cohesive FG domains spontaneously form FG hydrogels (which excludes inert molecules over 50 Angstroms in diameter ). Cargo NTR complexes migrate into such hydrogels ‘up to’ 20,000 times faster than the respective cargoes alone. The intragel diffusion rate of a typical importinBeta:cargo complex predicts a similar NPC passage time (10 milliSeconds) as was actually ssen in living NPCs.

The FG repeat domain of the yeast nucleoporin Nsp1 forms a hydrogel-like structure in vitro which requires hydrophobic interactions between the aromatic rings of the phenylalanines. This work assembled FG hydrogels in vitro, and studied protein entry into them and diffusion through them usingfluorescence microscopy. The influx of various nuclear transport receptors of the importin beta family into the Nsp1 FG hydrogel was 1000 times faster than the entry of a control protein. Access of a model cargo bound to importin beta was accelerated by over 20,000 fold (compared to free cargo). However, not every FG hydrogel shows selectivity. To achieve selective permeability the total FG concentration within the gel had to be raised above 50 milliMolar. This has led the authors to introduce the concept of the saturated hydrogel, in which all the FG repeats must extend completely and undergo a maximum number of interactions. It seems likely (to the authors of the editorial not the authors of the paper) that newly made FG proteins would immediately curl up and form intramolecular FG bridges (rather than intermolecular ones) In vitro gel formation can only be induced from lyophilized proteins under extreme pH and salt. The authors suggest that nuclear transport receptors act as chaperones preventing intramolecuular FG interactions after synthesis. Under more physiologic conditions, the FG domain of Nsp1 formed neither homo nor heterotypic interactions with other FG nucleoporins.

FG repeat domains (they contain a hydrophobic patch, usually FG, FxFG, or GLFG, surrounded by more hydroplic spacers) account for 12 – 20% of the mass of a nuclear pore complex. Up to 50 FG repeat domains may occur in a single protein. FG repeats occur in various flavors — examples are FxFG repeats

So there you have it — quite a mess. Figure it out and get on the boat to Sweden

Man’s best friend

I usually pay little attention to animal models of neurologic disease. After all, our brain is what separates us from animals (recent human behavior excepted). Neuromuscular disease is different because our peripheral nerves and muscles work the same way as animals. An astounding paper from Harvard and Brazil, gives us an entirely new angle to treat muscular dystrophy, particularly the Duchenne form. I ran a muscular dystrophy clinic for 15 years in the 70s and 80s and haplessly watched young boys deteriorate and die from Duchenne. The major therapeutic advance during that time was — hold your breath — lighter weight braces, allowing the boys to stay out of wheelchairs a bit longer.

Some background for those who don’t know, the molecular defect in Duchenne was found in ’87. Interestingly Kunkel, one of the authors on the original paper [ Cell vol. 51 pp.; 919 – 928 ’87 ] is an author on the present one [ Cell vol. 163 pp. 1204 – 1213 ’15 ]. Duchenne dystrophy affects only males, as the gene for the protein (dystrophin) is found on the X chromosome, so women with a normal X and a mutant X escape. To show how pathetic things were back then, we tried to find out if a sister of a patient was a carrier. How did we do it. By measuring an enzyme released by damaged muscle (CPK) on several occasion. Carriers often showed an elevation.

The mutated protein is called dystrophin. It hooks the contractile apparatus of a muscle cell to the membrane. Failure of this makes muscle cells more fragile when they contract resulting in eventual loss. From a molecular biological point of view the protein is fascinating. The gene is one of largest known, stretching over 2,220,233 positions (nucleotides) on the X chromosome and containing 79 exons. Figuring a transcription rate of 100 nucleotides a second, it takes 6 hours to make the messenger RNA (mRNA) for it. The protein has 3,685 amino acids and figuring a translation rate of 3 – 6 amino acids/second it takes 10 minutes for the ribosome to make it. Given that it takes only 3 nucleotides to code for an amino acid, the protein coding part of the gene takes up only .5% of the gene. Correctly splicing out the introns is a huge task, which we all perform well. This size and complexity of the gene explains why mutations are so common, making it the most common form of hereditary muscular dystrophy (most are).

There are currently all sorts of efforts underway to correct the mutation, particularly in a milder form called Becker dystrophy. Derek has covered them and they constitute a logical direct attack on the pathology.

What is so remarkable about the current Cell paper is that it gives us an entirely new and different way to attack Duchenne (and possible all forms of muscular dystrophy). It involves a colony of dogs in Brazil. They have GRMD (Golden Retriever Muscular Dystrophy) with a mutation in one of the many splice sites in dystrophin (it has 79 exons in man) leading to a premature stop codon and no functional dystrophin in the dogs’ muscles. The animals weaken and become non ambulatory with a shortened lifespan. However, a few of the dogs in the colony seemed pretty normal. So they went to work. The obvious reason was that gene was in some way repaired so the animals had normal amounts of dystrophin. Not so, even though ambulatory, the animals’ muscles had no dystrophin. So the whole genome was sequenced. What they found was that a mutation at an upstream site of a protein called Jagged1 lead to increased transcription of the gene and increased levels of the protein.

Jagged1 is a protein ligand for the Notch system of receptors. The Notch system is important in muscle regeneration. The myoblasts of the animals had more proliferative capacity. The Notch system is far too complicated to go into here —, but expect to see a lot more research money pumped into it.

What I find so fabulous about this paper, is that it gives us an entirely new way of thinking about Duchenne, totally unrelated to the genetic defect, which had been our focus up to now. It also rubs our noses in how little we understand about our molecular biology and cell physiology. If we really understood things, we’d have been focused on Notch years ago. Yet another reason drug discovery is so hard. We are trying to alter a system we only dimly understand.


Get every new post delivered to your Inbox.

Join 85 other followers