The old year goes out with a bang

A huge amount of cellular genomics will have to be redone if the following paper is replicated. Remember “Extraordinary claims require extraordinary evidence.” Carl Sagan.

What’s all the shouting about? Normally when you think about messenger RNA (mRNA) as it exists in the cytoplasm after the initial transcript is significantly massaged in the nucleus, you think about the part that codes for amino acids. This ‘coding region’ is the part that is translated into amino acids by the ribosome. But mRNA is invariably larger having nucleotides at each end (3′ and 5′) which have other uses. These are called the 3′ Untranslated Region (3′ UTR) and 5′ Untranslated Region (5′ UTR).

So if you do single cell RNA sequencing (which we can do now) it shouldn’t matter what nucleotide sequence you search for (5′ UTR, 3′ UTR or the coding region) as all mRNA contains one of each.

Not so says this paper [ Neuron vol. 88 pp. 1149 – 1156 ’15 ].

Given the mRNA for a given protein in a single cell, using a probe for the 3’UTR and a probe for the coding sequence should give you the same abundance for both. That’s not what they found at all for single neurons from the brain. In some cases there was much more RNA coding for the 3’UTR than for the coding segment of a given mRNA for a protein. In others there was much less. Even more impressively is that the 3’UTR/(3’UTR + coding) ratio for a given protein varies between different parts of the brain. Obviously this ratio should be .5 given what we knew about mRNA in the past. The ratio has to be between 0 and 1.

Well they looked at a lot of proteins. The did find around 1,400 genes with a ratio of .5 (as expected), but they found 700 showing a ratio of .2 (lots more 3’UTR than coding sequence), and 1,100 showing a ratio of .8. Overall plotting the ratio vs. number of genes with that ratio gives something looking like a bell curve (Gaussian distribution).

It’s long been known that mRNA levels don’t exactly correlate with the levels of proteins made from them. If there’s lots of 3’UTRs around the authors found that there was relatively little protein made from the gene.

A variety of brain atlases have published mRNA abundances for various regions of the brain. If they just used one probe (as they probably did) this is clearly not enough.

The 3’UTRs may be acting as ceRNAs (competitive endogenous RNAs). These have been known for years — I’ve included a post of 3 years ago on the subject (at the end).

So this work (if replicated) throws everything we thought we knew about mRNA into a cocked hat. It’s why I love science, there’s always something really new to think about. Happy New Year !!!

Chemiotics II
Lotsa stuff, basically scientific — molecular biology, organic chemistry, medicine (neurology), math — and music
Why drug discovery is so hard: reason #20 — competitive endogenous RNAs

The chemist will appreciate le Chatelier’s principle in action in what follows. We are far from knowing all the players controlling cellular behavior. So how in the world will we find drugs to change cellular behavior when we don’t know all the things affecting it. The latest previously unknown cellular player to enter the lists are competitive endogenous RNAs (ceRNAs). For details see Cell vol. 147 pp. 344 – 357, 382 – 395 ’11. The background the pure chemist needs for what follows can all be found in the category “Molecular Biology Survival Guide.

Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.

Molecular biology is full of such semantic cherry bombs as nonCoding DNA (which meant DNA which didn’t cord for protein), a subset of Junk DNA. Another is the pseudogene — these are genes that look like they should code for protein, except that they don’t because of lack of an initiation codon or a premature termination codon. Except for these differences, they have the nucleotide sequence to code for a known protein. It is estimated that the human genome contains as many pseudogenes (20,000) as it contains true protein coding genes [ Genome Res. vol. 12 pp. 272 – 280 ’02 ]. We now know that well over half the genome is transcribed into mRNA, including the pseudogenes.

PTEN (you don’t want to know what it stands for) is a 403 amino acid protein which is one of the most commonly mutated proteins in human cancer. Our genome also contains a pseudogene for it (called PTENP). Interestingly deletion of PTENP (not PTEN) is found in some cancers. However PTENP deletion is associated with decreased amounts of the PTEN protein itself, something you don’t want as PTEN is a tumor suppressor. How PTEN accomplishes this appears to be fairly well known, but is irrelevant here.

Why should loss of PTENP decrease PTEN itself? The reason is because the mRNA made from PTENP, even though it has a premature termination codon, and can’t be made into protein, is just as long, so it also contains the 3’UTR of PTEN. This means PTENP is sopping up microRNAs which would otherwise decrease the level of PTEN. Think of PTENP mRNA as a sponge.

Subtle isn’t it? But there’s far more. At least PTENP mRNA closely resembles the PTEN mRNA. However other mRNAs coding for completely different proteins, also have binding sites in their 3’UTR for the microRNA which binds to the 3UTR of PTEN, resulting in its destruction. So transcription of a completely different gene (the example of ZEB2 is given) can control the abundance of another protein. Essentially its mRNA is acting as a sponge, sopping up the killer microRNA.

It gets worse. Most microRNAs have binding sites on the mRNAs of many different proteins, and PTEN itself has a 3’UTR which binds to 10 different microRNAs.

So here is a completely unexpected mechanism of control of protein levels in the cell. The general term for this is competitive endogenous RNA (ceRNA). Two years ago the number of human microRNAs was thought to be around 1,000. Unlike protein coding genes, it’s far from obvious how to find them by looking at the sequence of our genome, so there may be quite a few more.

So most microRNAs bind the 3’UTR of more than one protein (the average number is unclear at this point), and most proteins have binding sites for microRNAs in their 3’UTR (again the average number is unclear). What a mess. What subtlety. What an opportunity for the regulation of cellular function. Who is going to be smart enough to figure out a drug which will change this in a way that we want. Absence of evidence of a regulatory mechanism is not evidence of its absence. A little humility is in order.

Comments

Bryan On December 29, 2015 at 8:08 pm
Permalink | Reply

I’m fairly skeptical of the results. All of the techniques used to quantify RNA levels used in that paper (RNA-seq, qPCR, FISH) all have (likely overlapping) detection biases that could cause some sequences to be detected in higher or lower frequencies that other regions of the same transcript. I’d need to read the paper more closely, but a quick ctrl+f does not show any hits for “false detection,” suggesting that the authors did not consider the false detection rate of their assays and whether they are just looking at the expected number of outliers from a noisy data set.
luysii On December 29, 2015 at 9:45 pm
Permalink | Reply

Bryan — interesting. The fact that the distribution of the 3’UTR/(3’UTR + coding) ratio is nearly Gaussian, implies noisy data. We’ll have to wait and see. In the authors favor is that some values of the ratio are consistently associated with minimal amounts of the protein product of the mRNA
GCC On January 5, 2016 at 12:32 pm
Permalink | Reply

It would have been nice to see some good old-fashioned Northern blots with 3’UTR probes to identify the different sized transcripts with and without the coding sequences. Obviously they couldn’t get single-cell resolution by Northern blot, but they could have at least verified that the 3’UTR-only transcripts were detectable in brain extracts.

Chemiotics II