We all know what proteins do. They act as enzymes, structural elements of cells, membrane proteins where drugs bind etc. etc. The background the pure chemist needs for what follows can all be found in the category “Molecular Biology Survival Guide.
We also know that that the messenger RNA for any given protein contains a lot more information than that needed to code for the amino acids making up the protein. Forget the introns that are spliced out from the initial transcript. When the mature messenger RNA for a given protein leaves the nucleus for the cytoplasm where the ribosome translates it into protein at either end it contains nucleotides which the ribosome effectively ignores. These are called the untranslated regions (UTRs). The UTRs at the 3′ end of human mRNAs range in length between 60 and 4,000 nucleotides (average 800). It costs energy to store the information for the UTR in DNA, more energy to synthesize the nucleotides which make it up, even more to patch them together to form the UTR, more to package it and move it out of the nucleus etc. etc.
Why bother? Because the 3′ UTR of the mRNA contains a lot of information which tells the cell how much protein to make, how long the mRNA should hang around in the cell (among many other things). A Greek philosopher got here first — “Nature does nothing uselessly” – Aristotle
Those familiar with competitive endogenous RNA (ceRNA) can skip what follows up to the ****
Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.
Molecular biology is full of such semantic cherry bombs as nonCoding DNA (which meant DNA which didn’t cord for protein), a subset of Junk DNA. Another is the pseudogene — these are genes that look like they should code for protein, except that they don’t because of lack of an initiation codon or a premature termination codon. Except for these differences, they have the nucleotide sequence to code for a known protein. It is estimated that the human genome contains as many pseudogenes (20,000) as it contains true protein coding genes [ Genome Res. vol. 12 pp. 272 - 280 '02 ]. We now know that well over half the genome is transcribed into mRNA, including the pseudogenes.
PTEN (you don’t want to know what it stands for) is a 403 amino acid protein which is one of the most commonly mutated proteins in human cancer. Our genome also contains a pseudogene for it (called PTENP). Interestingly deletion of PTENP (not PTEN) is found in some cancers. However PTENP deletion is associated with decreased amounts of the PTEN protein itself, something you don’t want as PTEN is a tumor suppressor. How PTEN accomplishes this appears to be fairly well known, but is irrelevant here.
Why should loss of PTENP decrease PTEN itself? The reason is because the mRNA made from PTENP, even though it has a premature termination codon, and can’t be made into protein, is just as long, so it also contains the 3′UTR of PTEN. This means PTENP is sopping up microRNAs which would otherwise decrease the level of PTEN. Think of PTENP mRNA as a sponge.
Subtle isn’t it? But there’s far more. At least PTENP mRNA closely resembles the PTEN mRNA. However other mRNAs coding for completely different proteins, also have binding sites in their 3′UTR for the microRNA which binds to the 3UTR of PTEN, resulting in its destruction. So transcription of a completely different gene (the example of ZEB2 is given) can control the abundance of another protein. Essentially its mRNA is acting as a sponge, sopping up the killer microRNA.
It gets worse. Most microRNAs have binding sites on the mRNAs of many different proteins, and PTEN itself has a 3′UTR which binds to 10 different microRNAs.
So here is a completely unexpected mechanism of control of protein levels in the cell. The general term for this is competitive endogenous RNA (ceRNA). Two years ago the number of human microRNAs was thought to be around 1,000 (release 2.0 of miRBase in June ’13 gives the number at 2,555 — this is unlikely to be complete). Unlike protein coding genes, it’s far from obvious how to find them by looking at the sequence of our genome, so there may be quite a few more.
So most microRNAs bind the 3′UTR of more than one protein (the average number is unclear at this point), and most proteins have binding sites for microRNAs in their 3′UTR (again the average number is unclear). What a mess. What subtlety. What an opportunity for the regulation of cellular function. Who is going to be smart enough to figure out a drug which will change this in a way that we want. Absence of evidence of a regulatory mechanism is not evidence of its absence. A little humility is in order.
If this wasn’t a scary enough, consider the following cautionary tale — Nature vol. 505 pp. 212 – 217 ’14. HMGA2 is a protein we thought we understood for the most part. It is found in the nucleus, where it binds to DNA. While it doesn’t transcribe DNA into RNA, it does bind to DNA helping to form a protein complex which binds to DNA which effectively helps promote transcription of certain genes.
Well that’s what the protein does. However the mRNA for the protein uses its 3′ untranslated region (3′UTR) to sop up microRNAs of the let-7 family. The mRNA for HMGA2 is highly overexpressed in human cancer (notably the very common adenocarcinoma of the lung). You can mutate the mRNA for HMGA2 so it doesn’t produce the protein, just by putting a stop codon in it near the 5′ end. Throw the altered mRNA into a tissue culture of an lung adenocarcinoma cell line, and the cell become more proliferative and grows independently of being anchored to the tissue culture plate (e.g. anchorage independence, a biologic marker for cancer).
So what? It means that it is possible that every mRNA for every protein we make is acting as a ceRN A. The authors conclude the paper with ” Such dual-function ceRNA and protein activities necessitate a deeper exploration of the coding genome in biological systems.”
I’ll say. We’re just beginning to scratch the surface. The control mechanisms within the cell continue to amaze (me) by their elegance and subtlety. I doubt highly that we know them all. Yet more reasons that drug discovery is hard — we are mucking about with a system whose workings we only dimly understand.