Tag Archives: mRNA

Forgotten but not gone — take III

It’s pretty clear that life originated in the RNA world.  Consumed by thinking of proteins, enzymes, DNA etc. we tend to forget that there is a lot of RNA out there doing things we didn’t suspect.  Here are two more examples, one of which may explain why even genes coding  for proteins are relatively free of codons transcribed into amino acids.  The champ of course is dystrophin, discussed in the last post — https://luysii.wordpress.com/2019/05/05/duchenne-muscular-dystrophy-a-novel-genetic-treatment/.  The gene is a monster with  2,220,233 nucleotides coding for just 3,685 amino acids, meaning that less than 1/200th of the gene is actually coding for protein. The work below should make us think about just what else the 199/200th of dystrophin might be doing,

Unsuspected use of RNA #1.   [ Neuron vol. 102 pp. 507 – 509, 553 – 563 ’19 ]  The Tumor protein p53 inducible nuclear protein 2 (Tp53inp2) gene codes for a low complexity protein of 222 amino acids, all in one exon.  However the ‘3 untranslated region (3’UTR)  of the RNA for it is nearly 5 times longer (3,121 nucleotides) vs. 666 amino acid coding nucleotides.  The protein is made from the mRNA in some cells, but not in sympathetic neurons, even though the mRNA for Tp53inp2 is the most abundant RNA in the axons of these neurons.

Why do animals lick their wounds?  Because their saliva contains nerve growth factor (NGF) among other things.  NGF is crucial for the growth of sympathetic neuron axons, and their very survival in embryonic life.  It is a protein, which binds to a receptor for it (TrkA) on the axon membrane.  The receptor/NGF complex is then internalized and transported back to the nucleus turning on the genes necessary for axon growth and cell survival.

Even though the mRNA for Tp53inp2 is NOT translated into protein in the axon, it is crucial for the internalization of TrkA/NGF.

People have studied proteins whose function it is to bind RNA for years.  They are called RBPs (RNA Binding Proteins), and our genome has 750 of them.  200 RBPs are associated with genetic disease.  This work turns everthing on its head.  Here is an RNA whose function it is to bind a protein (e.g. TrkA).

How many more mRNAs have nonCoding (for protein) parts with other functions?

Unsuspected use of RNA #2. Circular RNAs had been missed for years (although known since 1976).  The classic sequencing methods isolate only RNAs with characteristic tails (such as polyAdenine).  Circular RNAs don’t have any.    They are formed by back splicing of 3′ end of exon N to the 5′ end of exon N.  Fortunately this is only 1% as efficient as the normal way.

So what?  Circular RNAs are crucial in the innate immune response to microbial invaders.  Double stranded DNA belongs inside the nucleus.  When it gets into the cytoplasm when some organism brings it there,it binds to Protein Kinase R (PKR) activating it so it phosphorylates eukaryotic initiation factor 2 (eiF2) bringing protein synthesis to a screeching halt.

This means that the cell needs a mechanism to keep PKR quiet.  This is where circular RNAs come in   [ Cell vol. 177 pp. 797 – 799, 865 – 880 ’19 ].  If the nucleotides in the circle can reach across the circle and base pair with each other forming a duplex of any length, it will bind to PKR inhibiting it.  Most circular RNAs are expressed at only a handful of copies/cell, the cell containing just 10,000 of them.

The work found that overexpression of a single circular RNA able to form duplexes (dsRNA) inhibits PKR.  Over expression of linear RNA of the same sequence does not, nor does overexpression of circular RNA which can’t form dsRNA.

So when an invader with dsDNA or dsRNA gets into the cell, RNAase L, a cytoplasmic endonuclease is activated, cleaving circular RNA, and uninhibiting PKR.

So it’s back to the drawing board for mRNA and those parts (introns, 3’UTRs) we didn’t think were doing anything.  Perhaps that’s why there are so many of them, and why they take up more room in mRNA and genes than the ones coding for amino acids.  Also it’s time to look at RNAs as protein binders and modifiers, rather than the other way around as we have been doing.

Here’s a link to an earlier member of the series — https://luysii.wordpress.com/2019/04/15/forgotten-but-not-gone-take-ii/xa

Another fail safe mechanism used by the cell — readthrough

Nothing is perfect in this world, not even the translation of mRNA into protein. The error rate is one amino acid misincorporated into a protein for every 10,000 or so done correctly — but these results are for one celled organisms (E. Coli, yeast). I can’t find a number for mammals, primates etc. etc.

This means that occasionally one of the 3 codons which tell the ribosome to quit (stop codons), will be misread as an amino acid. This is called readthrough, and means that the ribosome will merrily march on producing a much larger protein than coded for by the mRNA until one of two things happens. l. the ribosome reaches the end of the mRNA and stops. 2. the mRNA contains another stop codon (there are 3). The probability of this is 3/64 per codon. If stop codons are randomly distributed (which they are most certainly not in the protein coding segment of an mRNA) the chances of 100 codons in a row not containing a stop codon is under 1% (.822 % to be exact). So any protein containing more than 100 amino acids is a statistical freak in this sense. Since the 3′ untranslated region (3’UTR) of mRNA doesn’t code for protein, they should have stop codons randomly distributed (there being no selective pressure to keep them away).

Enter Nature vol. 534 pp. 719 – 723 ’16 — if you attach a 3′ UTR section of an mRNA to a normal protein sequence (mimicking readthrough) you get much less protein. The authors think the 3’UTRs code for peptide sequences destabilizing the attached protein. They don’t know what this might be, so it’s terra incognita for researchers, and a worthwhile PhD project to figure it out. Another example of ‘coding’ by a presumably nonCoding sequence in the genome. It may also tell us something about protein structure.

Why you do and don’t need chemistry to understand why we have big brains

You need some serious molecular biological chops to understand why primates such as ourselves have large brains. For this you need organic chemistry. Or do you? Yes and no. Yes to understand how the players are built and how they interact. No because it can be explained without any chemistry at all. In fact, the mechanism is even clearer that way.

It’s an exercise in pure logic. David Hilbert, one of the major mathematicians at the dawn of the 20th century famously said about geometry — “One must be able to say at all times–instead of points, straight lines, and planes–tables, chairs, and beer mugs”. The relationships between the objects of geometry were far more crucial to him than the objects themselves. We’ll take the same tack here.

So instead of the nucleotides Uridine (U), Adenine (A), Guanine (G), Cytosine (C), we’re going to talk about lock and key and hook and eye.

We’re going to talk about long chains of these four items. The order is crucial Two long chains of them can pair up only only if there are segments on each where the locks on one pair with the keys on the other and the hooks with the eyes. How many possible combinations of the four are there on a chain of 20 — just 4^20 or 2^40 = 1,099,511,621,776. So to get two randomly chosen chains to pair up exactly is pretty unlikely, unless in some way you or the blind Watchmaker chose them to do so.

Now you need a Turing machine to take a long string of these 4 items and turn it into a protein. In the case of the crucial Notch protein the string of locks, keys, hooks and eyes contains at least 5,000 of them, and their order is important, just as the order of letters in a word is crucial for its meaning (consider united and untied).

The cell has tons of such Turing machines (called ribosomes) and lots of copies of strings coding for Notch (called Notch mRNAs).

The more Notch protein around in the developing brain, the more the proliferating precursors to neurons proliferate before differentiating into neurons, resulting in a bigger brain.

The Notch string doesn’t all code for protein, at one end is a stretch of locks, keys, hooks and eyes which bind other strings, which when bound cause the Notch string to be degraded, mean less Notch and a smaller brain. The other strings are about 20 long and are called microRNAs.

So to get more Notch and a bigger brain, you need to decrease the number of microRNAs specifically binding to the Notch string. One particular microRNA (called miR-143-3p) has it in for the Notch string. So how did primates get rid of miR-143-3p they have an insert (unique to them) in another string which contains 16 binding sites for miR-143-3p. So this string called lincND essentially acts as a sponge for miR-143-3p meaning it can’t get to the Notch string, meaning that neuronal precursor cells proliferate more, and primate brains get bigger.

So can you forget organic chemistry if you want to understand why we have big brains? In the above sense you can. Your understanding won’t be particularly rich, but it will be at a level where chemical explanation is powerless.

No amount of understanding of polyribonucleotide double helices will tell you why a particular choice out of the 1,099,511,621,776 possible strings of 20 will be important. Literally we have moved from physicality to the realm of pure ideas, crossing the Cartesian dichotomy in the process.

Here’s a copy of the original post with lots of chemistry in it and all the references you need to get the molecular biological chops you’ll need.

Why our brains are large: the elegance of its molecular biology

Primates have much larger brains in proportion to their body size than other mammals. Here’s why. The mechanism is incredibly elegant. Unfortunately, you must put a sizable chunk of recent molecular biology under your belt before you can comprehend it. Anyone can listen to Mozart without knowing how to read or write music. Not so here.

I doubt that anyone can start from ground zero and climb all the way up, but here is all the background you need to comprehend what follows. Start here — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/
and follow the links (there are 5 more articles).

Also you should be conversant with competitive endogenous RNA (ceRNA) — here’s a link — https://luysii.wordpress.com/2014/01/20/why-drug-discovery-is-so-hard-reason-24-is-the-3-untranslated-region-of-every-protein-a-cerna/

Also you should understand what microRNAs are — we’re still discovering all the things they do — here’s the background you need — https://luysii.wordpress.com/2015/03/22/why-drug-discovery-is-so-hard-reason-26-were-discovering-new-players-all-the-time/weith.

Still game?

Now we must delve into the embryology of the brain, something few chemists or nonbiological type scientists have dealt with.

You’ve probably heard of the term ‘water on the brain’. This refers to enlargement of the ventricular system, a series of cavities in all our brains. In the fetus, all nearly all our neurons are formed from cells called neuronal precursor cells (NPCs) lining the fetal ventricle. Once formed they migrate to their final positions.

Each NPC has two choices — Choice #1 –divide into two NPCs, or Choice #2 — divide into an NPC and a daughter cell which will divide no further, but which will mature, migrate and become an adult neuron. So to get a big brain make NPCs adopt choice #1.

This is essentially a choice between proliferation and maturation. It doesn’t take many doublings of a NPC to eventually make a lot of neurons. Naturally cancer biologists are very interested in the mechanism of this choice.

Well to make a long story short, there is a protein called NOTCH — vitally important in embryology and in cancer biology which, when present, causes NPCs to make choice #1. So to make a big brain keep Notch around.

Well we know that some microRNAs bind to the mRNA for NOTCH which helps speed its degradation, meaning less NOTCH protein. One such microRNA is called miR-143-3p.

We also know that the brain contains a lncRNA called lncND (ND for Neural Development). The incredible elegance is that there is a primate specific insert in lncND which contains 16 (yes 16) binding sites for miR-143-3p. So lncND acts as a sponge for miR-143-3p meaning it can’t bind to the mRNA for NOTCH, meaning that there is more NOTCH around. Is this elegant or what. Let’s hear it for the Blind Watchmaker, assuming you have the faith to believe in such things.

Fortunately lncND is confined to the brain, otherwise we’d all be dead of cancer.

Should you want to read about this, here’s the reference [ Neuron vol. 90 pp. 1141 – 1143, 1255 – 1262 ’16 ] where there’s a lot more.

Historically, this was one of the criticisms of the Star Wars Missile Defense — the Russians wouldn’t send over a few missles, they’d send hundreds which would act as sponges to our defense. Whether or not attempting to put Star Wars in place led to Russia’s demise is debatable, but a society where it was a crime to own a copying machine, could never compete technically to produce such a thing.

Why our brains are large: the elegance of its molecular biology

Primates have much larger brains in proportion to their body size than other mammals. Here’s why. The mechanism is incredibly elegant. Unfortunately, you must put a sizable chunk of recent molecular biology under your belt before you can comprehend it. Anyone can listen to Mozart without knowing how to read or write music. Not so here.

I doubt that anyone can start from ground zero and climb all the way up, but here is all the background you need to comprehend what follows. Start here — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/
and follow the links (there are 5 more articles).

Also you should be conversant with competitive endogenous RNA (ceRNA) — here’s a link — https://luysii.wordpress.com/2014/01/20/why-drug-discovery-is-so-hard-reason-24-is-the-3-untranslated-region-of-every-protein-a-cerna/

Also you should understand what microRNAs are — we’re still discovering all the things they do — here’s the background you need — https://luysii.wordpress.com/2015/03/22/why-drug-discovery-is-so-hard-reason-26-were-discovering-new-players-all-the-time/weith.

Still game?

Now we must delve into the embryology of the brain, something few chemists or nonbiological type scientists have dealt with.

You’ve probably heard of the term ‘water on the brain’. This refers to enlargement of the ventricular system, a series of cavities in all our brains. In the fetus, all nearly all our neurons are formed from cells called neuronal precursor cells (NPCs) lining the fetal ventricle. Once formed they migrate to their final positions.

Each NPC has two choices — Choice #1 –divide into two NPCs, or Choice #2 — divide into an NPC and a daughter cell which will divide no further, but which will mature, migrate and become an adult neuron. So to get a big brain make NPCs adopt choice #1.

This is essentially a choice between proliferation and maturation. It doesn’t take many doublings of a NPC to eventually make a lot of neurons. Naturally cancer biologists are very interested in the mechanism of this choice.

Well to make a long story short, there is a protein called NOTCH — vitally important in embryology and in cancer biology which, when present, causes NPCs to make choice #1. So to make a big brain keep Notch around.

Well we know that some microRNAs bind to the mRNA for NOTCH which helps speed its degradation, meaning less NOTCH protein. One such microRNA is called miR-143-3p.

We also know that the brain contains a lncRNA called lncND (ND for Neural Development). The incredible elegance is that there is a primate specific insert in lncND which contains 16 (yes 16) binding sites for miR-143-3p. So lncND acts as a sponge for miR-143-3p meaning it can’t bind to the mRNA for NOTCH, meaning that there is more NOTCH around. Is this elegant or what. Let’s hear it for the Blind Watchmaker, assuming you have the faith to believe in such things.

Fortunately lncND is confined to the brain, otherwise we’d all be dead of cancer.

Should you want to read about this, here’s the reference [ Neuron vol. 90 pp. 1141 – 1143, 1255 – 1262 ’16 ] where there’s a lot more.

Historically, this was one of the criticisms of the Star Wars Missile Defense — the Russians wouldn’t send over a few missles, they’d send hundreds which would act as sponges to our defense. Whether or not attempting to put Star Wars in place led to Russia’s demise is debatable, but a society where it was a crime to own a copying machine, could never compete technically to produce such a thing.

Are you as smart as the (inanimate) blind watchmaker

Here’s a problem the cell has solved. Can you? Figure out a way to send a protein to two different membranes in the cell (the membrane encoding it { aka the plasma membrane }, and the endoplasmic reticulum) in the proportions you wish.

The proteins must have exactly the same sequence and content of amino acids, ruling out alternative splicing of exons in the mRNA (if this is Greek to you have a look at the following post — https://luysii.wordpress.com/2012/01/09/molecular-biology-survival-guide-for-chemists-v-the-ribosome/ and the others collected under — https://luysii.wordpress.com/category/molecular-biology-survival-guide/).

The following article tells you how the cell does it. Recall that not all of the messenger RNA (mRNA) is translated into protein. The ribosome latches on to the 5′ end of the mRNA,  subsequently moving toward the 3′ end until it finds the initiator codon (AUG which codes for methionine). This means that there is a 5′ untranslated region (5′ UTR). It then continues moving 3′ ward stitching amino acids together.  Similarly after the ribosome reaches the last codon (one of 3 stop codons) there is a 3′ untranslated region (3′ UTR) of the mRNA. The 3′ UTR isn’t left alone but is cleaved and a polyAdenine tail added to it. The 3′ UTR is where most microRNAs bind controlling mRNA stability (hence the amount of protein produced from a given mRNA).

The trick used by the cell is described in [ Nature vol. 522 pp. 363 – 367 ’15 ]. The 3’UTR is alternatively processed producing a variety of short and long 3’UTRs. One such protein where this happens is CD47 — which is found on the surface of most cells where it stops the cell from being eaten by scavenger cells such as macrophages. The long 3′ UTR of CD47 allows efficient cell surface expression, while the short 3′ UTR localizes it to the endoplasmic reticulum.

How could this possibly work? Once the protein is translated by the ribosome, it leaves the ribosome and the mRNA doesn’t it? Not quite.

They say that the long 3′ UTR of CD47 acts as a scaffold to recruit a protein complex which contains HuR (aka ELAVL1), an RNA binding protein and SET to the site of translation. The allows interaction of SET with the newly translated cytoplasmic domains of CD47, resulting in subsequent translocation of CD47 to the plasma membrane via activated RAC1.

The short 3′ UTR of CD47 doesn’t have the sequence binding HuR and SET, so CD47 doesn’t get to the plasma membrane, rather to the endoplasmic reticulum.

The mechanism may be quite general as HuR binds to thousands of mRNAs. The paper gives two more examples of proteins where this happens.

It’s also worth noting that all this exquisite control, does NOT involve covalent bond formation and breakage (e.g. not what we consider classic chemical reactions). Instead it’s the dance of one large molecular object binding to another in other ways. The classic chemist isn’t smiling. The physical chemist is.

Why drug discovery is so hard: Reason #26 — We’re discovering new players all the time

Drug discovery is so very hard because we don’t understand the way cells and organisms work very well. We know some of the actors — DNA, proteins, lipids, enzymes but new ones are being discovered all the time (even among categories known for decades such as microRNAs).

Briefly microRNAs bind to messenger RNAs usually decreasing their stability so less protein is made from them (translated) by the ribosome. It’s more complicated than that (see later), but that’s not bad for a first pass.

Presently some 2,800 human microRNAs have been annotated. Many of them are promiscuous binding more than one type of mRNA. However the following paper more than doubled their number, finding some 3,707 new ones [ Proc. Natl. Acad. Sci. vol. 112 pp. E1106 – E1115 ’15 ]. How did they do it?

Simplicity itself. They just looked at samples of ‘short’ RNA sequences from 13 different tissue types. MicroRNAs are all under 30 nucleotides long (although their precursors are not). The reason that so few microRNAs have been found in the past 20 years is that cross-species conservation has been used as a criterion to discover them. The authors abandoned the criterion. How did they know that this stuff just wasn’t transcriptional chaff? Two enzymes (DROSHA, DICER) are involved in microRNA formation from larger precursors, and inhibiting them decreased the abundance of the ‘new’ RNAs, implying that they’d been processed by the enzymes rather than just being runoff from the transcriptional machinery. Further evidence is that of half were found associated with a protein called Argonaute which applies the microRNA to the mRBNA. 92% of the microRNAs were found in 10 or more samples. An incredible 23 billion sequenced reads were performed to find them.

If that isn’t complex enough for you, consider that we now know that microRNAs bind mRNAs everywhere, not just in the 3′ untranslated region (3′ UTR) — introns, exons. MicroRNAs also bind pseudogenes, SINEes, circular RNAs, nonCoding RNAs. So it’s a giant salad bowl of various RNAs binding each other affecting their stability and other functions. This may be echoes of prehistoric life before DNA arrived on the scene.

It’s early times, and the authors estimate that we have some 25,000 microRNAs in our genome — more than the number of protein genes.

As always, the Category “Molecular Biology Survival Guide” found on the left should fill in any gaps you may have.

One rather frightening thought; If, as Dawkins said, we are just large organisms designed to allow DNA to reproduce itself, is all our DNA, proteins, lipids etc, just a large chemical apparatus to allow our RNA to reproduce itself? Perhaps the primitive RNA world from which we are all supposed to have arisen, never left.

Why drug discovery is so hard: Reason #24 — Is the 3′ untranslated region of every mRNA a ceRNA?

We all know what proteins do. They act as enzymes, structural elements of cells, membrane proteins where drugs bind etc. etc. The background the pure chemist needs for what follows can all be found in the category “Molecular Biology Survival Guide.

We also know that that the messenger RNA for any given protein contains a lot more information than that needed to code for the amino acids making up the protein. Forget the introns that are spliced out from the initial transcript. When the mature messenger RNA for a given protein leaves the nucleus for the cytoplasm where the ribosome translates it into protein at either end it contains nucleotides which the ribosome effectively ignores. These are called the untranslated regions (UTRs). The UTRs at the 3′ end of human mRNAs range in length between 60 and 4,000 nucleotides (average 800). It costs energy to store the information for the UTR in DNA, more energy to synthesize the nucleotides which make it up, even more to patch them together to form the UTR, more to package it and move it out of the nucleus etc. etc.

Why bother? Because the 3′ UTR of the mRNA contains a lot of information which tells the cell how much protein to make, how long the mRNA should hang around in the cell (among many other things). A Greek philosopher got here first — “Nature does nothing uselessly” – Aristotle

Those familiar with competitive endogenous RNA (ceRNA) can skip what follows up to the ****

Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.

Molecular biology is full of such semantic cherry bombs as nonCoding DNA (which meant DNA which didn’t cord for protein), a subset of Junk DNA. Another is the pseudogene — these are genes that look like they should code for protein, except that they don’t because of lack of an initiation codon or a premature termination codon. Except for these differences, they have the nucleotide sequence to code for a known protein. It is estimated that the human genome contains as many pseudogenes (20,000) as it contains true protein coding genes [ Genome Res. vol. 12 pp. 272 – 280 ’02 ]. We now know that well over half the genome is transcribed into mRNA, including the pseudogenes.

PTEN (you don’t want to know what it stands for) is a 403 amino acid protein which is one of the most commonly mutated proteins in human cancer. Our genome also contains a pseudogene for it (called PTENP). Interestingly deletion of PTENP (not PTEN) is found in some cancers. However PTENP deletion is associated with decreased amounts of the PTEN protein itself, something you don’t want as PTEN is a tumor suppressor. How PTEN accomplishes this appears to be fairly well known, but is irrelevant here.

Why should loss of PTENP decrease PTEN itself? The reason is because the mRNA made from PTENP, even though it has a premature termination codon, and can’t be made into protein, is just as long, so it also contains the 3′UTR of PTEN. This means PTENP is sopping up microRNAs which would otherwise decrease the level of PTEN. Think of PTENP mRNA as a sponge.

Subtle isn’t it? But there’s far more. At least PTENP mRNA closely resembles the PTEN mRNA. However other mRNAs coding for completely different proteins, also have binding sites in their 3′UTR for the microRNA which binds to the 3UTR of PTEN, resulting in its destruction. So transcription of a completely different gene (the example of ZEB2 is given) can control the abundance of another protein. Essentially its mRNA is acting as a sponge, sopping up the killer microRNA.

It gets worse. Most microRNAs have binding sites on the mRNAs of many different proteins, and PTEN itself has a 3′UTR which binds to 10 different microRNAs.

So here is a completely unexpected mechanism of control of protein levels in the cell. The general term for this is competitive endogenous RNA (ceRNA). Two years ago the number of human microRNAs was thought to be around 1,000 (release 2.0 of miRBase in June ’13 gives the number at 2,555 — this is unlikely to be complete). Unlike protein coding genes, it’s far from obvious how to find them by looking at the sequence of our genome, so there may be quite a few more.

So most microRNAs bind the 3′UTR of more than one protein (the average number is unclear at this point), and most proteins have binding sites for microRNAs in their 3′UTR (again the average number is unclear). What a mess. What subtlety. What an opportunity for the regulation of cellular function. Who is going to be smart enough to figure out a drug which will change this in a way that we want. Absence of evidence of a regulatory mechanism is not evidence of its absence. A little humility is in order.

*****

If this wasn’t a scary enough, consider the following cautionary tale — Nature vol. 505 pp. 212 – 217 ’14. HMGA2 is a protein we thought we understood for the most part. It is found in the nucleus, where it binds to DNA. While it doesn’t transcribe DNA into RNA, it does bind to DNA helping to form a protein complex which binds to DNA which effectively helps promote transcription of certain genes.

Well that’s what the protein does. However the mRNA for the protein uses its 3′ untranslated region (3’UTR) to sop up microRNAs of the let-7 family. The mRNA for HMGA2 is highly overexpressed in human cancer (notably the very common adenocarcinoma of the lung). You can mutate the mRNA for HMGA2 so it doesn’t produce the protein, just by putting a stop codon in it near the 5′ end. Throw the altered mRNA into a tissue culture of an lung adenocarcinoma cell line, and the cell become more proliferative and grows independently of being anchored to the tissue culture plate (e.g. anchorage independence, a biologic marker for cancer).

So what? It means that it is possible that every mRNA for every protein we make is acting as a ceRN A. The authors conclude the paper with ” Such dual-function ceRNA and protein activities necessitate a deeper exploration of the coding genome in biological systems.”

I’ll say. We’re just beginning to scratch the surface. The control mechanisms within the cell continue to amaze (me) by their elegance and subtlety. I doubt highly that we know them all. Yet more reasons that drug discovery is hard — we are mucking about with a system whose workings we only dimly understand.

The most interesting paper I’ve read in the past 5 years — finale

Recall from https://luysii.wordpress.com/2013/06/13/the-most-interesting-paper-ive-read-in-the-past-5-years-introduction-and-allegro/ that if you knew the ones and zeroes coding for the instruction your computer was currently working on you’d know exactly what it would do. Similarly, it has long been thought that, if you knew the sequence of the 4 letters of the genetic code (A, T, G, C) coding for a protein, you’d know exactly what would happen. The cellular machinery (the ribosome) producing output (a protein in this case) was thought to be an automaton similar to a computer blindly carrying out instructions. Assuming the machinery is intact, the cellular environment should have nothing to do with the protein produced. Not so. In what follows, I attempt to provide an abbreviated summary of the background you need to understand what goes wrong, and how, even here, environment rears its head.

If you find the following a bit terse, have a look at the https://luysii.wordpress.com/category/molecular-biology-survival-guide/ . In particular the earliest 3 articles (Roman numerals I, II and III) should be all you need).

We’ve learned that our DNA codes for lots of stuff that isn’t protein. In fact only 2% of it codes for the amino acids comprising our 20,000 proteins. Proteins are made of sequences of 20 different amino acids. Each amino acid is coded for by a sequence of 3 genetic code letters. However there are 64 possibilities for these sequences (4 * 4 * 4). 3 possibilities tell the machinery to quit (they don’t code for an amino acid). So some amino acids have as many as 6 codons (sequences of 3 letters) for them — e.g. Leucine (L) has 6 different codons (synonymous codons) for it while Methionine (M) has but 1. The other 18 amino acids fall somewhere between.

The cellular machine making proteins (the ribosome) uses the transcribed genetic code (mRNA) and a (relatively small) adapter, called transfer RNA (tRNA). There are 64 different tRNAs (61 for each codon specifying an amino acid and 3 telling the machine to stop). Each tRNA contains a sequence of 3 letters (the antiCodon) which exactly pairs with the codon sequence in the mRNA, the same way the letters (bases if you’re a chemist) in the two strands of DNA pair with each other. Hanging off the opposite end of each tRNA is the amino acid the antiCodon refers to. The ribosome basically stitches two amino acids from adjacent tRNAs together and then gets rid of one tRNA.

So which particular synonymous codon is found in the mRNA shouldn’t make any difference to the final product of the ribosome. That’s what the computer model of the cell tells us.

Since most cells are making protein all the time. There is lots of tRNA around. We need so much tRNA that instead of 64 genes (one for each tRNA) we have some 500 in our genome. So we have multiple different genes coding for each tRNA. I can’t find out how many of each we have (which would be very nice to know in what follows). The amount of tRNA of each of the 64 types is roughly proportional to the number of genes coding for it (the gene copy number) according to the papers cited below.

This brings us to codon usage. You have 6 different codons (synonymous codons) for leucine. Are they all used equally (when you look at every codon in the genome which codes for leucine)? They are not. Here are the percentages for the usages of the 6 distinct leucine codons in human DNA: 7, 7, 13, 13, 20, 40. For random use they should all be around 16. The most frequently appearing codon occurs as often as the least frequently used 4.

It turns out the the most used synonymous codons are the ones with the highest number of genes for the corresponding tRNA. Makes sense as there should be more of that synonymous tRNA around (at least in most cases) This is called codon bias, but I can’t seem to find the actual numbers.

This brings us (at last) to the actual paper [ Nature vol. 495 pp. 111 – 115 ’13 ] and the accompanying editorial (ibid. pp. 57 – 58). The paper says “codon-usage bias has been observed in almost all genomes and is thought to result from selection for efficient and accurate translation (into protein) of highly expressed genes” — 3 references given. Essentially this says that the more tRNA around matching a particular codon, the faster the mRNA will find it (le Chatelier’s principle in action).

An analogy at this point might help. When I was a kid, I hung around a print shop. In addition to high speed printing, there was also a printing press, where individual characters were selected from boxes of characters, placed on a line (this is where the font term leading comes from), and baked into place using some scary smelling stuff. This was so the same constellation of characters could be used over and over. For details see http://en.wikipedia.org/wiki/Printing_press. You can regard the 6 different tRNAs for leucine as 6 different fonts for the letter L. To make things right, the correct font must be chosen (by the printer or the ribosome). Obviously if a rare font is used, the printer will have to fumble more in the L box to come up with the right one. This is exactly le Chatelier’s principle.

The papers concern a protein (FRQ) used in the circadian clock of a fungus — evolutionarily far from us to be sure, but hang in there. Paradoxically, the FRQ gene uses a lot of ‘rare’ synonymous codons. Given the technology we have presently, the authors were able to switch the ‘rare’ synonymous codons to the most common ones. As expected, the organism made a lot more FRQ using the modified gene.

The fascinating point (to me at least) is that the protein, with exactly the same amino acids did not fulfill its function in the circadian clock. As expected there was more of the protein around (it was easier for the ribosome machinery to make).

Now I’ve always been amazed that the proteins making us up have just a few shapes, something I’d guess happens extremely rarely. For details see https://luysii.wordpress.com/2010/10/24/the-essential-strangeness-of-the-proteins-that-make-us-up/.

Well, as we know, proteins are just a linear string of amino acids, and they have to fold to their final shape. The protein made by codon optimization must not have had the proper shape. Why? For one thing the protein is broken down faster. For another it is less stable after freeze thaw cycles. For yet another, it just didn’t work correctly in the cell.

What does this mean? Most likely it means that the protein made from codon optimized mRNA has a different shape. The organism must make it more slowly so that it folds into the correct shape. Recall that the amino acid chain is extruded from one by one from the ribosome, like sausage from a sausage making machine. As it’s extruded the chain (often with help from other proteins called chaperones) flops around and finds its final shape.

Why is this so fascinating (to me at least)? Because here,in the very uterus of biologic determinism, the environment (how much of each type of synonymous tRNA is around) rears its head. Forests have been felled for papers on the heredity vs. environment question. Just as American GIs wrote “Kilroy was here” everywhere they went in WWII, here’s the environment popping up where no one thought it would.

In addition the implications for protein function, if this is a widespread phenomenon, are simply staggering.