Tag Archives: microRNA

Forgotten but not gone — take II

The RNA world from whence we sprang strikes again, this time giving us a glimpse into its own internal dynamic.  18 months ago I wrote the following post — which will give you the background to follow the latest (found at the end after the (***)

Life is said to have originated in the RNA world.  We all know about the big 3 important RNAs for the cell, mRNA, ribosomal RNA and transfer RNA.  But just like the water, sewer, power and subway systems under Manhattan, there is another world down there in the cell which doesn’t much get talked about.  These areRNAs, whose primary (and possibly only) function is to interact with other RNAs.

Start with microRNAs (of which we have at least 1,500 as of 12/12).  Their function is to bind to messenger RNA (mRNA) and inhibit translation of the mRNA into protein.  The effects aren’t huge, but they are a more subtle control of protein expression, than the degree of transcription of the gene.

Then there are ceRNAs (competitive endogenous RNAs) which have a large number of binding sites for microRNAs — humans have a variety of them all with horrible acronyms — HULC, PTCSC3 etc. etc. They act as sponges for microRNAs keeping them bound and quiet.

Then there are circular RNAs.  They’d been missed until recently, because typical RNA sequencing methods isolate only RNAs with characteristic tails, and a circular RNA doesn’t have any.  One such is called CiRS7/CDR1) which contain 70 binding sites for one particular microRNA (miR-7).  They are unlike to be trivial.  They are derived from 15% of actively transcribed genes.  They ‘can be’ 10 times as numerous as linear RNAs (like mRNA and everything else) — probably because they are hard to degrade < Science vol. 340 pp. 440 – 441 ’17 >. So some of them are certainly RNA sponges — but all of them?

The latest, and most interesting class are the nonCoding RNAs found in viruses. Some of them function to attack cellular microRNAs and help the virus survive. Herpesvirus saimiri a gamma-herpes virus establishes latency in the T lymphocytes of New World primates, by expressing 7 small nuclear uracil-rich nonCoding RNAs (called HSURs).  They associate with some microRNAs, and rather than blocking their function act as chaperones < Nature vol. 550 pp. 275 – 279 ’17 >.  They HSURs also bind to some mRNAs inhibiting their function — they do this by helping miR-16 bind to their targets — so they are chaperones.  So viral Sm-class RNAs may function as microRNA adaptors.

Do you think for one minute, that the cell isn’t doing something like this.

I have a tendency to think of RNAs as always binding to other RNAs by classic Watson Crick base pairing — this is wrong as a look at any transfer RNA structure will show. https://en.wikipedia.org/wiki/Transfer_RNA.  Far more complicated structures may be involved, but we’ve barely started to look.

Then there are the pseudogenes, which may also have a function, which is to be transcribed and sop up microRNAs and other things — I’ve already written about this — https://luysii.wordpress.com/2010/07/14/junk-dna-that-isnt-and-why-chemistry-isnt-enough/.  Breast cancer cells think one (PTEN1) is important enough to stop it from being transcribed, even though it can’t be translated into protein.

*****

[ Proc. Natl. Acad. Sci. vol. 116 pp. 7455 – 7464 ’19 ] The work reports a fascinating example of that early world in which the function of one denizen (a circular RNA called cPWWP2A) binds to another denizen of that world (microRNA 579 aka miR-579) acting as a sponge sopping up so it can’t bind to the mRNAs for angiopoetitin1, occludin and SIRT1.

So what you say?  Well it may lead to a way to treat diabetic retinopathy. How did they find cPWWP2A?  They used the Shanghai BIotechnology Company Mouse Circular RNA microArray which measures circular RNAs.  They found that 400 or so that were upregulated in diabetic retinopathy and another 400 or so that were downregulated.  cPWWP2A was on of the 3 top upregulated circular RNAs in diabetic retinopathy.  cPWWP2A comes from (what else?) PWWP2A, a gene coding for a protein which specifically binds the histone protein H2A.Z.

Overexpression of cPWW2PA or inhibition of miR-579 improves retinal vascular dysfunction in experimental diabetes.

So here is all this stuff going on way down there in the RNA world, first interacting with other players in this world and eventually reaching up to the level we thought we knew about and controlling gene expression.  It’s sort of like DOS (Disc Operating System) still being important in Windows.

How much more stuff like this is to be discovered controlling gene expression in us is anyone’s guess

Advertisements

Why drug discovery is hard #29 — a very old player doing a very new thing

We all know what RNA does don’t we?  It binds to other RNAs and to DNA.  Sure lots of new forms of RNA have been found: microRNAs, competitive endogenous RNA (ceRNA), long nonCoding (for protein) RNA (lncRNA), piwiRNAs, small interfering RNAs (siRNAs), . .. The list appears endless.  But the basic mechanism of action of RNA in the cell is binding to some other polynucleotide (RNA or DNA) and affecting its function.

Not so fast.  A new paper http://science.sciencemag.org/content/358/6366/1051 describes  lncRNA-ACOD1, a cellular RNA induced by a variety of viruses.  lncRNA-ACOD1 binds to an enzyme enhancing its catalytic efficiency.  Now that’s new.  Certainly RNAs and proteins bind to each other in the ribosome, and in RNAase P, but here the proteins serve to structure the RNA so it can carry out its catalytic function, not the other way around.

The enzyme bound is called GOT2 (Glutamic Oxaloacetic Transaminase 2).  Much interesting cellular biochemistry is discussed in the paper which I’ll skip, except to say that the virus uses the hyped up GOT2 to repurpose the cell’s metabolic machinery for its own evil ends.

lncRNA-ACOD1 has 3 exons and a polyAdenine tail.  There are two transcript variants containing  2,330 and 2,259 nucleotides.  There are only 100 copies/cell.  lncRNA-ACOD1 nucleotides #165 – #390 bind to amino acids #54 – #68 of GOT2.

So what are the other 2000 or so nucleotides of lncRNA-ACOD1 doing?   The phenomenon of RNA binding to protein is quite likely to be more widespread.  Both the GOT2 interacting motif and the interacting sequence of lncRNA-ACOD1 are well conserved across species of hosts and viruses.

Although viruses co-opt lncRNA-ACOD1, it is normally expressed in the heart as is GOT2 with no viral infection at all.  So we have likely stumbled onto an entirely new method of cellular metabolic control, AND a whole new set of players and interactions for drugs to act on (if they aren’t already doing this unknown to us).

This is series member #29 of why drug development is hard, most of which concentrated on the fact that we don’t know all the players.  lncRNA-ACOD1 is different — RNA is a player we’ve known for a very long time  but it appears to be playing a game entirely new to us.

It is also good to see cutting edge research like this coming out of China.  Hopefully it will stand up, but enough questionable stuff has come from them that every Chinese paper is under a cloud.

This is why I love reading the current literature.  You never know what you’re going to find.  It’s like opening presents.

Forgotten but not gone

Life is said to have originated in the RNA world.  We all know about the big 3 important RNAs for the cell, mRNA, ribosomal RNA and transfer RNA.  But just like the water, sewer, power and subway systems under Manhattan, there is another world down there in the cell which doesn’t much get talked about.  These are RNAs, whose primary (and possibly only) function is to interact with other RNAs.

Start with microRNAs (of which we have at least 1,500 as of 12/12).  Their function is to bind to messenger RNA (mRNA) and inhibit translation of the mRNA into protein.  The effects aren’t huge, but they are a more subtle control of protein expression, than the degree of transcription of the gene.

Then there are ceRNAs (competitive endogenous RNAs) which have a large number of binding sites for microRNAs — humans have a variety of them all with horrible acronyms — HULC, PTCSC3 etc. etc. They act as sponges for microRNAs keeping them bound and quiet.

Then there are circular RNAs.  They’d been missed until recently, because typical RNA sequencing methods isolate only RNAs with characteristic tails, and a circular RNA doesn’t have any.  One such is called CiRS7/CDR1) which contain 70 binding sites for one particular microRNA (miR-7).  They are unlike to be trivial.  They are derived from 15% of actively transcribed genes.  They ‘can be’ 10 times as numerous as linear RNAs (like mRNA and everything else) — probably because they are hard to degrade < Science vol. 340 pp. 440 – 441 ’17 >. So some of them are certainly RNA sponges — but all of them?

The latest, and most interesting class are the nonCoding RNAs found in viruses. Some of them function to attack cellular microRNAs and help the virus survive. Herpesvirus saimiri a gamma-herpes virus establishes latency in the T lymphocytes of New World primates, by expressing 7 small nuclear uracil-rich nonCoding RNAs (called HSURs).  They associate with some microRNAs, and rather than blocking their function act as chaperones < Nature vol. 550 pp. 275 – 279 ’17 >.  They HSURs also bind to some mRNAs inhibiting their function — they do this by helping miR-16 bind to their targets — so they are chaperones.  So viral Sm-class RNAs may function as microRNA adaptors.

Do you think for one minute, that the cell isn’t doing something like this.

I have a tendency to think of RNAs as always binding to other RNAs by classic Watson Crick base pairing — this is wrong as a look at any transfer RNA structure will show. https://en.wikipedia.org/wiki/Transfer_RNA.  Far more complicated structures may be involved, but we’ve barely started to look.

Then there are the pseudogenes, which may also have a function, which is to be transcribed and sop up microRNAs and other things — I’ve already written about this — https://luysii.wordpress.com/2010/07/14/junk-dna-that-isnt-and-why-chemistry-isnt-enough/.  Breast cancer cells think one (PTEN1) is important enough to stop it from being transcribed, even though it can’t be translated into protein.

Why you do and don’t need chemistry to understand why we have big brains

You need some serious molecular biological chops to understand why primates such as ourselves have large brains. For this you need organic chemistry. Or do you? Yes and no. Yes to understand how the players are built and how they interact. No because it can be explained without any chemistry at all. In fact, the mechanism is even clearer that way.

It’s an exercise in pure logic. David Hilbert, one of the major mathematicians at the dawn of the 20th century famously said about geometry — “One must be able to say at all times–instead of points, straight lines, and planes–tables, chairs, and beer mugs”. The relationships between the objects of geometry were far more crucial to him than the objects themselves. We’ll take the same tack here.

So instead of the nucleotides Uridine (U), Adenine (A), Guanine (G), Cytosine (C), we’re going to talk about lock and key and hook and eye.

We’re going to talk about long chains of these four items. The order is crucial Two long chains of them can pair up only only if there are segments on each where the locks on one pair with the keys on the other and the hooks with the eyes. How many possible combinations of the four are there on a chain of 20 — just 4^20 or 2^40 = 1,099,511,621,776. So to get two randomly chosen chains to pair up exactly is pretty unlikely, unless in some way you or the blind Watchmaker chose them to do so.

Now you need a Turing machine to take a long string of these 4 items and turn it into a protein. In the case of the crucial Notch protein the string of locks, keys, hooks and eyes contains at least 5,000 of them, and their order is important, just as the order of letters in a word is crucial for its meaning (consider united and untied).

The cell has tons of such Turing machines (called ribosomes) and lots of copies of strings coding for Notch (called Notch mRNAs).

The more Notch protein around in the developing brain, the more the proliferating precursors to neurons proliferate before differentiating into neurons, resulting in a bigger brain.

The Notch string doesn’t all code for protein, at one end is a stretch of locks, keys, hooks and eyes which bind other strings, which when bound cause the Notch string to be degraded, mean less Notch and a smaller brain. The other strings are about 20 long and are called microRNAs.

So to get more Notch and a bigger brain, you need to decrease the number of microRNAs specifically binding to the Notch string. One particular microRNA (called miR-143-3p) has it in for the Notch string. So how did primates get rid of miR-143-3p they have an insert (unique to them) in another string which contains 16 binding sites for miR-143-3p. So this string called lincND essentially acts as a sponge for miR-143-3p meaning it can’t get to the Notch string, meaning that neuronal precursor cells proliferate more, and primate brains get bigger.

So can you forget organic chemistry if you want to understand why we have big brains? In the above sense you can. Your understanding won’t be particularly rich, but it will be at a level where chemical explanation is powerless.

No amount of understanding of polyribonucleotide double helices will tell you why a particular choice out of the 1,099,511,621,776 possible strings of 20 will be important. Literally we have moved from physicality to the realm of pure ideas, crossing the Cartesian dichotomy in the process.

Here’s a copy of the original post with lots of chemistry in it and all the references you need to get the molecular biological chops you’ll need.

Why our brains are large: the elegance of its molecular biology

Primates have much larger brains in proportion to their body size than other mammals. Here’s why. The mechanism is incredibly elegant. Unfortunately, you must put a sizable chunk of recent molecular biology under your belt before you can comprehend it. Anyone can listen to Mozart without knowing how to read or write music. Not so here.

I doubt that anyone can start from ground zero and climb all the way up, but here is all the background you need to comprehend what follows. Start here — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/
and follow the links (there are 5 more articles).

Also you should be conversant with competitive endogenous RNA (ceRNA) — here’s a link — https://luysii.wordpress.com/2014/01/20/why-drug-discovery-is-so-hard-reason-24-is-the-3-untranslated-region-of-every-protein-a-cerna/

Also you should understand what microRNAs are — we’re still discovering all the things they do — here’s the background you need — https://luysii.wordpress.com/2015/03/22/why-drug-discovery-is-so-hard-reason-26-were-discovering-new-players-all-the-time/weith.

Still game?

Now we must delve into the embryology of the brain, something few chemists or nonbiological type scientists have dealt with.

You’ve probably heard of the term ‘water on the brain’. This refers to enlargement of the ventricular system, a series of cavities in all our brains. In the fetus, all nearly all our neurons are formed from cells called neuronal precursor cells (NPCs) lining the fetal ventricle. Once formed they migrate to their final positions.

Each NPC has two choices — Choice #1 –divide into two NPCs, or Choice #2 — divide into an NPC and a daughter cell which will divide no further, but which will mature, migrate and become an adult neuron. So to get a big brain make NPCs adopt choice #1.

This is essentially a choice between proliferation and maturation. It doesn’t take many doublings of a NPC to eventually make a lot of neurons. Naturally cancer biologists are very interested in the mechanism of this choice.

Well to make a long story short, there is a protein called NOTCH — vitally important in embryology and in cancer biology which, when present, causes NPCs to make choice #1. So to make a big brain keep Notch around.

Well we know that some microRNAs bind to the mRNA for NOTCH which helps speed its degradation, meaning less NOTCH protein. One such microRNA is called miR-143-3p.

We also know that the brain contains a lncRNA called lncND (ND for Neural Development). The incredible elegance is that there is a primate specific insert in lncND which contains 16 (yes 16) binding sites for miR-143-3p. So lncND acts as a sponge for miR-143-3p meaning it can’t bind to the mRNA for NOTCH, meaning that there is more NOTCH around. Is this elegant or what. Let’s hear it for the Blind Watchmaker, assuming you have the faith to believe in such things.

Fortunately lncND is confined to the brain, otherwise we’d all be dead of cancer.

Should you want to read about this, here’s the reference [ Neuron vol. 90 pp. 1141 – 1143, 1255 – 1262 ’16 ] where there’s a lot more.

Historically, this was one of the criticisms of the Star Wars Missile Defense — the Russians wouldn’t send over a few missles, they’d send hundreds which would act as sponges to our defense. Whether or not attempting to put Star Wars in place led to Russia’s demise is debatable, but a society where it was a crime to own a copying machine, could never compete technically to produce such a thing.

Why our brains are large: the elegance of its molecular biology

Primates have much larger brains in proportion to their body size than other mammals. Here’s why. The mechanism is incredibly elegant. Unfortunately, you must put a sizable chunk of recent molecular biology under your belt before you can comprehend it. Anyone can listen to Mozart without knowing how to read or write music. Not so here.

I doubt that anyone can start from ground zero and climb all the way up, but here is all the background you need to comprehend what follows. Start here — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/
and follow the links (there are 5 more articles).

Also you should be conversant with competitive endogenous RNA (ceRNA) — here’s a link — https://luysii.wordpress.com/2014/01/20/why-drug-discovery-is-so-hard-reason-24-is-the-3-untranslated-region-of-every-protein-a-cerna/

Also you should understand what microRNAs are — we’re still discovering all the things they do — here’s the background you need — https://luysii.wordpress.com/2015/03/22/why-drug-discovery-is-so-hard-reason-26-were-discovering-new-players-all-the-time/weith.

Still game?

Now we must delve into the embryology of the brain, something few chemists or nonbiological type scientists have dealt with.

You’ve probably heard of the term ‘water on the brain’. This refers to enlargement of the ventricular system, a series of cavities in all our brains. In the fetus, all nearly all our neurons are formed from cells called neuronal precursor cells (NPCs) lining the fetal ventricle. Once formed they migrate to their final positions.

Each NPC has two choices — Choice #1 –divide into two NPCs, or Choice #2 — divide into an NPC and a daughter cell which will divide no further, but which will mature, migrate and become an adult neuron. So to get a big brain make NPCs adopt choice #1.

This is essentially a choice between proliferation and maturation. It doesn’t take many doublings of a NPC to eventually make a lot of neurons. Naturally cancer biologists are very interested in the mechanism of this choice.

Well to make a long story short, there is a protein called NOTCH — vitally important in embryology and in cancer biology which, when present, causes NPCs to make choice #1. So to make a big brain keep Notch around.

Well we know that some microRNAs bind to the mRNA for NOTCH which helps speed its degradation, meaning less NOTCH protein. One such microRNA is called miR-143-3p.

We also know that the brain contains a lncRNA called lncND (ND for Neural Development). The incredible elegance is that there is a primate specific insert in lncND which contains 16 (yes 16) binding sites for miR-143-3p. So lncND acts as a sponge for miR-143-3p meaning it can’t bind to the mRNA for NOTCH, meaning that there is more NOTCH around. Is this elegant or what. Let’s hear it for the Blind Watchmaker, assuming you have the faith to believe in such things.

Fortunately lncND is confined to the brain, otherwise we’d all be dead of cancer.

Should you want to read about this, here’s the reference [ Neuron vol. 90 pp. 1141 – 1143, 1255 – 1262 ’16 ] where there’s a lot more.

Historically, this was one of the criticisms of the Star Wars Missile Defense — the Russians wouldn’t send over a few missles, they’d send hundreds which would act as sponges to our defense. Whether or not attempting to put Star Wars in place led to Russia’s demise is debatable, but a society where it was a crime to own a copying machine, could never compete technically to produce such a thing.

It ain’t the bricks it’s the plan — take II

A recent review in Neuron (vol. 88 pp. 681 – 677 ’15) gives a possible new explanation of how our brains came to be so different from apes (if not our behavior of late).

You’ve all heard that our proteins are only 2% different than the chimp, so we are 98% chimpanzee. The facts are correct, the interpretation wrong. We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

This is like saying Monticello and Independence Hall are just the same because they’re both made out of bricks. One could chemically identify Monticello bricks as coming from the Virginia piedmont, and Independence Hall bricks coming from the red clay of New Jersey, but the real difference between the buildings is the plan.

It’s not the proteins, but where and when and how much of them are made. The control for this (plan if you will) lies outside the genes for the proteins themselves, in the rest of the genome (remember only 2% of the genome codes for the amino acids making up our 20,000 or so protein genes). The control elements have as much right to be called genes, as the parts of the genome coding for amino acids. Granted, it’s easier to study genes coding for proteins, because we’ve identified them and know so much about them. It’s like the drunk looking for his keys under the lamppost because that’s where the light is.

We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

All the molecular biology you need to understand what follows is in the following post — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure.

The neuron paper is detailed and fascinating to a neurologist, but toward the end it begins to fry far bigger fish.

Until about 10 years ago, molecular biology was incredibly protein-centric. Consider the following terms — nonsense codon, noncoding DNA, junk DNA. All are pejorative and arose from the view that all the genome does is code for protein. Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein. Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).

Well all that has changed. The ENCODE Consortium showed that well over half (and probably all) our DNA is transcribed into RNA — for details see https://en.wikipedia.org/wiki/ENCODE. This takes energy, and it is doubtful (to me at least) that organisms would waste this much energy if the products were not doing something useful.

I’ve discussed microRNAs elsewhere — for details please see — https://luysii.wordpress.com/2010/07/14/junk-dna-that-isnt-and-why-chemistry-isnt-enough/. They don’t code for protein either, but control how much of a given protein is made.

The Neuron paper concerns lncRNAs (long nonCoding RNAs). They don’t code for protein either and contain over 200 nucleotides. There are a lot of them (10,000 – 50,000 are known to be expressed in man. Amazingly 40% of them are expressed in the brain, and not just in adult life, but during embryonic development. Expression of some of them is restricted to specific brain areas. It is easier for an embryologist to tell what type a cell is during brain cortical development by looking at the lncRNAs expressed than by the proteins a given cell is making. The paper contains multiple examples of the lncRNAs controlling when and where a protein is made in the brain.

lncRNAs can contain multiple domains, each of which has a different affinity for a particular RNA (such as the mRNA for a protein), or DNA, or protein. In the nucleus they influence the DNA binding sites of transcription factors, RNA polymerase II, the polycomb repressor complex. The review goes on with many specific examples of lncRNA function — synaptic plasticity, neurotic extension.

Getting back to proteins, the vast majority are nearly the same in all mammals (this is where the 2% Chimpanzee argument comes from). Here is where it gets interesting. Roughly 1/3 of lncRNAs found in man are primate specific. This includes hundreds of lncRNAs found only in man. The paper gives evidence that hundreds of them have shown evidence of positive selection in humans.

So the paper provides yet another mechanism (with far more detail than I’ve been able to provide here) for why our brains are so much larger, and different in many ways than our nearest evolutionary ancestor, the chimpanzee. This is the largest molecular biological difference found so far for the human brain as opposed to every other brain. Fascinating stuff. Stay tuned. I think this is a watershed paper.

Of what use is an inactive enzyme?

Why should a cell take the trouble make an enzyme protein with no enzymatic activity? It takes metabolic energy to store the information for a protein in DNA, transcribe the DNA into RNA and then translate the RNA into protein. Is this junk protein a la junk DNA? Not at all — and therein lies a tale.

All sorts of nasty bugs inveigle their way into cells, among them viruses (such as influenza) whose genome is made of RNA, rather than DNA. Not only that, but in many virus their genome is not single stranded (like mRNA) but double stranded with two RNA strands base paired to each other (just like DNA, except for an extra oxygen on the ribose sugars in the backbone).

Nucleated cells don’t contain much double stranded RNA (dsRNA) outside the nucleus, so it almost always means trouble. An extremely elegant mechanism exists to find and respond to such RNA. Recall that double helix molecules can reach enormous lengths.The 3.2 billion base pairs of our genome, if stretched out, would be more than a yard.

Well we have at least 4 genes which bind dsRNA and then signal trouble. They all make a molecule called 2′ – 5′ oligoadenyic acid (2-5A) from ATP, so they are called OligoAdenylate Syntheses (OASs). The 2-5A, once made wanders about the cell until it finds another enzyme called RNAase L. 2-5A binds to RNAase L causing it to dimerize and become active. RNAase L then destroys all the RNA in the cell, killing it along with the invading virus. Pretty harsh, but it’s one way to stop the virus from spreading and killing more cells.

A recent paper http://www.pnas.org/content/112/13/3949.full concerns OAS3, which has 3 catalytic modules rather than just one like most enzymes. Even worse, 2 of the 3 catalytic modules can’t make 2-5A (but they still can bind dsRNA). OAS3 is a large protein (over 1,000 amino acids), so it has some length to it. The 3 catalytic modules are spread out along OAS3 with the active catalytic module at one end and one of the inactive modules at the other.

The modules at both ends bind dsRNA, but only the active module makes 2-5A when it does. Interestingly, the inactive module binds dsRNA much more strongly than the active one.

OK, you’ve got the picture — what possible use is this rather Byzantine set up?

See if you can figure it out.

It’s incredibly clever and elegant, and shows the danger to regarding anything within the cell as functionless (or junk). Teleology rides supreme in molecular and cellular biology.

Give up?

OAS3 essentially acts as a molecular ruler making 2-5A only when long dsRNA (e.g. over 50 nucleotides long) binds to it. The inactive module gloms onto longish dsRNA, holding it tightly until till Brownian motion brings it to the other end of OAS3 activating the catalytic module to make 2-5A. This is good as the cell normally contains all sorts of shorter RNA duplexes (the binding of microRNAs to the 3′ end of mRNAs come to mind — but they are much shorter (22 nucleotides at most).

Why drug discovery is so hard: Reason #24 — Is the 3′ untranslated region of every mRNA a ceRNA?

We all know what proteins do. They act as enzymes, structural elements of cells, membrane proteins where drugs bind etc. etc. The background the pure chemist needs for what follows can all be found in the category “Molecular Biology Survival Guide.

We also know that that the messenger RNA for any given protein contains a lot more information than that needed to code for the amino acids making up the protein. Forget the introns that are spliced out from the initial transcript. When the mature messenger RNA for a given protein leaves the nucleus for the cytoplasm where the ribosome translates it into protein at either end it contains nucleotides which the ribosome effectively ignores. These are called the untranslated regions (UTRs). The UTRs at the 3′ end of human mRNAs range in length between 60 and 4,000 nucleotides (average 800). It costs energy to store the information for the UTR in DNA, more energy to synthesize the nucleotides which make it up, even more to patch them together to form the UTR, more to package it and move it out of the nucleus etc. etc.

Why bother? Because the 3′ UTR of the mRNA contains a lot of information which tells the cell how much protein to make, how long the mRNA should hang around in the cell (among many other things). A Greek philosopher got here first — “Nature does nothing uselessly” – Aristotle

Those familiar with competitive endogenous RNA (ceRNA) can skip what follows up to the ****

Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.

Molecular biology is full of such semantic cherry bombs as nonCoding DNA (which meant DNA which didn’t cord for protein), a subset of Junk DNA. Another is the pseudogene — these are genes that look like they should code for protein, except that they don’t because of lack of an initiation codon or a premature termination codon. Except for these differences, they have the nucleotide sequence to code for a known protein. It is estimated that the human genome contains as many pseudogenes (20,000) as it contains true protein coding genes [ Genome Res. vol. 12 pp. 272 – 280 ’02 ]. We now know that well over half the genome is transcribed into mRNA, including the pseudogenes.

PTEN (you don’t want to know what it stands for) is a 403 amino acid protein which is one of the most commonly mutated proteins in human cancer. Our genome also contains a pseudogene for it (called PTENP). Interestingly deletion of PTENP (not PTEN) is found in some cancers. However PTENP deletion is associated with decreased amounts of the PTEN protein itself, something you don’t want as PTEN is a tumor suppressor. How PTEN accomplishes this appears to be fairly well known, but is irrelevant here.

Why should loss of PTENP decrease PTEN itself? The reason is because the mRNA made from PTENP, even though it has a premature termination codon, and can’t be made into protein, is just as long, so it also contains the 3′UTR of PTEN. This means PTENP is sopping up microRNAs which would otherwise decrease the level of PTEN. Think of PTENP mRNA as a sponge.

Subtle isn’t it? But there’s far more. At least PTENP mRNA closely resembles the PTEN mRNA. However other mRNAs coding for completely different proteins, also have binding sites in their 3′UTR for the microRNA which binds to the 3UTR of PTEN, resulting in its destruction. So transcription of a completely different gene (the example of ZEB2 is given) can control the abundance of another protein. Essentially its mRNA is acting as a sponge, sopping up the killer microRNA.

It gets worse. Most microRNAs have binding sites on the mRNAs of many different proteins, and PTEN itself has a 3′UTR which binds to 10 different microRNAs.

So here is a completely unexpected mechanism of control of protein levels in the cell. The general term for this is competitive endogenous RNA (ceRNA). Two years ago the number of human microRNAs was thought to be around 1,000 (release 2.0 of miRBase in June ’13 gives the number at 2,555 — this is unlikely to be complete). Unlike protein coding genes, it’s far from obvious how to find them by looking at the sequence of our genome, so there may be quite a few more.

So most microRNAs bind the 3′UTR of more than one protein (the average number is unclear at this point), and most proteins have binding sites for microRNAs in their 3′UTR (again the average number is unclear). What a mess. What subtlety. What an opportunity for the regulation of cellular function. Who is going to be smart enough to figure out a drug which will change this in a way that we want. Absence of evidence of a regulatory mechanism is not evidence of its absence. A little humility is in order.

*****

If this wasn’t a scary enough, consider the following cautionary tale — Nature vol. 505 pp. 212 – 217 ’14. HMGA2 is a protein we thought we understood for the most part. It is found in the nucleus, where it binds to DNA. While it doesn’t transcribe DNA into RNA, it does bind to DNA helping to form a protein complex which binds to DNA which effectively helps promote transcription of certain genes.

Well that’s what the protein does. However the mRNA for the protein uses its 3′ untranslated region (3’UTR) to sop up microRNAs of the let-7 family. The mRNA for HMGA2 is highly overexpressed in human cancer (notably the very common adenocarcinoma of the lung). You can mutate the mRNA for HMGA2 so it doesn’t produce the protein, just by putting a stop codon in it near the 5′ end. Throw the altered mRNA into a tissue culture of an lung adenocarcinoma cell line, and the cell become more proliferative and grows independently of being anchored to the tissue culture plate (e.g. anchorage independence, a biologic marker for cancer).

So what? It means that it is possible that every mRNA for every protein we make is acting as a ceRN A. The authors conclude the paper with ” Such dual-function ceRNA and protein activities necessitate a deeper exploration of the coding genome in biological systems.”

I’ll say. We’re just beginning to scratch the surface. The control mechanisms within the cell continue to amaze (me) by their elegance and subtlety. I doubt highly that we know them all. Yet more reasons that drug discovery is hard — we are mucking about with a system whose workings we only dimly understand.