Tag Archives: competitive endogenous RNA

Forgotten but not gone — take II

The RNA world from whence we sprang strikes again, this time giving us a glimpse into its own internal dynamic.  18 months ago I wrote the following post — which will give you the background to follow the latest (found at the end after the (***)

Life is said to have originated in the RNA world.  We all know about the big 3 important RNAs for the cell, mRNA, ribosomal RNA and transfer RNA.  But just like the water, sewer, power and subway systems under Manhattan, there is another world down there in the cell which doesn’t much get talked about.  These areRNAs, whose primary (and possibly only) function is to interact with other RNAs.

Start with microRNAs (of which we have at least 1,500 as of 12/12).  Their function is to bind to messenger RNA (mRNA) and inhibit translation of the mRNA into protein.  The effects aren’t huge, but they are a more subtle control of protein expression, than the degree of transcription of the gene.

Then there are ceRNAs (competitive endogenous RNAs) which have a large number of binding sites for microRNAs — humans have a variety of them all with horrible acronyms — HULC, PTCSC3 etc. etc. They act as sponges for microRNAs keeping them bound and quiet.

Then there are circular RNAs.  They’d been missed until recently, because typical RNA sequencing methods isolate only RNAs with characteristic tails, and a circular RNA doesn’t have any.  One such is called CiRS7/CDR1) which contain 70 binding sites for one particular microRNA (miR-7).  They are unlike to be trivial.  They are derived from 15% of actively transcribed genes.  They ‘can be’ 10 times as numerous as linear RNAs (like mRNA and everything else) — probably because they are hard to degrade < Science vol. 340 pp. 440 – 441 ’17 >. So some of them are certainly RNA sponges — but all of them?

The latest, and most interesting class are the nonCoding RNAs found in viruses. Some of them function to attack cellular microRNAs and help the virus survive. Herpesvirus saimiri a gamma-herpes virus establishes latency in the T lymphocytes of New World primates, by expressing 7 small nuclear uracil-rich nonCoding RNAs (called HSURs).  They associate with some microRNAs, and rather than blocking their function act as chaperones < Nature vol. 550 pp. 275 – 279 ’17 >.  They HSURs also bind to some mRNAs inhibiting their function — they do this by helping miR-16 bind to their targets — so they are chaperones.  So viral Sm-class RNAs may function as microRNA adaptors.

Do you think for one minute, that the cell isn’t doing something like this.

I have a tendency to think of RNAs as always binding to other RNAs by classic Watson Crick base pairing — this is wrong as a look at any transfer RNA structure will show. https://en.wikipedia.org/wiki/Transfer_RNA.  Far more complicated structures may be involved, but we’ve barely started to look.

Then there are the pseudogenes, which may also have a function, which is to be transcribed and sop up microRNAs and other things — I’ve already written about this — https://luysii.wordpress.com/2010/07/14/junk-dna-that-isnt-and-why-chemistry-isnt-enough/.  Breast cancer cells think one (PTEN1) is important enough to stop it from being transcribed, even though it can’t be translated into protein.

*****

[ Proc. Natl. Acad. Sci. vol. 116 pp. 7455 – 7464 ’19 ] The work reports a fascinating example of that early world in which the function of one denizen (a circular RNA called cPWWP2A) binds to another denizen of that world (microRNA 579 aka miR-579) acting as a sponge sopping up so it can’t bind to the mRNAs for angiopoetitin1, occludin and SIRT1.

So what you say?  Well it may lead to a way to treat diabetic retinopathy. How did they find cPWWP2A?  They used the Shanghai BIotechnology Company Mouse Circular RNA microArray which measures circular RNAs.  They found that 400 or so that were upregulated in diabetic retinopathy and another 400 or so that were downregulated.  cPWWP2A was on of the 3 top upregulated circular RNAs in diabetic retinopathy.  cPWWP2A comes from (what else?) PWWP2A, a gene coding for a protein which specifically binds the histone protein H2A.Z.

Overexpression of cPWW2PA or inhibition of miR-579 improves retinal vascular dysfunction in experimental diabetes.

So here is all this stuff going on way down there in the RNA world, first interacting with other players in this world and eventually reaching up to the level we thought we knew about and controlling gene expression.  It’s sort of like DOS (Disc Operating System) still being important in Windows.

How much more stuff like this is to be discovered controlling gene expression in us is anyone’s guess

Forgotten but not gone

Life is said to have originated in the RNA world.  We all know about the big 3 important RNAs for the cell, mRNA, ribosomal RNA and transfer RNA.  But just like the water, sewer, power and subway systems under Manhattan, there is another world down there in the cell which doesn’t much get talked about.  These are RNAs, whose primary (and possibly only) function is to interact with other RNAs.

Start with microRNAs (of which we have at least 1,500 as of 12/12).  Their function is to bind to messenger RNA (mRNA) and inhibit translation of the mRNA into protein.  The effects aren’t huge, but they are a more subtle control of protein expression, than the degree of transcription of the gene.

Then there are ceRNAs (competitive endogenous RNAs) which have a large number of binding sites for microRNAs — humans have a variety of them all with horrible acronyms — HULC, PTCSC3 etc. etc. They act as sponges for microRNAs keeping them bound and quiet.

Then there are circular RNAs.  They’d been missed until recently, because typical RNA sequencing methods isolate only RNAs with characteristic tails, and a circular RNA doesn’t have any.  One such is called CiRS7/CDR1) which contain 70 binding sites for one particular microRNA (miR-7).  They are unlike to be trivial.  They are derived from 15% of actively transcribed genes.  They ‘can be’ 10 times as numerous as linear RNAs (like mRNA and everything else) — probably because they are hard to degrade < Science vol. 340 pp. 440 – 441 ’17 >. So some of them are certainly RNA sponges — but all of them?

The latest, and most interesting class are the nonCoding RNAs found in viruses. Some of them function to attack cellular microRNAs and help the virus survive. Herpesvirus saimiri a gamma-herpes virus establishes latency in the T lymphocytes of New World primates, by expressing 7 small nuclear uracil-rich nonCoding RNAs (called HSURs).  They associate with some microRNAs, and rather than blocking their function act as chaperones < Nature vol. 550 pp. 275 – 279 ’17 >.  They HSURs also bind to some mRNAs inhibiting their function — they do this by helping miR-16 bind to their targets — so they are chaperones.  So viral Sm-class RNAs may function as microRNA adaptors.

Do you think for one minute, that the cell isn’t doing something like this.

I have a tendency to think of RNAs as always binding to other RNAs by classic Watson Crick base pairing — this is wrong as a look at any transfer RNA structure will show. https://en.wikipedia.org/wiki/Transfer_RNA.  Far more complicated structures may be involved, but we’ve barely started to look.

Then there are the pseudogenes, which may also have a function, which is to be transcribed and sop up microRNAs and other things — I’ve already written about this — https://luysii.wordpress.com/2010/07/14/junk-dna-that-isnt-and-why-chemistry-isnt-enough/.  Breast cancer cells think one (PTEN1) is important enough to stop it from being transcribed, even though it can’t be translated into protein.

SmORFs and DWORFs — has molecular biology lost its mind?

There’s Plenty of Room at The Bottom is a famous talk given by Richard Feynman 56 years ago. He was talking about something not invented until decades later — nanotechnology. He didn’t know that the same advice now applies to molecular biology. The talk itself is well worth reading — here’s the link http://www.zyvex.com/nanotech/feynman.html.

Those not up to speed on molecular biology can find what they need at — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/. Just follow the links (there are only 5) in the series.

lncRNA stands for long nonCoding RNA — nonCoding for protein that is. Long is taken to mean over 200 nucleotides. There is considerable debate concerning how many there are — but “most estimates place the number in the tens of thousands” [ Cell vol. 164 p. 69 ’16 ]. Whether they have any cellular function is also under debate. Could they be like the turnings from a lathe, produced by the various RNA polymerases we have (3 actually) simply transcribing the genome compulsively? I doubt this, because transcription takes energy and cells are a lot of things but wasteful isn’t one of them.

Where does Feynmann come in? Because at least one lncRNA codes for a very small protein using a Small Open Reading Frame (SMORF) to do so. The protein in question is called DWORF (for DWorf Open Reading Frame). It contains only 34 amino acids. Its function is definitely not trivial. It binds to something called SERCA, which is a large enzyme in the sarcoplasmic reticulum of muscle which allows muscle to relax after contracting. Muscle contraction occurs when calcium is released from the endoplasmic reticulum of muscle.  SERCA takes the released calcium back into the endoplasmic reticulum allowing muscle to contract. So repetitive muscle contraction depends on the flow and ebb of calcium tides in the cell. Amazingly there are 3 other small proteins which also bind to SERCA modifying its function. Their names are phospholamban (no kidding) sarcolipin and myoregulin — also small proteins of 52, 31 and 46 amino acids.

So here is a lncRNA making an oxymoron of its name by actually coding for a protein. So DWORF is small, but so are its 3 exons, one of which is only 4 amino acids long. Imagine the gigantic spliceosome which has a mass over 1,300,000 Daltons, 10,574 amino acids making up 37 proteins, along with several catalytic RNAs, being that precise and operating on something that small.

So there’s a whole other world down there which we’ve just begun to investigate. It’s probably a vestige of the RNA world from which life is thought to have sprung.

Then there are the small molecules of intermediary metabolism. Undoubtedly some of them are used for control as well as metabolism. I’ll discuss this later, but the Human Metabolome DataBase (HMDB) has 42,000 entries and METLIN, a metabolic database has 240,000 entries.

Then there is competitive endogenous RNA –https://luysii.wordpress.com/2012/01/29/why-drug-discovery-is-so-hard-reason-20-competitive-endogenous-rnas/

Do you need chemistry to understand this? Yes and no. How the molecules do what they do is the province of chemistry. The description of their function doesn’t require chemistry at all. As David Hilbert said about axiomatizing geometry, you don’t need points, straight lines and planes You could use tables, chairs and beer mugs. What is important are the relations between them. Ditto for the chemical entities making us up.

I wouldn’t like that.  It’s neat to picture in my mind our various molecular machines, nuts and bolts doing what they do.  It’s a much richer experience.  Not having the background is being chemical blind..  Not a good thing, but better than nothing.

The old year goes out with a bang

A huge amount of cellular genomics will have to be redone if the following paper is replicated. Remember “Extraordinary claims require extraordinary evidence.” Carl Sagan.

What’s all the shouting about? Normally when you think about messenger RNA (mRNA) as it exists in the cytoplasm after the initial transcript is significantly massaged in the nucleus, you think about the part that codes for amino acids. This ‘coding region’ is the part that is translated into amino acids by the ribosome. But mRNA is invariably larger having nucleotides at each end (3′ and 5′) which have other uses. These are called the 3′ Untranslated Region (3′ UTR) and 5′ Untranslated Region (5′ UTR).

So if you do single cell RNA sequencing (which we can do now) it shouldn’t matter what nucleotide sequence you search for (5′ UTR, 3′ UTR or the coding region) as all mRNA contains one of each.

Not so says this paper [ Neuron vol. 88 pp. 1149 – 1156 ’15 ].

Given the mRNA for a given protein in a single cell, using a probe for the 3’UTR and a probe for the coding sequence should give you the same abundance for both. That’s not what they found at all for single neurons from the brain. In some cases there was much more RNA coding for the 3’UTR than for the coding segment of a given mRNA for a protein. In others there was much less. Even more impressively is that the 3’UTR/(3’UTR + coding) ratio for a given protein varies between different parts of the brain. Obviously this ratio should be .5 given what we knew about mRNA in the past. The ratio has to be between 0 and 1.

Well they looked at a lot of proteins. The did find around 1,400 genes with a ratio of .5 (as expected), but they found 700 showing a ratio of .2 (lots more 3’UTR than coding sequence), and 1,100 showing a ratio of .8. Overall plotting the ratio vs. number of genes with that ratio gives something looking like a bell curve (Gaussian distribution).

It’s long been known that mRNA levels don’t exactly correlate with the levels of proteins made from them. If there’s lots of 3’UTRs around the authors found that there was relatively little protein made from the gene.

A variety of brain atlases have published mRNA abundances for various regions of the brain. If they just used one probe (as they probably did) this is clearly not enough.

The 3’UTRs may be acting as ceRNAs (competitive endogenous RNAs). These have been known for years — I’ve included a post of 3 years ago on the subject (at the end).

So this work (if replicated) throws everything we thought we knew about mRNA into a cocked hat. It’s why I love science, there’s always something really new to think about. Happy New Year !!!

Chemiotics II
Lotsa stuff, basically scientific — molecular biology, organic chemistry, medicine (neurology), math — and music
Why drug discovery is so hard: reason #20 — competitive endogenous RNAs

The chemist will appreciate le Chatelier’s principle in action in what follows. We are far from knowing all the players controlling cellular behavior. So how in the world will we find drugs to change cellular behavior when we don’t know all the things affecting it. The latest previously unknown cellular player to enter the lists are competitive endogenous RNAs (ceRNAs). For details see Cell vol. 147 pp. 344 – 357, 382 – 395 ’11. The background the pure chemist needs for what follows can all be found in the category “Molecular Biology Survival Guide.

Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.

Molecular biology is full of such semantic cherry bombs as nonCoding DNA (which meant DNA which didn’t cord for protein), a subset of Junk DNA. Another is the pseudogene — these are genes that look like they should code for protein, except that they don’t because of lack of an initiation codon or a premature termination codon. Except for these differences, they have the nucleotide sequence to code for a known protein. It is estimated that the human genome contains as many pseudogenes (20,000) as it contains true protein coding genes [ Genome Res. vol. 12 pp. 272 – 280 ’02 ]. We now know that well over half the genome is transcribed into mRNA, including the pseudogenes.

PTEN (you don’t want to know what it stands for) is a 403 amino acid protein which is one of the most commonly mutated proteins in human cancer. Our genome also contains a pseudogene for it (called PTENP). Interestingly deletion of PTENP (not PTEN) is found in some cancers. However PTENP deletion is associated with decreased amounts of the PTEN protein itself, something you don’t want as PTEN is a tumor suppressor. How PTEN accomplishes this appears to be fairly well known, but is irrelevant here.

Why should loss of PTENP decrease PTEN itself? The reason is because the mRNA made from PTENP, even though it has a premature termination codon, and can’t be made into protein, is just as long, so it also contains the 3’UTR of PTEN. This means PTENP is sopping up microRNAs which would otherwise decrease the level of PTEN. Think of PTENP mRNA as a sponge.

Subtle isn’t it? But there’s far more. At least PTENP mRNA closely resembles the PTEN mRNA. However other mRNAs coding for completely different proteins, also have binding sites in their 3’UTR for the microRNA which binds to the 3UTR of PTEN, resulting in its destruction. So transcription of a completely different gene (the example of ZEB2 is given) can control the abundance of another protein. Essentially its mRNA is acting as a sponge, sopping up the killer microRNA.

It gets worse. Most microRNAs have binding sites on the mRNAs of many different proteins, and PTEN itself has a 3’UTR which binds to 10 different microRNAs.

So here is a completely unexpected mechanism of control of protein levels in the cell. The general term for this is competitive endogenous RNA (ceRNA). Two years ago the number of human microRNAs was thought to be around 1,000. Unlike protein coding genes, it’s far from obvious how to find them by looking at the sequence of our genome, so there may be quite a few more.

So most microRNAs bind the 3’UTR of more than one protein (the average number is unclear at this point), and most proteins have binding sites for microRNAs in their 3’UTR (again the average number is unclear). What a mess. What subtlety. What an opportunity for the regulation of cellular function. Who is going to be smart enough to figure out a drug which will change this in a way that we want. Absence of evidence of a regulatory mechanism is not evidence of its absence. A little humility is in order.

Les fleurs du PTEN

Les fleurs du Mal is a volume of poetry by Baudelaire about the beauty of evil and depravity. I have the same esthetic appreciation for the horrible things a mutant of PTEN does. It’s awful, but incredibly elegant chemically.

Back in the day med students used to be told ‘know syphylis and you’ll know medicine’ because of its varied clinical manifestations. PTEN is like that for cellular and molecular biology.

PTEN (Phosphatase and TENsin homolog) is a gene mutated in many forms of cancer. So it was regarded as a tumor suppressor, keeping our cells on the straight and narrow. Naturally cancer cells ‘try’ (note the anthropomorphism) to neutralize it. PI3K is a universal tumor driver, integrating growth factor signaling with downstream circuitries of cell proliferation, metabolism and survival.

Inositol is a 6 membered ring (all carbons) with one OH group attached to each carbon, which are numbered 1 through 6. PI3K puts phosphate on the 3 position, PTEN takes it off. Since this is how PI3K signaling begins, cells lacking PTEN grow faster and migrate aberrantly (e.g. spread).

Enter Proc. Natl. Acad. Sci. vol. 112 pp. 13976 – 13981 ’15 which carefully studied a PTEN mutant found in an unfortunate man with aggressive prostate cancer. It just changed one of the 403 amino acids (#126) from alanine to glycine. Not a big deal you say,it’s just a change of CH3 (alanine) to H (glycine). #126 is near the active site of the enzyme. One might expect that the mutation inhibits PTEN’s phosphatase activity (e.g. its enzymatic activity). Not so — the mutations shifts the activity so the enzyme. Instead of removing phosphate from the 3 position of inositol, the phosphate at the 5 position is removed (leaving the 3 position alone). This shifts inositol phosphate levels in the cell with hyperactivation of PI3K signaling (which requires inositol phospholipids containing phosphate at the 3 position).

What happens is that inositol phosphates fit into the mutant active site with the 5 position near the catalytic amino acid (cysteine). Essentially the 6 membered ring rotates the 3 position away from cysteine and puts the 5 position there instead. This changes PTEN from a tumor suppressor (anti-oncogene) to an oncogene.

To a chemist this is elegant and beautiful (apologies Baudelaire).

PTEN has taught us a huge amount about the control of protein levels, pseudogenes, competitive endogenous RNA (ceRNA). You can read all about this in https://luysii.wordpress.com/2014/01/20/why-drug-discovery-is-so-hard-reason-24-is-the-3-untranslated-region-of-every-protein-a-cerna/

That’s fairly grim, so here’s a link to one of the great comedians of years past — Jonathan Winters

http://biggeekdad.com/2013/04/jonathan-winters-stick/

It’s politically incorrect and sure to offend the humorless pompous prigs. Enjoy ! ! !

Why drug discovery is so hard: Reason #24 — Is the 3′ untranslated region of every mRNA a ceRNA?

We all know what proteins do. They act as enzymes, structural elements of cells, membrane proteins where drugs bind etc. etc. The background the pure chemist needs for what follows can all be found in the category “Molecular Biology Survival Guide.

We also know that that the messenger RNA for any given protein contains a lot more information than that needed to code for the amino acids making up the protein. Forget the introns that are spliced out from the initial transcript. When the mature messenger RNA for a given protein leaves the nucleus for the cytoplasm where the ribosome translates it into protein at either end it contains nucleotides which the ribosome effectively ignores. These are called the untranslated regions (UTRs). The UTRs at the 3′ end of human mRNAs range in length between 60 and 4,000 nucleotides (average 800). It costs energy to store the information for the UTR in DNA, more energy to synthesize the nucleotides which make it up, even more to patch them together to form the UTR, more to package it and move it out of the nucleus etc. etc.

Why bother? Because the 3′ UTR of the mRNA contains a lot of information which tells the cell how much protein to make, how long the mRNA should hang around in the cell (among many other things). A Greek philosopher got here first — “Nature does nothing uselessly” – Aristotle

Those familiar with competitive endogenous RNA (ceRNA) can skip what follows up to the ****

Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.

Molecular biology is full of such semantic cherry bombs as nonCoding DNA (which meant DNA which didn’t cord for protein), a subset of Junk DNA. Another is the pseudogene — these are genes that look like they should code for protein, except that they don’t because of lack of an initiation codon or a premature termination codon. Except for these differences, they have the nucleotide sequence to code for a known protein. It is estimated that the human genome contains as many pseudogenes (20,000) as it contains true protein coding genes [ Genome Res. vol. 12 pp. 272 – 280 ’02 ]. We now know that well over half the genome is transcribed into mRNA, including the pseudogenes.

PTEN (you don’t want to know what it stands for) is a 403 amino acid protein which is one of the most commonly mutated proteins in human cancer. Our genome also contains a pseudogene for it (called PTENP). Interestingly deletion of PTENP (not PTEN) is found in some cancers. However PTENP deletion is associated with decreased amounts of the PTEN protein itself, something you don’t want as PTEN is a tumor suppressor. How PTEN accomplishes this appears to be fairly well known, but is irrelevant here.

Why should loss of PTENP decrease PTEN itself? The reason is because the mRNA made from PTENP, even though it has a premature termination codon, and can’t be made into protein, is just as long, so it also contains the 3′UTR of PTEN. This means PTENP is sopping up microRNAs which would otherwise decrease the level of PTEN. Think of PTENP mRNA as a sponge.

Subtle isn’t it? But there’s far more. At least PTENP mRNA closely resembles the PTEN mRNA. However other mRNAs coding for completely different proteins, also have binding sites in their 3′UTR for the microRNA which binds to the 3UTR of PTEN, resulting in its destruction. So transcription of a completely different gene (the example of ZEB2 is given) can control the abundance of another protein. Essentially its mRNA is acting as a sponge, sopping up the killer microRNA.

It gets worse. Most microRNAs have binding sites on the mRNAs of many different proteins, and PTEN itself has a 3′UTR which binds to 10 different microRNAs.

So here is a completely unexpected mechanism of control of protein levels in the cell. The general term for this is competitive endogenous RNA (ceRNA). Two years ago the number of human microRNAs was thought to be around 1,000 (release 2.0 of miRBase in June ’13 gives the number at 2,555 — this is unlikely to be complete). Unlike protein coding genes, it’s far from obvious how to find them by looking at the sequence of our genome, so there may be quite a few more.

So most microRNAs bind the 3′UTR of more than one protein (the average number is unclear at this point), and most proteins have binding sites for microRNAs in their 3′UTR (again the average number is unclear). What a mess. What subtlety. What an opportunity for the regulation of cellular function. Who is going to be smart enough to figure out a drug which will change this in a way that we want. Absence of evidence of a regulatory mechanism is not evidence of its absence. A little humility is in order.

*****

If this wasn’t a scary enough, consider the following cautionary tale — Nature vol. 505 pp. 212 – 217 ’14. HMGA2 is a protein we thought we understood for the most part. It is found in the nucleus, where it binds to DNA. While it doesn’t transcribe DNA into RNA, it does bind to DNA helping to form a protein complex which binds to DNA which effectively helps promote transcription of certain genes.

Well that’s what the protein does. However the mRNA for the protein uses its 3′ untranslated region (3’UTR) to sop up microRNAs of the let-7 family. The mRNA for HMGA2 is highly overexpressed in human cancer (notably the very common adenocarcinoma of the lung). You can mutate the mRNA for HMGA2 so it doesn’t produce the protein, just by putting a stop codon in it near the 5′ end. Throw the altered mRNA into a tissue culture of an lung adenocarcinoma cell line, and the cell become more proliferative and grows independently of being anchored to the tissue culture plate (e.g. anchorage independence, a biologic marker for cancer).

So what? It means that it is possible that every mRNA for every protein we make is acting as a ceRN A. The authors conclude the paper with ” Such dual-function ceRNA and protein activities necessitate a deeper exploration of the coding genome in biological systems.”

I’ll say. We’re just beginning to scratch the surface. The control mechanisms within the cell continue to amaze (me) by their elegance and subtlety. I doubt highly that we know them all. Yet more reasons that drug discovery is hard — we are mucking about with a system whose workings we only dimly understand.