Tag Archives: lncRNA

BMOR is a bad actor

RNA and proteins have long been known to interact, but classic molecular biology pretty much had proteins down as something that modified RNA function.   Not so for BMOR, a long nonCoding RNA (1,247 nucleotides) expressed in breast cancer cells metastatic to the brain.  BMOR binds to IRF3 (Interferon Regulatory factor 3) inhibiting its phosphorylation by TBK1 with subsequent movement to the nucleus where it stimulates interferon expression which then turns on hundreds of genes producing inflammation.  All this is described in Proc. Natl. Acad. Sci. vol. 119 e2200230119 ’22 —

May 26, 2022
119 (22) e2200230119
Not sure if it is behind a paywall.    Definitely worth a read because knocking down BMOR in breast cancer cells prevents them from spreading to the brain (probably  by using BMOR to turni off the brain’s immune response to them).  Even more interestingly, BMOR was found to be only substantially expressed in breast cancer metastasis to brain tissue versus breast cancer metastasis to nonbrain tissues.

 

 

Teleology as always raises its head.  What in the world is the normal function of BMOR?  It can’t be what it is doing in the animal model described here.  Why would a cell make something to help it kill the organism containing it?

 

Then of course, as is typical of all interesting research, larger questions are raised.  Are there other RNAs whose function is to modify protein function?  Remember that 75% of the genome is transcribed into RNA.  Most of this has been thought of as molecular chaff, like the turnings of a lathe.   Time pick up the chaff from the factory floor and give it a look.

The RNA world strikes again (it never stopped)

Jpx is a long (over 200 nucleotides) nonCoding (for protein that is) RNA (e.g. a lncRNA).  It is an example of the RNA world from which we (presumably) sprang. One of its function is to control another RNA, and a fairly important one at that — namely Xist, which inactivates one of a woman’s two X chromosomes.  The jpx gene is just 10 kiloBases away from that of Xist. Jpx turns on the transcription of Xist which then goes and coats the X chromosome from which it is transcribed, shutting off most of its genes.

One of the mechanisms by which Jpx turns on Xist production is by binding to a protein called CTCF.  CTCF sits on the promoter of the Xist gene until Jpx binds to it displacing CTCF from the promoter.

CTCF is a much better known actor, and along with cohesin is thought to be responsible for the formation of chromosome loops, and the establishment of TADs (topologically associated domains) which are basically loops of chromosomes containing about a million nucleotides with an average of 8 protein coding genes which are coordinately expressed as a result.

That’s fairly impressive.  What happens when you knock out the jpx gene.  [ Cell vol. 184 pp. 6157 – 6173 ’21 ] did just this and all Hell broke loose.  Jpx keeps CTCF from binding promotors, and without jpx thousands of chromosome loops are replaced by others, with downregulation of some 700 protein coding genes.

Again, the RNA world is like some legacy software (think DOS) underlying the latest stuff (think Windows), forgotten but not gone.

The RNA world strikes again

Life is said to have originated in the RNA world.  We all know about the big 3 important RNAs for the cell, mRNA, ribosomal RNA and transfer RNA.  But just like the water, sewer, power and subway systems under Manhattan, there is another world down there in the cell which is just beginning to come into focus

I’ve written several posts about the RNA world in our cells (links at the end), but the latest is really staggering, in that RNA is helping to organize the how our DNA lies in the nucleus.

As usual the discoveries depended new technologies — RD-SPRITE in this case (you don’t want to know what the acronym stands for (by the bye have you noticed how many more acronyms are appearing in papers you read?).  It is extremely complex, but the technique is said to be able to simultaneously map thousands of  RNA and DNA molecules at high resolution relative to all other RNA and DNA molecules.  Details in Cell vol. 184 pp. 5775 – 5790 ’21 .

The count of long nonCoding (for protein that is) RNAs is now in the tens of thousands [ Science vol. 373 pp. 623 – 624 ’21 ]. They have all sorts of functions, but the present work shows that 93% of them stay close to the gene that transcribes them in the nucleus.  Here they bind other proteins in precise territories in the nucleus (because the gene for lncRNAs are found in territories as precise  in the nucleus).   This establishes functional compartments in the nucleus to regulate gene expression.

Interestingly long nonCoding RNAs are transcribed at very low levels, which led people to dismiss them as chaff.  By binding proteins this explains how so few molecules can do so much.

That’s pretty abstract.  Consider Xist, a large nonCoding RNA which inactivates one of the X chromosomes in females.  Just two xists are able to seed a multiprotein cloud around the Xist locus on the X.

Later to be described is Jpx which is crucial in establishing TADs (topologically associated domains)

Here are some older posts on the RNA world

Forgotten but not gone

Forgotten but not gone — take II

Forgotten but not gone — take III

Maybe there really is junk DNA

Until about 20 years ago, molecular biology was incredibly protein-centric.  Consider the following terms — nonsense codon, noncoding DNA, junk DNA.  All are pejorative and arose from the view that all the genome does is code for protein.  Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein.  Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).

The term Junk DNA goes back to the 60s, a time of tremendous hubris as the grand biochemical plan of life was being discovered. People were not embarrassed to use the term ‘central dogma’ which was DNA makes RNA makes protein. It therefore came as a shock once we had a better handle on the size of the genome to discover that less than 2% of it coded for protein. Since much of it was made of repetitive sequences it was called junk DNA.

I never bought it, thinking it very dangerous to dismiss as unimportant what you did not understand or could not measure. Probably this was influenced by my experience as an Air Force M.D. ’68 – ’70 during the Vietnam war.

But now comes a sure to be contentious but well reasoned paper arguing that junk DNA does exist, even though it is occasionally transcribed [ Cell vol. 183 pp. 1151 – 1161 ’20 ]. The paper discusses all RNAs in the cell not part of the ribosome, or small nucleolar RNAs (snoRNAs) or microRNAs.

They note that no enzyme is perfect acting on only the substrate we think evolution optimized it for — they call this promiscuous behavior. So a transcription factor which binds to a particular promoter sequence will also bind to near miss sequence. Moreover such near misses are constantly being generated in our genome by random mutation. This is why they think that the ENCODE (ENCyopedia Of Dna Elements) found that the entire genome is transcribed into RNA. The implication made by many is that this must be functional.

However many random pieces of DNA can activate transcription [ Genes Dev. vol. 30 pp. 1895 – 1907 ’16 ] producing what the authors call transcriptional noise.

There is evidence that the cell has evolved a way to stop some of this. U1 snRNP recognizes the 5′ splice site motif. It is present in nuclei at an order of magnitude higher than other spliceosomal subcomplexes, so it monitors for RNAs which have a 5′ splice site motif but which lack the 3′ splice site. These RNAs are subsequently destroyed, never making it out of the nucleus.

They think the primary function of lncRNA is chromatin remodeling affecting gene expression — this is certainly true of XIST which silences one of the two X chromosomes females carry.

There is a lot more very technical molecular biology and close reasoning in the paper, but this should be enough to whet your interest. It is well worth reading. Probably, like me, you’ll be mentally arguing with the authors as you read it, but that’s the sign of a good paper.

Now for a question which has always puzzled me. Consider the leprosy organism. It’s a mycobacterium (like the organism causing TB), but because it essentially is confined to man, and lives inside humans for most of its existence, it has jettisoned large parts of its genome, first by throwing about 1/3 of it out (the genome is 1/3 smaller than TB from which it is thought to have diverged 66 million years ago), and second by mutation of many of its genes so protein can no longer be made from them. Why throw out all that DNA? The short answer is that it is metabolically expensive to produce and maintain DNA that you’re not using

If you want a few numbers here they are:
Genome of M. TB 4,441,529 nucleotides
Genome of M. Leprae 3,268,203 nucleotides

Clearly microorganisms are under high selective pressure, and the paper says that humans are under almost none, but it seems to me that multicellular organisms would have found a way to get rid of DNA it doesn’t need.

It may well be that all this DNA and the RNA transcribed from it is evolutionary potting soil, waiting for some new environmental stress to put it to use.

Marshall McLuhan rides again

Marshall McLuhan famously said “the medium is the message”. Who knew he was talking about molecular biology?  But he was, if you think of the process of transcription of DNA into various forms of RNA as the medium and the products of transcription as the message.  That’s exactly what this paper [ Cell vol. 171 pp. 103 – 119 ’17 ] says.

T cells are a type of immune cell formed in the thymus.  One of the important transcription factors which turns on expression of the genes which make a T cell a Tell is called Bcl11b.  Early in T cell development it is sequestered away near the nuclear membrane in highly compacted DNA. Remember that you must compress your 1 meter of DNA down by 100,000fold to have it fit in the nucleus which is 1/100,000th of a meter (10 microns).

What turns it on?  Transcription of nonCoding (for protein) RNA calledThymoD.  From my reading of the paper, ThymoD doesn’t do anything, but just the act of opening up compacted DNA near the nuclear membrane produced by transcribing ThymoD is enough to cause this part of the genome to move into the center of the nucleus where the gene for Bcl11b can be transcribed into RNA.

There’s a lot more to the paper,  but that’s the message if you will.  It’s the act of transcription rather than what is being transcribed which is important.

Here’s more about McLuhan — https://en.wikipedia.org/wiki/Marshall_McLuhan

If some of the terms used here are unfamiliar — look at the following post and follow the links as far as you need to.  https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/

Well that was an old post.  Here’s another example [ Cell vol. 173 pp. 1318 – 1319, 1398 – 1412 ’18 ] It concerns a gene called PVT1 (Plasmacytoma Variant Translocation 1) found 25 years ago.  It was the first gene coding for a long nonCoding (for proteins RNA (lncRNA) found as a recurrent breakpoint in Burkitt’s lymphoma, which sadly took a friend (Nick Cozzarelli) far too young as (he edited PNAS for 10 years).

So PVT1 is involved in cancer.  The translocation turns on expression of the myc oncogene, something that has been studied out the gazoo and we’re still not sure of how it causes cancer. I’ve got 60,000 characters of notes on the damn thing, but as someone said 6 years ago “Whatever the latest trend in cancer biology — cell cycle, cell growth, apoptosis, metabolism, cancer stem cells, microRNAs, angiogenesis, inflammation — Myc is there regulating most of the key genes”

We do know that the lncRNA coded by PVT1 in some way stabilizes the myc protein [ Nature vol. 512 pp. 82 – 87 ’14 ].  However the cell experiments knocked out the lncRNA of PVT1 and myc expression was still turned on.

PVT1 resides 53 kiloBases away from myc on chromosome #8.  That’s about 17% of the diameter of the average nucleus (10 microns) if the DNA is stretched out into the B-DNA form seen in all the textbooks.  Since each base is 3.3 Angstroms thick that’s 175,000 Angstroms 17,500 nanoMeters 1.7 microns.  You can get an idea of how compacted DNA is in the nucleus when you realize that there are 3,200,000,000/53,000 = 60,000 such segments in the genome all packed into a sphere 10 microns in diameter.

To cut to the chase, within the PVT1 gene there are at least 4 enhancers (use the link above to find what all the terms to be used actually mean).  Briefly enhancers are what promoters bind to to help turn on the transcription of the genes in DNA into RNA (messenger and otherwise).  This means that the promoter of PVT1 binds one or more of the enhancers, preventing the promoter of the myc oncogene from binding.

Just how they know that there are 4 enhancers in PVT1 is a story in itself.  They cut various parts of the PVT1 gene (which itself has 306,721 basepairs) out, and place it in front of a reporter gene and see if transcription increases.

The actual repressor of myc is the promoter of PVT1 according to the paper (it binds to the enhancers present in the gene body preventing the myc promoter from doing so).  Things may be a bit more complicated as the PVT1 gene also codes for a cluster of 7 microRNAs and what they do isn’t explained in the paper.

So it’s as if the sardonic sense of humor of ‘nature’, ‘evolution’, ‘God’, (call it what you will) has set molecular biologists off on a wild goose chase, looking at the structure of the gene product (the lncRNA) to determine the function of the gene, when actually it’s the promoter in front of the gene and the enhancers within which are performing the function.

The mechanism may be more widespread, as 4/36 lncRNA promoters silenced by CRISPR techniques subsequently activated genes in a 1 megaBase window (possibly by the same mechanism as PVT1 and myc).

Where does McLuhan come in?  The cell paper also notes that lncRNA gene promoters are more evolutionarily conserved than their gene bodies.  So it’s the medium (promoter, enhancer) is the message once again (rather than what we thought the message was).

 

It ain’t the bricks it’s the plan — take II

A recent review in Neuron (vol. 88 pp. 681 – 677 ’15) gives a possible new explanation of how our brains came to be so different from apes (if not our behavior of late).

You’ve all heard that our proteins are only 2% different than the chimp, so we are 98% chimpanzee. The facts are correct, the interpretation wrong. We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

This is like saying Monticello and Independence Hall are just the same because they’re both made out of bricks. One could chemically identify Monticello bricks as coming from the Virginia piedmont, and Independence Hall bricks coming from the red clay of New Jersey, but the real difference between the buildings is the plan.

It’s not the proteins, but where and when and how much of them are made. The control for this (plan if you will) lies outside the genes for the proteins themselves, in the rest of the genome (remember only 2% of the genome codes for the amino acids making up our 20,000 or so protein genes). The control elements have as much right to be called genes, as the parts of the genome coding for amino acids. Granted, it’s easier to study genes coding for proteins, because we’ve identified them and know so much about them. It’s like the drunk looking for his keys under the lamppost because that’s where the light is.

We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

All the molecular biology you need to understand what follows is in the following post — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure.

The neuron paper is detailed and fascinating to a neurologist, but toward the end it begins to fry far bigger fish.

Until about 10 years ago, molecular biology was incredibly protein-centric. Consider the following terms — nonsense codon, noncoding DNA, junk DNA. All are pejorative and arose from the view that all the genome does is code for protein. Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein. Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).

Well all that has changed. The ENCODE Consortium showed that well over half (and probably all) our DNA is transcribed into RNA — for details see https://en.wikipedia.org/wiki/ENCODE. This takes energy, and it is doubtful (to me at least) that organisms would waste this much energy if the products were not doing something useful.

I’ve discussed microRNAs elsewhere — for details please see — https://luysii.wordpress.com/2010/07/14/junk-dna-that-isnt-and-why-chemistry-isnt-enough/. They don’t code for protein either, but control how much of a given protein is made.

The Neuron paper concerns lncRNAs (long nonCoding RNAs). They don’t code for protein either and contain over 200 nucleotides. There are a lot of them (10,000 – 50,000 are known to be expressed in man. Amazingly 40% of them are expressed in the brain, and not just in adult life, but during embryonic development. Expression of some of them is restricted to specific brain areas. It is easier for an embryologist to tell what type a cell is during brain cortical development by looking at the lncRNAs expressed than by the proteins a given cell is making. The paper contains multiple examples of the lncRNAs controlling when and where a protein is made in the brain.

lncRNAs can contain multiple domains, each of which has a different affinity for a particular RNA (such as the mRNA for a protein), or DNA, or protein. In the nucleus they influence the DNA binding sites of transcription factors, RNA polymerase II, the polycomb repressor complex. The review goes on with many specific examples of lncRNA function — synaptic plasticity, neurotic extension.

Getting back to proteins, the vast majority are nearly the same in all mammals (this is where the 2% Chimpanzee argument comes from). Here is where it gets interesting. Roughly 1/3 of lncRNAs found in man are primate specific. This includes hundreds of lncRNAs found only in man. The paper gives evidence that hundreds of them have shown evidence of positive selection in humans.

So the paper provides yet another mechanism (with far more detail than I’ve been able to provide here) for why our brains are so much larger, and different in many ways than our nearest evolutionary ancestor, the chimpanzee. This is the largest molecular biological difference found so far for the human brain as opposed to every other brain. Fascinating stuff. Stay tuned. I think this is a watershed paper.

Why drug discovery is so hard: Reason #21 — RNA sequences won’t help you determine function

We are just beginning to understand all the things RNA does in the cell, despite its importance obvious to all for half a century (think messenger RNA which goes back that far).  This means that RNA is likely to be a target of useful drugs.  Posts #4, #11 and #20 concern some of the more newly discovered effects of RNA in the cell.

While we’re still discovering proteins with no obvious resemblance  in their amino acid sequence to known proteins, most of them do have some resemblance we’ve seen before.  So if we see a kinase-like domain, or a group of 7 rather hydrophobic sequences, we have a leg up on what that protein is actually doing.

A similar attack (comparing sequences to RNAs of known function) should help us figure out what some of the RNAs in the cell not coding for protein are actually doing.  If you see a mistyke in this sentence, you still probably know what I meant (e.g. how that word is meant to function in the sentence).  That’s the hope underlying the technique anyway

Recent work in the zebrafish [ Cell vol. 147 pp. 1537 – 1550 ’11 ] shows that this isn’t very likely in the RNA world. For some background on large intervening nonCoding RNAs (lincRNAs — aka lncRNAs) see https://luysii.wordpress.com/2011/03/02/we-dont-know-all-the-players-which-is-why-finding-good-drugs-is-so-hard/.  The zebrafish has become a plaything of embryologists (because it is transparent, and because like most fish (except sharks) it is a vertebrate.

At any rate the work found some 550 distinct lincRNAs in the zebrafish.  But only 29 had detectable sequence similarity with lincRNAs in mammals (which are just as numerous).  Even though chromosomes have been scrambled many times over geologic time, many genes near each other in the zebrafish are near each other in humans as well (the term for this is synteny).  This means one can look at DNA to see where the lincRNA is binding in two organisms, and infer that they’re doing something similar physiologically if they are binding to a syntenic site.

So they did this and found some  lincRNAs with almost no sequence similarity to each other binding to identical syntenic sites in man and zebrafish.  Next they used antisense reagents targeting the small regions of the lincRNAs conserved between us and fish and produced developmental defects (in the fish)  Amazingly, despite very little sequence similarity, human orthologs (determined by synteny) could prevent the embryological defects.

So in this case at least, and probably more generally, we’re not going to be able to look at the sequence of lincRNAs (or the many other types of non messenger RNAs present in the cell) and infer what they are doing.  This will make drug discovery in this area even harder.