Tag Archives: lncRNA

Marshall McLuhan rides again

Marshall McLuhan famously said “the medium is the message”. Who knew he was talking about molecular biology?  But he was, if you think of the process of transcription of DNA into various forms of RNA as the medium and the products of transcription as the message.  That’s exactly what this paper [ Cell vol. 171 pp. 103 – 119 ’17 ] says.

T cells are a type of immune cell formed in the thymus.  One of the important transcription factors which turns on expression of the genes which make a T cell a Tell is called Bcl11b.  Early in T cell development it is sequestered away near the nuclear membrane in highly compacted DNA. Remember that you must compress your 1 meter of DNA down by 100,000fold to have it fit in the nucleus which is 1/100,000th of a meter (10 microns).

What turns it on?  Transcription of nonCoding (for protein) RNA calledThymoD.  From my reading of the paper, ThymoD doesn’t do anything, but just the act of opening up compacted DNA near the nuclear membrane produced by transcribing ThymoD is enough to cause this part of the genome to move into the center of the nucleus where the gene for Bcl11b can be transcribed into RNA.

There’s a lot more to the paper,  but that’s the message if you will.  It’s the act of transcription rather than what is being transcribed which is important.

Here’s more about McLuhan — https://en.wikipedia.org/wiki/Marshall_McLuhan

If some of the terms used here are unfamiliar — look at the following post and follow the links as far as you need to.  https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/

Well that was an old post.  Here’s another example [ Cell vol. 173 pp. 1318 – 1319, 1398 – 1412 ’18 ] It concerns a gene called PVT1 (Plasmacytoma Variant Translocation 1) found 25 years ago.  It was the first gene coding for a long nonCoding (for proteins RNA (lncRNA) found as a recurrent breakpoint in Burkitt’s lymphoma, which sadly took a friend (Nick Cozzarelli) far too young as (he edited PNAS for 10 years).

So PVT1 is involved in cancer.  The translocation turns on expression of the myc oncogene, something that has been studied out the gazoo and we’re still not sure of how it causes cancer. I’ve got 60,000 characters of notes on the damn thing, but as someone said 6 years ago “Whatever the latest trend in cancer biology — cell cycle, cell growth, apoptosis, metabolism, cancer stem cells, microRNAs, angiogenesis, inflammation — Myc is there regulating most of the key genes”

We do know that the lncRNA coded by PVT1 in some way stabilizes the myc protein [ Nature vol. 512 pp. 82 – 87 ’14 ].  However the cell experiments knocked out the lncRNA of PVT1 and myc expression was still turned on.

PVT1 resides 53 kiloBases away from myc on chromosome #8.  That’s about 17% of the diameter of the average nucleus (10 microns) if the DNA is stretched out into the B-DNA form seen in all the textbooks.  Since each base is 3.3 Angstroms thick that’s 175,000 Angstroms 17,500 nanoMeters 1.7 microns.  You can get an idea of how compacted DNA is in the nucleus when you realize that there are 3,200,000,000/53,000 = 60,000 such segments in the genome all packed into a sphere 10 microns in diameter.

To cut to the chase, within the PVT1 gene there are at least 4 enhancers (use the link above to find what all the terms to be used actually mean).  Briefly enhancers are what promoters bind to to help turn on the transcription of the genes in DNA into RNA (messenger and otherwise).  This means that the promoter of PVT1 binds one or more of the enhancers, preventing the promoter of the myc oncogene from binding.

Just how they know that there are 4 enhancers in PVT1 is a story in itself.  They cut various parts of the PVT1 gene (which itself has 306,721 basepairs) out, and place it in front of a reporter gene and see if transcription increases.

The actual repressor of myc is the promoter of PVT1 according to the paper (it binds to the enhancers present in the gene body preventing the myc promoter from doing so).  Things may be a bit more complicated as the PVT1 gene also codes for a cluster of 7 microRNAs and what they do isn’t explained in the paper.

So it’s as if the sardonic sense of humor of ‘nature’, ‘evolution’, ‘God’, (call it what you will) has set molecular biologists off on a wild goose chase, looking at the structure of the gene product (the lncRNA) to determine the function of the gene, when actually it’s the promoter in front of the gene and the enhancers within which are performing the function.

The mechanism may be more widespread, as 4/36 lncRNA promoters silenced by CRISPR techniques subsequently activated genes in a 1 megaBase window (possibly by the same mechanism as PVT1 and myc).

Where does McLuhan come in?  The cell paper also notes that lncRNA gene promoters are more evolutionarily conserved than their gene bodies.  So it’s the medium (promoter, enhancer) is the message once again (rather than what we thought the message was).

 

Advertisements

It ain’t the bricks it’s the plan — take II

A recent review in Neuron (vol. 88 pp. 681 – 677 ’15) gives a possible new explanation of how our brains came to be so different from apes (if not our behavior of late).

You’ve all heard that our proteins are only 2% different than the chimp, so we are 98% chimpanzee. The facts are correct, the interpretation wrong. We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

This is like saying Monticello and Independence Hall are just the same because they’re both made out of bricks. One could chemically identify Monticello bricks as coming from the Virginia piedmont, and Independence Hall bricks coming from the red clay of New Jersey, but the real difference between the buildings is the plan.

It’s not the proteins, but where and when and how much of them are made. The control for this (plan if you will) lies outside the genes for the proteins themselves, in the rest of the genome (remember only 2% of the genome codes for the amino acids making up our 20,000 or so protein genes). The control elements have as much right to be called genes, as the parts of the genome coding for amino acids. Granted, it’s easier to study genes coding for proteins, because we’ve identified them and know so much about them. It’s like the drunk looking for his keys under the lamppost because that’s where the light is.

We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

All the molecular biology you need to understand what follows is in the following post — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure.

The neuron paper is detailed and fascinating to a neurologist, but toward the end it begins to fry far bigger fish.

Until about 10 years ago, molecular biology was incredibly protein-centric. Consider the following terms — nonsense codon, noncoding DNA, junk DNA. All are pejorative and arose from the view that all the genome does is code for protein. Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein. Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).

Well all that has changed. The ENCODE Consortium showed that well over half (and probably all) our DNA is transcribed into RNA — for details see https://en.wikipedia.org/wiki/ENCODE. This takes energy, and it is doubtful (to me at least) that organisms would waste this much energy if the products were not doing something useful.

I’ve discussed microRNAs elsewhere — for details please see — https://luysii.wordpress.com/2010/07/14/junk-dna-that-isnt-and-why-chemistry-isnt-enough/. They don’t code for protein either, but control how much of a given protein is made.

The Neuron paper concerns lncRNAs (long nonCoding RNAs). They don’t code for protein either and contain over 200 nucleotides. There are a lot of them (10,000 – 50,000 are known to be expressed in man. Amazingly 40% of them are expressed in the brain, and not just in adult life, but during embryonic development. Expression of some of them is restricted to specific brain areas. It is easier for an embryologist to tell what type a cell is during brain cortical development by looking at the lncRNAs expressed than by the proteins a given cell is making. The paper contains multiple examples of the lncRNAs controlling when and where a protein is made in the brain.

lncRNAs can contain multiple domains, each of which has a different affinity for a particular RNA (such as the mRNA for a protein), or DNA, or protein. In the nucleus they influence the DNA binding sites of transcription factors, RNA polymerase II, the polycomb repressor complex. The review goes on with many specific examples of lncRNA function — synaptic plasticity, neurotic extension.

Getting back to proteins, the vast majority are nearly the same in all mammals (this is where the 2% Chimpanzee argument comes from). Here is where it gets interesting. Roughly 1/3 of lncRNAs found in man are primate specific. This includes hundreds of lncRNAs found only in man. The paper gives evidence that hundreds of them have shown evidence of positive selection in humans.

So the paper provides yet another mechanism (with far more detail than I’ve been able to provide here) for why our brains are so much larger, and different in many ways than our nearest evolutionary ancestor, the chimpanzee. This is the largest molecular biological difference found so far for the human brain as opposed to every other brain. Fascinating stuff. Stay tuned. I think this is a watershed paper.

Why drug discovery is so hard: Reason #21 — RNA sequences won’t help you determine function

We are just beginning to understand all the things RNA does in the cell, despite its importance obvious to all for half a century (think messenger RNA which goes back that far).  This means that RNA is likely to be a target of useful drugs.  Posts #4, #11 and #20 concern some of the more newly discovered effects of RNA in the cell.

While we’re still discovering proteins with no obvious resemblance  in their amino acid sequence to known proteins, most of them do have some resemblance we’ve seen before.  So if we see a kinase-like domain, or a group of 7 rather hydrophobic sequences, we have a leg up on what that protein is actually doing.

A similar attack (comparing sequences to RNAs of known function) should help us figure out what some of the RNAs in the cell not coding for protein are actually doing.  If you see a mistyke in this sentence, you still probably know what I meant (e.g. how that word is meant to function in the sentence).  That’s the hope underlying the technique anyway

Recent work in the zebrafish [ Cell vol. 147 pp. 1537 – 1550 ’11 ] shows that this isn’t very likely in the RNA world. For some background on large intervening nonCoding RNAs (lincRNAs — aka lncRNAs) see https://luysii.wordpress.com/2011/03/02/we-dont-know-all-the-players-which-is-why-finding-good-drugs-is-so-hard/.  The zebrafish has become a plaything of embryologists (because it is transparent, and because like most fish (except sharks) it is a vertebrate.

At any rate the work found some 550 distinct lincRNAs in the zebrafish.  But only 29 had detectable sequence similarity with lincRNAs in mammals (which are just as numerous).  Even though chromosomes have been scrambled many times over geologic time, many genes near each other in the zebrafish are near each other in humans as well (the term for this is synteny).  This means one can look at DNA to see where the lincRNA is binding in two organisms, and infer that they’re doing something similar physiologically if they are binding to a syntenic site.

So they did this and found some  lincRNAs with almost no sequence similarity to each other binding to identical syntenic sites in man and zebrafish.  Next they used antisense reagents targeting the small regions of the lincRNAs conserved between us and fish and produced developmental defects (in the fish)  Amazingly, despite very little sequence similarity, human orthologs (determined by synteny) could prevent the embryological defects.

So in this case at least, and probably more generally, we’re not going to be able to look at the sequence of lincRNAs (or the many other types of non messenger RNAs present in the cell) and infer what they are doing.  This will make drug discovery in this area even harder.