Maybe there really is junk DNA

Until about 20 years ago, molecular biology was incredibly protein-centric.  Consider the following terms — nonsense codon, noncoding DNA, junk DNA.  All are pejorative and arose from the view that all the genome does is code for protein.  Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein.  Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).

The term Junk DNA goes back to the 60s, a time of tremendous hubris as the grand biochemical plan of life was being discovered. People were not embarrassed to use the term ‘central dogma’ which was DNA makes RNA makes protein. It therefore came as a shock once we had a better handle on the size of the genome to discover that less than 2% of it coded for protein. Since much of it was made of repetitive sequences it was called junk DNA.

I never bought it, thinking it very dangerous to dismiss as unimportant what you did not understand or could not measure. Probably this was influenced by my experience as an Air Force M.D. ’68 – ’70 during the Vietnam war.

But now comes a sure to be contentious but well reasoned paper arguing that junk DNA does exist, even though it is occasionally transcribed [ Cell vol. 183 pp. 1151 – 1161 ’20 ]. The paper discusses all RNAs in the cell not part of the ribosome, or small nucleolar RNAs (snoRNAs) or microRNAs.

They note that no enzyme is perfect acting on only the substrate we think evolution optimized it for — they call this promiscuous behavior. So a transcription factor which binds to a particular promoter sequence will also bind to near miss sequence. Moreover such near misses are constantly being generated in our genome by random mutation. This is why they think that the ENCODE (ENCyopedia Of Dna Elements) found that the entire genome is transcribed into RNA. The implication made by many is that this must be functional.

However many random pieces of DNA can activate transcription [ Genes Dev. vol. 30 pp. 1895 – 1907 ’16 ] producing what the authors call transcriptional noise.

There is evidence that the cell has evolved a way to stop some of this. U1 snRNP recognizes the 5′ splice site motif. It is present in nuclei at an order of magnitude higher than other spliceosomal subcomplexes, so it monitors for RNAs which have a 5′ splice site motif but which lack the 3′ splice site. These RNAs are subsequently destroyed, never making it out of the nucleus.

They think the primary function of lncRNA is chromatin remodeling affecting gene expression — this is certainly true of XIST which silences one of the two X chromosomes females carry.

There is a lot more very technical molecular biology and close reasoning in the paper, but this should be enough to whet your interest. It is well worth reading. Probably, like me, you’ll be mentally arguing with the authors as you read it, but that’s the sign of a good paper.

Now for a question which has always puzzled me. Consider the leprosy organism. It’s a mycobacterium (like the organism causing TB), but because it essentially is confined to man, and lives inside humans for most of its existence, it has jettisoned large parts of its genome, first by throwing about 1/3 of it out (the genome is 1/3 smaller than TB from which it is thought to have diverged 66 million years ago), and second by mutation of many of its genes so protein can no longer be made from them. Why throw out all that DNA? The short answer is that it is metabolically expensive to produce and maintain DNA that you’re not using

If you want a few numbers here they are:
Genome of M. TB 4,441,529 nucleotides
Genome of M. Leprae 3,268,203 nucleotides

Clearly microorganisms are under high selective pressure, and the paper says that humans are under almost none, but it seems to me that multicellular organisms would have found a way to get rid of DNA it doesn’t need.

It may well be that all this DNA and the RNA transcribed from it is evolutionary potting soil, waiting for some new environmental stress to put it to use.

Marshall McLuhan rides again

Marshall McLuhan famously said “the medium is the message”. Who knew he was talking about molecular biology?  But he was, if you think of the process of transcription of DNA into various forms of RNA as the medium and the products of transcription as the message.  That’s exactly what this paper [ Cell vol. 171 pp. 103 – 119 ’17 ] says.

T cells are a type of immune cell formed in the thymus.  One of the important transcription factors which turns on expression of the genes which make a T cell a Tell is called Bcl11b.  Early in T cell development it is sequestered away near the nuclear membrane in highly compacted DNA. Remember that you must compress your 1 meter of DNA down by 100,000fold to have it fit in the nucleus which is 1/100,000th of a meter (10 microns).

What turns it on?  Transcription of nonCoding (for protein) RNA calledThymoD.  From my reading of the paper, ThymoD doesn’t do anything, but just the act of opening up compacted DNA near the nuclear membrane produced by transcribing ThymoD is enough to cause this part of the genome to move into the center of the nucleus where the gene for Bcl11b can be transcribed into RNA.

There’s a lot more to the paper,  but that’s the message if you will.  It’s the act of transcription rather than what is being transcribed which is important.

Here’s more about McLuhan — https://en.wikipedia.org/wiki/Marshall_McLuhan

If some of the terms used here are unfamiliar — look at the following post and follow the links as far as you need to.  https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/

Well that was an old post.  Here’s another example [ Cell vol. 173 pp. 1318 – 1319, 1398 – 1412 ’18 ] It concerns a gene called PVT1 (Plasmacytoma Variant Translocation 1) found 25 years ago.  It was the first gene coding for a long nonCoding (for proteins RNA (lncRNA) found as a recurrent breakpoint in Burkitt’s lymphoma, which sadly took a friend (Nick Cozzarelli) far too young as (he edited PNAS for 10 years).

So PVT1 is involved in cancer.  The translocation turns on expression of the myc oncogene, something that has been studied out the gazoo and we’re still not sure of how it causes cancer. I’ve got 60,000 characters of notes on the damn thing, but as someone said 6 years ago “Whatever the latest trend in cancer biology — cell cycle, cell growth, apoptosis, metabolism, cancer stem cells, microRNAs, angiogenesis, inflammation — Myc is there regulating most of the key genes”

We do know that the lncRNA coded by PVT1 in some way stabilizes the myc protein [ Nature vol. 512 pp. 82 – 87 ’14 ].  However the cell experiments knocked out the lncRNA of PVT1 and myc expression was still turned on.

PVT1 resides 53 kiloBases away from myc on chromosome #8.  That’s about 17% of the diameter of the average nucleus (10 microns) if the DNA is stretched out into the B-DNA form seen in all the textbooks.  Since each base is 3.3 Angstroms thick that’s 175,000 Angstroms 17,500 nanoMeters 1.7 microns.  You can get an idea of how compacted DNA is in the nucleus when you realize that there are 3,200,000,000/53,000 = 60,000 such segments in the genome all packed into a sphere 10 microns in diameter.

To cut to the chase, within the PVT1 gene there are at least 4 enhancers (use the link above to find what all the terms to be used actually mean).  Briefly enhancers are what promoters bind to to help turn on the transcription of the genes in DNA into RNA (messenger and otherwise).  This means that the promoter of PVT1 binds one or more of the enhancers, preventing the promoter of the myc oncogene from binding.

Just how they know that there are 4 enhancers in PVT1 is a story in itself.  They cut various parts of the PVT1 gene (which itself has 306,721 basepairs) out, and place it in front of a reporter gene and see if transcription increases.

The actual repressor of myc is the promoter of PVT1 according to the paper (it binds to the enhancers present in the gene body preventing the myc promoter from doing so).  Things may be a bit more complicated as the PVT1 gene also codes for a cluster of 7 microRNAs and what they do isn’t explained in the paper.

So it’s as if the sardonic sense of humor of ‘nature’, ‘evolution’, ‘God’, (call it what you will) has set molecular biologists off on a wild goose chase, looking at the structure of the gene product (the lncRNA) to determine the function of the gene, when actually it’s the promoter in front of the gene and the enhancers within which are performing the function.

The mechanism may be more widespread, as 4/36 lncRNA promoters silenced by CRISPR techniques subsequently activated genes in a 1 megaBase window (possibly by the same mechanism as PVT1 and myc).

Where does McLuhan come in?  The cell paper also notes that lncRNA gene promoters are more evolutionarily conserved than their gene bodies.  So it’s the medium (promoter, enhancer) is the message once again (rather than what we thought the message was).


Why drug discovery is hard #29 — a very old player doing a very new thing

We all know what RNA does don’t we?  It binds to other RNAs and to DNA.  Sure lots of new forms of RNA have been found: microRNAs, competitive endogenous RNA (ceRNA), long nonCoding (for protein) RNA (lncRNA), piwiRNAs, small interfering RNAs (siRNAs), . .. The list appears endless.  But the basic mechanism of action of RNA in the cell is binding to some other polynucleotide (RNA or DNA) and affecting its function.

Not so fast.  A new paper http://science.sciencemag.org/content/358/6366/1051 describes  lncRNA-ACOD1, a cellular RNA induced by a variety of viruses.  lncRNA-ACOD1 binds to an enzyme enhancing its catalytic efficiency.  Now that’s new.  Certainly RNAs and proteins bind to each other in the ribosome, and in RNAase P, but here the proteins serve to structure the RNA so it can carry out its catalytic function, not the other way around.

The enzyme bound is called GOT2 (Glutamic Oxaloacetic Transaminase 2).  Much interesting cellular biochemistry is discussed in the paper which I’ll skip, except to say that the virus uses the hyped up GOT2 to repurpose the cell’s metabolic machinery for its own evil ends.

lncRNA-ACOD1 has 3 exons and a polyAdenine tail.  There are two transcript variants containing  2,330 and 2,259 nucleotides.  There are only 100 copies/cell.  lncRNA-ACOD1 nucleotides #165 – #390 bind to amino acids #54 – #68 of GOT2.

So what are the other 2000 or so nucleotides of lncRNA-ACOD1 doing?   The phenomenon of RNA binding to protein is quite likely to be more widespread.  Both the GOT2 interacting motif and the interacting sequence of lncRNA-ACOD1 are well conserved across species of hosts and viruses.

Although viruses co-opt lncRNA-ACOD1, it is normally expressed in the heart as is GOT2 with no viral infection at all.  So we have likely stumbled onto an entirely new method of cellular metabolic control, AND a whole new set of players and interactions for drugs to act on (if they aren’t already doing this unknown to us).

This is series member #29 of why drug development is hard, most of which concentrated on the fact that we don’t know all the players.  lncRNA-ACOD1 is different — RNA is a player we’ve known for a very long time  but it appears to be playing a game entirely new to us.

It is also good to see cutting edge research like this coming out of China.  Hopefully it will stand up, but enough questionable stuff has come from them that every Chinese paper is under a cloud.

This is why I love reading the current literature.  You never know what you’re going to find.  It’s like opening presents.

It ain’t the bricks it’s the plan — take II

A recent review in Neuron (vol. 88 pp. 681 – 677 ’15) gives a possible new explanation of how our brains came to be so different from apes (if not our behavior of late).

You’ve all heard that our proteins are only 2% different than the chimp, so we are 98% chimpanzee. The facts are correct, the interpretation wrong. We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

This is like saying Monticello and Independence Hall are just the same because they’re both made out of bricks. One could chemically identify Monticello bricks as coming from the Virginia piedmont, and Independence Hall bricks coming from the red clay of New Jersey, but the real difference between the buildings is the plan.

It’s not the proteins, but where and when and how much of them are made. The control for this (plan if you will) lies outside the genes for the proteins themselves, in the rest of the genome (remember only 2% of the genome codes for the amino acids making up our 20,000 or so protein genes). The control elements have as much right to be called genes, as the parts of the genome coding for amino acids. Granted, it’s easier to study genes coding for proteins, because we’ve identified them and know so much about them. It’s like the drunk looking for his keys under the lamppost because that’s where the light is.

We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

All the molecular biology you need to understand what follows is in the following post — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure.

The neuron paper is detailed and fascinating to a neurologist, but toward the end it begins to fry far bigger fish.

Until about 10 years ago, molecular biology was incredibly protein-centric. Consider the following terms — nonsense codon, noncoding DNA, junk DNA. All are pejorative and arose from the view that all the genome does is code for protein. Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein. Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).

Well all that has changed. The ENCODE Consortium showed that well over half (and probably all) our DNA is transcribed into RNA — for details see https://en.wikipedia.org/wiki/ENCODE. This takes energy, and it is doubtful (to me at least) that organisms would waste this much energy if the products were not doing something useful.

I’ve discussed microRNAs elsewhere — for details please see — https://luysii.wordpress.com/2010/07/14/junk-dna-that-isnt-and-why-chemistry-isnt-enough/. They don’t code for protein either, but control how much of a given protein is made.

The Neuron paper concerns lncRNAs (long nonCoding RNAs). They don’t code for protein either and contain over 200 nucleotides. There are a lot of them (10,000 – 50,000 are known to be expressed in man. Amazingly 40% of them are expressed in the brain, and not just in adult life, but during embryonic development. Expression of some of them is restricted to specific brain areas. It is easier for an embryologist to tell what type a cell is during brain cortical development by looking at the lncRNAs expressed than by the proteins a given cell is making. The paper contains multiple examples of the lncRNAs controlling when and where a protein is made in the brain.

lncRNAs can contain multiple domains, each of which has a different affinity for a particular RNA (such as the mRNA for a protein), or DNA, or protein. In the nucleus they influence the DNA binding sites of transcription factors, RNA polymerase II, the polycomb repressor complex. The review goes on with many specific examples of lncRNA function — synaptic plasticity, neurotic extension.

Getting back to proteins, the vast majority are nearly the same in all mammals (this is where the 2% Chimpanzee argument comes from). Here is where it gets interesting. Roughly 1/3 of lncRNAs found in man are primate specific. This includes hundreds of lncRNAs found only in man. The paper gives evidence that hundreds of them have shown evidence of positive selection in humans.

So the paper provides yet another mechanism (with far more detail than I’ve been able to provide here) for why our brains are so much larger, and different in many ways than our nearest evolutionary ancestor, the chimpanzee. This is the largest molecular biological difference found so far for the human brain as opposed to every other brain. Fascinating stuff. Stay tuned. I think this is a watershed paper.