Until about 20 years ago, molecular biology was incredibly protein-centric. Consider the following terms — nonsense codon, noncoding DNA, junk DNA. All are pejorative and arose from the view that all the genome does is code for protein. Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein. Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).
The term Junk DNA goes back to the 60s, a time of tremendous hubris as the grand biochemical plan of life was being discovered. People were not embarrassed to use the term ‘central dogma’ which was DNA makes RNA makes protein. It therefore came as a shock once we had a better handle on the size of the genome to discover that less than 2% of it coded for protein. Since much of it was made of repetitive sequences it was called junk DNA.
I never bought it, thinking it very dangerous to dismiss as unimportant what you did not understand or could not measure. Probably this was influenced by my experience as an Air Force M.D. ’68 – ’70 during the Vietnam war.
But now comes a sure to be contentious but well reasoned paper arguing that junk DNA does exist, even though it is occasionally transcribed [ Cell vol. 183 pp. 1151 – 1161 ’20 ]. The paper discusses all RNAs in the cell not part of the ribosome, or small nucleolar RNAs (snoRNAs) or microRNAs.
They note that no enzyme is perfect acting on only the substrate we think evolution optimized it for — they call this promiscuous behavior. So a transcription factor which binds to a particular promoter sequence will also bind to near miss sequence. Moreover such near misses are constantly being generated in our genome by random mutation. This is why they think that the ENCODE (ENCyopedia Of Dna Elements) found that the entire genome is transcribed into RNA. The implication made by many is that this must be functional.
However many random pieces of DNA can activate transcription [ Genes Dev. vol. 30 pp. 1895 – 1907 ’16 ] producing what the authors call transcriptional noise.
There is evidence that the cell has evolved a way to stop some of this. U1 snRNP recognizes the 5′ splice site motif. It is present in nuclei at an order of magnitude higher than other spliceosomal subcomplexes, so it monitors for RNAs which have a 5′ splice site motif but which lack the 3′ splice site. These RNAs are subsequently destroyed, never making it out of the nucleus.
They think the primary function of lncRNA is chromatin remodeling affecting gene expression — this is certainly true of XIST which silences one of the two X chromosomes females carry.
There is a lot more very technical molecular biology and close reasoning in the paper, but this should be enough to whet your interest. It is well worth reading. Probably, like me, you’ll be mentally arguing with the authors as you read it, but that’s the sign of a good paper.
Now for a question which has always puzzled me. Consider the leprosy organism. It’s a mycobacterium (like the organism causing TB), but because it essentially is confined to man, and lives inside humans for most of its existence, it has jettisoned large parts of its genome, first by throwing about 1/3 of it out (the genome is 1/3 smaller than TB from which it is thought to have diverged 66 million years ago), and second by mutation of many of its genes so protein can no longer be made from them. Why throw out all that DNA? The short answer is that it is metabolically expensive to produce and maintain DNA that you’re not using
If you want a few numbers here they are:
Genome of M. TB 4,441,529 nucleotides
Genome of M. Leprae 3,268,203 nucleotides
Clearly microorganisms are under high selective pressure, and the paper says that humans are under almost none, but it seems to me that multicellular organisms would have found a way to get rid of DNA it doesn’t need.
It may well be that all this DNA and the RNA transcribed from it is evolutionary potting soil, waiting for some new environmental stress to put it to use.
Comments
Susumu Ohno coined the term “junk DNA” in 1972. The original paper is hard to get, but you can read it at http://www.junkdna.com/ohno.html (be careful though, its embedded in a page of weird DNA fractal theories). You can see Ohno specifically mentioned structural genes like centromeres and the long tandem repeats surrounding them, regulatory genes and regulatory sequences (promoters and operators), and also introduced the idea of a “spacer” sequence of untranscribed or untranslated DNA that keeps genes some certain distance apart. Remember also that the primary structure of tRNAs was reported in 1965, and there is plenty of work on ribosomal RNAs around then also: https://doi.org/10.1016/S0022-2836(60)80029-0. The Nobel Prize for Chemistry in 1989 was given for discovery of catalytic properties of RNA.
My point is not to be picky with dates; it is to say that no knowledgeable scientist has ever believed that “Until about 20 years ago…all the genome does is code for protein”. The reason we know so much about what is and isn’t junk is that we have been studying it for 50 years. More and more functional DNA sequences are discovered every year, but it doesnt change the fact that a full ~44% of our genomes is defective DNA transposons and retrotransposons, and another ~9% is old broken viruses. Your point about junk DNA being “evolutionary potting soil” is very perceptive, but was also proposed by Ohno in the 1972 paper, saying that new genes can evolve here “sheltered from the relentless pressure of natural selection”. The experts have understood this from the beginning.
Handles: Thanks for commenting. I don’t know why comments don’t show up for 24 hours. I didn’t make myself clear unfortunately. The term junk DNA arose long before 2000 as you note (as did the pejorative term nonsense codon), but the idea that all the genome did was mostly code for protein was alive and well 20 years ago (and if not 20 years ago, certainly 30 years ago).