Tag Archives: nonCoding DNA

Maybe there really is junk DNA

Until about 20 years ago, molecular biology was incredibly protein-centric.  Consider the following terms — nonsense codon, noncoding DNA, junk DNA.  All are pejorative and arose from the view that all the genome does is code for protein.  Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein.  Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).

The term Junk DNA goes back to the 60s, a time of tremendous hubris as the grand biochemical plan of life was being discovered. People were not embarrassed to use the term ‘central dogma’ which was DNA makes RNA makes protein. It therefore came as a shock once we had a better handle on the size of the genome to discover that less than 2% of it coded for protein. Since much of it was made of repetitive sequences it was called junk DNA.

I never bought it, thinking it very dangerous to dismiss as unimportant what you did not understand or could not measure. Probably this was influenced by my experience as an Air Force M.D. ’68 – ’70 during the Vietnam war.

But now comes a sure to be contentious but well reasoned paper arguing that junk DNA does exist, even though it is occasionally transcribed [ Cell vol. 183 pp. 1151 – 1161 ’20 ]. The paper discusses all RNAs in the cell not part of the ribosome, or small nucleolar RNAs (snoRNAs) or microRNAs.

They note that no enzyme is perfect acting on only the substrate we think evolution optimized it for — they call this promiscuous behavior. So a transcription factor which binds to a particular promoter sequence will also bind to near miss sequence. Moreover such near misses are constantly being generated in our genome by random mutation. This is why they think that the ENCODE (ENCyopedia Of Dna Elements) found that the entire genome is transcribed into RNA. The implication made by many is that this must be functional.

However many random pieces of DNA can activate transcription [ Genes Dev. vol. 30 pp. 1895 – 1907 ’16 ] producing what the authors call transcriptional noise.

There is evidence that the cell has evolved a way to stop some of this. U1 snRNP recognizes the 5′ splice site motif. It is present in nuclei at an order of magnitude higher than other spliceosomal subcomplexes, so it monitors for RNAs which have a 5′ splice site motif but which lack the 3′ splice site. These RNAs are subsequently destroyed, never making it out of the nucleus.

They think the primary function of lncRNA is chromatin remodeling affecting gene expression — this is certainly true of XIST which silences one of the two X chromosomes females carry.

There is a lot more very technical molecular biology and close reasoning in the paper, but this should be enough to whet your interest. It is well worth reading. Probably, like me, you’ll be mentally arguing with the authors as you read it, but that’s the sign of a good paper.

Now for a question which has always puzzled me. Consider the leprosy organism. It’s a mycobacterium (like the organism causing TB), but because it essentially is confined to man, and lives inside humans for most of its existence, it has jettisoned large parts of its genome, first by throwing about 1/3 of it out (the genome is 1/3 smaller than TB from which it is thought to have diverged 66 million years ago), and second by mutation of many of its genes so protein can no longer be made from them. Why throw out all that DNA? The short answer is that it is metabolically expensive to produce and maintain DNA that you’re not using

If you want a few numbers here they are:
Genome of M. TB 4,441,529 nucleotides
Genome of M. Leprae 3,268,203 nucleotides

Clearly microorganisms are under high selective pressure, and the paper says that humans are under almost none, but it seems to me that multicellular organisms would have found a way to get rid of DNA it doesn’t need.

It may well be that all this DNA and the RNA transcribed from it is evolutionary potting soil, waiting for some new environmental stress to put it to use.

Molecular biology’s oxymoron

Dear reader.  What does a gene do?  It codes for something.  What does a nonCoding Gene do?  It also codes for something, just RNA instead of protein. It’s molecular biology’s very own oxymoron, a throwback to the heroic protein-centric early days of molecular biology. The term has been enshrined by usage for so long that it’s impossible to get rid of.  Nonetheless, the the latest work found even more nonCoding genes than genes actually coding for  protein.

An amusing article from Nature (vol. 558 pp. 354 – 355 ’18) has the current state of play.   The latest estimate is from GTex which sequenced 900 billion RNAs found in various human tissues, matched them to the sequence(s) of the human genome and used computer algorithms to determine which  of them were the product of genes coding for proteins and genes coding for something else.

The report from GTex  (Genotype Tissue expression Project) found 21,306 protein-coding genes and 21,856 non-coding genes — amazingly there are more nonCoding genes than protein coding ones.  This  is many more genes than found in the two most widely used human gene databases. The GENCODE gene set, maintained by the EBI, includes 19,901 protein-coding genes and 15,779 non-coding genes. RefSeq, a database run by the US National Center for Biotechnology Information (NCBI), lists 20,203 protein-coding genes and 17,871 non-coding genes.

Stay tuned.  The fat lady hasn’t sung.

It ain’t the bricks it’s the plan — take II

A recent review in Neuron (vol. 88 pp. 681 – 677 ’15) gives a possible new explanation of how our brains came to be so different from apes (if not our behavior of late).

You’ve all heard that our proteins are only 2% different than the chimp, so we are 98% chimpanzee. The facts are correct, the interpretation wrong. We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

This is like saying Monticello and Independence Hall are just the same because they’re both made out of bricks. One could chemically identify Monticello bricks as coming from the Virginia piedmont, and Independence Hall bricks coming from the red clay of New Jersey, but the real difference between the buildings is the plan.

It’s not the proteins, but where and when and how much of them are made. The control for this (plan if you will) lies outside the genes for the proteins themselves, in the rest of the genome (remember only 2% of the genome codes for the amino acids making up our 20,000 or so protein genes). The control elements have as much right to be called genes, as the parts of the genome coding for amino acids. Granted, it’s easier to study genes coding for proteins, because we’ve identified them and know so much about them. It’s like the drunk looking for his keys under the lamppost because that’s where the light is.

We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

All the molecular biology you need to understand what follows is in the following post — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure.

The neuron paper is detailed and fascinating to a neurologist, but toward the end it begins to fry far bigger fish.

Until about 10 years ago, molecular biology was incredibly protein-centric. Consider the following terms — nonsense codon, noncoding DNA, junk DNA. All are pejorative and arose from the view that all the genome does is code for protein. Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein. Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).

Well all that has changed. The ENCODE Consortium showed that well over half (and probably all) our DNA is transcribed into RNA — for details see https://en.wikipedia.org/wiki/ENCODE. This takes energy, and it is doubtful (to me at least) that organisms would waste this much energy if the products were not doing something useful.

I’ve discussed microRNAs elsewhere — for details please see — https://luysii.wordpress.com/2010/07/14/junk-dna-that-isnt-and-why-chemistry-isnt-enough/. They don’t code for protein either, but control how much of a given protein is made.

The Neuron paper concerns lncRNAs (long nonCoding RNAs). They don’t code for protein either and contain over 200 nucleotides. There are a lot of them (10,000 – 50,000 are known to be expressed in man. Amazingly 40% of them are expressed in the brain, and not just in adult life, but during embryonic development. Expression of some of them is restricted to specific brain areas. It is easier for an embryologist to tell what type a cell is during brain cortical development by looking at the lncRNAs expressed than by the proteins a given cell is making. The paper contains multiple examples of the lncRNAs controlling when and where a protein is made in the brain.

lncRNAs can contain multiple domains, each of which has a different affinity for a particular RNA (such as the mRNA for a protein), or DNA, or protein. In the nucleus they influence the DNA binding sites of transcription factors, RNA polymerase II, the polycomb repressor complex. The review goes on with many specific examples of lncRNA function — synaptic plasticity, neurotic extension.

Getting back to proteins, the vast majority are nearly the same in all mammals (this is where the 2% Chimpanzee argument comes from). Here is where it gets interesting. Roughly 1/3 of lncRNAs found in man are primate specific. This includes hundreds of lncRNAs found only in man. The paper gives evidence that hundreds of them have shown evidence of positive selection in humans.

So the paper provides yet another mechanism (with far more detail than I’ve been able to provide here) for why our brains are so much larger, and different in many ways than our nearest evolutionary ancestor, the chimpanzee. This is the largest molecular biological difference found so far for the human brain as opposed to every other brain. Fascinating stuff. Stay tuned. I think this is a watershed paper.

None dare call it junk

There has been a huge amount of controversy about whether all the DNA we carry about has some purpose to carry out — or not. Could some of it be ‘junk’?.

At most 2% of our DNA actually codes for the amino acids comprising our proteins. Some (particularly the ENCODE consortium) have used the criterion of transcription of the DNA into RNA (a process which takes energy) as a sign that well over 50% of our genome is NOT junk. Others regard this transcription as the unused turnings from a lathe.

All agree however, that bacteria use a good deal of their small genomes to code for protein. The following paper http://www.pnas.org/content/112/14/4251.full quotes a figure of 84 – 89%.

Consider the humble leprosy organism.It’s a mycobacterium (like the organism causing TB), but because it essentially is confined to man, and lives inside humans for most of its existence, it has jettisoned large parts of its genome, first by throwing about 1/3 of it out (the genome is 1/3 smaller than TB from which it is thought to have diverged 66 million years ago), and second by mutation of many of its genes so protein can no longer be made from them. Why throw out all that DNA? The short answer is that it is metabolically expensive to produce and maintain DNA that you’re not using

If you want a few numbers here they are:
Genome of M. TB 4,441,529 nucleotides
Genome of M. Leprae 3,268,203 nucleotides
1,604 genes coding for protein
1,116 pseudoGenes (e.g. genes that look like they could code for proteins, but no longer can because of premature termination codons.

This brings us to the organism described in the paper — Trichodesmium erythraeum — a photosynthetic bacterium living in the ocean. When conditions are right it multiplies rapidly causing a red algal bloom (even though it isn’t an algae which are cellular). It’s probably how the Red Sea got its name.

The organism only uses 64% of its genome to code for its protein. The most interesting point is that 86% of the nonCoding (for protein anyway) DNA is transcribed into RNA.

The authors wrestle with the question of what the nonCoding DNA is doing.

“Because it is thought that many bacteria are deletion-biased (47, 77), stable maintenance of these elements from laboratory isolates to the natural samples suggest that they may be required in some fashion for growth both in culture and in situ.”

Translation: The nonCoding DNA probably isn’t junk.

They give it another shot.

“Others have hypothesized that the conserved repeat structures observed in some bacteria could function as recombination-dependent “promoter banks” for adaptation to new conditions, thereby allowing relatively quick “rewiring” of metabolism in subpopulations”

Plausible, but why waste the energy transcribing the DNA into RNA if it isn’t doing anything for the organism doing the transcribing?

Never assume that what you can’t measure or don’t understand is unimportant.