The coding capacity of our genome continues to amaze. The redundancy of the genetic code has been put to yet another use. Depending on how much you know, skip the following two links and read on. Otherwise all the background to understand the following is in them.
https://luysii.wordpress.com/2011/05/03/the-death-of-the-synonymous-codon/
https://luysii.wordpress.com/2011/05/09/the-death-of-the-synonymous-codon-ii/
There really was no way around it. If you want to code for 20 different amino acids with only four choices at each position, two positions (4^2) won’t do. You need three positions, which gives you 64 possibilities (61 after the three stop codons are taken into account) and the redundancy that comes with it. The previous links show how the redundant codons for some amino acids aren’t redundant at all but used to code for the speed of translation, or for exonic splicing enhancers and inhibitors. Different codons for the same amino acid can produce wildly different effects leaving the amino acid sequence of a given protein alone.
The following recent work [ Science vol. 342 pp. 1325 – 1326, 1367 – 1367 ’13 ] showed that transcription factors bind to the coding sequences of proteins, not just the promoters and enhancers found outside them as we had thought.
The principle behind the DNAaseI protection assay is pretty simple. Any protein binding to DNA protects it against DNAase I which chops it up. Then clone and sequence what’s left to see where proteins have bound to DNA. These are called footprints. They must have removed histones first, I imagine.
The work performed DNAaseI protection assays on a truly massive scale. They looked at 81 different cell types at nucleotide resolution. They found 11,000,000 footprints all together, about 1,000,000 per cell type. In a given cell type 25,000 were completely localized within exons (the parts of the gene actually specifying amino acids). When all the codons of the genome are looked at as a group, 1/7 of them are found in a footprint in one of the cell types.
The results wouldn’t have been that spectacular had they just looked at a few cell types. How do we know the binding sites contain transcription factors? Because the footprints match transcription factor recognition sequences.
We know that sequences around splice sites are used to code for splicing enhancers and inhibitors. Interestingly, the splice sites are generally depleted of DNAaseI footprints. Remember that splicing occurs after the gene has been transcribed.
At this point it isn’t clear how binding of a transcription factor in a protein coding region influences gene expression.
Just like a work of art, there is more than one way that DNA can mean. Remarkable !