My cousin provides technical advice to a large mutual fund. He knows that whenever he goes away something big happens to the market. He’s been in Egypt now since Sunday and of course the market fell out of bed this week.
Similarly, I thought I’d wrapped things up prior to leaving for band camp Sunday when two juicy papers appeared, the most recent in last Friday’s Cell (6 August). They involve molecular biology (but with plenty of chemistry involved). I have no idea how much molecular biology the average chemist has under the belt, so here are two links which should provide all the background you need. If not, let me know about it.
The first [ Science vol. 329 pp. 284 – 285, 336 – 339 ’10 (16 July ’10) ] consisted of an editorial with the great title “Hiding in Plain Sight” and a paper. Each position in our DNA codes for one of 4 nucleotides (bases) A, G, T, and C. A pair of nucleotides has 16 possibilities, not enough to code for the 20 amino acids making up our proteins. There are 64 possibilities for 3 nucleotides, certainly overkill, but that’s the way it is. A group of 3 contiguous nucleotides is called a codon, and all (except 3) code for an amino acid. Some of the 63 must code for the same amino acid. The 3 that don’t code for an amino acid are called stop codons, because they tell the ribosome to stop making the protein.
Assume a completely random sequence of A’s, G’s, T’s and C’s. How often will they happen to NOT have a stop codon in them. Well 30 nucleotides in a row (coding for 10 amino acids) will occur only 61% of the time. 60 nucleotides will not have a stop codon only 38% of the time, 300 nucleotides (coding for a protein of 100 amino acids) will lack a stop codon .8% of the time. Long stretches of nucleotides without stop codons are called open reading frames (ORFs). This is why protein prediction algorithms have a threshold of detection of 100 amino acids. I’m not clear how they deal with the interrupted (by introns) genes which we have.
This brings us to polished-rice (pri), a gene involved in the embryology of that workhorse of genetics, the fruitfly (Drosophila). For some reason people who work with Drosophila love weird names, and mutations in pri cause a lack of bristles — which the scientists working with Drosophila call shavenbaby (this has to make funding agencies look twice at what they’re doing).
So what does pri consist of? The pri gene codes for a large nonCoding RNA (not coding for protein in the current lingo). Why nonCoding? Because it’s got lots of stop codons in it. Nonetheless pri codes for 5 small peptides ranging in size from 11 to 32 amino acids, which are highly conserved among insects. What do they do? They direct removal of part of the protein coded for by the shavenbaby gene. So even though pri makes a large nonCoding RNA, it definitely has an impact, and here’s a gene that makes small peptides. Not only that but it’s coding for chains of amino acids large enough to be called peptides, but not large enough to be called proteins (exactly where you locate the boundary is a matter of taste).
So how many functional peptides might be hidden among the messenger RNAs (mRNAs) we already know about? Some 40% of Drosophila mRNAs contain such short open reading frames (called uORFs, presumably for microORF). The party is about to start with potentially a huge number of completely new players we knew nothing about until last month.
The second paper is even wilder [ Cell vol. 142 pp. 358 -360, 409 – 419 ’10 ]. A protein called p53 is a central player in whether cells (particularly cancer cells) live or die. Mutations cause a familial cancer syndrome (Li Fraumeni syndrome), and the gene is mutated in 50% of all human cancers. Tens of thousands of papers have been written about p53 since its discovery, and we are still finding out new things that it does. p53 turns on some genes (causes them to be transcribed into mRNA) and turns off others (inhibits their transcription into mRNA). One of the genes p53 causes to be transcribed codes for lincRNA-21, a classic example of protein chauvanism. The lincRNA acronym stands for large intergenic noncoding RNA near the gene that codes for another important target gene of p53, called p21. Intergenic of course means between genes coding for protein, noncoding means not coding for proteins.
p53 does a lot of different things depending on conditions. It can stop a cell from dividing if DNA is damaged (called cell cycle arrest). If the damage can’t be repaired p53 can cause the cell to commit suicide (Google apoptosis for details). What happens if the DNA for lincRNA is destroyed? The ability of p53 to cause apoptosis (but not cell cycle arrest) is lost. Clearly lincRNA-21 is an important player in a very important aspect of the life of any multicellular organism — preventing cancer. Is it a gene? Of course. Does it code for protein. No. Just how lincRNA-21 produces its effects remains to be worked out, but we do know that it interacts with a protein called (hnRNPK — don’t ask what it stands for) helping it to inhibit gene expression (prevent trancription into RNA).
Is it an aberration? Well, it turns out that over 50% of our genome is transcribed into protein (not just the 1.5% which codes for the amino acids of proteins). An early reference is Cell vol. 116 pp. 499 – 509 ’04.
So I’m far from convinced that we know all the kinds of genes there are, and we certainly don’t know all the things these two new classes of genes are doing in the cell. Stay tuned. Keep an open mind about what a gene actually is and can be.
Yesterday’s post on “In the Pipeline” concerned what to do if your leaving chemistry. I’d like to think that whole bunch of juicy drug targets has just been discovered, which should keep those chemists left in the drug industry quite busy. First, of course, we have to figure out what they’re doing.