My wife is very smart but she couldn’t follow what you are about to read because she has no conception of what the players look like (being an art history major etc. etc.) Chemists will have no such problem. Even so there’s quite a bit of background to get under your belt, which is why I wrote the two previous posts “Molecular Biology survival guide for Chemists I” and MBSGFC II.
While the war on cancer is far from won, We know a good deal more about cancer than at the inception of the “War on Cancer” in 1971. At least 20 tumor suppressors, proteins which prevent us from having various forms of cancer are now known and characterized. . Mutations which abolish their function or which decrease their abundance of their protein product increase the risk of cancer.
One of the most commonly mutated genes in cancer is the tumor suppressor whose acronym is PTEN. It is an enzyme which removes phosphates from a variety of lipids found in cell membranes. As a neurologist, I became interested in it early on because it is mutated in 30% of brain tumors.
There is a second gene for PTEN in our genome called PTENP1. Recall that the first amino acid coded for in every gene is methionine (by the AUG initiation codon). PTENP1 has a mutation in this codon, so the protein never gets made, even though the rest of the gene (coding for another 402 amino acids) is quite normal and is transcribed into mRNA. So PTEN1 is a pseudogene. Our genome contains a fair number of pseudogenes for proteins, in fact “Pseudogenes are almost as numerous as coding (for protein) genes, and represent a significant proportion of the transcriptome” [ Nature vol. 465 p. 1033 ’10 ]. The transcriptome is the collection of RNAs transcribed by RNA polymerases from DNA.
You can tell a lot about something by the name given to it. Pseudogenes were considered to be just another form of ‘junk’ DNA, stuff that sits in our DNA doing nothing useful for the cell. In fact, until recently, most of the genome was considered junk, and less than 5% of our 3.2 billion base pairs of DNA codes for the 20,000 or so proteins making us up.
Until about 10 years ago, molecular biology was incredibly protein-centric. Consider the following terms — nonsense codon, noncoding DNA, junk DNA. All are pejorative and arose from the view that all the genome does is code for protein. Nonsense codon means one of the 3 termination codons, which tells the ribosome to stop making protein. Noncoding DNA means not coding for protein (with the implication that DNA not coding for protein isn’t coding for anything).
Now here is where the chemistry comes in. Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.
Now with 403 amino acids the mRNA for PTEN and PTEN1 is at least 1209 nucleotides long, and the 3′ UTR of the mRNA contains another hundred nucleotides or so. This means that there’s room for several different microRNAs to bind here decreasing PTEN levels in the cell.
This is a general phenomenon. Most of our mRNAs have multiple sites in their 3′ UTR ready willing and able to bind microRNAs. In addition, most microRNAs bind to more than one mRNA. Notice what this really means. 3′ UTR stands for 3′ untranslated region, meaning that it isn’t translated into protein, yet the 3′ UTR certainly isn’t junk as it is intimately involved in the control of protein expression.
Some cancers (breast, colon) delete the gene for PTEN1. Why should the cancer bother if the PTEN1 gene doesn’t code for anything? Because the mRNA transcript of PTEN1 sops up microRNAs which would otherwise bind to PTEN mRNA leading to its destruction. The mRNA transcript of the PTEN1 (junk) gene acts as a decoy for the microRNAs. So the PTEN1 gene isn’t junk at all, but actually helps increase the levels of the PTEN protein which protects us against cancer (which is why some cancers delete it). You can read all about it in Nature vol. 465 pp. 1016 – 1017, 1033 – 1038 ’10.
So what is the master controller of PTEN levels? The short (and long) answer is that there isn’t one, just a bunch of feedback loops between levels of transcription of the microRNA genes and those of PTEN and PTEN1. We’re just getting into what controls the stability of microRNAs, and what controls their transcription (so there are almost certainly other levels of control).
Have a look at another post https://luysii.wordpress.com/2010/07/01/why-linearity-is-not-enough/ for why our understanding of anything involving multiple levels of feedback must remain incomplete. Chemistry is absolutely helpless to shed light on the way the control mechanisms interact with each other. It can explain each individual molecular interaction, but larger forces are at play here, and at a mathematical level you don’t even have to know that molecules are involved.
My cousin married a very smart PhD in electrical engineering 2 years ago. He wanted to get together and set up models of neurological and cellular function. I told him this was pointless, as we didn’t know all the players (the way Einstein didn’t know of two of the major forces of nature). It turns out that over 50% of most genomes studied (including ours) is transcribed into RNA (not necessarily mRNA coding for protein). Until recently it has been thought that this represented transcriptional chaff – like the turnings coming off a lathe – RNA polymerase will transcribe DNA, just as a CPU will try to execute any series of bits. I think it’s quite likely that the transcribed RNA isn’t chaff and that we’ve just found a whole new bunch of players.
Comments
Junk DNA is not all junk. But some of it may be junk, and there is nothing wrong with that. Plus, ‘noncoding’ is not the same as junk. An excellent article on this.
As for models, what you say about not capturing all the relevant players is quite true. But we need to keep in mind that models need not include all players since they are not really supposed to mirror ‘reality’. Models are simplified representations of reality and in most cases are supposed to do only one thing- work well enough for us to be able to use them. Consider force fields in molecular mechanics. They work well enough in many cases and are composed of classical ball-and-stick models which leave out a rather important player- the electrons! Yet we use them.
Wavefunction: Force fields are a good example (which helps me make the point I was trying for in the post). They may not properly describe the forces, but the (reasonable) assumption in molecular mechanics is that force fields are ALL you have to know. Forces control molecular mechanics. Getting better and better ones (say ones including electrons) will make things more accurate and realistic.
However in the cell we’re talking purely about control, and by players I mean controlling elements (or at least influencing elements, since everything in the cell is so nonlinear because of feedback). Not knowing the all the players involved in control is a far more serious deficiency than a poor force field. The only analogy I can think of would be something like the old Phlogiston controlling dynamics, something for which you could not take into account because you were ignorant of it. Or like poor Einstein trying to come up with a unified field theory (force theory) in his last years, while being ignorant of the weak force and the strong nuclear force.
The post concerns an example of what was formerly considered junk (pseudogenes were, trust me) and what was formerly considered transcriptional chaff (RNA that wasn’t coding for protein, tRNA or rRNA) controlling (or helping control) whether or not you get cancer.
This discussion (plus the nonlinearity discussion) reminds me of a recent episode of RadioLab (a show produced by NPR that discusses somewhat scientific issues) on the limits of scientific understanding. They discuss a computer program that can take experimental data and distill equations describing that data (http://ccsl.mae.cornell.edu/sites/default/files/Science09_Schmidt.pdf). They apply this program to some data from a bacterium and come up with a set of two equations describing the dynamics of nutrient levels in the bacterium. And, the equations that the program comes up with are absolutely correct. The problem? They have no idea how to derive these equations. (you can listen to the episode here: http://www.wnyc.org/shows/radiolab/episodes/2010/04/16/segments/149570).
You are quite right about the myriad factors involved in control that we don’t understand. That’s where parametrization comes in (something which, mind you, I have myself criticized). By parametrization based on experimental data, you can possibly take these very complex feedbacks implicitly into account. I am not saying that we are ready for a model of cellular behavior, but only that we need not necessarily have an atomistic-type description of a system to get useful results out of it.