Tag Archives: The influence of environment on heredity

The most interesting paper I’ve read in the past 5 years — finale

Recall from https://luysii.wordpress.com/2013/06/13/the-most-interesting-paper-ive-read-in-the-past-5-years-introduction-and-allegro/ that if you knew the ones and zeroes coding for the instruction your computer was currently working on you’d know exactly what it would do. Similarly, it has long been thought that, if you knew the sequence of the 4 letters of the genetic code (A, T, G, C) coding for a protein, you’d know exactly what would happen. The cellular machinery (the ribosome) producing output (a protein in this case) was thought to be an automaton similar to a computer blindly carrying out instructions. Assuming the machinery is intact, the cellular environment should have nothing to do with the protein produced. Not so. In what follows, I attempt to provide an abbreviated summary of the background you need to understand what goes wrong, and how, even here, environment rears its head.

If you find the following a bit terse, have a look at the https://luysii.wordpress.com/category/molecular-biology-survival-guide/ . In particular the earliest 3 articles (Roman numerals I, II and III) should be all you need).

We’ve learned that our DNA codes for lots of stuff that isn’t protein. In fact only 2% of it codes for the amino acids comprising our 20,000 proteins. Proteins are made of sequences of 20 different amino acids. Each amino acid is coded for by a sequence of 3 genetic code letters. However there are 64 possibilities for these sequences (4 * 4 * 4). 3 possibilities tell the machinery to quit (they don’t code for an amino acid). So some amino acids have as many as 6 codons (sequences of 3 letters) for them — e.g. Leucine (L) has 6 different codons (synonymous codons) for it while Methionine (M) has but 1. The other 18 amino acids fall somewhere between.

The cellular machine making proteins (the ribosome) uses the transcribed genetic code (mRNA) and a (relatively small) adapter, called transfer RNA (tRNA). There are 64 different tRNAs (61 for each codon specifying an amino acid and 3 telling the machine to stop). Each tRNA contains a sequence of 3 letters (the antiCodon) which exactly pairs with the codon sequence in the mRNA, the same way the letters (bases if you’re a chemist) in the two strands of DNA pair with each other. Hanging off the opposite end of each tRNA is the amino acid the antiCodon refers to. The ribosome basically stitches two amino acids from adjacent tRNAs together and then gets rid of one tRNA.

So which particular synonymous codon is found in the mRNA shouldn’t make any difference to the final product of the ribosome. That’s what the computer model of the cell tells us.

Since most cells are making protein all the time. There is lots of tRNA around. We need so much tRNA that instead of 64 genes (one for each tRNA) we have some 500 in our genome. So we have multiple different genes coding for each tRNA. I can’t find out how many of each we have (which would be very nice to know in what follows). The amount of tRNA of each of the 64 types is roughly proportional to the number of genes coding for it (the gene copy number) according to the papers cited below.

This brings us to codon usage. You have 6 different codons (synonymous codons) for leucine. Are they all used equally (when you look at every codon in the genome which codes for leucine)? They are not. Here are the percentages for the usages of the 6 distinct leucine codons in human DNA: 7, 7, 13, 13, 20, 40. For random use they should all be around 16. The most frequently appearing codon occurs as often as the least frequently used 4.

It turns out the the most used synonymous codons are the ones with the highest number of genes for the corresponding tRNA. Makes sense as there should be more of that synonymous tRNA around (at least in most cases) This is called codon bias, but I can’t seem to find the actual numbers.

This brings us (at last) to the actual paper [ Nature vol. 495 pp. 111 – 115 ’13 ] and the accompanying editorial (ibid. pp. 57 – 58). The paper says “codon-usage bias has been observed in almost all genomes and is thought to result from selection for efficient and accurate translation (into protein) of highly expressed genes” — 3 references given. Essentially this says that the more tRNA around matching a particular codon, the faster the mRNA will find it (le Chatelier’s principle in action).

An analogy at this point might help. When I was a kid, I hung around a print shop. In addition to high speed printing, there was also a printing press, where individual characters were selected from boxes of characters, placed on a line (this is where the font term leading comes from), and baked into place using some scary smelling stuff. This was so the same constellation of characters could be used over and over. For details see http://en.wikipedia.org/wiki/Printing_press. You can regard the 6 different tRNAs for leucine as 6 different fonts for the letter L. To make things right, the correct font must be chosen (by the printer or the ribosome). Obviously if a rare font is used, the printer will have to fumble more in the L box to come up with the right one. This is exactly le Chatelier’s principle.

The papers concern a protein (FRQ) used in the circadian clock of a fungus — evolutionarily far from us to be sure, but hang in there. Paradoxically, the FRQ gene uses a lot of ‘rare’ synonymous codons. Given the technology we have presently, the authors were able to switch the ‘rare’ synonymous codons to the most common ones. As expected, the organism made a lot more FRQ using the modified gene.

The fascinating point (to me at least) is that the protein, with exactly the same amino acids did not fulfill its function in the circadian clock. As expected there was more of the protein around (it was easier for the ribosome machinery to make).

Now I’ve always been amazed that the proteins making us up have just a few shapes, something I’d guess happens extremely rarely. For details see https://luysii.wordpress.com/2010/10/24/the-essential-strangeness-of-the-proteins-that-make-us-up/.

Well, as we know, proteins are just a linear string of amino acids, and they have to fold to their final shape. The protein made by codon optimization must not have had the proper shape. Why? For one thing the protein is broken down faster. For another it is less stable after freeze thaw cycles. For yet another, it just didn’t work correctly in the cell.

What does this mean? Most likely it means that the protein made from codon optimized mRNA has a different shape. The organism must make it more slowly so that it folds into the correct shape. Recall that the amino acid chain is extruded from one by one from the ribosome, like sausage from a sausage making machine. As it’s extruded the chain (often with help from other proteins called chaperones) flops around and finds its final shape.

Why is this so fascinating (to me at least)? Because here,in the very uterus of biologic determinism, the environment (how much of each type of synonymous tRNA is around) rears its head. Forests have been felled for papers on the heredity vs. environment question. Just as American GIs wrote “Kilroy was here” everywhere they went in WWII, here’s the environment popping up where no one thought it would.

In addition the implications for protein function, if this is a widespread phenomenon, are simply staggering.

Have Tibetans illuminated a path to the dark matter (of the genome)?

I speak not of the Dalai Lama’s path to enlightenment (despite the title).  Tall people tend to have tall kids. Eye color and hair color is also hereditary to some extent.  Pitched battles have been fought over just how much of intelligence (assuming one can measure it) is heritable.  Now that genome sequencing is approaching a price of $1,000/genome, people have started to look at variants in the genome to help them find the genetic contribution to various diseases, in the hopes of understanding andtreating them better.

Frankly, it’s been pretty much of a bust.  Height is something which is 80% heritable, yet the 20 leading candidate variants picked up by genome wide association studies (GWAS) account for 3% of the variance [ Nature vol. 461 pp. 458 – 459 ’09 ].  This has happened again and again particularly with diseases.  A candidate gene (or region of the genome), say for schizophrenia, or autism,  is described in one study, only to be shot down by the next.   This is likely due to the fact that many different genetic defects can be associated with schizophrenia — there are a lot of ways the brain cannot work well.  For details — see https://luysii.wordpress.com/2010/04/25/tolstoy-was-right-about-hereditary-diseases-imagine-that/. or see https://luysii.wordpress.com/2010/07/29/tolstoy-rides-again-autism-spectrum-disorder/.

Typically, even when an  association of a disease with a genetic variant is found, the variant only increases the risk of the disorder by 2% or less.  The bad thing is that when you lump them all of the variants you’ve discovered together (for something like height) and add up the risk, you never account for over 50% of the heredity.  It isn’t for want of looking as by 2010 some 600 human GWAS studies had been published  [ Neuron vol. 68 p. 182 ’10 ].  Yet lots of the studies have shown various disease to have a degree of heritability (particularly schizophrenia).  The fact that we’ve been unable to find the DNA variants causing the heritability was totally unexpected.  Like the dark matter in galaxies, which we know is there by the way the stars spin around the galactic center, this missing heritability has been called the  dark matter of the genome.

Which brings us to Proc. Natl. Acad. Sci. vol. 109 pp. 7391 – 7396 ’12.  It concerns an awful disease causing blindness in kids called Leber’s hereditary optic neuropathy.  The ’cause’ has been found. It is a change of 1 base from thymine to cytosine in the gene for a protein (NADH dehydrogenase subunit 1) causing a change at amino acid #30 from tyrosine to histidine.  The mutation is found in mitochondrial DNA not nuclear DNA, making it easier to find (it occurs at position 3394 of the 16,569 nucleotide mitochondrial DNA).

Mitochondria in animal cells, and chloroplasts in plant cells, are remnants of bacteria which moved inside cells as we know them today (rest in peace Lynn Margulis).

Some 25% of Tibetans have the 3394 T–>C mutations, but they see just fine.  It appears to be an adaptation to altitude, because the same mutation is found in nonTibetans on the Indian subcontinent living about 1500 meters (about as high as Denver).  However, if you have the same genetic change living below this altitude you get Lebers.

This is a spectacular demonstration of the influence of environment on heredity.  Granted that the altitude you live at is a fairly impressive environmental change, but it’s at least possible that more subtle changes (temperature, humidity, air conditions etc. etc.) might also influence disease susceptibility to the same genetic variant.  This certainly is one possible explanation for the failure of GWAS to turn up much.  The authors make no mention of this in their paper, so these ideas may actually be (drumroll please) original.

If such environmental influences on the phenotypic expression of genetic changes are common, it might be yet another explanation for why drug discovery is so hard.  Consider CETP (Cholesterol Ester Transfer Protein) and the very expensive failure of drugs inhibiting it. Torcetrapib was associated with increased deaths in a trial of 15,000 people for 18 – 20 months.  Perhaps those dying somehow lived in a different environment.  Perhaps others were actually helped by the drug