Tag Archives: synonymous codon

The death of the synonymous codon – V

The coding capacity of our genome continues to amaze. The redundancy of the genetic code has been put to yet another use. Depending on how much you know, skip the following four links and read on. Otherwise all the background you need to understand the following is in them.

https://luysii.wordpress.com/2011/05/03/the-death-of-the-synonymous-codon/

https://luysii.wordpress.com/2011/05/09/the-death-of-the-synonymous-codon-ii/

https://luysii.wordpress.com/2014/01/05/the-death-of-the-synonymous-codon-iii/

https://luysii.wordpress.com/2014/04/03/the-death-of-the-synonymous-codon-iv/

There really is no way around the redundancy producing synonymous codons. If you want to code for 20 different amino acids with only four choices at each position, two positions (4^2) won’t do. You need three positions, which gives you 64 possibilities (61 after the three stop codons are taken into account) and the redundancy that comes along with it. The previous links show how the redundant codons for some amino acids aren’t redundant at all but used to code for the speed of translation, or for exonic splicing enhancers and inhibitors. Different codons for the same amino acid can produce wildly different effects leaving the amino acid sequence of a given protein alone.

The latest example — https://www.pnas.org/content/117/40/24936 Proc. Natl. Acad. Sci. vol. 117 pp. 24936 – 24046 ‘2 — is even more impressive, as it implies that our genome may be coding for way more proteins than we thought.

The work concerns Mitochondrial DNA Polymerase Gamma (POLG), which is a hotspot for mutations (with over 200 known) 4 of which cause fairly rare neurologic diseases.

Normally translation of mRNA into protein begins with something called an initator codon (AUG) which codes for methionine. However in the case of POLG, a CUG triplet (not AUG) located in the 5′ leader of POLG messenger RNA (mRNA) initiates translation almost as efficiently (∼60 to 70%) as an AUG in optimal context. This CUG directs translation of a conserved 260-triplet-long overlapping open reading frame (ORF) called  POLGARF (POLG Alternative Reading Frame — surely they could have come up something more euphonious).

Not only that but the reading frame is shifted down one (-1) meaning that the protein looks nothing like POLG, with a completely different amino acid composition. “We failed to find any significant similarity between POLGARF and other known or predicted proteins or any similarity with known structural motifs. It seems likely that POLGARF is an intrinsically disordered protein (IDP) with a remarkably high isoelectric point (pI =12.05 for a human protein).” They have no idea what POLGARF does.

Yet mammals make the protein. It gets more and more interesting because the CUG triplet is part of something called a MIR (Mammalian-wide Interspersed Repeat) which (based on comparative genomics with a lot of different animals), entered the POLG gene 135 million years ago.

Using the teleological reasoning typical of biology, POLGARF must be doing something useful, or it would have been mutated away, long ago.

The authors note that other mutations (even from one synonymous codon to another — hence the title of this post) could cause other diseases due to changes in POLGARF amino acid coding. So while different synonymous codons might code for the same amino acid in POLG, they probably code for something wildly different in POLGARF.

So the same segment of the genome is coding for two different proteins.

Is this a freak of nature? Hardly. We have over an estimated 368,000 mammalian interspersed repeats in our genome — https://en.wikipedia.org/wiki/Mammalian-wide_interspersed_repeat.

Could they be turning on transcription for other proteins that we hadn’t dreamed of. Algorithms looking for protein coding genes probably all look for AUG codons and then look for open reading frames following them.

As usual Shakespeare got there first “There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.”

Certainly the paper of the year for intellectual interest and speculation.

Phillip Anderson, 1923 – 202 R. I. P.

Phil Anderson probably never heard of Ludwig Mies Van Der Rohe, he of the Bauhaus and his famous dictum ‘less is more’, so he probably wasn’t riffing on it when he wrote “More Is Different” in August of 1970 [ Science vol. 177 pp. 393 – 396 ’72 ] — https://science.sciencemag.org/content/sci/177/4047/393.full.pdf.

I was just finishing residency and found it a very unusual paper for Science Magazine.  His Nobel was 5 years away, but Anderson was of sufficient stature that Science published it.  The article was a nonphilosophical attack on reductionism with lots of hard examples from solid state physics. It is definitely worth reading, if the link will let you.  The philosophic repercussions are still with us.

He notes that most scientists are reductionists.  He puts it this way ” The workings of our minds and bodies and of all the matter animate and inanimate of which we have any detailed knowledge, are assumed to be controlled by the same set of fundamental laws, which except under extreme conditions we feel we know pretty well.”

So many body physics/solid state physics obeys the laws of particle physics, chemistry obeys the laws of many body physics, molecular biology obeys the laws of chemistry, and onward and upward to psychology and the social sciences.

What he attacks is what appears to be a logical correlate of this, namely that understanding the fundamental laws allows you to derive from them the structure of the universe in which we live (including ourselves).   Chemistry really doesn’t predict molecular biology, and cellular molecular biology doesn’t really predict the existence of multicellular organisms.  This is because new phenomena arise at each level of increasing complexity, for which laws (e.g. regularities) appear which don’t have an explanation by reducing them the next fundamental level below.

Even though the last 48 years of molecular biology, biophysics have shown us a lot of new phenomena, they really weren’t predictable.  So they are a triumph of reductionism, and yet —

As soon as you get into biology you become impaled on the horns of the Cartesian dualism of flesh vs. spirit.  As soon as you ask what something is ‘for’ you realize that reductionism can’t help.  As an example I’ll repost an old one in which reductionism tells you exactly how something happens, but is absolutely silent on what that something is ‘for’

The limits of chemical reductionism

“Everything in chemistry turns blue or explodes”, so sayeth a philosophy major roommate years ago.  Chemists are used to being crapped on, because it starts so early and never lets up.  However, knowing a lot of organic chemistry and molecular biology allows you to see very clearly one answer to a serious philosophical question — when and where does scientific reductionism fail?

Early on, physicists said that quantum mechanics explains all of chemistry.  Well it does explain why atoms have orbitals, and it does give a few hints as to the nature of the chemical bond between simple atoms, but no one can solve the equations exactly for systems of chemical interest.  Approximate the solution, yes, but this is hardly a pure reduction of chemistry to physics.  So we’ve failed to reduce chemistry to physics because the equations of quantum mechanics are so hard to solve, but this is hardly a failure of reductionism.

The last post “The death of the synonymous codon – II” — https://luysii.wordpress.com/2011/05/09/the-death-of-the-synonymous-codon-ii/ –puts you exactly at the nidus of the failure of chemical reductionism to bag the biggest prey of all, an understanding of the living cell and with it of life itself.  We know the chemistry of nucleotides, Watson-Crick base pairing, and enzyme kinetics quite well.  We understand why less transfer RNA for a particular codon would mean slower protein synthesis.  Chemists understand what a protein conformation is, although we can’t predict it 100% of the time from the amino acid sequence.  So we do understand exactly why the same amino acid sequence using different codons would result in slower synthesis of gamma actin than beta actin, and why the slower synthesis would allow a more leisurely exploration of conformational space allowing gamma actin to find a conformation which would be modified by linking it to another protein (ubiquitin) leading to its destruction.  Not bad.  Not bad at all.

Now ask yourself, why the cell would want to have less gamma actin around than beta actin.  There is no conceivable explanation for this in terms of chemistry.  A better understanding of protein structure won’t give it to you.  Certainly, beta and gamma actin differ slightly in amino acid sequence (4/375) so their structure won’t be exactly the same.  Studying this till the cows come home won’t answer the question, as it’s on an entirely different level than chemistry.

Cellular and organismal molecular biology is full of questions like that, but gamma and beta actin are the closest chemists have come to explaining the disparity in the abundance of two closely related proteins on a purely chemical basis.

So there you have it.  Physicality has gone as far as it can go in explaining the mechanism of the effect, but has nothing to say whatsoever about why the effect is present.  It’s the Cartesian dualism between physicality and the realm of ideas, and you’ve just seen the junction between the two live and in color, happening right now in just about every cell inside you.  So the effect is not some trivial toy model someone made up.

Whether philosophers have the intellectual cojones to master all this chemistry and molecular biology is unclear.  Probably no one has tried (please correct me if I’m wrong).  They are certainly capable of mounting intellectual effort — they write book after book about Godel’s proof and the mathematical logic behind it. My guess is that they are attracted to such things because logic and math are so definitive, general and nonparticular.

Chemistry and molecular biology aren’t general this way.  We study a very arbitrary collection of molecules, which must simply be learned and dealt with. Amino acids are of one chirality. The alpha helix turns one way and not the other.  Our bodies use 20 particular amino acids not any of the zillions of possible amino acids chemists can make.  This sort of thing may turn off the philosophical mind which has a taste for the abstract and general (at least my roommates majoring in it were this way).

If you’re interested in how far reductionism can take us  have a look at http://wavefunction.fieldofscience.com/2011/04/dirac-bernstein-weinberg-and.html

Were my two philosopher roommates still alive, they might come up with something like “That’s how it works in practice, but how does it work in theory? 

The death of the synonymous codon – III

The coding capacity of our genome continues to amaze. The redundancy of the genetic code has been put to yet another use. Depending on how much you know, skip the following two links and read on. Otherwise all the background to understand the following is in them.

https://luysii.wordpress.com/2011/05/03/the-death-of-the-synonymous-codon/

https://luysii.wordpress.com/2011/05/09/the-death-of-the-synonymous-codon-ii/

There really was no way around it. If you want to code for 20 different amino acids with only four choices at each position, two positions (4^2) won’t do. You need three positions, which gives you 64 possibilities (61 after the three stop codons are taken into account) and the redundancy that comes with it. The previous links show how the redundant codons for some amino acids aren’t redundant at all but used to code for the speed of translation, or for exonic splicing enhancers and inhibitors. Different codons for the same amino acid can produce wildly different effects leaving the amino acid sequence of a given protein alone.

The following recent work [ Science vol. 342 pp. 1325 – 1326, 1367 – 1367 ’13 ] showed that transcription factors bind to the coding sequences of proteins, not just the promoters and enhancers found outside them as we had thought.

The principle behind the DNAaseI protection assay is pretty simple. Any protein binding to DNA protects it against DNAase I which chops it up. Then clone and sequence what’s left to see where proteins have bound to DNA. These are called footprints. They must have removed histones first, I imagine.

The work performed DNAaseI protection assays on a truly massive scale. They looked at 81 different cell types at nucleotide resolution. They found 11,000,000 footprints all together, about 1,000,000 per cell type. In a given cell type 25,000 were completely localized within exons (the parts of the gene actually specifying amino acids). When all the codons of the genome are looked at as a group, 1/7 of them are found in a footprint in one of the cell types.

The results wouldn’t have been that spectacular had they just looked at a few cell types. How do we know the binding sites contain transcription factors? Because the footprints match transcription factor recognition sequences.

We know that sequences around splice sites are used to code for splicing enhancers and inhibitors. Interestingly, the splice sites are generally depleted of DNAaseI footprints. Remember that splicing occurs after the gene has been transcribed.

At this point it isn’t clear how binding of a transcription factor in a protein coding region influences gene expression.

Just like a work of art, there is more than one way that DNA can mean. Remarkable !

The most interesting paper I’ve read in the past 5 years — finale

Recall from https://luysii.wordpress.com/2013/06/13/the-most-interesting-paper-ive-read-in-the-past-5-years-introduction-and-allegro/ that if you knew the ones and zeroes coding for the instruction your computer was currently working on you’d know exactly what it would do. Similarly, it has long been thought that, if you knew the sequence of the 4 letters of the genetic code (A, T, G, C) coding for a protein, you’d know exactly what would happen. The cellular machinery (the ribosome) producing output (a protein in this case) was thought to be an automaton similar to a computer blindly carrying out instructions. Assuming the machinery is intact, the cellular environment should have nothing to do with the protein produced. Not so. In what follows, I attempt to provide an abbreviated summary of the background you need to understand what goes wrong, and how, even here, environment rears its head.

If you find the following a bit terse, have a look at the https://luysii.wordpress.com/category/molecular-biology-survival-guide/ . In particular the earliest 3 articles (Roman numerals I, II and III) should be all you need).

We’ve learned that our DNA codes for lots of stuff that isn’t protein. In fact only 2% of it codes for the amino acids comprising our 20,000 proteins. Proteins are made of sequences of 20 different amino acids. Each amino acid is coded for by a sequence of 3 genetic code letters. However there are 64 possibilities for these sequences (4 * 4 * 4). 3 possibilities tell the machinery to quit (they don’t code for an amino acid). So some amino acids have as many as 6 codons (sequences of 3 letters) for them — e.g. Leucine (L) has 6 different codons (synonymous codons) for it while Methionine (M) has but 1. The other 18 amino acids fall somewhere between.

The cellular machine making proteins (the ribosome) uses the transcribed genetic code (mRNA) and a (relatively small) adapter, called transfer RNA (tRNA). There are 64 different tRNAs (61 for each codon specifying an amino acid and 3 telling the machine to stop). Each tRNA contains a sequence of 3 letters (the antiCodon) which exactly pairs with the codon sequence in the mRNA, the same way the letters (bases if you’re a chemist) in the two strands of DNA pair with each other. Hanging off the opposite end of each tRNA is the amino acid the antiCodon refers to. The ribosome basically stitches two amino acids from adjacent tRNAs together and then gets rid of one tRNA.

So which particular synonymous codon is found in the mRNA shouldn’t make any difference to the final product of the ribosome. That’s what the computer model of the cell tells us.

Since most cells are making protein all the time. There is lots of tRNA around. We need so much tRNA that instead of 64 genes (one for each tRNA) we have some 500 in our genome. So we have multiple different genes coding for each tRNA. I can’t find out how many of each we have (which would be very nice to know in what follows). The amount of tRNA of each of the 64 types is roughly proportional to the number of genes coding for it (the gene copy number) according to the papers cited below.

This brings us to codon usage. You have 6 different codons (synonymous codons) for leucine. Are they all used equally (when you look at every codon in the genome which codes for leucine)? They are not. Here are the percentages for the usages of the 6 distinct leucine codons in human DNA: 7, 7, 13, 13, 20, 40. For random use they should all be around 16. The most frequently appearing codon occurs as often as the least frequently used 4.

It turns out the the most used synonymous codons are the ones with the highest number of genes for the corresponding tRNA. Makes sense as there should be more of that synonymous tRNA around (at least in most cases) This is called codon bias, but I can’t seem to find the actual numbers.

This brings us (at last) to the actual paper [ Nature vol. 495 pp. 111 – 115 ’13 ] and the accompanying editorial (ibid. pp. 57 – 58). The paper says “codon-usage bias has been observed in almost all genomes and is thought to result from selection for efficient and accurate translation (into protein) of highly expressed genes” — 3 references given. Essentially this says that the more tRNA around matching a particular codon, the faster the mRNA will find it (le Chatelier’s principle in action).

An analogy at this point might help. When I was a kid, I hung around a print shop. In addition to high speed printing, there was also a printing press, where individual characters were selected from boxes of characters, placed on a line (this is where the font term leading comes from), and baked into place using some scary smelling stuff. This was so the same constellation of characters could be used over and over. For details see http://en.wikipedia.org/wiki/Printing_press. You can regard the 6 different tRNAs for leucine as 6 different fonts for the letter L. To make things right, the correct font must be chosen (by the printer or the ribosome). Obviously if a rare font is used, the printer will have to fumble more in the L box to come up with the right one. This is exactly le Chatelier’s principle.

The papers concern a protein (FRQ) used in the circadian clock of a fungus — evolutionarily far from us to be sure, but hang in there. Paradoxically, the FRQ gene uses a lot of ‘rare’ synonymous codons. Given the technology we have presently, the authors were able to switch the ‘rare’ synonymous codons to the most common ones. As expected, the organism made a lot more FRQ using the modified gene.

The fascinating point (to me at least) is that the protein, with exactly the same amino acids did not fulfill its function in the circadian clock. As expected there was more of the protein around (it was easier for the ribosome machinery to make).

Now I’ve always been amazed that the proteins making us up have just a few shapes, something I’d guess happens extremely rarely. For details see https://luysii.wordpress.com/2010/10/24/the-essential-strangeness-of-the-proteins-that-make-us-up/.

Well, as we know, proteins are just a linear string of amino acids, and they have to fold to their final shape. The protein made by codon optimization must not have had the proper shape. Why? For one thing the protein is broken down faster. For another it is less stable after freeze thaw cycles. For yet another, it just didn’t work correctly in the cell.

What does this mean? Most likely it means that the protein made from codon optimized mRNA has a different shape. The organism must make it more slowly so that it folds into the correct shape. Recall that the amino acid chain is extruded from one by one from the ribosome, like sausage from a sausage making machine. As it’s extruded the chain (often with help from other proteins called chaperones) flops around and finds its final shape.

Why is this so fascinating (to me at least)? Because here,in the very uterus of biologic determinism, the environment (how much of each type of synonymous tRNA is around) rears its head. Forests have been felled for papers on the heredity vs. environment question. Just as American GIs wrote “Kilroy was here” everywhere they went in WWII, here’s the environment popping up where no one thought it would.

In addition the implications for protein function, if this is a widespread phenomenon, are simply staggering.