Tag Archives: initiator codon

The death of the synonymous codon – V

The coding capacity of our genome continues to amaze. The redundancy of the genetic code has been put to yet another use. Depending on how much you know, skip the following four links and read on. Otherwise all the background you need to understand the following is in them.





There really is no way around the redundancy producing synonymous codons. If you want to code for 20 different amino acids with only four choices at each position, two positions (4^2) won’t do. You need three positions, which gives you 64 possibilities (61 after the three stop codons are taken into account) and the redundancy that comes along with it. The previous links show how the redundant codons for some amino acids aren’t redundant at all but used to code for the speed of translation, or for exonic splicing enhancers and inhibitors. Different codons for the same amino acid can produce wildly different effects leaving the amino acid sequence of a given protein alone.

The latest example — https://www.pnas.org/content/117/40/24936 Proc. Natl. Acad. Sci. vol. 117 pp. 24936 – 24046 ‘2 — is even more impressive, as it implies that our genome may be coding for way more proteins than we thought.

The work concerns Mitochondrial DNA Polymerase Gamma (POLG), which is a hotspot for mutations (with over 200 known) 4 of which cause fairly rare neurologic diseases.

Normally translation of mRNA into protein begins with something called an initator codon (AUG) which codes for methionine. However in the case of POLG, a CUG triplet (not AUG) located in the 5′ leader of POLG messenger RNA (mRNA) initiates translation almost as efficiently (∼60 to 70%) as an AUG in optimal context. This CUG directs translation of a conserved 260-triplet-long overlapping open reading frame (ORF) called  POLGARF (POLG Alternative Reading Frame — surely they could have come up something more euphonious).

Not only that but the reading frame is shifted down one (-1) meaning that the protein looks nothing like POLG, with a completely different amino acid composition. “We failed to find any significant similarity between POLGARF and other known or predicted proteins or any similarity with known structural motifs. It seems likely that POLGARF is an intrinsically disordered protein (IDP) with a remarkably high isoelectric point (pI =12.05 for a human protein).” They have no idea what POLGARF does.

Yet mammals make the protein. It gets more and more interesting because the CUG triplet is part of something called a MIR (Mammalian-wide Interspersed Repeat) which (based on comparative genomics with a lot of different animals), entered the POLG gene 135 million years ago.

Using the teleological reasoning typical of biology, POLGARF must be doing something useful, or it would have been mutated away, long ago.

The authors note that other mutations (even from one synonymous codon to another — hence the title of this post) could cause other diseases due to changes in POLGARF amino acid coding. So while different synonymous codons might code for the same amino acid in POLG, they probably code for something wildly different in POLGARF.

So the same segment of the genome is coding for two different proteins.

Is this a freak of nature? Hardly. We have over an estimated 368,000 mammalian interspersed repeats in our genome — https://en.wikipedia.org/wiki/Mammalian-wide_interspersed_repeat.

Could they be turning on transcription for other proteins that we hadn’t dreamed of. Algorithms looking for protein coding genes probably all look for AUG codons and then look for open reading frames following them.

As usual Shakespeare got there first “There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.”

Certainly the paper of the year for intellectual interest and speculation.

smORFs, dwORFs and now uORFs

A recent post described small Open Reading Frames (smORFs) and DWarf Open Reading Frames (DWORFS) — see the link at the bottom. Now it’s time for uORFs (upstream Open Reading Frames). Upstream of what you might ask? Well messenger RNA is grabbed by the ribosome at one end (called the 5′ end). The current thinking was that the ribosome marched along the mRNA from the 5′ to the 3′ direction looking for the sequence Adenine Uridine Guanine (AUG) which codes for methionine. It then begins reading the mRNA 3 nucleotides at a time and tacking amino acids onto the methionine. This is called translating mRNA into protein. What about the 5′ end of the mRNA before the AUG is reached (perhaps hundreds of nucleotides later) — it isn’t translated which is why its called the 5′ UTR (5′ UnTranslated Region). In bacteria its only a few nucleotides, but our 5′ UTRs can have thousands — https://en.wikipedia.org/wiki/Five_prime_untranslated_region.

Two other terms of art are upstream and downstream. Since the ribosome flows from 5′ to 3′ on mRNA, any nucleotide 5′ to a given point is called upstream, and anything 3′ is called downstream. Logical terminology — what a pleasure.

So a uORF is an upstream Open Reading Frame. Upstream to what? Why to the AUG (the initiator codon). The assumption had always been that since there was no initiator AUG codon on this region — that proteins couldn’t be made from the uORF. Wrong.

This is where [ Science vol. 351 p. 465 aad2867 – 1 –> 9 ’16 ] comes in. It turns out that the ribosome can translate some of these uORFs in protein, and the paper describes a clever technique (called 3T) they developed to find them. One of the problems in finding uORF proteins is that some are quite small, and are missed in the usual protein assays. One uORF from ATF4 contains only3 amino acids which is so small that mass spectrometry can’t see it.

The paper makes the amazing statement that — Nearly half of all mammalian mRNAs harbor uORFs in the 5′ UTRs, and many are initiated with nonAUG start codons. They may be a general mechanism to regulate downstream coding sequence expression and gives two citations that I must have missed in my reading .

For instance Binding immunoglobulin Protein (BiP aka Heat Shock Protein family A member 5 – HSPA5 ) contains uORFs exclusively initiated by UUG and CUG start codons (not AUG).

What might the functions of uORF actually be? The obvious one is that the proteins made from them might actually be doing something. What could a 3 amino acid protein possibly do? Lots. Consider thyrotropin releasing hormone which helps control your thyroid — it is pyroglutamic acid histidine proline. Then there is met-encephalin which has 5 amino acids and is one of the endogenous opiate peptides your brain uses.

Another possibility is that just translating the uORF into protein controls the translation of the protein starting with the AUG codon. This isn’t so far fetched. A recent paper [ Nature vol. 529 pp. 551 – 554 ’16 ] gave a 3 dimensional structure for RNA polymerase II transcribing a DNA template into mRNA. The authoress (Carrie Bernecky) was kind enough to supply the dimensions of the complex when I wrote her. Remember you can consider the DNA double helix as a cylinder 20 Angstroms in diameter. It is roughly 150 x 150 x 160 Angstroms. Figuring 3 stacked nucleotides/10 Angstroms, this is enough to obstruct 45 nucleotides of DNA upstream of the actual start site.

This is just another example of room at the bottom, where all sorts of small molecule metabolites, small RNAs, small DNAs are just being unearthed and their structure determined. For more on this please see the following link


How ‘simple’ can a protein be and still have a significant biological effect

Words only have meaning in the context of the much larger collection of words we call language. So it is with proteins. Their only ‘meaning’ is the biologic effects they produce in the much larger collection of proteins, lipids, sugars, metabolites, cells and tissues of an organism.

So how ‘simple’ can a protein be and still produce a meaningful effect? As Bill Clinton would say, that depends on what you mean by simple. Well one way a protein can be simple is by only having a few amino acids. Met-enkephalin, an endogenous opiate, contains only 5 amino acids. Now many wouldn’t consider met-enkehalin a protein, calling it a polypeptide instead. But the boundary between polypeptide and protein is as fluid and ill-defined as a few grains of sand and a pile of it.

Another way to define simple, is by having most of the protein made up by just a few of the 20 amino acids. Collagen is a good example. Nearly half of it is glycine and proline (and a modified proline called hydroxyProline), leaving the other 18 amino acids to make up the rest. Collagen is big despite being simple — a single molecule has a mass of 285 kiloDaltons.

This brings us to [ Proc. Natl. Acad. Sci. vol 112 pp. E4717 – E4727 ’15 ] They constructed a protein/polypeptide of 26 amino acids of which 25 are either leucine or isoleucine. The 26th amino acid is methionine (which is found at the very amino terminal end of all proteins — remember methionine is always the initiator codon).

What does it do? It causes tumors. How so? It binds to the transmembrane domain of the beta variant for the receptor for Platelet Derived Growth factor (PDGFRbeta). The receptor when turned on causes cells to proliferate.

What is the smallest known oncoprotein? It is the E5 protein of Bovine PapillomaVirus (BPV), which is an essentially a free standing transmembrane domain (which also binds to PDGFRbeta). It has only 44 amino acids.

Well we have 26 letters + a space. I leave it to you to choose 3 of them, use one of them once, the other two 25 times, with as many spaces as you want and construct a meaningful sequence from them (in any language using the English alphabet).

Just back from an Adult Chamber Music Festival (aka Band Camp for Adults).  More about that in a future post

Are you as smart as the (inanimate) blind watchmaker

Here’s a problem the cell has solved. Can you? Figure out a way to send a protein to two different membranes in the cell (the membrane encoding it { aka the plasma membrane }, and the endoplasmic reticulum) in the proportions you wish.

The proteins must have exactly the same sequence and content of amino acids, ruling out alternative splicing of exons in the mRNA (if this is Greek to you have a look at the following post — https://luysii.wordpress.com/2012/01/09/molecular-biology-survival-guide-for-chemists-v-the-ribosome/ and the others collected under — https://luysii.wordpress.com/category/molecular-biology-survival-guide/).

The following article tells you how the cell does it. Recall that not all of the messenger RNA (mRNA) is translated into protein. The ribosome latches on to the 5′ end of the mRNA,  subsequently moving toward the 3′ end until it finds the initiator codon (AUG which codes for methionine). This means that there is a 5′ untranslated region (5′ UTR). It then continues moving 3′ ward stitching amino acids together.  Similarly after the ribosome reaches the last codon (one of 3 stop codons) there is a 3′ untranslated region (3′ UTR) of the mRNA. The 3′ UTR isn’t left alone but is cleaved and a polyAdenine tail added to it. The 3′ UTR is where most microRNAs bind controlling mRNA stability (hence the amount of protein produced from a given mRNA).

The trick used by the cell is described in [ Nature vol. 522 pp. 363 – 367 ’15 ]. The 3’UTR is alternatively processed producing a variety of short and long 3’UTRs. One such protein where this happens is CD47 — which is found on the surface of most cells where it stops the cell from being eaten by scavenger cells such as macrophages. The long 3′ UTR of CD47 allows efficient cell surface expression, while the short 3′ UTR localizes it to the endoplasmic reticulum.

How could this possibly work? Once the protein is translated by the ribosome, it leaves the ribosome and the mRNA doesn’t it? Not quite.

They say that the long 3′ UTR of CD47 acts as a scaffold to recruit a protein complex which contains HuR (aka ELAVL1), an RNA binding protein and SET to the site of translation. The allows interaction of SET with the newly translated cytoplasmic domains of CD47, resulting in subsequent translocation of CD47 to the plasma membrane via activated RAC1.

The short 3′ UTR of CD47 doesn’t have the sequence binding HuR and SET, so CD47 doesn’t get to the plasma membrane, rather to the endoplasmic reticulum.

The mechanism may be quite general as HuR binds to thousands of mRNAs. The paper gives two more examples of proteins where this happens.

It’s also worth noting that all this exquisite control, does NOT involve covalent bond formation and breakage (e.g. not what we consider classic chemical reactions). Instead it’s the dance of one large molecular object binding to another in other ways. The classic chemist isn’t smiling. The physical chemist is.