Tag Archives: Termination Codon

A synonymous codon that isn’t

Molecular biology is simply too elegant and beautiful to be left to the molecular biologists.  So I’m going to present the intriguing result of a recent paper as I would take notes on it for myself, and then unpack it explaining the various terms contained as I go along.

It you’re really adventurous — start reading a series of 5 posts I wrote starting with https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ and follow the links.

It should explain everything in the paper below.

The paper itself is Nature vol. 602 pp. 335 – 342 ’22 — https://www.nature.com/articles/s41586-022-04451-4.pdf.

The unvarnished result:  Just mutating glutamine to lysine at position 61 of the KRAS oncogene (Q61K)isn’t enough to make KRAS resistant to an anticancer drug that attacks it (Osimertinib).  One of the synonymous codons for glycine at position 60 must be switched to another.

OK:  let’s unpack this starting with synonymous codon.

The DNA making up our genome is a string of elements (nucleotides also known as bases) strung together.  Similarly, our proteins are strings of elements (amino acids).  The order is crucial; just as it is with the 26 letters making up words. Consider the two words united and untied.

Bases come on 4 varieties (A, T, G and C).  Amino acids come in twenty varieties (of which three are glycine (G), Glutamine (Q) and lysine (K) — the one letter abbreviations don’t make much sense but that’s the way it is.

Since order of both bases and amino acids are important, it’s clear that  A T and T A are different. 2 bases  can only code for 16 amino acids.  Go up to 3 bases and you can code for 64 amino acids, which is overkill.   A sequence of 3 bases is called a codon. All 64 codons   code for an amino acid (except for three of them about which much more later).  This means that there must be several codons coding for the same amino acid —  these are the synonymous codons.

The number of codons for a given amino acid ranges from 1 (methionine M) to 6 (Leucine L).  Here are the 4 synonymous codons for glycine — GGA, GGC, GGG and GGT.  Note how similar they are.

Now the human genome has 3,200,000,000 bases strung together divided into 46 pieces (the chromosomes).  If placed end to end (Dorothy Parker fashion) they would be 3 feet 3 inches (1 meter) long.  All this is in a cell so small it is invisible to the naked eye.   If this is too much to get your head around, you might enjoy the following series of 6 posts — start here and follow the links https://luysii.wordpress.com/2010/03/22/the-cell-nucleus-and-its-dna-on-a-human-scale-i/

Any 3 bases linked together code for an amino acid, but there are many different ways to ‘read’ the genome. Among the many proteins our genome codes for are the transcription factors (1,639 of them as of 2018) which bind to stretches of 10 or more bases, to activate certain genes.   That’s 4^10 possibilities (over a million) allowing a unique binding site for the 1,639.  So transcription factors read the genome in groups of 10 or so not 3.

There is yet another way to read the genome, and this has to do with the fact the genes coding for proteins are much longer (have more bases) than the 3 times the number of amino acids they code for.  The classic example is dystrophin, a gene mutated in Duchenne muscular dystrophy.  It’s a monster protein with 3,685 amino acids — so it needs 3,685 *3 = 11,055 bases in a row to code for them at 3 bases/amino acids.  The dystrophin gene, however, stretches for 2,220,223  bases.  So the protein coding parts of the gene (the exons) come in 79 different pieces separated by parts that don’t code for amino acids (the introns).

I’m skipping a lot here, but the introns must be spliced out of a copy of the gene (mRNA).  Again the genome is read by yet another machine (the spliceosome) which removes introns from newly formed copies of the gene (the mRNA).  The spliceosome is a huge molecular machine containing 5 RNAs (called small nuclear RNAs, aka snRNAs), along with 50 or so proteins with a total molecular mass again of around 2,500,000 kiloDaltons (a carbon atom is 12 Daltons).  Most proteins have introns and exons, and most of them exist in multiple forms due to alternative splicing of introns.  The spliceosome reads the mRNA in 6 – 8 base chunks looking for sites (splicing sites) to bind and begin splicing out introns. Yet another way to ‘read’ a sequence of bases.   Exon sequences which promote or repress alternative splicing sites are known (these are called EXE == exonic splicing enhancers, and ESSs = exonic splicing suppressors).

And now, at very long last, we get to the four synonymous codons of glycine which aren’t functionally synonymous at all.  This isn’t trivial: they determine the base sequence a mutated gene must have to produce cancer.

Here’s the unvarnished result once again — Just mutating glutamine to lysine at position 61 of the KRAS oncogene (Q61K) isn’t enough to make KRAS resistant to an anticancer drug that attacks it (Osimertinib).  One of the synonymous codons for glycine at position 60 must be switched to another.

What is KRAS?  A protein which gets its name from a virus causing cancer in rats.  Kirsten RAt Sarcoma virus.  KRAS, when active, relays signals from outside the cell to the nucleus to make the cell proliferate.  The protein exists in active and inactive forms.  Humans have KRAS, and 3 similar proteins.  Mutations causing  members of the protein family to remain in constantly active form are found in 1/3 of all cancers.  In the case of KRAS some activating mutations occur at positions 60 and 61 of the 189 amino acid protein.  That’s all it takes.

The codon for glutamine at position 61 in KRAS is CAA.  To change it to the codon for lysine requires a change of just one base e.g. from CAA (glutamine) to AAA (lysine) and now you have  a KRAS which is always active producing cancer.

Recall that glycine has 4 codons (GGA, GGC, GGG and GGT).  The one found in unmutated KRAS is GGT.  This codon is never found in the KRAS Q61K mutant seen in tumors.  Why?  Because GGTAAA forms a splice site which the splicing machine uses to cut out a different set of introns going to an exon.  This exon contains one of the 3 codons  mentioned above not coding for an amino acid.  They are called termination codons or stop codons, and tell the machinery making mRNA from DNA to quit.   This means that the full mutated  KRAS with its 188 amino acids is never made.  So tumor producing KRAS has GGGAAA or GGAAAA or GGCAAA at positions 60 and 61 and never GGTAAA

So the 3 synonymous glycine codons have very nonsynonymous effects.  Now you know.  Elegant isn’t it?

 

 

Duchenne muscular dystrophy — a novel genetic treatment

Could the innumerable genetic defects underlying Duchenne muscular dystrophy all be treated the same way?  Possibly.  Paradoxically, the treatment involves actually making the gene  even worse.

Understanding how and why this might work involves a very deep dive into molecular biology.  You might start by looking at the series of five background articles I wrote — start at https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ and follow the links.

I have a personal interest in Duchenne muscular dystrophy because I ran such a clinic from ’72 to ’87 watching young boys and adolescents die from it.  The major advance during that time, was NOT medical or anything I did, but lighter braces, so the boys could stay ambulatory longer.  Things have improved as survival has improved by a decade so they die in their late 20s.

So lets start.  Duchenne muscular dystrophy is caused by a mutation in the gene coding for dystrophin, a large (3,685 amino acids) protein which ties the contractile apparatus of the muscle cell (actin and myosin) to the cell membrane. Although it isn’t the largest protein we have — titin, another muscle protein with 34,350 amino acids is, the gene for dystrophin is the largest we have, weighing in at 2,220,233 nucleotides.  This is why Duchenne is one of the most common diseases due to a defect in a single gene, the gene is so large that lots of things can (and do) go wrong with it.

The gene comes in 79 pieces (exons) which account for under 1/200 of the nucleotides of the gene.  The rest must be spliced out and discarded.  Have a look at http://www.dmd.nl.  to see what can go wrong — the commonest is deletion of parts of the gene (60 – 70% of cases), followed by duplication of other parts (10% of cases) with the rest being mutations that change one amino acid to another.

Duchenne isn’t like cystic fibrosis where some 600 different mutations in the causative CFTR gene were known by 2003 but with 90% of cases due to just one.  So any genetic treatment for that young boy sitting in front of you had better be personalized to his particular mutation.

Or should it?

Possibly not.  We’ll need to discuss 3 things first

l. Nonsense Mediated Decay (NMD)

2. Nonsense Induced Transcriptional Compensation (NITC).

3. The MDX mouse model of Duchenne muscular dystrophy

Nonsense mediated decay.  Nonsense is a poor term, because the 3 nonSense codons (out of 64 possible) tell the ribosome to stop translating mRNA into protein and drop off the mRNA.  That isn’t nonsense.  I prefer stop codon, or termination codon

An an incredibly clever piece of business tells the ribosome (which is after all an inanimate object) when a stop codon occurs too early in the mRNA when there are a bunch of codons afterwards needed to make up the whole protein.

Lets go back to dystrophin and its 79 exons, and the fact that 99.5% of the gene is made of introns which are spliced out.   Remember the mRNA starts at the 5′ end and ends at the 3′ end.  The ribosome reads and translates it from 5′ to 3′. When an intron is spliced out, a protein complex of several proteins is placed on the mRNA some 20 – 24 basepairs 5′ to the splice site (this happens in the nucleus way before the mRNA gets near a ribosome in the cytoplasm).  The complex is called the Exon Junction Complex (EJC). The ribosome then happily munches along the mRNA from 5′ to 3′ knocking off the EJCs as it moves, until it hits a termination codon and drops off.

Over 95% of  genes do not have introns after the termination codon.  What happens if it does? Well then it is called a premature termination codon (PTC) and there is usually an EJC 3′ (downstream) to it.  If a termination codon is present 50 -55 nucleotides 5′ (upstream) to an EJC then NMD occurs.

Whenever any termination codon is reached, release protein factors (eRF1, eRF3, SMG1) bind to the mRNA.  It there is an EJC around (which there shouldn’t be) the interaction between the two complexes triggers phosphorylation of one of EJC proteins, triggering NMD.

So that’s how NMD happens, when there is a PTC.  Clever no?

Nonsense Induced Transcriptional Compensation (NITC).  I realize that this is a lot to throw at you, but a treatment for Duchenne is worth the effort (not to mention other genetic diseases in which the mechanism to be described also applies).

NITC is something I never heard about until two papers appearing in the 13 April Nature (vol. 568 pp. 179 – 180 (editorial), 193 – 197, 259 – 263).  Ever since we could knock out by placing a PTC early (near the 5′ end) of the gene we’ve been surprised by some of the results –e.g. knocking out some genes thought to be crucial had little or no effect.  Other technologies which didn’t affect the gene, but which decreased the expression of the mRNA (such as RNA interference, aka Post-Transcriptional gene silencing — PTGS) did have big phenotypic effects.

This turns out to be due NITC, which turns out to be due to increased transcription of genes which are ancestrally related to the mutant. Gene.  Hard to believe.

Time to go back to NMD.  It doesn’t break mRNA down nucleotide by nucleotide, but fragments it.  These fragments get into the nucleus, and bind to complementary genomic sequences of the gene containing the PTC, and also to genes ancestrally related to the mutant gene (so they’ll have similar nucleotide sequences). Then epigenetics takes over because the fragments recruit the COMPASS complex which catalyzes the formation of H3K4Me3 which is part of the histone code which helps turn on transcription of the gene.  The sequence similarity of ancestrally related genes, allows them and only them to be turned on by NITC.  Even cleverer than finding a PTC by the ribosome.

Something so incredible needs evidence.  Well heterozygotic zebrafish can bemade to have one normal gene and one with a PTC. What do you think happens?  The normal gene is upregulated (e.g. more is made).  Pretty good.

Finally the Mdx mouse.  I’ve been reading about it for years.  It has a PTC in exon 23 of the dystrophin gene, resulting in a protein only 27% as long as it should be.  All sorts of therapeutic maneuvers have been tried on it.  Now any drug development chemist will tell you that animal models are lousy, but they’re all we’ve got.

The remarkable thing about the mdx mouse, is that they don’t get weak.  They do have muscle pathology.  All the verbiage above probably explains why.

So to treat ALL forms of Duchenne put in a premature termination codon (PTC) in exon #23 of the human gene. It should work as there are  4 dystrophin related proteins scattered around the genome — their names are — utrophin, dystrophin related protein 2 (DRP2), alpha dystrobrevin, and beta dystrobrevin

There is an even better way to look for a place to put a PTC in the dystrophin gene.  Our genomes are filled with errors — for details see — https://luysii.wordpress.com/2018/05/01/how-badly-are-thy-genomes-oh-humanity-take-ii/.

There are lots of very normal people around with supposedly lethal mutations (including PTCs) in their genomes.  Probably scattered about various labs are at least 1,000,000 exome sequences in presumably normal people.  I’m not sure how much clinical information about them is available (other than that they are normal).  Hopeful their sex is.  Look at the dystrophin gene of normal males (females can be perfectly healthy carrying a mutant dystrophin gene as it is found on the X chromosome and they have 2) and see if PTCs are to be found.  You can’t have a better animal model than that.

At over 1,000 words this is the longest post I’ve written, and hopefully the most useful.

Is that mutation significant?

Face it, our genomes are a real mess. A study of just the parts of the genome coding for amino acids (2% at most) in about 2,500 people found an average of 205 variants which change the amino acid coded for IN EACH PERSON. Each person also had an average of 3 termination codons in the 15,000+ protein coding sequences they studied. So they are wandering around with 3 abnormally short proteins. You can read more about it in this old post –https://luysii.wordpress.com/2012/07/31/how-badly-are-thy-genomes-oh-humanity/

Here’s the problem — these people were healthy. Obviously, not a problem for them, but a big problem for physicians attempting to do genetic counseling. For how it affected epilepsy counseling see — https://luysii.wordpress.com/2011/07/17/weve-found-the-mutation-causing-your-disease-not-so-fast-says-this-paper/.

This brings us to Lynch syndrome (aka Hereditary NonPolyposis Colorectal Cancer — HNPCC). It is a familial cancer syndrome, and we now know what the problem is — mutations in any of four genes involved in a type of DNA mutation repair (there are many). The genes are called MSH2, MSH6, MLH1 and PMS2 (acronyms all whose names you don’t need to know) and the type of repair is called MisMatch Repair (MMR).

This isn’t academic at all. Suppose your aunt comes down with colon cancer and you get tested for mutations in one of the four, and a mutation is found. You’re fine now. The question before the house is — should you have your colon out? Colonoscopy won’t help because this kind of colon cancer doesn’t arise from polyps (which is what colonoscopy is looking for).

The problem is that the 4 genes are ‘peppered’ with missense variants (change the amino acid coded for). They are called VUS (Variants of Unknown Significance). The following paper [ Proc. Natl. Acad. Sci. vol. 113 pp. 3918 – 3820, 4128 – 4133 ’16 ] used a clever way to test a VUS for significance. This would have been impossible 5 years ago. What they did was use CRISPR to introduce the variant into the appropriate protein in mouse Embryonic Stem cells. Then they tested the manipulated stem cells for defects in MisMatch Repair. They tested 59 (yes fifty-nine) such VUSs and found that about 1/3 (19) produced MMR defects.

Fascinating time to be alive and reading about all this stuff.

The Bach Fugue of the Genome

There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy.
– Hamlet (1.5.167-8), Hamlet to Horatio

Just when you thought we’d figured out what genomes could do, the virusoid of rice yellow mottle virus performs a feat of dense coding I’d have thought impossible. The following work requires a fairly sophisticated understanding of molecular biology which the articles in “Molecular Biology Survival Guide for Chemists” might provide the background. Give it a shot. This is fascinating stuff. If the following seems incomprehensible, start with –https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ and then follow the links forward.

Virusoids are single stranded circular RNAs which are dependent on a virus for replication. They are distinct from viroids because viroids need nothing else to replicate. Neither the virusoid or the viroid were thought to code for protein (until now). They are usually found inside the protein shells of plant viruses.

[ Proc. Natl. Acad. Sci. vol. 111 pp. 14542 – 14547 ’14 ] Viroids and virusoids (viroid like satellite RNAs) are small (220 – 450 nucleotide) covalently closed circular RNAs. They are the smallest known replicating circular RNA pathogens. They replicate via a rolling circle mechanism to produce larger concatemers which are then processed into monomeric forms by a self-splicing hammerhead ribozyme, or by cellular enzymes.

The rice yellow mottle virus (RYMV) contains a virusoid which is a covalently closed circular RNA of a mere 220 nucleotides. A 16 kiloDalton basic protein is made from it. How can this be? Figure the average molecular mass of an amino acid at 100 Daltons, and 3 codons per amino acid. This means that 220 can code for 73 amino acids at most (e.g. for a 7 – 8 kiloDalton protein).

So far the RYMV virusoid is the only RNA of viroids and virusoids which actually codes for a protein. The virusoid sequence contains an internal ribosome entry site (IRES) of the following form UGAUGA. Intiation starts at the AUG, and since 220 isn’t an integral multiple of 3 (the size of amino acid codons), it continues replicating in another reading frame until it gets to one of the UGAs (termination codons) in UGAUGA or UGAUGA. Termination codons can be ignored (leaky codons) to obtain larger read through proteins. So this virusoid is a circular RNA with no NONcoding sequences which codes for a protein in either 2 or 3 of the 3 possible reading frames. Notice that UGAUGA contains UGA in both of the alternate reading frames ! So it is likely that the same nucleotide is being read 2 or 3 ways. Amazing ! ! !

It isn’t clear what function the virusoid protein performs for the virus when the virus has infected a cell. Perhaps there aren’t any, and the only function of the protein is to help the virusoid continue existence inside the virus.

Talk about information density. The RYMV virusoid is the Bach Fugue of the genome. Bach sometimes inverts the fugue theme, and sometimes plays it backwards (a musical palindrome if you will).

It is unfortunate that more people don’t understand the details of molecular biology so they can appreciate mechanisms of this elegance. Whether you think understanding it is an esthetic experience, is up to you. I do. To me, this resembles the esthetic experience that mathematics offers.

A while back I wrote a post, wondering if the USA was acquiring brains from the MidEast upheavals, the way we did from Europe because of WWII. Here’s the link https://luysii.wordpress.com/2014/09/28/maryam-mirzakhani/.

Clearly Canada has done just that. Here are the authors of the PNAS paper above and their affiliations. Way to go Canada !

Mounir Georges AbouHaidar
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Srividhya Venkataraman
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Ashkan Golshani
bBiology Department, Carleton University, Ottawa, ON, Canada K1S 5B6
Bolin Liu
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Tauqeer Ahmad
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and