Tag Archives: Osimertinib

A synonymous codon that isn’t

Molecular biology is simply too elegant and beautiful to be left to the molecular biologists.  So I’m going to present the intriguing result of a recent paper as I would take notes on it for myself, and then unpack it explaining the various terms contained as I go along.

It you’re really adventurous — start reading a series of 5 posts I wrote starting with https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ and follow the links.

It should explain everything in the paper below.

The paper itself is Nature vol. 602 pp. 335 – 342 ’22 — https://www.nature.com/articles/s41586-022-04451-4.pdf.

The unvarnished result:  Just mutating glutamine to lysine at position 61 of the KRAS oncogene (Q61K)isn’t enough to make KRAS resistant to an anticancer drug that attacks it (Osimertinib).  One of the synonymous codons for glycine at position 60 must be switched to another.

OK:  let’s unpack this starting with synonymous codon.

The DNA making up our genome is a string of elements (nucleotides also known as bases) strung together.  Similarly, our proteins are strings of elements (amino acids).  The order is crucial; just as it is with the 26 letters making up words. Consider the two words united and untied.

Bases come on 4 varieties (A, T, G and C).  Amino acids come in twenty varieties (of which three are glycine (G), Glutamine (Q) and lysine (K) — the one letter abbreviations don’t make much sense but that’s the way it is.

Since order of both bases and amino acids are important, it’s clear that  A T and T A are different. 2 bases  can only code for 16 amino acids.  Go up to 3 bases and you can code for 64 amino acids, which is overkill.   A sequence of 3 bases is called a codon. All 64 codons   code for an amino acid (except for three of them about which much more later).  This means that there must be several codons coding for the same amino acid —  these are the synonymous codons.

The number of codons for a given amino acid ranges from 1 (methionine M) to 6 (Leucine L).  Here are the 4 synonymous codons for glycine — GGA, GGC, GGG and GGT.  Note how similar they are.

Now the human genome has 3,200,000,000 bases strung together divided into 46 pieces (the chromosomes).  If placed end to end (Dorothy Parker fashion) they would be 3 feet 3 inches (1 meter) long.  All this is in a cell so small it is invisible to the naked eye.   If this is too much to get your head around, you might enjoy the following series of 6 posts — start here and follow the links https://luysii.wordpress.com/2010/03/22/the-cell-nucleus-and-its-dna-on-a-human-scale-i/

Any 3 bases linked together code for an amino acid, but there are many different ways to ‘read’ the genome. Among the many proteins our genome codes for are the transcription factors (1,639 of them as of 2018) which bind to stretches of 10 or more bases, to activate certain genes.   That’s 4^10 possibilities (over a million) allowing a unique binding site for the 1,639.  So transcription factors read the genome in groups of 10 or so not 3.

There is yet another way to read the genome, and this has to do with the fact the genes coding for proteins are much longer (have more bases) than the 3 times the number of amino acids they code for.  The classic example is dystrophin, a gene mutated in Duchenne muscular dystrophy.  It’s a monster protein with 3,685 amino acids — so it needs 3,685 *3 = 11,055 bases in a row to code for them at 3 bases/amino acids.  The dystrophin gene, however, stretches for 2,220,223  bases.  So the protein coding parts of the gene (the exons) come in 79 different pieces separated by parts that don’t code for amino acids (the introns).

I’m skipping a lot here, but the introns must be spliced out of a copy of the gene (mRNA).  Again the genome is read by yet another machine (the spliceosome) which removes introns from newly formed copies of the gene (the mRNA).  The spliceosome is a huge molecular machine containing 5 RNAs (called small nuclear RNAs, aka snRNAs), along with 50 or so proteins with a total molecular mass again of around 2,500,000 kiloDaltons (a carbon atom is 12 Daltons).  Most proteins have introns and exons, and most of them exist in multiple forms due to alternative splicing of introns.  The spliceosome reads the mRNA in 6 – 8 base chunks looking for sites (splicing sites) to bind and begin splicing out introns. Yet another way to ‘read’ a sequence of bases.   Exon sequences which promote or repress alternative splicing sites are known (these are called EXE == exonic splicing enhancers, and ESSs = exonic splicing suppressors).

And now, at very long last, we get to the four synonymous codons of glycine which aren’t functionally synonymous at all.  This isn’t trivial: they determine the base sequence a mutated gene must have to produce cancer.

Here’s the unvarnished result once again — Just mutating glutamine to lysine at position 61 of the KRAS oncogene (Q61K) isn’t enough to make KRAS resistant to an anticancer drug that attacks it (Osimertinib).  One of the synonymous codons for glycine at position 60 must be switched to another.

What is KRAS?  A protein which gets its name from a virus causing cancer in rats.  Kirsten RAt Sarcoma virus.  KRAS, when active, relays signals from outside the cell to the nucleus to make the cell proliferate.  The protein exists in active and inactive forms.  Humans have KRAS, and 3 similar proteins.  Mutations causing  members of the protein family to remain in constantly active form are found in 1/3 of all cancers.  In the case of KRAS some activating mutations occur at positions 60 and 61 of the 189 amino acid protein.  That’s all it takes.

The codon for glutamine at position 61 in KRAS is CAA.  To change it to the codon for lysine requires a change of just one base e.g. from CAA (glutamine) to AAA (lysine) and now you have  a KRAS which is always active producing cancer.

Recall that glycine has 4 codons (GGA, GGC, GGG and GGT).  The one found in unmutated KRAS is GGT.  This codon is never found in the KRAS Q61K mutant seen in tumors.  Why?  Because GGTAAA forms a splice site which the splicing machine uses to cut out a different set of introns going to an exon.  This exon contains one of the 3 codons  mentioned above not coding for an amino acid.  They are called termination codons or stop codons, and tell the machinery making mRNA from DNA to quit.   This means that the full mutated  KRAS with its 188 amino acids is never made.  So tumor producing KRAS has GGGAAA or GGAAAA or GGCAAA at positions 60 and 61 and never GGTAAA

So the 3 synonymous glycine codons have very nonsynonymous effects.  Now you know.  Elegant isn’t it?