Tag Archives: Spliceosome

A synonymous codon that isn’t

Molecular biology is simply too elegant and beautiful to be left to the molecular biologists.  So I’m going to present the intriguing result of a recent paper as I would take notes on it for myself, and then unpack it explaining the various terms contained as I go along.

It you’re really adventurous — start reading a series of 5 posts I wrote starting with https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ and follow the links.

It should explain everything in the paper below.

The paper itself is Nature vol. 602 pp. 335 – 342 ’22 — https://www.nature.com/articles/s41586-022-04451-4.pdf.

The unvarnished result:  Just mutating glutamine to lysine at position 61 of the KRAS oncogene (Q61K)isn’t enough to make KRAS resistant to an anticancer drug that attacks it (Osimertinib).  One of the synonymous codons for glycine at position 60 must be switched to another.

OK:  let’s unpack this starting with synonymous codon.

The DNA making up our genome is a string of elements (nucleotides also known as bases) strung together.  Similarly, our proteins are strings of elements (amino acids).  The order is crucial; just as it is with the 26 letters making up words. Consider the two words united and untied.

Bases come on 4 varieties (A, T, G and C).  Amino acids come in twenty varieties (of which three are glycine (G), Glutamine (Q) and lysine (K) — the one letter abbreviations don’t make much sense but that’s the way it is.

Since order of both bases and amino acids are important, it’s clear that  A T and T A are different. 2 bases  can only code for 16 amino acids.  Go up to 3 bases and you can code for 64 amino acids, which is overkill.   A sequence of 3 bases is called a codon. All 64 codons   code for an amino acid (except for three of them about which much more later).  This means that there must be several codons coding for the same amino acid —  these are the synonymous codons.

The number of codons for a given amino acid ranges from 1 (methionine M) to 6 (Leucine L).  Here are the 4 synonymous codons for glycine — GGA, GGC, GGG and GGT.  Note how similar they are.

Now the human genome has 3,200,000,000 bases strung together divided into 46 pieces (the chromosomes).  If placed end to end (Dorothy Parker fashion) they would be 3 feet 3 inches (1 meter) long.  All this is in a cell so small it is invisible to the naked eye.   If this is too much to get your head around, you might enjoy the following series of 6 posts — start here and follow the links https://luysii.wordpress.com/2010/03/22/the-cell-nucleus-and-its-dna-on-a-human-scale-i/

Any 3 bases linked together code for an amino acid, but there are many different ways to ‘read’ the genome. Among the many proteins our genome codes for are the transcription factors (1,639 of them as of 2018) which bind to stretches of 10 or more bases, to activate certain genes.   That’s 4^10 possibilities (over a million) allowing a unique binding site for the 1,639.  So transcription factors read the genome in groups of 10 or so not 3.

There is yet another way to read the genome, and this has to do with the fact the genes coding for proteins are much longer (have more bases) than the 3 times the number of amino acids they code for.  The classic example is dystrophin, a gene mutated in Duchenne muscular dystrophy.  It’s a monster protein with 3,685 amino acids — so it needs 3,685 *3 = 11,055 bases in a row to code for them at 3 bases/amino acids.  The dystrophin gene, however, stretches for 2,220,223  bases.  So the protein coding parts of the gene (the exons) come in 79 different pieces separated by parts that don’t code for amino acids (the introns).

I’m skipping a lot here, but the introns must be spliced out of a copy of the gene (mRNA).  Again the genome is read by yet another machine (the spliceosome) which removes introns from newly formed copies of the gene (the mRNA).  The spliceosome is a huge molecular machine containing 5 RNAs (called small nuclear RNAs, aka snRNAs), along with 50 or so proteins with a total molecular mass again of around 2,500,000 kiloDaltons (a carbon atom is 12 Daltons).  Most proteins have introns and exons, and most of them exist in multiple forms due to alternative splicing of introns.  The spliceosome reads the mRNA in 6 – 8 base chunks looking for sites (splicing sites) to bind and begin splicing out introns. Yet another way to ‘read’ a sequence of bases.   Exon sequences which promote or repress alternative splicing sites are known (these are called EXE == exonic splicing enhancers, and ESSs = exonic splicing suppressors).

And now, at very long last, we get to the four synonymous codons of glycine which aren’t functionally synonymous at all.  This isn’t trivial: they determine the base sequence a mutated gene must have to produce cancer.

Here’s the unvarnished result once again — Just mutating glutamine to lysine at position 61 of the KRAS oncogene (Q61K) isn’t enough to make KRAS resistant to an anticancer drug that attacks it (Osimertinib).  One of the synonymous codons for glycine at position 60 must be switched to another.

What is KRAS?  A protein which gets its name from a virus causing cancer in rats.  Kirsten RAt Sarcoma virus.  KRAS, when active, relays signals from outside the cell to the nucleus to make the cell proliferate.  The protein exists in active and inactive forms.  Humans have KRAS, and 3 similar proteins.  Mutations causing  members of the protein family to remain in constantly active form are found in 1/3 of all cancers.  In the case of KRAS some activating mutations occur at positions 60 and 61 of the 189 amino acid protein.  That’s all it takes.

The codon for glutamine at position 61 in KRAS is CAA.  To change it to the codon for lysine requires a change of just one base e.g. from CAA (glutamine) to AAA (lysine) and now you have  a KRAS which is always active producing cancer.

Recall that glycine has 4 codons (GGA, GGC, GGG and GGT).  The one found in unmutated KRAS is GGT.  This codon is never found in the KRAS Q61K mutant seen in tumors.  Why?  Because GGTAAA forms a splice site which the splicing machine uses to cut out a different set of introns going to an exon.  This exon contains one of the 3 codons  mentioned above not coding for an amino acid.  They are called termination codons or stop codons, and tell the machinery making mRNA from DNA to quit.   This means that the full mutated  KRAS with its 188 amino acids is never made.  So tumor producing KRAS has GGGAAA or GGAAAA or GGCAAA at positions 60 and 61 and never GGTAAA

So the 3 synonymous glycine codons have very nonsynonymous effects.  Now you know.  Elegant isn’t it?



A moonlighting quorum sensing molecule

Bacteria talk to each other using quorum sensing molecules. Although the first one was found 50 years ago, the field really opened up with the work of Bonnie Bassler at Princeton in the 90s. These are small molecules which bacteria secrete, so that when there are a lot of bacteria around, the concentration of quorum sensors rises, allowing them to get into bacteria (by the law of mass action) changing gene expression for a variety of things, particularly virulence and biofilm formation. They have also been used by bacteria to compete with those of a different species.  There was a lot of hope, that we could control some nasty bugs (such as Pseudomonas) by messing about with their quorum sensors, but it hasn’t panned out. 

The real surprise came in a paper [ Proc. Natl. Acad. Sci. vol. 118 e2012529118  ’21 ]showing that Pseudomonas uses one of its quorum sensing molecules (C12 < N-3-oxo-dodecanoyl) homoserine lactone > ) inside the eukaryotic cells it attacks. 

What it does once inside, is to attack a cellular organelle I’ve (and probably you) never heard of called vaults.  They’s been known since ’86, and given their size (12.9 megaDaltons) I’m surprised I’d never heard of them.  The likely reason is that no one knows what their function is. 

It is made of just 3 proteins

l. MVP — Major Vault Protein, mass 100 kiloDaltons 96 copies/vault

2. VPARP — Vault poly ADP ribose polymerase 190 kiloDaltons

3. Telomerase associated protein 290 kiloDaltons.

Human vaults contain 4 different RNAs (called, naturally enough vault RNAs < vtRNAs >).  They are 88 – 100 nucleotides long.  

Vaults look like a hand grenades and are 670 Angstroms long and 400 Angstroms in maximum diameter. 

[ Cell vol. 176 pp. 1054 – 1067 ’19 ] says that there can be 10,000 to 100,000 vaults/cell.  So why haven’t I seen them?

One of the vtRNAs binds to a protein involved in autophagy inhibiting it. This is an example of an RNA binding to a protein altering its function, something unusual until you think of the ribosome or the spliceosome. Starvation decreases the number of vaults inducing autophagy.

Once pseudomonas C12 gets into a cell it binds to the Major Vault Protein, causing its translocation into lipid rafts, the net effect being attenuation of the p38 protein kinase pathway to attenuate programmed cell death (apoptosis).  

So C12 keeps the cell alive when normally it would die.  A lot of recent work has shown that bacteria infiltrate cancers.  Do they do something similar to cancer cells to keep them alive. 

It really makes you humble (or should) to realize how many separate parts of cellular and molecular biology you must understand to even hope to understand how cells (and bacteria) go about their business. 

Think of how many terms were introduced to understand what the humble quorum sensor C12 is up to. 

Microexons, great new drugable targets

Some very serious new players in cellular and tissue molecular biology have just been found. They are very juicy drugable targets, not that targeting them will be easy. If you don’t know what introns, exons and alternate splicing are, it’s time to learn. Go to https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ read and follow the links forward. It should be all you need to comprehend the following.

The work came out at the tail end of 2014 [ Cell vol. 159 pp. 1488 – 1489, 1511 -1523 ’14 ]. Microexons are defined as exons containing 50 nucleotides or less (the paper says 3 – 27 nucleotides). They have been overlooked, partially because their short length makes them computationally difficult to find. Also few bothered to look for them as they were thought to be unfavorable for splicing because they were too short to contain exonic splicing enhancers. They are so short that it was thought that the splicing machinery (which is huge) couldn’t physically assemble at both the 3′ and 5′ splice sites. So much for theory, they’re out there.

What is a cell and tissue differentially regulated alternative splicing event? It’s the way a given mRNA can be spliced together one way in tissue/cell #1 and another in tissue/cell #2 producing different proteins in each. Exons subject to tissue specific alternative splicing are significantly UNDERrepresented in well folded domains in proteins. Instead they are found in regions of protein disorder more frequently than one would expect by chance. Typically these regions are on the protein surface. The paper found that the microexons code for short amino acid motifs which typically interact with other proteins and ligands. 3 – 27 nucleotides lets you only code for 1 – 9 amino acids.

One well known example of a short interaction motif is RGD (for Arginine Glycine Aspartic acid in the single letter amino acid code). The sequence is found in a family of surface proteins (the integrins) with at least 26 known members. These 3 amino acids are all that is needed for the interns to bind to a variety of extracellular molecules — collagen, fibrin, glycosaminoglycans, proteoglycans. So a 3 amino acid sequence on the surface of a protein can do quite a bit.

Among a set of analyzed neural specific exons (e. g. they were only spliced that way in the brain) found in known disordered regions of the parent protein, 1/3 promoted or disrupted interactions with partner proteins. So regulated exon splicing might specify tissue and cell type specific protein interaction networks (Translation: they might explain why tissues look different even when they express the same genes). The authors regard microExon inclusion/exclusion as protein surface microsurgery.

The paper has found HUNDREDS of evolutionarily highly conserved microexons from RNA-Seq data sets (http://en.wikipedia.org/wiki/RNA-Seq) in various species. Many of them impact neurogenesis and brain function. Regulation of microExons changes significantly during neuronal differentiation. Although microexons represent only 1% of the alternate splice sites seen, they constitute ‘up to’ 1/3 of all evolutionarily conserved neural-regulated alternative splicing between man and mouse.

The inclusion in the final transcript of most identified neural microExons is regulated by a brain specific factor nSR100 (neural specific SR related protein of 100 kiloDaltons)/SRRM4 which binds to intronic enhancer UGC motifs close to the 3′ splice sites, resulting in their inclusion. They are ‘enhanced’ by tissue specific RBFox proteins. nSR100 is reduced in Autism Spectrum DIsorder (really? all? some?). nSR100 is strongly coexpressed in the developing human brain in a gene network module M2 which is enriched for rare de novo ASD assciated mutations.

MicroExons are enriched for lengths which are multiples of 3 nucleotides (implying strong selection pressure to preserve reading frames). The microExons are also enriched in charged amino acids. Most microExons show high inclusion at late stages of neuronal differentiation in genes associated with axon and synapse function. A neural specific microExon in Protrudin/Zfyve27 increases its interaction with Vesicle Associated membrane protein associated Protein VAP) and to promote neurite outgrowth. A 6 nucleotide neural microExon in Apbb1/Fe65 promotes an interaction with Kat5/Tip60. Apbb1 is an adaptor protein functioning in neurite outgrowth.

So inclusion/exclusion of microExons can alter the interactions of proteins involved in neurogenesis. Misregulation of neural specific microexons has been found in autism spectrum disorder (what hasn’t? Pardon the cynicism).

Protein interaction domains haven’t been studied to nearly the extent they need to be, and we know far less about them than we should. All the large molecular machines of the cell (ribosome, mediator, spliceosome, mitochondrial respiratory chain) involve large numbers of proteins interacting with each other not by the covalent bonds beloved by organic chemists, but by much weaker forces (van der Waals,charge attraction, hydrophobic entropic forces etc. etc.).

Designing drugs to interfere (or promote) such interactions will be tricky, yet they should have profound effects on cellular and organismal physiology. Off target effects are almost certain to occur (particularly since we know so little about the partners of a given motif). Showing how potentially useful such a drug can be, a small molecule inhibitor of the interaction of the AIDs virus capsid protein with two cellular proteins (CPSF6, TNPO3) it must interact with to get into the nucleus has been developed. (Unfortunately I’ve lost the reference)

My cousin married a high school dropout a few years ago. Not to worry — he dropped out of high school to go to college, and has a PhD in Electrical Engineering from Berkeley and has worked at Bell labs. He was very interested in combining his math and modeling skills with my knowledge of neurology to make some models of CNS function. I demurred, as I thought we knew too little about the brain to come up with models (which I generally distrust anyway). The basic problem was that I felt we didn’t know all the players in the brain and how they fit together.

MicroExons show this in spades.

I sincerely hope it works, but I’m very doubtful

A fascinating series of papers offers hope (in the form of a small molecule) for the truly horrible Werdnig Hoffman disease which basically kills infants by destroying neurons in their spinal cord. For why this is especially poignant for me, see the end of the post.

First some background:

Our genes occur in pieces. Dystrophin is the protein mutated in the commonest form of muscular dystrophy. The gene for it is 2,220,233 nucleotides long but the dystrophin contains ‘only’ 3685 amino acids, not the 770,000+ amino acids the gene could specify. What happens? The whole gene is transcribed into an RNA of this enormous length, then 78 distinct segments of RNA (called introns) are removed by a gigantic multimegadalton machine called the spliceosome, and the 79 segments actually coding for amino acids (these are the exons) are linked together and the RNA sent on its way.

All this was unknown in the 70s and early 80s when I was running a muscular dystrophy clininc and taking care of these kids. Looking back, it’s miraculous that more of us don’t have muscular dystrophy; there is so much that can go wrong with a gene this size, let along transcribing and correctly splicing it to produce a functional protein.

One final complication — alternate splicing. The spliceosome removes introns and splices the exons together. But sometimes exons are skipped or one of several exons is used at a particular point in a protein. So one gene can make more than one protein. The record holder is something called the Dscam gene in the fruitfly which can make over 38,000 different proteins by alternate splicing.

There is nothing worse than watching an infant waste away and die. That’s what Werdnig Hoffmann disease is like, and I saw one or two cases during my years at the clinic. It is also called infantile spinal muscular atrophy. We all have two genes for the same crucial protein (called unimaginatively SMN). Kids who have the disease have mutations in one of the two genes (called SMN1) Why isn’t the other gene protective? It codes for the same sequence of amino acids (but using different synonymous codons). What goes wrong?

[ Proc. Natl. Acad. Sci. vol. 97 pp. 9618 – 9623 ’00 ] Why is SMN2 (the centromeric copy (e.g. the copy closest to the middle of the chromosome) which is normal in most patients) not protective? It has a single translationally silent nucleotide difference from SMN1 in exon 7 (e.g. the difference doesn’t change amino acid coded for). This disrupts an exonic splicing enhancer and causes exon 7 skipping leading to abundant production of a shorter isoform (SMN2delta7). Thus even though both genes code for the same protein, only SMN1 actually makes the full protein.

Intellectually fascinating but ghastly to watch.

This brings us to the current papers [ Science vol. 345 pp. 624 – 625, 688 – 693 ’14 ].

More background. The molecular machine which removes the introns is called the spliceosome. It’s huge, containing 5 RNAs (called small nuclear RNAs, aka snRNAs), along with 50 or so proteins with a total molecular mass again of around 2,500,000 kiloDaltons. Think about it chemists. Design 50 proteins and 5 RNAs with probably 200,000+ atoms so they all come together forming a machine to operate on other monster molecules — such as the mRNA for Dystrophin alluded to earlier. Hard for me to believe this arose by chance, but current opinion has it that way.

Splicing out introns is a tricky process which is still being worked on. Mistakes are easy to make, and different tissues will splice the same pre-mRNA in different ways. All this happens in the nucleus before the mRNA is shipped outside where the ribosome can get at it.

The papers describe a small molecule which acts on the spliceosome to increase the inclusion of SMN2 exon 7. It does appear to work in patient cells and mouse models of the disease, even reversing weakness.

Why am I skeptical? Because just about every protein we make is spliced (except histones), and any molecule altering the splicing machinery seems almost certain to produce effects on many genes, not just SMN2. If it really works, these guys should get a Nobel.

Why does the paper grip me so. I watched the beautiful infant daughter of a cop and a nurse die of it 30 – 40 years ago. Even with all the degrees, all the training I was no better for the baby than my immigrant grandmother dispensing emotional chicken soup from her dry goods store (she only had a 4th grade education). Fortunately, the couple took the 25% risk of another child with WH and produced a healthy infant a few years later.

A second reason — a beautiful baby grandaughter came into our world 24 hours ago.

Poets and religious types may intuit how miraculous our existence is, but the study of molecular biology proves it (to me at least).