Category Archives: Aargh ! Big pharma sheds chemists. Why?

Why drug development is hard #31: retroviruses at the synapse

What if I told you that a very important neuronal synaptic protein Arc (Arg3.1) is acting like like a virus, sending copies of itself (and its messenger RNA) across the synapse?  Would a team of shrinks, who’ve never examined me, tell you that I was crazy and unfit to blog?  Well there is very good evidence that exactly this occurs in one situation and probably many more [ Cell vol. 172 pp. 8 – 10, 262 – 274, 275 – 288 ’18] — http://www.cell.com/cell/fulltext/S0092-8674(17)31509-X.

Arc stands for Activity Regulated Cytoskeleton associated protein.  It’s messenger RNA (mRNA) is transcribed from the gene in response to neuronal activity.  More importantly, the mRNA for  Arc is rapidly distributed to active synapses through the cell body and dendrites, where it is translated into protein. It is locally and rapidly stimulated during the induction of long term depression and plays a critical role in removing a class of glutamic acid receptors (AMPA receptors) from the synapse.  To whet the interest of drug developers, Arc regulates the activity dependent cleavage of the Amyloid Precursor Protein (APP) and beta amyloid production by its interaction with presenilin

Several posts could easily be filled with what Arc does, but that’s not what is so amazing about these papers.  Parts of the Arc protein arose from one of the many transcriptionally dead retroviruses found in our genome.  Our species literally wouldn’t exist without other retroviral gifts.  For instance syncytin1 is a protein expressed a high levels in the placenta.  It is produced from the envelope gene of an endogenous retrovirus (HERV-W) which has undergon inactivating mutations in its other major genes (gag and pol).  Mutant mice in which the gene has been knocked out die in utero due to failure of placenta formation.

Part of the arc gene arose from the Gag gene (Group specific antigen gene) of a retrovirus.  Recall most viruses have proteins coating their genetic material when they’re on the move (e. g. a capsid).  In the case of retroviruses, the genetic material is RNA rather than DNA.  Well the gag elements of the Arc protein form a capsid containing the mRNA for Arc (just like a virus).  In some way or other the capsid containing mRNA gets outside the neuron at the nerve muscle junction and gets into muscle.  The evidence is good that this happens, but in a system somewhat removed from us — the fruitfly (Drosophila).  Fruitfly neuromuscular junctions lacking this mechanism are weaker.

Well that’s pretty far from us.  However one of the papers (275 – 288) showed that the Arc protein and its mRNA was found in extracellular vesicles released from mouse neurons cultured from their cerebral cortex.  Could viral-like particles be crossing the synapses in our brains (which are already pretty chockfull of stuff — see https://luysii.wordpress.com/2017/11/15/the-bouillabaisse-of-the-synaptic-cleft/).  It’s very early times (in fact the Cell issue came out 3 days ago) but people are sure to look.  There are at least 100 Gag derived genes in the human genome (Campillos, M., Doerks, T., Shah, P.K., and Bork, P. (2006). Computational characterization of multiple Gag-like human proteins. Trends Genet. 22, 585–589.).

Remarkable.  Remember CRISPR was hiding in plain sight for half a century.  We have a lot to learn.  No wonder drugs have unexpected side effects.

Advertisements

Why drug development is hard #30 — more new interactions we had no idea existed

We’re full of proteins which bind RNA wrangling it into a desired conformation.  The ribosome (whose enzymatic business end is pure RNA) has a mere 80 proteins doing this.  Its mass is 4,300,000 times that of a hydrogen atom.  However the idea that RNA could return the favor was pretty much unheard of until [ Science vol. 358 pp. 993 – 994, 1051 – 1055 ’17 — http://science.sciencemag.org/content/358/6366/1051 ].

As is often the case, viruses and the RNA world continue to instruct us.  In order to survive, some viruses induce cells to express a long (2,200+ nucleotides) nonCoding (for protein that is) RNA called lncRNA-ACOD1.   It binds to a protein enzyme (called GOT2, for Glutamic acid OxaloAcetic Transaminase 2) increasing its catalytic efficiency.  This shifts cellular metabolism around making it more favorable for virus proliferation, as GOT2 is found in mitochondria being used to replenish tricarboxylic cycle intermediates — e.g. making more energy available to the virus.

lncRNA-ACOD1 is induced by a variety of viruses, most importantly influenza virus in man, and vaccinia, herpes simplex 1, vesicular stomatitis virus in mice.  Exactly how viruses induce it isn’t clear, but the transcription factor NFkappaB is involved.

Viruses continue to teach us.  The amino acids of GOT2 (#15 – #68) and the interacting sequence of nucleotides in lncRNA-ACOD1 (#165 – #390) are well conserved across species.  This might be a primordial mechanism from the RNA world (forgotten but not gone) to produce ATP production to compe with metabolic stress.   The RNA/protein binding site is close (4.2 Angstroms) to the substrate binding site.

The fun is just starting as several other lncRNAs are induced by viruses.  You can only imagine what they will tell us.  Another set of drug targets perhaps, or worse, the cause of peculiar side effects from drugs already in use.

Why drug discovery is hard #29 — a very old player doing a very new thing

We all know what RNA does don’t we?  It binds to other RNAs and to DNA.  Sure lots of new forms of RNA have been found: microRNAs, competitive endogenous RNA (ceRNA), long nonCoding (for protein) RNA (lncRNA), piwiRNAs, small interfering RNAs (siRNAs), . .. The list appears endless.  But the basic mechanism of action of RNA in the cell is binding to some other polynucleotide (RNA or DNA) and affecting its function.

Not so fast.  A new paper http://science.sciencemag.org/content/358/6366/1051 describes  lncRNA-ACOD1, a cellular RNA induced by a variety of viruses.  lncRNA-ACOD1 binds to an enzyme enhancing its catalytic efficiency.  Now that’s new.  Certainly RNAs and proteins bind to each other in the ribosome, and in RNAase P, but here the proteins serve to structure the RNA so it can carry out its catalytic function, not the other way around.

The enzyme bound is called GOT2 (Glutamic Oxaloacetic Transaminase 2).  Much interesting cellular biochemistry is discussed in the paper which I’ll skip, except to say that the virus uses the hyped up GOT2 to repurpose the cell’s metabolic machinery for its own evil ends.

lncRNA-ACOD1 has 3 exons and a polyAdenine tail.  There are two transcript variants containing  2,330 and 2,259 nucleotides.  There are only 100 copies/cell.  lncRNA-ACOD1 nucleotides #165 – #390 bind to amino acids #54 – #68 of GOT2.

So what are the other 2000 or so nucleotides of lncRNA-ACOD1 doing?   The phenomenon of RNA binding to protein is quite likely to be more widespread.  Both the GOT2 interacting motif and the interacting sequence of lncRNA-ACOD1 are well conserved across species of hosts and viruses.

Although viruses co-opt lncRNA-ACOD1, it is normally expressed in the heart as is GOT2 with no viral infection at all.  So we have likely stumbled onto an entirely new method of cellular metabolic control, AND a whole new set of players and interactions for drugs to act on (if they aren’t already doing this unknown to us).

This is series member #29 of why drug development is hard, most of which concentrated on the fact that we don’t know all the players.  lncRNA-ACOD1 is different — RNA is a player we’ve known for a very long time  but it appears to be playing a game entirely new to us.

It is also good to see cutting edge research like this coming out of China.  Hopefully it will stand up, but enough questionable stuff has come from them that every Chinese paper is under a cloud.

This is why I love reading the current literature.  You never know what you’re going to find.  It’s like opening presents.

A possible new player

Drug development is very hard because we don’t know all the players inside the cell. A recent paper describes an entirely new class of player — circular DNA derived from an ancient virus.  The authoress is Laura Manuelidis, who would have been a med school classmate had I chosen to go to Yale med instead of Penn.   She is the last scientist standing who doesn’t believe Prusiner’s prion hypothesis.  She didn’t marry the boss’s daughter being female, so she married the boss instead;  Elias Manuelidis a Yale neuropathologist who would be 99 today had he not passed away at 72 in 1992.

The circular DNAs go by the name of SPHINX  an acronym  for  Slow Progressive Hidden INfections of X origin.  They have no sequences in common with bacterial or eukaryotic DNA, but there some homology to a virus infecting Acinebacter, a wound pathogen common in soil and water.

How did she find them?  By doggedly pursuing the idea the neurodegenerative diseases such as Cruetzfeldt Jakob Disease (CJD) and scrapie were due to an infectious agent triggering aggregation of the prion protein.

As she says:  “The cytoplasm of CJD and scrapie-infected cells, but not control cells, also contains virus-like particle arrays and because we were able to isolate these nuclease-protected particles with quantitative recovery of infectivity, but with little or no detectable PrP (Prion Protein), we began to analyze protected nucleic acids. Using Φ29 rolling circle amplification, several circular DNA sequences of <5 kb (kilobases) with ORFs (Open Reading Frames) were thereby discovered in brain and cultured neuronal cell lines. These circular DNA sequences were named SPHINX elements for their initial association with slow progressive hidden infections of X origin."

SPHINX itself codes for a 324 amino acid protein, which is found in human brain, concentrated in synaptic boutons.  Strangely, even though the DNAs are presumably viral derived, they contain intervening sequences which don't code for protein.

The use of rolling circle amplification is quite clever, as it will copy only circular DNA.

Stanley Prusiner is sure to weigh in.  Remarkably, Prusiner was at Penn Med when I was and was even in my med school fraternity (Nu Sigma Nu)  primarily a place to eat lunch and dinner.  I probably ate with him, but have no recollection of him whatsoever.

Circular DNAs outside chromosomes are called plasmids. Bacteria are full of them. The best known eukaryote containing plasmids is yeast. Perhaps we have them as well. Manuelidis may be the first person to look.

What is docosahexenoic acid and why should you care?

Why should drug chemists care about docosahexenoic acid — it’s a fairly trivial organic structure as these things go – a 22 carbon straight chain carboxylic acid with 6 double bonds — https://en.wikipedia.org/wiki/Docosahexaenoic_acid. However the structure is decidedly non-random (see later)

Docosahexenoic acid turns out to be crucial for the function of the blood brain barrier (BBB), something that makes it very difficult to get drugs into the brain. Years of work have shown that the only drugs able to get through the BBB are small lipid soluble molecules of mass under 400 kiloDaltons with fewer than 9 hydrogen bonds. Certainly not a large group of drugs. The more we know about the BBB, the more likely we’ll be able to figure out something to circumvent it.

The BBB was known to exist more than 100 years ago. Ehrlich found that dyes injected into the circulation were rapidly taken up by all organs except the brain. His student E. Goldmann found that dye injected into the CSF stained the brain but not other organs.

The barrier has at least two components — (1) a very tight seal between the cells lining brain blood vessels (e.g. the endothelium) — see the end of the post — (2)very low transfer across the endothelial cell from the vessel lumen. The latter is called transcytosis and involves formation of small vesicles at the lumenal surface of the endothelium, migration across the endothelial cell with release of vesicle content on the other side.

In general there are two mechanisms of transcytosis — clathrin coated pits, and caveolae. Brain endothelium shows very low rates of transcytosis. There aren’t any coated pits (no explanation I can find) and the rate of caveolar transcytosis is very low.

Dococsahexaenoic acid is the reason for the low rate of caveolar transcytosis. Here is why.

[ Nature vol. 509 pp. 432 – 433, 503 – 506, 507 – 511 ’14 Neuron vol. 82 pp. 728 – 730 ’14 ] An orphan transporter, MFSD2a (Major Facilitator Superfamily Domain containing 2a) is selectively expressed in the BBB endothelium. It is REQUIRED for formation and maintenance of BBB integrity. Animals lacking MFSD2a show uninhibited bulk transcytosis across the endothelium. The animals show no obvious defects in the junctions between the endothelial cells. Pericytes (cells in the brain layer after the endothelium) are important in keeping the levels of MFSD2a at normal levels as animals lacking them show the same defects in the BBB as those lacking MFSD2a. Even though knockouts don’t have much of a BBB, they have normal patterning of vascular networks.

MFSD2a is the major transporter of docohexaenoic acid (DHA), an omega3 fatty acid (more later). DHA isn’t made in the brain and must be transported into it. Knockouts have reduced levels of DHA in the brain accompanied by neuronal loss in the hippocampus and cerebellum and microcephaly. Human cases due to mutation are now known (11/15). Transport of DHA and fatty acids into the brain across the BBB occurs only in the form of esters with lysophosphatidylcholines (LPCs) but not as free fatty acids in a sodium dependent manner. The phospho-zwitterionic headgroup of of LPC is essential for transport. MFSD2a ‘prefers’ long chain fatty acids (oleic, palmitic), failing to transport fatty acids with chain lengths under 14.

So MFSD2a inhibits transcytosis at the same time it promotes fatty acid transport into the brain. Major Facilitator Superfamily (MFS) proteins use the electrochemical potential of the cell to transport substrates. The best known MFSs are the glucose transporters (GLUT1 – 4).

So the blood brain barrier is due in part to the lipid transport activity of MFSD2a which gives BBB endothelium a different lipid composition (with lots of docosahexenoic acid) ) than others, inhibiting caveolar transport. Increased DHA levels are associated with membrane cholesterol depletion, as well as displacement of caveolin1 (the major protein involved in this form of transcytosis) from caveolae.

It is likely that MFSD2A acts as a lipid flippase, transporting phospholipids, including DHA containing species from the outer to the inner plasma membrane leaflet (where caveolin1 binds).

What is so hot about docosahexenoic acid — 22 carbons all in a row, a carboxyl group and 6 double bonds. We’re not talking fused ring systems, alkaloids, bizarre functional groups etc. etc.

Half the answer is that the double bonds are NOT randomly arranged. The 6 occur all in a row (but with methylene groups between them). This tells the chemist that they are not conjugated, hence the chain is probably not straight. Think how unlikely the arrangement is considering the way 6 double bonds and 9 methylenes COULD be arranged in a chain (2^15). Answer 5 ways depending on where the arrangement starts relative to the end of the chain.

The other half is that all the double bonds are cis, making it very unlikely that the 21 carbon chain can straighten out and cross the membrane. Lots of DHA means a very disordered membrane, which may be impossible to caveolin1 (and clathrin) to bind to.

So even though it’s years and years since I left organic chemistry, it permits the enjoying of the biochemical esthetics of the blood brain barrier.

The tight junctions between endothelial cells are primarily responsible for barrier function. These tight junctions are found only in the capillaries and postcapillary venules of the brain. Endothelial cells of the brain have few pinocytotic vesicles and fenestriae. [ Neuron vol. 71 p. 408 ’11 ] The brain vasculature has the thinnest endothelial cells, with the tightest junction and a higher degree of pericyte coverage coverage (‘up to’ 30%). [ Neuron vol. 78 pp. 214 – 232 ’13 ] The tight junctions are made from occludin, claudins and junctional adhesion molecules, and are closer to the lumen than the adherens junctions (which also link endothelial cells to each other) made by the cadherins (E, P and N). (ibid p. 219) TLR2/6 specific stimuli.

Time for drug chemists to go to the Multiplex

30 – 40% of all the drugs currently in clinical use are thought to target G Protein Coupled Receptors (GPCRs). Just how many GPCRs inhabit our genome isn’t clear. The latest estimate is 850 which is 4.2% of the 20,077 annotated protein genes we have. That being the case, it behooves drug chemists to know everything about them and how they work.

A recent paper [ Cell vol. 166 pp. 907 – 919 ’16 ] shows that a lot of the old thinking about GPCRs is wrong. Binding of a ligand to a GCPR results in a conformational change in its 7 transmembrane segments, so that the parts inside the cell bind to a heterotrimer of proteins which bind (and hydrolyze) GTP — this is the G protein. So far so good. The trimer splits up into its 3 constituents, unimaginatively called alpha, beta and gamma, each of which can act as a messenger that a ligand from outside the cell has landed on a GPCR, binding to other proteins causing all sorts of effects (e.g. can act as a second messenger)

All good things must end, and termination of GPCR signaling was thought to involve phosphorylation of the intracellular segment of the GPCR, binding of another protein (betaArrestin), removal from the cell membrane (so it can no longer bind its extracellular ligand) and then either destruction or recycling back to the cell membrane. So the old paradigm was betaArrestin binding equals the end of signaling.

It was thought that betaArrestin and the G protein competed for binding to the same intracellular amino acids of the GPCR. Not so says this paper. For some GPCRs both can bind, and signaling can continue, even though the complex of GPCR, G protein and betaArrestin is now inside the cell in an endosome. The complex is called the Multiplex. The examples given are GPCRs for parathyroid hormone (PTH) and Thyroid Stimulating Hormone (TSH). Blurry pictures are given of the complex. GPCRs have been divided into several classes and GPCRs for TSH and PTH are class B GPCRs — which contain a long phosphorylatable tail in the cytoplasm. The G protein binds to these GPCRs by its core region, while betaArrestin binds to the tail. Signaling continues apace.

Man’s best friend

I usually pay little attention to animal models of neurologic disease. After all, our brain is what separates us from animals (recent human behavior excepted). Neuromuscular disease is different because our peripheral nerves and muscles work the same way as animals. An astounding paper from Harvard and Brazil, gives us an entirely new angle to treat muscular dystrophy, particularly the Duchenne form. I ran a muscular dystrophy clinic for 15 years in the 70s and 80s and haplessly watched young boys deteriorate and die from Duchenne. The major therapeutic advance during that time was — hold your breath — lighter weight braces, allowing the boys to stay out of wheelchairs a bit longer.

Some background for those who don’t know, the molecular defect in Duchenne was found in ’87. Interestingly Kunkel, one of the authors on the original paper [ Cell vol. 51 pp.; 919 – 928 ’87 ] is an author on the present one [ Cell vol. 163 pp. 1204 – 1213 ’15 ]. Duchenne dystrophy affects only males, as the gene for the protein (dystrophin) is found on the X chromosome, so women with a normal X and a mutant X escape. To show how pathetic things were back then, we tried to find out if a sister of a patient was a carrier. How did we do it. By measuring an enzyme released by damaged muscle (CPK) on several occasion. Carriers often showed an elevation.

The mutated protein is called dystrophin. It hooks the contractile apparatus of a muscle cell to the membrane. Failure of this makes muscle cells more fragile when they contract resulting in eventual loss. From a molecular biological point of view the protein is fascinating. The gene is one of largest known, stretching over 2,220,233 positions (nucleotides) on the X chromosome and containing 79 exons. Figuring a transcription rate of 100 nucleotides a second, it takes 6 hours to make the messenger RNA (mRNA) for it. The protein has 3,685 amino acids and figuring a translation rate of 3 – 6 amino acids/second it takes 10 minutes for the ribosome to make it. Given that it takes only 3 nucleotides to code for an amino acid, the protein coding part of the gene takes up only .5% of the gene. Correctly splicing out the introns is a huge task, which we all perform well. This size and complexity of the gene explains why mutations are so common, making it the most common form of hereditary muscular dystrophy (most are).

There are currently all sorts of efforts underway to correct the mutation, particularly in a milder form called Becker dystrophy. Derek has covered them and they constitute a logical direct attack on the pathology.

What is so remarkable about the current Cell paper is that it gives us an entirely new and different way to attack Duchenne (and possible all forms of muscular dystrophy). It involves a colony of dogs in Brazil. They have GRMD (Golden Retriever Muscular Dystrophy) with a mutation in one of the many splice sites in dystrophin (it has 79 exons in man) leading to a premature stop codon and no functional dystrophin in the dogs’ muscles. The animals weaken and become non ambulatory with a shortened lifespan. However, a few of the dogs in the colony seemed pretty normal. So they went to work. The obvious reason was that gene was in some way repaired so the animals had normal amounts of dystrophin. Not so, even though ambulatory, the animals’ muscles had no dystrophin. So the whole genome was sequenced. What they found was that a mutation at an upstream site of a protein called Jagged1 lead to increased transcription of the gene and increased levels of the protein.

Jagged1 is a protein ligand for the Notch system of receptors. The Notch system is important in muscle regeneration. The myoblasts of the animals had more proliferative capacity. The Notch system is far too complicated to go into here — https://en.wikipedia.org/wiki/Notch_signaling_pathway, but expect to see a lot more research money pumped into it.

What I find so fabulous about this paper, is that it gives us an entirely new way of thinking about Duchenne, totally unrelated to the genetic defect, which had been our focus up to now. It also rubs our noses in how little we understand about our molecular biology and cell physiology. If we really understood things, we’d have been focused on Notch years ago. Yet another reason drug discovery is so hard. We are trying to alter a system we only dimly understand.

Why drug discovery is so hard (particularly in the brain): Reason #28: The brain processes its introns very differently

Useful drug discovery for neurologic and psychiatric disease is nearly at a standstill. It isn’t for want of trying by basic researchers and big and small pharma. A recent excellent review [ Neuron vol. 87 pp. 14 – 27 ’15 ] helps explain why. In short, the brain processes its protein coding genes rather differently.

This post assumes you know what introns, exons and alternate splicing are. For pretty much all the needed background see the following.

First: https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/
Second:https://luysii.wordpress.com/2010/07/11/molecular-biology-survival-guide-for-chemists-ii-what-dna-is-transcribed-into/

When splicing first came out I started making a list of proteins which were alternatively spliced. It is now safe to assume that any gene containing introns (95% of all protein coding genes [ Proc. Natl. Acad. Sci. vol. 112 pp. 17985 – 17990 ’08 ]) results in several protein products due to alternative splicing. The products produced vary from tissue to tissue, probably because most tissues express different splicing regulators.

Here are a few. A2BP1 (aka Rbfox1, aka FOX1) is a brain specific RNA splicing factor found only in postmitotic terminally differentiated neurons. It is deleted in 10% of glioblastomas. Another is nSR100 (neural Specific Related protein of 100 kiloDaltons) — see later.

To show how crucial alternative splicing is for the every existence of the brain, consider this. The neuronal splicing regulator PTBP2 is barely expressed in most tissues. It is upregulated in neurons. Both PTBP1 and PTBP2 are repressors of neural alternative splicing (but some genes are actually enhanced). In a given region of the brain either PTPB1 or PTBP2 is expressed (but not both). PTBP1 promotes skiping of a neural specific exon (exon #10) in PTBP2 transcripts. This exposes a premature termination codon in PTBP2 leading to nonsense mediated decay (NMD). PTPB1 is expressed in most nonNeural tissues and neural precursor cells, but is silenced in developing neurons by the microRNA miR-124. The mRNA for PTBP2 contains an alternative exon which triggers nonsense mediated decay (NMD) when skipped. Inclusion of the exon requires positive transacting factors such as nSR100 in neurons. Repression is mediated by PTBP1 in undifferentiation. microRNAs (which ones?) downregulate PTBP1 during neuronal differentiation, relieving the negative regulation of PTBP2. Depletion of PTBP1 in fibroblasts is enough for PTBP2 induction and neuronal transdifferentiation.

It gets more complicated still. PTBP1 inhibits splicing of introns at the 3′ end of some genes involved in presynaptic function. This results in nuclear retention and turnover via components of the nuclear RNA surveillance machinery. As PTBP1 is downregulated during neuronal differentiation, the target introns are spliced out and the mature mRNAs are found.

Now we get to microExons, something unknown until 2014. For more details see — https://luysii.wordpress.com/2015/01/04/microexons-great-new-drugable-targets/.
Briefly, microexons are defined as exons containing 50 nucleotides or less (the paper says 3 – 27 nucleotides). They have been overlooked, partially because their short length makes them computationally difficult to find. Also few bothered to look for them as they were thought to be unfavorable for splicing because they were too short to contain exonic splicing enhancers. They are so short that it was thought that the splicing machinery (which is huge) couldn’t physically assemble at both the 3′ and 5′ splice sites. So much for theory, they’re out there.

The inclusion in the final transcript of most identified neural microExons is regulated by a brain specific factor nSR100 (neural specific SR related protein of 100 kiloDaltons)/SRRM4 which binds to intronic enhancer UGC motifs close to the 3′ splice sites, resulting in their inclusion. They are ‘enhanced’ by tissue specific RBFox proteins. nSR100 is said to be reduced in Autism Spectrum Disorder (really? all? some?). nSR100 is strongly coexpressed in the developing human brain in a gene network module M2 which is enriched for rare de novo ASD assciated mutations.

MicroExons are enriched for lengths which are multiples of 3 nucleotides. Recall that every 3 nucleotides in mRNA codes for an amino acid. This implies strong selection pressure was used to preserve reading frames as 3n+1 and 3n+2 produce a frameshift. The microExons are enriched in charged amino acids. Most microExons show high inclusion at late stages of neuronal differentiation in genes associated with axon formation and synapse function. A neural specific microExon in Protrudin/Zfyve27 increases its interation with Vessicale Associated membrane protein associated Protein VAP) and to promote neurite outgrowth.

[ Proc. Natl. Acad. Sci. vol. 112 pp. 3445 – 3450 ’15 ] Deep mRNA sequencing of mouse cerebral cortex expanded the list of alternative splicing events TENfold and showed that 72% of multiexon genes express multiple splice variants. Among the newly discovered alternatively spliced exon are 1,104 exons involved in nonsense mediated decay (NMD). THey are enriched in RNA binding proteins including splicing factors. Another set of alternatively spliced NMD exons is found in genes coding for chromatin regulators. Conservation of NMD exons is found in lower vertebrates, but those involving chromatin regulators are found later into the mammalian lineage. So the transcriptome in the brain is even more complicated.

A bit more about the actual effects on protein structure of alternate splicing. The sites chosen for this aren’t random. Cell and tissue differentially regulated alternative splicing events are significantly UNDERrepresented in functionally defined folded domains in proteins, they are enriched in regions of protein disorder that typically are surface accessible and embed short linear interaction motifs (with other proteins and ligands). Among a set of analyzed neural specific exons enriched in disordered regions, 1/3 promoted or disrupted interactions with partner proteins. So regulated exon splicing might specify tissue and cell type specific protein interaction networks. They regard their inclusion/exclusion as protein surface microsurgery.

How much can a little microexon do to protein function? Here’s an example of a 6 nucleotide microexon (two amino acids). Insertion of the microExon in the nuclear adaptor protein Apbb1 enhances its interaction with Kat5/Tip60 a histone deacetylase. The microExon adds Arginine and Glutamic acid to a phosphotyrosine binding domain (PTB domain) which binds Kat4. This enhances binding.

Had enough? The complexity is staggering and I haven’t even talked about recursive splicing — that’s for another post, but here’s a reference if you can’t wait — [ Nature vol. 521 pp. 300 – 301, 371 – 375, 376 – 379 ’15 ]. Pity the drug chemist figuring out which alternatively spliced form of a brain protein to attack (particularly if it hasn’t been studied for microExons).

Why drug discovery is so hard: Reason #27 Moonlighting effects.

Well, we all know what heat shock proteins (Hsps) do — they bind to proteins which have lost their shape due to heat (or other stressors), cuddle them hydrolyze ATP and nurse them back to health. But what  if some of them do other things? The phenomenon is called moonlighting.

The case of Hsp70 is instructive. Some background first. The Hsp70 chaperone transiently associates with its substrates in a manner controlled by its ATPase cycle. ATP binding to the amino terminal nucleotide binding domain (NBD) induces a conformational change in the carboxy terminal substrate binding domain (SBD) which results in low affinity for substrate. Hydrolysis of ATP converts the Hsp70 to the ADP state, which binds substrates with higher affinity. Exchange of ADP for ATP releases substrate completing the cycle. The hydrolysis of ATP is stimulated by J-domain containing cochaperones. These are the nucleotide exchange factors.  Back and forth Hsp70 and the damaged protein go through the cycle until the protein is nursed back to normal or, failing this, is destroyed.

The Hsp70 family acts early in protein synthesis by binding to a small stretch of hydrophobic amino acids on a protein’s surface. Aided by a set of smaller Hsp40 proteins (also known as J proteins), a hsp70 monomer binds to its target protein and then hydrolyzes ATP to ADP, undergoing a conformational change that causes the hsp70 to clamp down very tightly on the target. After the hsp40 dissociates (see below), the dissociation of the hsp70 protein is induced by the rapid rebinding of ATP after ADP release. Repeated cycles of hsp protein binding and release help the target protein to refold.

Enter [ Proc. Natl. Acad. Sci. vol. 112 pp. E3327 – E3336 ’15 ] This work shows Hsp70 is methylated on arginine #469 by Coactivator Associated aRginine Methyltransferase 1/Protein aRginine MethylTransferase 4 (CARM1/PRMT4) and demethylated by JuMonJi Domain containing 6 (JMJD6) — hideous acronyms shortening even more hideous names. Methylated Hsp70 then functions in transcription as a ‘regulator’ of Retinoid Acid Receptor beta 2 (RARbeta2) transcriptional acitivty. R468Mmethylated Hsp70 mediates the interaction between Hsp70 and TFIIH (Transcription Factor IIH).

The regulation of gene transcription is an entirely novel and unsuspected function for a heat shock protein. A classic example of moonlighting.

Drug chemists and pharmacologists are always concerned about off-target effects. For an interesting example please see https://luysii.wordpress.com/2011/02/02/medicinal-chemists-do-you-know-where-your-drug-is-and-what-it-is-doing/.  Off-target effects occur when their drug hits something else in the cell producing an unexpected (and usually untoward) effect.

If you are unaware that your target of choice is doing a little something else on the side (e.g. moonlighting) you can get an off target effect even when you hit your desired target. It’s a tough business. How many more moonlighters are out there that we don’t know about?

Hsp70 is a good example. Here are two more — no background provided, so you’re on your own — except to point out that glucocorticoids are a widely used class of drug.

[ Proc. Natl. Acad. Sci. vol. 112 pp. E1540 – 1549 ’15 ] Amazingly, the glucocorticoid receptor (GR)plays a role in mRNA degradation by acting as an RNA binding protein. When loaded onto the 5′ UnTranslated Region (5′ UTR) of a target mRNA, the GR recruits UPF1 through Proline-rich Nuclear Receptor Coregulatory protein 2 (PNRC2) in a ligand (of itself?) dependent manner to cuase rapid mRNA degradation. They call this GMD (Glurocorticoid receptor Mediated Decay). Along with Staufen Mediated mRNA Decay (SMD) and Nonsense Mediated mRNA Decay (NMD), they share UPF1 (Upstream Frameshift 1) and PNRC2.

[ Science vol. 323 pp. 723 – 724, 793 – 797 ’09 ] Stat3 proteins represent the canonical mediators of signals elicited by cytokines binding to type I cytokine receptors. However, GRIM19 (Gene associated with Retinoid Interferon Mortality 19), a mitochondrial protein, interacts with Stat3 and inhibits its transcriptional activity (where?). This work shows that Stat3 associates with GRIM19 containing complexes I and II (components of the electron transport chain) in mouse liver and muscle mitochondria. Levels of Stat3 in mitochondria are 10% of cytosolic levels.

Cells lacking Stat3 show decreased activity of mitochondrial complexes I and II. Effects on complex I and II don’t require Stat3’s DNA binding domain, the dimerization motif, or the tyrosine phosphorylation site controlling Stat3 nuclear localization and transcriptional activity — so this is a ‘moonlighting’ role for State3 having nothing to do with gene transcription. The serine phosporylation site on Stat3 is important. So Stat3 is required to maintain normal mitochondrial function.

How little we know

Well it’s basic biochem 101, but enzymes only allow equilibrium to be reached faster (by lowering activation energy), they never change it. This came as a shock to the authors of [ Proc. Natl. Acad. Sci. vol. 112 pp. 6601 – 6606 ’15 ] when Cytosolic Nonspecific DiPeptidase 2 (CNDP2), a proteolytic enzyme, was found to tack the carboxyl group of lactic acid onto the amino group of a variety of amino acids, essentially running the proteolytic reaction in reverse. Why? Because intracellular levels of lactic acid and amino acids are in the high microMolar to milliMolar range. It’s Le Chatelier’s principle in action.

The compounds are called N-Lactoyl amino acids. No one had ever seen them before. They are part of the ‘metabolome’ — small molecules found in our bodies. God knows what they do. The paper was really shocking to me for another reason, because I had no idea how many members the metabolome has.

How large is the metabolome? Make a guess.

Well METLIN (https://metlin.scripps.edu/index.php has 240,000, and Human Metabolome DataBase http://www.hmdb.ca/metabolites?c=hmdb_id&d=up&page=1676 has 42,000. I doubt that we know what they are all doing. Undoubtedly some of them are binding to proteins producing physiologic effects. Drug chemists may be mimicking some of them unknowingly, producing untoward and unexpected side effects.

What’s even more shocking to me is the following statement from the paper. State of the art untargeted metabolomics studies still report ‘up to’ 40% unidentified, but potentially important metabolitcs which can be detected reproducibly. The unknown metabolites are only rarely characterized because of the extensive work required for de novo structure determination..

So we really don’t know everything that’s out there in our bodies, and even if we did, we don’t know what they are doing. Drug discovery is hard because we only dimly understand the system we are trying to manipulate. Until I read this paper, I had no idea just how dim this is.