Category Archives: Aargh ! Big pharma sheds chemists. Why?

A possible new player

Drug development is very hard because we don’t know all the players inside the cell. A recent paper describes an entirely new class of player — circular DNA derived from an ancient virus.  The authoress is Laura Manuelidis, who would have been a med school classmate had I chosen to go to Yale med instead of Penn.   She is the last scientist standing who doesn’t believe Prusiner’s prion hypothesis.  She didn’t marry the boss’s daughter being female, so she married the boss instead;  Elias Manuelidis a Yale neuropathologist who would be 99 today had he not passed away at 72 in 1992.

The circular DNAs go by the name of SPHINX  an acronym  for  Slow Progressive Hidden INfections of X origin.  They have no sequences in common with bacterial or eukaryotic DNA, but there some homology to a virus infecting Acinebacter, a wound pathogen common in soil and water.

How did she find them?  By doggedly pursuing the idea the neurodegenerative diseases such as Cruetzfeldt Jakob Disease (CJD) and scrapie were due to an infectious agent triggering aggregation of the prion protein.

As she says:  “The cytoplasm of CJD and scrapie-infected cells, but not control cells, also contains virus-like particle arrays and because we were able to isolate these nuclease-protected particles with quantitative recovery of infectivity, but with little or no detectable PrP (Prion Protein), we began to analyze protected nucleic acids. Using Φ29 rolling circle amplification, several circular DNA sequences of <5 kb (kilobases) with ORFs (Open Reading Frames) were thereby discovered in brain and cultured neuronal cell lines. These circular DNA sequences were named SPHINX elements for their initial association with slow progressive hidden infections of X origin."

SPHINX itself codes for a 324 amino acid protein, which is found in human brain, concentrated in synaptic boutons.  Strangely, even though the DNAs are presumably viral derived, they contain intervening sequences which don't code for protein.

The use of rolling circle amplification is quite clever, as it will copy only circular DNA.

Stanley Prusiner is sure to weigh in.  Remarkably, Prusiner was at Penn Med when I was and was even in my med school fraternity (Nu Sigma Nu)  primarily a place to eat lunch and dinner.  I probably ate with him, but have no recollection of him whatsoever.

Circular DNAs outside chromosomes are called plasmids. Bacteria are full of them. The best known eukaryote containing plasmids is yeast. Perhaps we have them as well. Manuelidis may be the first person to look.

Advertisements

What is docosahexenoic acid and why should you care?

Why should drug chemists care about docosahexenoic acid — it’s a fairly trivial organic structure as these things go – a 22 carbon straight chain carboxylic acid with 6 double bonds — https://en.wikipedia.org/wiki/Docosahexaenoic_acid. However the structure is decidedly non-random (see later)

Docosahexenoic acid turns out to be crucial for the function of the blood brain barrier (BBB), something that makes it very difficult to get drugs into the brain. Years of work have shown that the only drugs able to get through the BBB are small lipid soluble molecules of mass under 400 kiloDaltons with fewer than 9 hydrogen bonds. Certainly not a large group of drugs. The more we know about the BBB, the more likely we’ll be able to figure out something to circumvent it.

The BBB was known to exist more than 100 years ago. Ehrlich found that dyes injected into the circulation were rapidly taken up by all organs except the brain. His student E. Goldmann found that dye injected into the CSF stained the brain but not other organs.

The barrier has at least two components — (1) a very tight seal between the cells lining brain blood vessels (e.g. the endothelium) — see the end of the post — (2)very low transfer across the endothelial cell from the vessel lumen. The latter is called transcytosis and involves formation of small vesicles at the lumenal surface of the endothelium, migration across the endothelial cell with release of vesicle content on the other side.

In general there are two mechanisms of transcytosis — clathrin coated pits, and caveolae. Brain endothelium shows very low rates of transcytosis. There aren’t any coated pits (no explanation I can find) and the rate of caveolar transcytosis is very low.

Dococsahexaenoic acid is the reason for the low rate of caveolar transcytosis. Here is why.

[ Nature vol. 509 pp. 432 – 433, 503 – 506, 507 – 511 ’14 Neuron vol. 82 pp. 728 – 730 ’14 ] An orphan transporter, MFSD2a (Major Facilitator Superfamily Domain containing 2a) is selectively expressed in the BBB endothelium. It is REQUIRED for formation and maintenance of BBB integrity. Animals lacking MFSD2a show uninhibited bulk transcytosis across the endothelium. The animals show no obvious defects in the junctions between the endothelial cells. Pericytes (cells in the brain layer after the endothelium) are important in keeping the levels of MFSD2a at normal levels as animals lacking them show the same defects in the BBB as those lacking MFSD2a. Even though knockouts don’t have much of a BBB, they have normal patterning of vascular networks.

MFSD2a is the major transporter of docohexaenoic acid (DHA), an omega3 fatty acid (more later). DHA isn’t made in the brain and must be transported into it. Knockouts have reduced levels of DHA in the brain accompanied by neuronal loss in the hippocampus and cerebellum and microcephaly. Human cases due to mutation are now known (11/15). Transport of DHA and fatty acids into the brain across the BBB occurs only in the form of esters with lysophosphatidylcholines (LPCs) but not as free fatty acids in a sodium dependent manner. The phospho-zwitterionic headgroup of of LPC is essential for transport. MFSD2a ‘prefers’ long chain fatty acids (oleic, palmitic), failing to transport fatty acids with chain lengths under 14.

So MFSD2a inhibits transcytosis at the same time it promotes fatty acid transport into the brain. Major Facilitator Superfamily (MFS) proteins use the electrochemical potential of the cell to transport substrates. The best known MFSs are the glucose transporters (GLUT1 – 4).

So the blood brain barrier is due in part to the lipid transport activity of MFSD2a which gives BBB endothelium a different lipid composition (with lots of docosahexenoic acid) ) than others, inhibiting caveolar transport. Increased DHA levels are associated with membrane cholesterol depletion, as well as displacement of caveolin1 (the major protein involved in this form of transcytosis) from caveolae.

It is likely that MFSD2A acts as a lipid flippase, transporting phospholipids, including DHA containing species from the outer to the inner plasma membrane leaflet (where caveolin1 binds).

What is so hot about docosahexenoic acid — 22 carbons all in a row, a carboxyl group and 6 double bonds. We’re not talking fused ring systems, alkaloids, bizarre functional groups etc. etc.

Half the answer is that the double bonds are NOT randomly arranged. The 6 occur all in a row (but with methylene groups between them). This tells the chemist that they are not conjugated, hence the chain is probably not straight. Think how unlikely the arrangement is considering the way 6 double bonds and 9 methylenes COULD be arranged in a chain (2^15). Answer 5 ways depending on where the arrangement starts relative to the end of the chain.

The other half is that all the double bonds are cis, making it very unlikely that the 21 carbon chain can straighten out and cross the membrane. Lots of DHA means a very disordered membrane, which may be impossible to caveolin1 (and clathrin) to bind to.

So even though it’s years and years since I left organic chemistry, it permits the enjoying of the biochemical esthetics of the blood brain barrier.

The tight junctions between endothelial cells are primarily responsible for barrier function. These tight junctions are found only in the capillaries and postcapillary venules of the brain. Endothelial cells of the brain have few pinocytotic vesicles and fenestriae. [ Neuron vol. 71 p. 408 ’11 ] The brain vasculature has the thinnest endothelial cells, with the tightest junction and a higher degree of pericyte coverage coverage (‘up to’ 30%). [ Neuron vol. 78 pp. 214 – 232 ’13 ] The tight junctions are made from occludin, claudins and junctional adhesion molecules, and are closer to the lumen than the adherens junctions (which also link endothelial cells to each other) made by the cadherins (E, P and N). (ibid p. 219) TLR2/6 specific stimuli.

Time for drug chemists to go to the Multiplex

30 – 40% of all the drugs currently in clinical use are thought to target G Protein Coupled Receptors (GPCRs). Just how many GPCRs inhabit our genome isn’t clear. The latest estimate is 850 which is 4.2% of the 20,077 annotated protein genes we have. That being the case, it behooves drug chemists to know everything about them and how they work.

A recent paper [ Cell vol. 166 pp. 907 – 919 ’16 ] shows that a lot of the old thinking about GPCRs is wrong. Binding of a ligand to a GCPR results in a conformational change in its 7 transmembrane segments, so that the parts inside the cell bind to a heterotrimer of proteins which bind (and hydrolyze) GTP — this is the G protein. So far so good. The trimer splits up into its 3 constituents, unimaginatively called alpha, beta and gamma, each of which can act as a messenger that a ligand from outside the cell has landed on a GPCR, binding to other proteins causing all sorts of effects (e.g. can act as a second messenger)

All good things must end, and termination of GPCR signaling was thought to involve phosphorylation of the intracellular segment of the GPCR, binding of another protein (betaArrestin), removal from the cell membrane (so it can no longer bind its extracellular ligand) and then either destruction or recycling back to the cell membrane. So the old paradigm was betaArrestin binding equals the end of signaling.

It was thought that betaArrestin and the G protein competed for binding to the same intracellular amino acids of the GPCR. Not so says this paper. For some GPCRs both can bind, and signaling can continue, even though the complex of GPCR, G protein and betaArrestin is now inside the cell in an endosome. The complex is called the Multiplex. The examples given are GPCRs for parathyroid hormone (PTH) and Thyroid Stimulating Hormone (TSH). Blurry pictures are given of the complex. GPCRs have been divided into several classes and GPCRs for TSH and PTH are class B GPCRs — which contain a long phosphorylatable tail in the cytoplasm. The G protein binds to these GPCRs by its core region, while betaArrestin binds to the tail. Signaling continues apace.

Man’s best friend

I usually pay little attention to animal models of neurologic disease. After all, our brain is what separates us from animals (recent human behavior excepted). Neuromuscular disease is different because our peripheral nerves and muscles work the same way as animals. An astounding paper from Harvard and Brazil, gives us an entirely new angle to treat muscular dystrophy, particularly the Duchenne form. I ran a muscular dystrophy clinic for 15 years in the 70s and 80s and haplessly watched young boys deteriorate and die from Duchenne. The major therapeutic advance during that time was — hold your breath — lighter weight braces, allowing the boys to stay out of wheelchairs a bit longer.

Some background for those who don’t know, the molecular defect in Duchenne was found in ’87. Interestingly Kunkel, one of the authors on the original paper [ Cell vol. 51 pp.; 919 – 928 ’87 ] is an author on the present one [ Cell vol. 163 pp. 1204 – 1213 ’15 ]. Duchenne dystrophy affects only males, as the gene for the protein (dystrophin) is found on the X chromosome, so women with a normal X and a mutant X escape. To show how pathetic things were back then, we tried to find out if a sister of a patient was a carrier. How did we do it. By measuring an enzyme released by damaged muscle (CPK) on several occasion. Carriers often showed an elevation.

The mutated protein is called dystrophin. It hooks the contractile apparatus of a muscle cell to the membrane. Failure of this makes muscle cells more fragile when they contract resulting in eventual loss. From a molecular biological point of view the protein is fascinating. The gene is one of largest known, stretching over 2,220,233 positions (nucleotides) on the X chromosome and containing 79 exons. Figuring a transcription rate of 100 nucleotides a second, it takes 6 hours to make the messenger RNA (mRNA) for it. The protein has 3,685 amino acids and figuring a translation rate of 3 – 6 amino acids/second it takes 10 minutes for the ribosome to make it. Given that it takes only 3 nucleotides to code for an amino acid, the protein coding part of the gene takes up only .5% of the gene. Correctly splicing out the introns is a huge task, which we all perform well. This size and complexity of the gene explains why mutations are so common, making it the most common form of hereditary muscular dystrophy (most are).

There are currently all sorts of efforts underway to correct the mutation, particularly in a milder form called Becker dystrophy. Derek has covered them and they constitute a logical direct attack on the pathology.

What is so remarkable about the current Cell paper is that it gives us an entirely new and different way to attack Duchenne (and possible all forms of muscular dystrophy). It involves a colony of dogs in Brazil. They have GRMD (Golden Retriever Muscular Dystrophy) with a mutation in one of the many splice sites in dystrophin (it has 79 exons in man) leading to a premature stop codon and no functional dystrophin in the dogs’ muscles. The animals weaken and become non ambulatory with a shortened lifespan. However, a few of the dogs in the colony seemed pretty normal. So they went to work. The obvious reason was that gene was in some way repaired so the animals had normal amounts of dystrophin. Not so, even though ambulatory, the animals’ muscles had no dystrophin. So the whole genome was sequenced. What they found was that a mutation at an upstream site of a protein called Jagged1 lead to increased transcription of the gene and increased levels of the protein.

Jagged1 is a protein ligand for the Notch system of receptors. The Notch system is important in muscle regeneration. The myoblasts of the animals had more proliferative capacity. The Notch system is far too complicated to go into here — https://en.wikipedia.org/wiki/Notch_signaling_pathway, but expect to see a lot more research money pumped into it.

What I find so fabulous about this paper, is that it gives us an entirely new way of thinking about Duchenne, totally unrelated to the genetic defect, which had been our focus up to now. It also rubs our noses in how little we understand about our molecular biology and cell physiology. If we really understood things, we’d have been focused on Notch years ago. Yet another reason drug discovery is so hard. We are trying to alter a system we only dimly understand.

Why drug discovery is so hard (particularly in the brain): Reason #28: The brain processes its introns very differently

Useful drug discovery for neurologic and psychiatric disease is nearly at a standstill. It isn’t for want of trying by basic researchers and big and small pharma. A recent excellent review [ Neuron vol. 87 pp. 14 – 27 ’15 ] helps explain why. In short, the brain processes its protein coding genes rather differently.

This post assumes you know what introns, exons and alternate splicing are. For pretty much all the needed background see the following.

First: https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/
Second:https://luysii.wordpress.com/2010/07/11/molecular-biology-survival-guide-for-chemists-ii-what-dna-is-transcribed-into/

When splicing first came out I started making a list of proteins which were alternatively spliced. It is now safe to assume that any gene containing introns (95% of all protein coding genes [ Proc. Natl. Acad. Sci. vol. 112 pp. 17985 – 17990 ’08 ]) results in several protein products due to alternative splicing. The products produced vary from tissue to tissue, probably because most tissues express different splicing regulators.

Here are a few. A2BP1 (aka Rbfox1, aka FOX1) is a brain specific RNA splicing factor found only in postmitotic terminally differentiated neurons. It is deleted in 10% of glioblastomas. Another is nSR100 (neural Specific Related protein of 100 kiloDaltons) — see later.

To show how crucial alternative splicing is for the every existence of the brain, consider this. The neuronal splicing regulator PTBP2 is barely expressed in most tissues. It is upregulated in neurons. Both PTBP1 and PTBP2 are repressors of neural alternative splicing (but some genes are actually enhanced). In a given region of the brain either PTPB1 or PTBP2 is expressed (but not both). PTBP1 promotes skiping of a neural specific exon (exon #10) in PTBP2 transcripts. This exposes a premature termination codon in PTBP2 leading to nonsense mediated decay (NMD). PTPB1 is expressed in most nonNeural tissues and neural precursor cells, but is silenced in developing neurons by the microRNA miR-124. The mRNA for PTBP2 contains an alternative exon which triggers nonsense mediated decay (NMD) when skipped. Inclusion of the exon requires positive transacting factors such as nSR100 in neurons. Repression is mediated by PTBP1 in undifferentiation. microRNAs (which ones?) downregulate PTBP1 during neuronal differentiation, relieving the negative regulation of PTBP2. Depletion of PTBP1 in fibroblasts is enough for PTBP2 induction and neuronal transdifferentiation.

It gets more complicated still. PTBP1 inhibits splicing of introns at the 3′ end of some genes involved in presynaptic function. This results in nuclear retention and turnover via components of the nuclear RNA surveillance machinery. As PTBP1 is downregulated during neuronal differentiation, the target introns are spliced out and the mature mRNAs are found.

Now we get to microExons, something unknown until 2014. For more details see — https://luysii.wordpress.com/2015/01/04/microexons-great-new-drugable-targets/.
Briefly, microexons are defined as exons containing 50 nucleotides or less (the paper says 3 – 27 nucleotides). They have been overlooked, partially because their short length makes them computationally difficult to find. Also few bothered to look for them as they were thought to be unfavorable for splicing because they were too short to contain exonic splicing enhancers. They are so short that it was thought that the splicing machinery (which is huge) couldn’t physically assemble at both the 3′ and 5′ splice sites. So much for theory, they’re out there.

The inclusion in the final transcript of most identified neural microExons is regulated by a brain specific factor nSR100 (neural specific SR related protein of 100 kiloDaltons)/SRRM4 which binds to intronic enhancer UGC motifs close to the 3′ splice sites, resulting in their inclusion. They are ‘enhanced’ by tissue specific RBFox proteins. nSR100 is said to be reduced in Autism Spectrum Disorder (really? all? some?). nSR100 is strongly coexpressed in the developing human brain in a gene network module M2 which is enriched for rare de novo ASD assciated mutations.

MicroExons are enriched for lengths which are multiples of 3 nucleotides. Recall that every 3 nucleotides in mRNA codes for an amino acid. This implies strong selection pressure was used to preserve reading frames as 3n+1 and 3n+2 produce a frameshift. The microExons are enriched in charged amino acids. Most microExons show high inclusion at late stages of neuronal differentiation in genes associated with axon formation and synapse function. A neural specific microExon in Protrudin/Zfyve27 increases its interation with Vessicale Associated membrane protein associated Protein VAP) and to promote neurite outgrowth.

[ Proc. Natl. Acad. Sci. vol. 112 pp. 3445 – 3450 ’15 ] Deep mRNA sequencing of mouse cerebral cortex expanded the list of alternative splicing events TENfold and showed that 72% of multiexon genes express multiple splice variants. Among the newly discovered alternatively spliced exon are 1,104 exons involved in nonsense mediated decay (NMD). THey are enriched in RNA binding proteins including splicing factors. Another set of alternatively spliced NMD exons is found in genes coding for chromatin regulators. Conservation of NMD exons is found in lower vertebrates, but those involving chromatin regulators are found later into the mammalian lineage. So the transcriptome in the brain is even more complicated.

A bit more about the actual effects on protein structure of alternate splicing. The sites chosen for this aren’t random. Cell and tissue differentially regulated alternative splicing events are significantly UNDERrepresented in functionally defined folded domains in proteins, they are enriched in regions of protein disorder that typically are surface accessible and embed short linear interaction motifs (with other proteins and ligands). Among a set of analyzed neural specific exons enriched in disordered regions, 1/3 promoted or disrupted interactions with partner proteins. So regulated exon splicing might specify tissue and cell type specific protein interaction networks. They regard their inclusion/exclusion as protein surface microsurgery.

How much can a little microexon do to protein function? Here’s an example of a 6 nucleotide microexon (two amino acids). Insertion of the microExon in the nuclear adaptor protein Apbb1 enhances its interaction with Kat5/Tip60 a histone deacetylase. The microExon adds Arginine and Glutamic acid to a phosphotyrosine binding domain (PTB domain) which binds Kat4. This enhances binding.

Had enough? The complexity is staggering and I haven’t even talked about recursive splicing — that’s for another post, but here’s a reference if you can’t wait — [ Nature vol. 521 pp. 300 – 301, 371 – 375, 376 – 379 ’15 ]. Pity the drug chemist figuring out which alternatively spliced form of a brain protein to attack (particularly if it hasn’t been studied for microExons).

Why drug discovery is so hard: Reason #27 Moonlighting effects.

Well, we all know what heat shock proteins (Hsps) do — they bind to proteins which have lost their shape due to heat (or other stressors), cuddle them hydrolyze ATP and nurse them back to health. But what  if some of them do other things? The phenomenon is called moonlighting.

The case of Hsp70 is instructive. Some background first. The Hsp70 chaperone transiently associates with its substrates in a manner controlled by its ATPase cycle. ATP binding to the amino terminal nucleotide binding domain (NBD) induces a conformational change in the carboxy terminal substrate binding domain (SBD) which results in low affinity for substrate. Hydrolysis of ATP converts the Hsp70 to the ADP state, which binds substrates with higher affinity. Exchange of ADP for ATP releases substrate completing the cycle. The hydrolysis of ATP is stimulated by J-domain containing cochaperones. These are the nucleotide exchange factors.  Back and forth Hsp70 and the damaged protein go through the cycle until the protein is nursed back to normal or, failing this, is destroyed.

The Hsp70 family acts early in protein synthesis by binding to a small stretch of hydrophobic amino acids on a protein’s surface. Aided by a set of smaller Hsp40 proteins (also known as J proteins), a hsp70 monomer binds to its target protein and then hydrolyzes ATP to ADP, undergoing a conformational change that causes the hsp70 to clamp down very tightly on the target. After the hsp40 dissociates (see below), the dissociation of the hsp70 protein is induced by the rapid rebinding of ATP after ADP release. Repeated cycles of hsp protein binding and release help the target protein to refold.

Enter [ Proc. Natl. Acad. Sci. vol. 112 pp. E3327 – E3336 ’15 ] This work shows Hsp70 is methylated on arginine #469 by Coactivator Associated aRginine Methyltransferase 1/Protein aRginine MethylTransferase 4 (CARM1/PRMT4) and demethylated by JuMonJi Domain containing 6 (JMJD6) — hideous acronyms shortening even more hideous names. Methylated Hsp70 then functions in transcription as a ‘regulator’ of Retinoid Acid Receptor beta 2 (RARbeta2) transcriptional acitivty. R468Mmethylated Hsp70 mediates the interaction between Hsp70 and TFIIH (Transcription Factor IIH).

The regulation of gene transcription is an entirely novel and unsuspected function for a heat shock protein. A classic example of moonlighting.

Drug chemists and pharmacologists are always concerned about off-target effects. For an interesting example please see https://luysii.wordpress.com/2011/02/02/medicinal-chemists-do-you-know-where-your-drug-is-and-what-it-is-doing/.  Off-target effects occur when their drug hits something else in the cell producing an unexpected (and usually untoward) effect.

If you are unaware that your target of choice is doing a little something else on the side (e.g. moonlighting) you can get an off target effect even when you hit your desired target. It’s a tough business. How many more moonlighters are out there that we don’t know about?

Hsp70 is a good example. Here are two more — no background provided, so you’re on your own — except to point out that glucocorticoids are a widely used class of drug.

[ Proc. Natl. Acad. Sci. vol. 112 pp. E1540 – 1549 ’15 ] Amazingly, the glucocorticoid receptor (GR)plays a role in mRNA degradation by acting as an RNA binding protein. When loaded onto the 5′ UnTranslated Region (5′ UTR) of a target mRNA, the GR recruits UPF1 through Proline-rich Nuclear Receptor Coregulatory protein 2 (PNRC2) in a ligand (of itself?) dependent manner to cuase rapid mRNA degradation. They call this GMD (Glurocorticoid receptor Mediated Decay). Along with Staufen Mediated mRNA Decay (SMD) and Nonsense Mediated mRNA Decay (NMD), they share UPF1 (Upstream Frameshift 1) and PNRC2.

[ Science vol. 323 pp. 723 – 724, 793 – 797 ’09 ] Stat3 proteins represent the canonical mediators of signals elicited by cytokines binding to type I cytokine receptors. However, GRIM19 (Gene associated with Retinoid Interferon Mortality 19), a mitochondrial protein, interacts with Stat3 and inhibits its transcriptional activity (where?). This work shows that Stat3 associates with GRIM19 containing complexes I and II (components of the electron transport chain) in mouse liver and muscle mitochondria. Levels of Stat3 in mitochondria are 10% of cytosolic levels.

Cells lacking Stat3 show decreased activity of mitochondrial complexes I and II. Effects on complex I and II don’t require Stat3’s DNA binding domain, the dimerization motif, or the tyrosine phosphorylation site controlling Stat3 nuclear localization and transcriptional activity — so this is a ‘moonlighting’ role for State3 having nothing to do with gene transcription. The serine phosporylation site on Stat3 is important. So Stat3 is required to maintain normal mitochondrial function.

How little we know

Well it’s basic biochem 101, but enzymes only allow equilibrium to be reached faster (by lowering activation energy), they never change it. This came as a shock to the authors of [ Proc. Natl. Acad. Sci. vol. 112 pp. 6601 – 6606 ’15 ] when Cytosolic Nonspecific DiPeptidase 2 (CNDP2), a proteolytic enzyme, was found to tack the carboxyl group of lactic acid onto the amino group of a variety of amino acids, essentially running the proteolytic reaction in reverse. Why? Because intracellular levels of lactic acid and amino acids are in the high microMolar to milliMolar range. It’s Le Chatelier’s principle in action.

The compounds are called N-Lactoyl amino acids. No one had ever seen them before. They are part of the ‘metabolome’ — small molecules found in our bodies. God knows what they do. The paper was really shocking to me for another reason, because I had no idea how many members the metabolome has.

How large is the metabolome? Make a guess.

Well METLIN (https://metlin.scripps.edu/index.php has 240,000, and Human Metabolome DataBase http://www.hmdb.ca/metabolites?c=hmdb_id&d=up&page=1676 has 42,000. I doubt that we know what they are all doing. Undoubtedly some of them are binding to proteins producing physiologic effects. Drug chemists may be mimicking some of them unknowingly, producing untoward and unexpected side effects.

What’s even more shocking to me is the following statement from the paper. State of the art untargeted metabolomics studies still report ‘up to’ 40% unidentified, but potentially important metabolitcs which can be detected reproducibly. The unknown metabolites are only rarely characterized because of the extensive work required for de novo structure determination..

So we really don’t know everything that’s out there in our bodies, and even if we did, we don’t know what they are doing. Drug discovery is hard because we only dimly understand the system we are trying to manipulate. Until I read this paper, I had no idea just how dim this is.

Why drug discovery is so hard: Reason #26 — We’re discovering new players all the time

Drug discovery is so very hard because we don’t understand the way cells and organisms work very well. We know some of the actors — DNA, proteins, lipids, enzymes but new ones are being discovered all the time (even among categories known for decades such as microRNAs).

Briefly microRNAs bind to messenger RNAs usually decreasing their stability so less protein is made from them (translated) by the ribosome. It’s more complicated than that (see later), but that’s not bad for a first pass.

Presently some 2,800 human microRNAs have been annotated. Many of them are promiscuous binding more than one type of mRNA. However the following paper more than doubled their number, finding some 3,707 new ones [ Proc. Natl. Acad. Sci. vol. 112 pp. E1106 – E1115 ’15 ]. How did they do it?

Simplicity itself. They just looked at samples of ‘short’ RNA sequences from 13 different tissue types. MicroRNAs are all under 30 nucleotides long (although their precursors are not). The reason that so few microRNAs have been found in the past 20 years is that cross-species conservation has been used as a criterion to discover them. The authors abandoned the criterion. How did they know that this stuff just wasn’t transcriptional chaff? Two enzymes (DROSHA, DICER) are involved in microRNA formation from larger precursors, and inhibiting them decreased the abundance of the ‘new’ RNAs, implying that they’d been processed by the enzymes rather than just being runoff from the transcriptional machinery. Further evidence is that of half were found associated with a protein called Argonaute which applies the microRNA to the mRBNA. 92% of the microRNAs were found in 10 or more samples. An incredible 23 billion sequenced reads were performed to find them.

If that isn’t complex enough for you, consider that we now know that microRNAs bind mRNAs everywhere, not just in the 3′ untranslated region (3′ UTR) — introns, exons. MicroRNAs also bind pseudogenes, SINEes, circular RNAs, nonCoding RNAs. So it’s a giant salad bowl of various RNAs binding each other affecting their stability and other functions. This may be echoes of prehistoric life before DNA arrived on the scene.

It’s early times, and the authors estimate that we have some 25,000 microRNAs in our genome — more than the number of protein genes.

As always, the Category “Molecular Biology Survival Guide” found on the left should fill in any gaps you may have.

One rather frightening thought; If, as Dawkins said, we are just large organisms designed to allow DNA to reproduce itself, is all our DNA, proteins, lipids etc, just a large chemical apparatus to allow our RNA to reproduce itself? Perhaps the primitive RNA world from which we are all supposed to have arisen, never left.

Off to China

No posts until March. Off to meet our new Granddaughter. Will be Email and Internet free until then.

To fill up the empty hours until I’m back, drug chemists should study the physical chemistry of protein/protein interaction, since that’s where most cellular work is done (and where new drugs should be useful). The interctions are multiple, transient and nonequivalent (the WordPress processor substituted this for nonCovalent).

An interesting paper made all 160,000 possible variants of 4 amino acids at the interface between two bacterial proteins [ Science vol. 347 pp. 673 – 677 ’15 ]. For bacterial histidine kinases mutating just 3 or 4 interfacial amino acids to match those in another kinase is enough to reprogram their specificity. The key amino acids are Ala284, Val285, Ser288, Thr289. The results were rather surprising.

Enjoy

The butterfly effect in cancer

Fans of Chaos know all about the butterfly, where a tiny change in air current produced by a butterfly’s wings in Brazil leads to a typhoon in Java. Could such a thing happen in cell biology? [ Proc. Natl. Acad. Sci. vol. 112 pp. 1131 – 1136 ’15 ] comes close.

The Cancer Genome Project has spent a ton of money looking at all the mutations of all our protein coding genes which occur in various types of cancers. It was criticized as we already knew that cancer is effectively a hypermutable state, and that it would just prove the obvious. Well it did, but it also showed us just what a formidable problem cancer actually is.

For instance [ Nature vol. 489 pp. 519 – 525 ’12 ] is report from the Cancer Genome Atlas of 178 cases of squamous cell cancer of the lung. There are a mean of 360 exonic mutations, 165 genomic rearrangements, and 323 copy number alterations per tumor. The technical details in the rest of the paragraph can be safely ignored but the point is that there no consistent pattern of mutation was found (except for p53 which is mutated in over 50% of all types of cancer, which we knew long before the Cancer Genome Atlas). Recurrent mutations were found in 11 genes. p53 was mutated in nearly all. Previously unreported loss of function mutations were seen in the class I major histocompatibility (HLA-A). Several pathways were altered relatively consistently (NFE2L2, KEAP1 in 34%, squamous differentiation genes in 44%, PI3K genes in 47% and CDKN2A and RB1 in 72%). EGFR and kRAS mutations are rare in squamous cell cancer of the lung (but quite common in adenocarcinoma). Alterations in FGFR are quite common in squamous cell carcinomas.

This sort of thing (which has been found in all the many types of tumors studied by the Cancer Genome Atlas) lead to a degree of hopelessness in looking for the holy grail of a single ‘driver mutation’ which leads to cancer with its attendant genomic instability.

All is not lost however.

MCF-10A is an immortalized epithelial cell line derived from human breast tissue. It is capable of continuous growth, but is far from normal: (1) an abnormal complement of chromosomes ) (2) threefold amplification of the MYC oncogene, and (3) deletion of a known tumor suppressor . It does lack some mutations found in breast cancer. For instance, the Epidermal Growth Factor Receptor 2 (ERRBB2) is not amplified. The cell line doesen’t express the estrogen and progesterone receptors — making it similar to triple negative breast cancer.

A single amino acid mutation (Arginine for Histidine at amino acid #1047 ) in the catalytic subunit of a very important protein kinase (p110alpha of the PIK3CA gene) was put into the MCF-10A cell line (which they call MCF-1A-H1047R). The mutation was chosen because it is one of the most frequently encountered cancer specific mutations known. Exome sequencing of the entire genome showed that this was the only change — but the control sequences outside the exons weren’t studied, a classic case of the protein centric style of molecular biology.

In the (admittedly not completely normal) cell line, the mutation produced a cellular reorganization that far exceeds the known signaling activities of PI3K. The proetins expressed were stimilar to the protein and RNA signatures of basal breast cancer. The changes far exceeded the known effects of PIK3CA signaling. The phosphoproteins of MCF-1A-H1047R are extremely different. Inhibitors of the kinase induce only a partial reversion to the normal phenotype.

They plan to study the epigenome. This is signifcant as breast cancers are said in the paper to have tons of mutations changing amino acids in proteins (4,000 per tumor). In my opinion they should do whole genome sequencing of MCF-A1-H1047R as well.

The mutant becomes fully transformed whan a second mutation (of KRAS, an oncogene) is put in. This allows them to form tumors in nude mice. Recall that nude mice (another rodent beloved of experimental biologists — see the previous post on the Naked Mole Rat) has a very limited immune system, allowing grafts of human cells to take root and proliferate.

How close the initial cell line is to normal is another matter. Work on a similar cell line the (3T3 fibroblast) has been criticized because that cell is so close to neoplastic. At least the mutant MCF-1A-H1047R cells aren’t truly neoplastic as they won’t produce tumors in nude mice. However, mutating just one more gene (KRAS) turns MCF-1A-H1047R malignant when transplanted.

The paper is also useful for showing how little we really understand about cause and effect in the cell. PI3K has been intensively studied for years because it is one of the major players telling cells to grow in size rather than divide. And yet “the mutation produced a cellular reorganization that far exceeds the known signaling activities of PI3K”