Category Archives: Aargh ! Big pharma sheds chemists. Why?

Why drug discovery is so hard: Reason #26 — We’re discovering new players all the time

Drug discovery is so very hard because we don’t understand the way cells and organisms work very well. We know some of the actors — DNA, proteins, lipids, enzymes but new ones are being discovered all the time (even among categories known for decades such as microRNAs).

Briefly microRNAs bind to messenger RNAs usually decreasing their stability so less protein is made from them (translated) by the ribosome. It’s more complicated than that (see later), but that’s not bad for a first pass.

Presently some 2,800 human microRNAs have been annotated. Many of them are promiscuous binding more than one type of mRNA. However the following paper more than doubled their number, finding some 3,707 new ones [ Proc. Natl. Acad. Sci. vol. 112 pp. E1106 – E1115 ’15 ]. How did they do it?

Simplicity itself. They just looked at samples of ‘short’ RNA sequences from 13 different tissue types. MicroRNAs are all under 30 nucleotides long (although their precursors are not). The reason that so few microRNAs have been found in the past 20 years is that cross-species conservation has been used as a criterion to discover them. The authors abandoned the criterion. How did they know that this stuff just wasn’t transcriptional chaff? Two enzymes (DROSHA, DICER) are involved in microRNA formation from larger precursors, and inhibiting them decreased the abundance of the ‘new’ RNAs, implying that they’d been processed by the enzymes rather than just being runoff from the transcriptional machinery. Further evidence is that of half were found associated with a protein called Argonaute which applies the microRNA to the mRBNA. 92% of the microRNAs were found in 10 or more samples. An incredible 23 billion sequenced reads were performed to find them.

If that isn’t complex enough for you, consider that we now know that microRNAs bind mRNAs everywhere, not just in the 3′ untranslated region (3′ UTR) — introns, exons. MicroRNAs also bind pseudogenes, SINEes, circular RNAs, nonCoding RNAs. So it’s a giant salad bowl of various RNAs binding each other affecting their stability and other functions. This may be echoes of prehistoric life before DNA arrived on the scene.

It’s early times, and the authors estimate that we have some 25,000 microRNAs in our genome — more than the number of protein genes.

As always, the Category “Molecular Biology Survival Guide” found on the left should fill in any gaps you may have.

One rather frightening thought; If, as Dawkins said, we are just large organisms designed to allow DNA to reproduce itself, is all our DNA, proteins, lipids etc, just a large chemical apparatus to allow our RNA to reproduce itself? Perhaps the primitive RNA world from which we are all supposed to have arisen, never left.

Off to China

No posts until March. Off to meet our new Granddaughter. Will be Email and Internet free until then.

To fill up the empty hours until I’m back, drug chemists should study the physical chemistry of protein/protein interaction, since that’s where most cellular work is done (and where new drugs should be useful). The interctions are multiple, transient and nonequivalent (the WordPress processor substituted this for nonCovalent).

An interesting paper made all 160,000 possible variants of 4 amino acids at the interface between two bacterial proteins [ Science vol. 347 pp. 673 – 677 ’15 ]. For bacterial histidine kinases mutating just 3 or 4 interfacial amino acids to match those in another kinase is enough to reprogram their specificity. The key amino acids are Ala284, Val285, Ser288, Thr289. The results were rather surprising.


The butterfly effect in cancer

Fans of Chaos know all about the butterfly, where a tiny change in air current produced by a butterfly’s wings in Brazil leads to a typhoon in Java. Could such a thing happen in cell biology? [ Proc. Natl. Acad. Sci. vol. 112 pp. 1131 – 1136 ’15 ] comes close.

The Cancer Genome Project has spent a ton of money looking at all the mutations of all our protein coding genes which occur in various types of cancers. It was criticized as we already knew that cancer is effectively a hypermutable state, and that it would just prove the obvious. Well it did, but it also showed us just what a formidable problem cancer actually is.

For instance [ Nature vol. 489 pp. 519 – 525 ’12 ] is report from the Cancer Genome Atlas of 178 cases of squamous cell cancer of the lung. There are a mean of 360 exonic mutations, 165 genomic rearrangements, and 323 copy number alterations per tumor. The technical details in the rest of the paragraph can be safely ignored but the point is that there no consistent pattern of mutation was found (except for p53 which is mutated in over 50% of all types of cancer, which we knew long before the Cancer Genome Atlas). Recurrent mutations were found in 11 genes. p53 was mutated in nearly all. Previously unreported loss of function mutations were seen in the class I major histocompatibility (HLA-A). Several pathways were altered relatively consistently (NFE2L2, KEAP1 in 34%, squamous differentiation genes in 44%, PI3K genes in 47% and CDKN2A and RB1 in 72%). EGFR and kRAS mutations are rare in squamous cell cancer of the lung (but quite common in adenocarcinoma). Alterations in FGFR are quite common in squamous cell carcinomas.

This sort of thing (which has been found in all the many types of tumors studied by the Cancer Genome Atlas) lead to a degree of hopelessness in looking for the holy grail of a single ‘driver mutation’ which leads to cancer with its attendant genomic instability.

All is not lost however.

MCF-10A is an immortalized epithelial cell line derived from human breast tissue. It is capable of continuous growth, but is far from normal: (1) an abnormal complement of chromosomes ) (2) threefold amplification of the MYC oncogene, and (3) deletion of a known tumor suppressor . It does lack some mutations found in breast cancer. For instance, the Epidermal Growth Factor Receptor 2 (ERRBB2) is not amplified. The cell line doesen’t express the estrogen and progesterone receptors — making it similar to triple negative breast cancer.

A single amino acid mutation (Arginine for Histidine at amino acid #1047 ) in the catalytic subunit of a very important protein kinase (p110alpha of the PIK3CA gene) was put into the MCF-10A cell line (which they call MCF-1A-H1047R). The mutation was chosen because it is one of the most frequently encountered cancer specific mutations known. Exome sequencing of the entire genome showed that this was the only change — but the control sequences outside the exons weren’t studied, a classic case of the protein centric style of molecular biology.

In the (admittedly not completely normal) cell line, the mutation produced a cellular reorganization that far exceeds the known signaling activities of PI3K. The proetins expressed were stimilar to the protein and RNA signatures of basal breast cancer. The changes far exceeded the known effects of PIK3CA signaling. The phosphoproteins of MCF-1A-H1047R are extremely different. Inhibitors of the kinase induce only a partial reversion to the normal phenotype.

They plan to study the epigenome. This is signifcant as breast cancers are said in the paper to have tons of mutations changing amino acids in proteins (4,000 per tumor). In my opinion they should do whole genome sequencing of MCF-A1-H1047R as well.

The mutant becomes fully transformed whan a second mutation (of KRAS, an oncogene) is put in. This allows them to form tumors in nude mice. Recall that nude mice (another rodent beloved of experimental biologists — see the previous post on the Naked Mole Rat) has a very limited immune system, allowing grafts of human cells to take root and proliferate.

How close the initial cell line is to normal is another matter. Work on a similar cell line the (3T3 fibroblast) has been criticized because that cell is so close to neoplastic. At least the mutant MCF-1A-H1047R cells aren’t truly neoplastic as they won’t produce tumors in nude mice. However, mutating just one more gene (KRAS) turns MCF-1A-H1047R malignant when transplanted.

The paper is also useful for showing how little we really understand about cause and effect in the cell. PI3K has been intensively studied for years because it is one of the major players telling cells to grow in size rather than divide. And yet “the mutation produced a cellular reorganization that far exceeds the known signaling activities of PI3K”

Time for drug chemists to hit the cell biology books

The (undeservedly) obscure Naked Mole Rat should be of interest to drug chemists for two reasons (1) it lives 8 times as long its fellow rodent the lab mouse (2) it never gets cancer (despite being under observation for the past 40 years). So untangling the mechanisms behind this should tell us about aging and cancer, particularly since cancer accounts for over 50% of the mortality in lab rodents. They age healthy. Until the last few years of their long lives, they show minimal morphological and physical changes of aging.

This post will concern a possible way Naked Mole Rats escape cancer. I’ve attempted to provide a molecular biologiocal background for chemists about DNA, RNA, gene transcription etc. See and follow the links. There is very little in these posts about cell physiology and biology. I suggest having a look at “Molecular Biology of The Cell” and “Cancer” by Robert Weinberg. Get the latest editions, as things are moving rapidly.

The following paper tried to find out why Naked Mole Rats don’t get cancer [ Proc. Natl. Acad. Sci. vol. 112 pp. 1053 – 1058 ’15 ]. In tissue culture, naked mole rat fibroblasts show hypersensitivity to contact inhibition (aka early contact inhibition aka ECI). E.g. they stop dividing or die when they get too close to each other. The signal triggering ECI comes from hyaluronan (which has a very high molecular weight) outside the cell. Removal of high MW hyaluronan abrogates ECI and makes naked mole rat cells susceptible to malignant transformation.

ECI is associated with an increase in expression of p16^INK4a, a tumor suppressor (here is where the cell biology comes in). Cells losing expression no longer show ECI. Deletion and/or silencing of INK4a/b is found in human cancers as well. The genomic locus containing p16^INK4a is small (under 50 kiloBases), but it it codes for 3 different tumor suppressors (p16^INK4a, p15^INK4b and p14^ARF). The 3 proteins coordinate a signaling network depending on the activities of the retinoblastoma protein (RB) and p53 (more cell molecular biology).

In the naked mole rat, the INK4a/b locus codes for an additional product which consists of p15^INK4b exon #1 joined to p16^INK4a exons #2 and #3, due to alternative splicing. They call this pALT^INK4a/b. It is present in cultured cells from naked mole rat tissues, but is absent in human and mouse cells. pALT^INK4a/b expression is induced during early contact inhibition and by a variety of stresses such as ultraviolet light, gamma radiation, loss of substrate attachment and expression of oncogenes. When over expressed in human cells, pALT^INK4a/b has more ability to induce cell cycle arrest than either p16^INK4a or p15^INK4b. So pALT^INK4a/b might explain the increased resistance to tumors.

There’s also a lot of work concerning why they live so long, but that’s for another post.

As if the job shortage for organic/medicinal chemists wasn’t bad enough

Will synthetic organic chemists be replaced by a machine? Today’s (7 August ’14) Nature (vol. 512 pp. 20 – 22) describes RoboChemist. As usual the job destruction is the fruit of the species being destroyed. Nothing new here — “The Capitalists will sell us the rope with which we will hang them.” — Lenin. “I would consider it entirely feasible to build a synthesis machine which could make any one of a billion defined small molecules on demand” says one organic chemist.

The design of the machine is already being studied, but with a rather paltry grant (1.2 million dollars). Even worse, for the thinking chemist, the choice of reactants and reactions to build the desired molecule will be made by the machine (given a knowledge base, and the algorithms that experienced chemists use, assuming they can be captured by a set of rules). E. J. Corey tried to do this automatically years ago with a program called LHASA (Logic and Heuristics Applied to Synthetic Analysis), but it never took off. Corey formalized what chemists had been doing all along — see

Another attempt along these lines is Chematica, which recently has had some success. A problem with using the chemical literature, is that only the conditions for a successful reaction are published. A synthetic program needs to know what doesn’t work as much as it needs to know what does. This is an important problem in the medical/drug literature where only studies showing a positive effect are published. There’s a great chapter in “How Not to Be Wrong” concerning the “International Journal of Haruspicy” which publishes only statically significant results for predicting the future reading sheep entrails. They publish a lot of stuff because some 400 Haruspicists in different labs are busy performing multiple experiments, 5% of which reach statistical significance. Previously drug companies had to publish only successful clinical trials. Now they’ll be going into a database regardless of outcome.

Automated machinery for making polynucleotides and poly peptides already exists, but here the reactions are limited. Still, the problem of getting the same reaction to work over and over with different molecules of the same class (amino acids, nucleotides) has been solved.

The last sentence is the most chilling “And with a large workforce of graduate students to draw on, academic labs often have little incentive to automate.” Academics — the last Feudal system left standing.

However, telephone operators faced the same fate years ago, due to automatic switching machinery. Given the explosion of telephone volume 50 years ago, there came a point where every woman in the USA would have worked for the phone company to handle the volume.

A similar moment of terror occurred in my field (clinical neurology) years ago with the invention of computerized axial tomography (CAT scans). All our diagnostic and examination skills (based on detecting slight deviations from normal function) would be out the window, when the CAT scan showed what was structurally wrong with the brain. Diagnosis was possible because abnormalities in structure invariably occurred earlier than abnormalities in function. Didn’t happen. We’d get calls – we found this thing on the CAT scan. What does it mean?

Even this wonderful machine which can make any molecule you wish, will not tell you what cellular entity to attack, what the target does, and how attacking it will produce a therapeutically useful result.

Getting cytoplasm out of a single cell without killing it

It’s easy to see what cells are doing metabolically. Just take a million or so, grind them up and measure what you want. If this sounds crude to you, you’re right. We’ve learned a lot this way, but wouldn’t it be nice to take a single cell and get a sample of its cytoplasm (or it’s nucleus) without killing it. A technique described in the 29 July PNAS (vol. pp. 10966 – 10971 ’14) does just that. It’s hardly physiologic, as cells are grown on a layer of polycarbonate containing magnetically active carbon nanoTubes covered in L-tyrosine polymers. The nanotubes are large enough to capture anything smaller than an organelle (1,000 Angstrom, 100 nanoMeter diameter, 15,000 Angstroms long). Turn on a magnetic underneath the polycarbonate, and they puncture the overlying cell and are filled with cytoplasm. Reverse the magnetic field and they come out, carrying the metabolites with them. Amazingly, there was no significant impact on cell viability or proliferation. Hardly physiologic but far better than what we’ve had.

It’s a long way from drug development, but wouldn’t it be nice to place your drug candidate inside a cell and watch what it’s doing?

Here’s a drug target for schizophrenia and other psychiatric diseases

All agree that any drug getting schizophrenics back to normal would be a blockbuster. The more we study its genetics and biochemistry the harder the task becomes. Here’s one target — neuregulin1, one variant of which is strongly associated with schizophrenia (in Iceland).

Now that we know that neuregulin1 is a potential target, why should discovering a drug to treat schizophrenia be so hard? The gene stretches over 1.2 megaBases and the protein contains some 640 amino acids. Cells make some 30 different isoforms by alternative splicing of the gene. Since the gene is so large one would expect to find a lot of single nucleotide polymorphisms (SNPs) in the gene. Here’s some SNP background.

Our genome has 3.2 gigaBases of DNA. With sequencing being what it is, each position has a standard nucleotide at each position (one of A, T, G, or C). If 5% of the population have any one of the other 3 at this position you have a SNP. By 2004 some 7 MILLION SNPs had been found and mapped to the human genome.

Well it’s 10 years later, and a mere 23,094 SNPs have been found in the neuregulin gene, of which 40 have been associated with schizophrenia. Unfortunately most of them aren’t in regions of the gene which code for amino acids (which is to be expected as 640 * 3 = 1920 nucleotides are all you need for coding out of the 1,200,000 nucleotides making up the gene). These SNPs probably alter the amount of the protein expressed but as of now very little is known (even whether they increase or decrease neuregulin1 protein levels).

An excellent review of Neuregulin1 and schizophrenia is available [ Neuron vol. 83 pp. 27 – 49 ’14 ] You’ll need a fairly substantial background in neuroanatomy, neuroembryology, molecular biology, neurophysiology to understand all of it. Included are some fascinating (but probably incomprehensible to the medicinal chemist) material on the different neurophyiologic abnormalities associated with different SNPs in the gene.

Here are a few of the high points (or depressing points for drug discovery) of the review. Neuregulin1 is a member of a 6 gene family, all fairly similar and most expressed in the brain. All of them have multiple splicing isoforms, so drug selectivity between them will be tricky. Also SNPs associated with increased risk of schizophrenia have been found in family members numbers 2, 3 and 6 as well, so neuregulin1 not be the actual target you want to hit.

It gets worse. The neuregulins bind to a family of receptors (the ERBBs) having 4 members. Tending to confirm the utility of the neuregulins as a drug target is the fact that SNPs in the ERBBs are also associated with schizophrenia. So which isoform of which neuregulin binding to which iso form of which ERBB is the real target? Knowledge isn’t always power.

A large part of the paper is concerned with the function of the neuregulins in embryonic development of the brain, leading the the rather depressing thought that the schizophrenic never had a change, having an abnormal brain to begin with. A drug to reverse such problems seems only a hope.

The neuregulin/EBBB system is only one of many genes which have been linked to schizophrenia. So it looks like a post of a 4 years ago on Schizophrenia is largely correct —

Happy hunting. It’s a horrible disease and well worth the effort. We’re just beginning to find out how complex it really is. Hopefully we’ll luck out, as we did with the phenothiazines, the first useful antipsychotics.

Further (physical) chemical elegance

If the chemical name phosphatidyl serine (PS) draws a blank, read the verbatim copy of a previous post under the *** to find out why it is so important to our existence. It is an ‘eat me’ signal when there is lots of it around, telling professional scavenger cells to engulf the cell showing lots of PS on its surface.

Life, as usual, is more complicated. There are a variety of proteins exposed on cell surfaces which bind to phosphoserine. Not only that, but exposing just a little PS on the surface of a cell can trigger a protective immune response. Immune cells binding to just a little PS on the surface of another cell proliferate rather than eat the cell expressing the PS. This brings us to Proc. Natl. Acad. Sci. vol. 111 pp 5526 – 5531 ’14 that explains how a given PS receptor (called TIM4) acts differently depending how much PS is present.

Some PS receptors such as Annexin V have essentially an all or none response to PS, if they bind at all, they trigger a response in the cell carrying them. Not so for TIM4 which only reacts if there is a lot of PS around, leaving cells which express less PS alone. This allows these cells to function in the protective immune response.

So how does TIM4 do this? See if you can think of a mechanism before reading the rest.

In addition to the PS binding pocket TIM4 has 4 peripheral basic residues in separate places. The basic residues are positively charged at physiologic pH and bind to the negatively charged phosphate group of phosphatidyl serene or to the carboxylate anion of phosphatidyl serine. The paper doesn’t explain how these basic residues don’t bind to the other phospholipids of the cell surface (such as phosphatidyl choline or sphingomyelin). It is conceivable that the basic side chains (arginine, lysine etc.) are so set up that they only bind to carboxylate anions and not phosphate anions (but this is a stretch). That would at least give them specificity for phosphatidyl serene as opposed the other phospholipids present in both leaflets of the cell membrane. In any even TIM4 will be triggered only if these groups also bind PS, leaving cells which show relatively little PS alone. Clever no?

For the cognoscenti, the Hill coefficient of TIM4 is 2 while that of Annexin V is 8 (describing more than explaining the all or none character of Annexin V binding).

Flippase. Eat me signals. Dragging their tails behind them. Have cellular biologists and structural biochemists gone over to the dark side? It’s all quite innocuous as the old nursery rhyme will show

Little Bo Peep has lost her sheep
and doesn’t know where to find them
Leave them alone, and they’ll come home
wagging their tails behind them.

First, some cellular biochemistry. The lipid bilayer encasing all our cells is made of two leaflets, inner and outer. The composition of the two is different (unlike the soap bubble). On the inside we find phosphatidylethanolamine (PE), phosphatidylserine (PS). The outer leaflet contains phosphatidylcholine (PC) and sphingomyelin (SM) and almost no PE or PS. This is clearly a low entropy situation compared to having all 4 randomly dispersed between the 2 leaflets.

What is the possible use of this (notice how teleology invariably creeps into cellular biology)? Chemistry is powerless to explain such things. Much as I love chemistry, such truths must be faced.

It takes energy to maintain this peculiar distribution. The enzyme moving PE and PS back inside the cell is the flippase. It requires energy in the form of ATP to operate. When a cell is dying ATP drops, and entropy takes its course moving PE and PS to the cell surface. Specialized cells (macrophages) exist to scoop up the dying or dead cells, without causing inflammation. They recognize PE and PS by a variety of receptors and munch up cells exposing them on the surface. So PE and PS are eat me signals which appear when there isn’t enough ATP around for flippase to use to haul PE and PS back inside. Clever no?

No for some juicy chemistry (assuming that you consider transport of a molecule across a lipid bilayer actual chemistry — no covalent bonds to the transferred molecule are formed or removed, although they are to the transporter). Well it certainly is physical chemistry isn’t it?

Here are the structures of PE, PS, PC, SM

There are a few things to notice. Like just about every lipid found in our membranes, they are amphipathic — they have a very lipid soluble part (look at the long hydrocarbon changes hanging below them) and a very water soluble part — the head groups containing the phosphate.

This brings us to [ Proc. Natl. Acad. Sci. vol. 111 pp. E1334 – E1343 ’14 ] Which describes ATP8A2 (aka the flippase). Interestingly, the protein, with at least 10 alpha helices spanning the membrane, and 3 cytoplasmic domains closely resembles the classic sodium pump beloved of neurophysioloogists everywhere, which pumps sodium ions out of neurons and pumps potassium ions inside, producing the equally beloved membrane potential of neurons.

Look at those structures again. While there are charges on PE, PS (on the phosphate group), these molecules are far larger than the sodium or the potassium ion (easily by a factor of 10). This has long been recognized and is called the ‘giant substrate problem’.

The paper solved the structure of ATP8A2 and used molecular dynamics stimulations to try to understand how it works. What they found is that transmembrane alpha helices 1, 2, 4 and 6 (out of 10) form a water filled cavity, which dissolves the negatively charged phosphate of the head group. What happens to those long hydrocarbon tails? The are left outside the helices in the lipid core of the membrane. It is the charged head groups that are dragged through by the flippase, with the tails wagging along behind them, just like little Bo Peep.

There’s a lot more great chemistry in the paper, particularly how Isoleucine #364 directs the sequential formation and annihilation of the water filled cavities between alpha helices 1, 2, 4 and 6, and how a particular aspartic acid is phosphorylated (by ATP, explaining why the enzyme no longer works in energetically dying cells) changing conformation of all 10 transmembrane helices, so that only one half of the channel is open at a time (either to the inside or the outside).

Go read and enjoy. It’s sad that people who don’t know organic chemistry are cut off from appreciating such elegance. There is more to esthetics than esthetics.

Why drug discovery is so hard: Reason #25 — What if your drug target is really a pointer to the real target?

Any drug safely producing weight loss would be a big (or small) pharma blockbuster. Those finding it should get on the boat to Sweden. Finding a target to attack is the problem. Here’s one way to look. Take lots of fat people, lots of thin people and see what in their genomes differentiates them (assuming anything does). Actually what was done was to look at type II diabetics (non-insulin dependent) the vast majority overweight and controls. The first study involved the genomes of nearly 5,000 diabetics and controls. How did they interrogate the genomes? At the time of the work it was impossible to completely sequence this many genomes.

It’s time to speak of SNPs (single nucleotide polymorphisms). Our genome has 3.2 gigaBases of DNA. With sequencing being what it is, each position has a standard nucleotide at each position (one of A, T, G, or C). If 5% of the population have one of the other 3 at this position you have a SNP. Already 10 years ago, some 7 MILLION SNPs had been found and mapped to the human genome.

The first study found some SNPs associated with obesity in the diabetics. This tells where to look for the gene. A second study with nearly 9,000 diabetics and controls, replicated the first.

Then the monster study, with 39,000 people [ Science vol. 316 pp. 889 – 894 ’07 ] found FTO (FaT mass and Obesity associated gene) on chromosome #16. The 16% of Caucasian adults with two copies of the variant SNP in FTO were 1.67 times more likely to be obese. An intense flurry of work showed that the gene coded for an oxidase, using iron and 2 oxo-glutaric acid (alphaKG for you old timers). The enzyme removes methyl groups from the amino group at position #6 of adenine and the 3 position of thymine. Before this time, no one really paid much attention to them. Subsequently we’ve found 6 methyl adenine in a mere 7,676 mRNAs. Just what it does when it’s there, and why the cell wants to remove it is currently being worked out.

Clearly FTO is a great target for an obesity drug. Of course they knocked the gene out in the mouse. The animals were normal at birth, but at 6 weeks weighed 30 – 40% less than normal mice. FTO as a drug target looked even better after this.

It was somewhat surprising that the SNP was in an intron in the gene. This meant that even in the obese the protein product of the FTO gene was the same as in the skinny. Presumably this could mean more FTO, less FTO or a different splice variant. If some of this molecular biology is above your pay grade, the background you need is in 5 posts starting with

It was somewhat surprising that FTO levels were the same in people with and without the fat SNP. That left splice variants as a possibility.

The denouement came this week [ Nature vol. 507 pp. 309 – 310, 371 – 375 ’14 ]. The intron containing the SNP in FTO produces obesity by controlling another gene called IRX3 which is a mere 500,000 nucleotides away. The intron of FTO binds to the promoter of IRX3 turning the gene on resulting in more IRX3. Mice lacking a functional copy of IRX3 have a 25 – 30% lower body mass. As any C programmer would say, FTO is the pointer not the data.

I don’t know if big or small pharma was at work finding inhibitors or enhancers of FTO function, but this paper should have brought them to a screeching halt. The FTO/IRX3 story just shows how many pitfalls there are to finding new drugs, and why the search has shown relatively little success recently. We are trying to alter the function of an incredibly complex system, whose workings we only dimly understand.

Why drug discovery is so hard: Reason #24 — Is the 3′ untranslated region of every mRNA a ceRNA?

We all know what proteins do. They act as enzymes, structural elements of cells, membrane proteins where drugs bind etc. etc. The background the pure chemist needs for what follows can all be found in the category “Molecular Biology Survival Guide.

We also know that that the messenger RNA for any given protein contains a lot more information than that needed to code for the amino acids making up the protein. Forget the introns that are spliced out from the initial transcript. When the mature messenger RNA for a given protein leaves the nucleus for the cytoplasm where the ribosome translates it into protein at either end it contains nucleotides which the ribosome effectively ignores. These are called the untranslated regions (UTRs). The UTRs at the 3′ end of human mRNAs range in length between 60 and 4,000 nucleotides (average 800). It costs energy to store the information for the UTR in DNA, more energy to synthesize the nucleotides which make it up, even more to patch them together to form the UTR, more to package it and move it out of the nucleus etc. etc.

Why bother? Because the 3′ UTR of the mRNA contains a lot of information which tells the cell how much protein to make, how long the mRNA should hang around in the cell (among many other things). A Greek philosopher got here first — “Nature does nothing uselessly” – Aristotle

Those familiar with competitive endogenous RNA (ceRNA) can skip what follows up to the ****

Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.

Molecular biology is full of such semantic cherry bombs as nonCoding DNA (which meant DNA which didn’t cord for protein), a subset of Junk DNA. Another is the pseudogene — these are genes that look like they should code for protein, except that they don’t because of lack of an initiation codon or a premature termination codon. Except for these differences, they have the nucleotide sequence to code for a known protein. It is estimated that the human genome contains as many pseudogenes (20,000) as it contains true protein coding genes [ Genome Res. vol. 12 pp. 272 – 280 ’02 ]. We now know that well over half the genome is transcribed into mRNA, including the pseudogenes.

PTEN (you don’t want to know what it stands for) is a 403 amino acid protein which is one of the most commonly mutated proteins in human cancer. Our genome also contains a pseudogene for it (called PTENP). Interestingly deletion of PTENP (not PTEN) is found in some cancers. However PTENP deletion is associated with decreased amounts of the PTEN protein itself, something you don’t want as PTEN is a tumor suppressor. How PTEN accomplishes this appears to be fairly well known, but is irrelevant here.

Why should loss of PTENP decrease PTEN itself? The reason is because the mRNA made from PTENP, even though it has a premature termination codon, and can’t be made into protein, is just as long, so it also contains the 3′UTR of PTEN. This means PTENP is sopping up microRNAs which would otherwise decrease the level of PTEN. Think of PTENP mRNA as a sponge.

Subtle isn’t it? But there’s far more. At least PTENP mRNA closely resembles the PTEN mRNA. However other mRNAs coding for completely different proteins, also have binding sites in their 3′UTR for the microRNA which binds to the 3UTR of PTEN, resulting in its destruction. So transcription of a completely different gene (the example of ZEB2 is given) can control the abundance of another protein. Essentially its mRNA is acting as a sponge, sopping up the killer microRNA.

It gets worse. Most microRNAs have binding sites on the mRNAs of many different proteins, and PTEN itself has a 3′UTR which binds to 10 different microRNAs.

So here is a completely unexpected mechanism of control of protein levels in the cell. The general term for this is competitive endogenous RNA (ceRNA). Two years ago the number of human microRNAs was thought to be around 1,000 (release 2.0 of miRBase in June ’13 gives the number at 2,555 — this is unlikely to be complete). Unlike protein coding genes, it’s far from obvious how to find them by looking at the sequence of our genome, so there may be quite a few more.

So most microRNAs bind the 3′UTR of more than one protein (the average number is unclear at this point), and most proteins have binding sites for microRNAs in their 3′UTR (again the average number is unclear). What a mess. What subtlety. What an opportunity for the regulation of cellular function. Who is going to be smart enough to figure out a drug which will change this in a way that we want. Absence of evidence of a regulatory mechanism is not evidence of its absence. A little humility is in order.


If this wasn’t a scary enough, consider the following cautionary tale — Nature vol. 505 pp. 212 – 217 ’14. HMGA2 is a protein we thought we understood for the most part. It is found in the nucleus, where it binds to DNA. While it doesn’t transcribe DNA into RNA, it does bind to DNA helping to form a protein complex which binds to DNA which effectively helps promote transcription of certain genes.

Well that’s what the protein does. However the mRNA for the protein uses its 3′ untranslated region (3’UTR) to sop up microRNAs of the let-7 family. The mRNA for HMGA2 is highly overexpressed in human cancer (notably the very common adenocarcinoma of the lung). You can mutate the mRNA for HMGA2 so it doesn’t produce the protein, just by putting a stop codon in it near the 5′ end. Throw the altered mRNA into a tissue culture of an lung adenocarcinoma cell line, and the cell become more proliferative and grows independently of being anchored to the tissue culture plate (e.g. anchorage independence, a biologic marker for cancer).

So what? It means that it is possible that every mRNA for every protein we make is acting as a ceRN A. The authors conclude the paper with ” Such dual-function ceRNA and protein activities necessitate a deeper exploration of the coding genome in biological systems.”

I’ll say. We’re just beginning to scratch the surface. The control mechanisms within the cell continue to amaze (me) by their elegance and subtlety. I doubt highly that we know them all. Yet more reasons that drug discovery is hard — we are mucking about with a system whose workings we only dimly understand.


Get every new post delivered to your Inbox.

Join 75 other followers