Category Archives: Aargh ! Big pharma sheds chemists. Why?

As if the job shortage for organic/medicinal chemists wasn’t bad enough

Will synthetic organic chemists be replaced by a machine? Today’s (7 August ’14) Nature (vol. 512 pp. 20 – 22) describes RoboChemist. As usual the job destruction is the fruit of the species being destroyed. Nothing new here — “The Capitalists will sell us the rope with which we will hang them.” — Lenin. “I would consider it entirely feasible to build a synthesis machine which could make any one of a billion defined small molecules on demand” says one organic chemist.

The design of the machine is already being studied, but with a rather paltry grant (1.2 million dollars). Even worse, for the thinking chemist, the choice of reactants and reactions to build the desired molecule will be made by the machine (given a knowledge base, and the algorithms that experienced chemists use, assuming they can be captured by a set of rules). E. J. Corey tried to do this automatically years ago with a program called LHASA (Logic and Heuristics Applied to Synthetic Analysis), but it never took off. Corey formalized what chemists had been doing all along — see

Another attempt along these lines is Chematica, which recently has had some success. A problem with using the chemical literature, is that only the conditions for a successful reaction are published. A synthetic program needs to know what doesn’t work as much as it needs to know what does. This is an important problem in the medical/drug literature where only studies showing a positive effect are published. There’s a great chapter in “How Not to Be Wrong” concerning the “International Journal of Haruspicy” which publishes only statically significant results for predicting the future reading sheep entrails. They publish a lot of stuff because some 400 Haruspicists in different labs are busy performing multiple experiments, 5% of which reach statistical significance. Previously drug companies had to publish only successful clinical trials. Now they’ll be going into a database regardless of outcome.

Automated machinery for making polynucleotides and poly peptides already exists, but here the reactions are limited. Still, the problem of getting the same reaction to work over and over with different molecules of the same class (amino acids, nucleotides) has been solved.

The last sentence is the most chilling “And with a large workforce of graduate students to draw on, academic labs often have little incentive to automate.” Academics — the last Feudal system left standing.

However, telephone operators faced the same fate years ago, due to automatic switching machinery. Given the explosion of telephone volume 50 years ago, there came a point where every woman in the USA would have worked for the phone company to handle the volume.

A similar moment of terror occurred in my field (clinical neurology) years ago with the invention of computerized axial tomography (CAT scans). All our diagnostic and examination skills (based on detecting slight deviations from normal function) would be out the window, when the CAT scan showed what was structurally wrong with the brain. Diagnosis was possible because abnormalities in structure invariably occurred earlier than abnormalities in function. Didn’t happen. We’d get calls – we found this thing on the CAT scan. What does it mean?

Even this wonderful machine which can make any molecule you wish, will not tell you what cellular entity to attack, what the target does, and how attacking it will produce a therapeutically useful result.

Getting cytoplasm out of a single cell without killing it

It’s easy to see what cells are doing metabolically. Just take a million or so, grind them up and measure what you want. If this sounds crude to you, you’re right. We’ve learned a lot this way, but wouldn’t it be nice to take a single cell and get a sample of its cytoplasm (or it’s nucleus) without killing it. A technique described in the 29 July PNAS (vol. pp. 10966 – 10971 ’14) does just that. It’s hardly physiologic, as cells are grown on a layer of polycarbonate containing magnetically active carbon nanoTubes covered in L-tyrosine polymers. The nanotubes are large enough to capture anything smaller than an organelle (1,000 Angstrom, 100 nanoMeter diameter, 15,000 Angstroms long). Turn on a magnetic underneath the polycarbonate, and they puncture the overlying cell and are filled with cytoplasm. Reverse the magnetic field and they come out, carrying the metabolites with them. Amazingly, there was no significant impact on cell viability or proliferation. Hardly physiologic but far better than what we’ve had.

It’s a long way from drug development, but wouldn’t it be nice to place your drug candidate inside a cell and watch what it’s doing?

Here’s a drug target for schizophrenia and other psychiatric diseases

All agree that any drug getting schizophrenics back to normal would be a blockbuster. The more we study its genetics and biochemistry the harder the task becomes. Here’s one target — neuregulin1, one variant of which is strongly associated with schizophrenia (in Iceland).

Now that we know that neuregulin1 is a potential target, why should discovering a drug to treat schizophrenia be so hard? The gene stretches over 1.2 megaBases and the protein contains some 640 amino acids. Cells make some 30 different isoforms by alternative splicing of the gene. Since the gene is so large one would expect to find a lot of single nucleotide polymorphisms (SNPs) in the gene. Here’s some SNP background.

Our genome has 3.2 gigaBases of DNA. With sequencing being what it is, each position has a standard nucleotide at each position (one of A, T, G, or C). If 5% of the population have any one of the other 3 at this position you have a SNP. By 2004 some 7 MILLION SNPs had been found and mapped to the human genome.

Well it’s 10 years later, and a mere 23,094 SNPs have been found in the neuregulin gene, of which 40 have been associated with schizophrenia. Unfortunately most of them aren’t in regions of the gene which code for amino acids (which is to be expected as 640 * 3 = 1920 nucleotides are all you need for coding out of the 1,200,000 nucleotides making up the gene). These SNPs probably alter the amount of the protein expressed but as of now very little is known (even whether they increase or decrease neuregulin1 protein levels).

An excellent review of Neuregulin1 and schizophrenia is available [ Neuron vol. 83 pp. 27 – 49 ’14 ] You’ll need a fairly substantial background in neuroanatomy, neuroembryology, molecular biology, neurophysiology to understand all of it. Included are some fascinating (but probably incomprehensible to the medicinal chemist) material on the different neurophyiologic abnormalities associated with different SNPs in the gene.

Here are a few of the high points (or depressing points for drug discovery) of the review. Neuregulin1 is a member of a 6 gene family, all fairly similar and most expressed in the brain. All of them have multiple splicing isoforms, so drug selectivity between them will be tricky. Also SNPs associated with increased risk of schizophrenia have been found in family members numbers 2, 3 and 6 as well, so neuregulin1 not be the actual target you want to hit.

It gets worse. The neuregulins bind to a family of receptors (the ERBBs) having 4 members. Tending to confirm the utility of the neuregulins as a drug target is the fact that SNPs in the ERBBs are also associated with schizophrenia. So which isoform of which neuregulin binding to which iso form of which ERBB is the real target? Knowledge isn’t always power.

A large part of the paper is concerned with the function of the neuregulins in embryonic development of the brain, leading the the rather depressing thought that the schizophrenic never had a change, having an abnormal brain to begin with. A drug to reverse such problems seems only a hope.

The neuregulin/EBBB system is only one of many genes which have been linked to schizophrenia. So it looks like a post of a 4 years ago on Schizophrenia is largely correct —

Happy hunting. It’s a horrible disease and well worth the effort. We’re just beginning to find out how complex it really is. Hopefully we’ll luck out, as we did with the phenothiazines, the first useful antipsychotics.

Further (physical) chemical elegance

If the chemical name phosphatidyl serine (PS) draws a blank, read the verbatim copy of a previous post under the *** to find out why it is so important to our existence. It is an ‘eat me’ signal when there is lots of it around, telling professional scavenger cells to engulf the cell showing lots of PS on its surface.

Life, as usual, is more complicated. There are a variety of proteins exposed on cell surfaces which bind to phosphoserine. Not only that, but exposing just a little PS on the surface of a cell can trigger a protective immune response. Immune cells binding to just a little PS on the surface of another cell proliferate rather than eat the cell expressing the PS. This brings us to Proc. Natl. Acad. Sci. vol. 111 pp 5526 – 5531 ’14 that explains how a given PS receptor (called TIM4) acts differently depending how much PS is present.

Some PS receptors such as Annexin V have essentially an all or none response to PS, if they bind at all, they trigger a response in the cell carrying them. Not so for TIM4 which only reacts if there is a lot of PS around, leaving cells which express less PS alone. This allows these cells to function in the protective immune response.

So how does TIM4 do this? See if you can think of a mechanism before reading the rest.

In addition to the PS binding pocket TIM4 has 4 peripheral basic residues in separate places. The basic residues are positively charged at physiologic pH and bind to the negatively charged phosphate group of phosphatidyl serene or to the carboxylate anion of phosphatidyl serine. The paper doesn’t explain how these basic residues don’t bind to the other phospholipids of the cell surface (such as phosphatidyl choline or sphingomyelin). It is conceivable that the basic side chains (arginine, lysine etc.) are so set up that they only bind to carboxylate anions and not phosphate anions (but this is a stretch). That would at least give them specificity for phosphatidyl serene as opposed the other phospholipids present in both leaflets of the cell membrane. In any even TIM4 will be triggered only if these groups also bind PS, leaving cells which show relatively little PS alone. Clever no?

For the cognoscenti, the Hill coefficient of TIM4 is 2 while that of Annexin V is 8 (describing more than explaining the all or none character of Annexin V binding).

Flippase. Eat me signals. Dragging their tails behind them. Have cellular biologists and structural biochemists gone over to the dark side? It’s all quite innocuous as the old nursery rhyme will show

Little Bo Peep has lost her sheep
and doesn’t know where to find them
Leave them alone, and they’ll come home
wagging their tails behind them.

First, some cellular biochemistry. The lipid bilayer encasing all our cells is made of two leaflets, inner and outer. The composition of the two is different (unlike the soap bubble). On the inside we find phosphatidylethanolamine (PE), phosphatidylserine (PS). The outer leaflet contains phosphatidylcholine (PC) and sphingomyelin (SM) and almost no PE or PS. This is clearly a low entropy situation compared to having all 4 randomly dispersed between the 2 leaflets.

What is the possible use of this (notice how teleology invariably creeps into cellular biology)? Chemistry is powerless to explain such things. Much as I love chemistry, such truths must be faced.

It takes energy to maintain this peculiar distribution. The enzyme moving PE and PS back inside the cell is the flippase. It requires energy in the form of ATP to operate. When a cell is dying ATP drops, and entropy takes its course moving PE and PS to the cell surface. Specialized cells (macrophages) exist to scoop up the dying or dead cells, without causing inflammation. They recognize PE and PS by a variety of receptors and munch up cells exposing them on the surface. So PE and PS are eat me signals which appear when there isn’t enough ATP around for flippase to use to haul PE and PS back inside. Clever no?

No for some juicy chemistry (assuming that you consider transport of a molecule across a lipid bilayer actual chemistry — no covalent bonds to the transferred molecule are formed or removed, although they are to the transporter). Well it certainly is physical chemistry isn’t it?

Here are the structures of PE, PS, PC, SM

There are a few things to notice. Like just about every lipid found in our membranes, they are amphipathic — they have a very lipid soluble part (look at the long hydrocarbon changes hanging below them) and a very water soluble part — the head groups containing the phosphate.

This brings us to [ Proc. Natl. Acad. Sci. vol. 111 pp. E1334 – E1343 ’14 ] Which describes ATP8A2 (aka the flippase). Interestingly, the protein, with at least 10 alpha helices spanning the membrane, and 3 cytoplasmic domains closely resembles the classic sodium pump beloved of neurophysioloogists everywhere, which pumps sodium ions out of neurons and pumps potassium ions inside, producing the equally beloved membrane potential of neurons.

Look at those structures again. While there are charges on PE, PS (on the phosphate group), these molecules are far larger than the sodium or the potassium ion (easily by a factor of 10). This has long been recognized and is called the ‘giant substrate problem’.

The paper solved the structure of ATP8A2 and used molecular dynamics stimulations to try to understand how it works. What they found is that transmembrane alpha helices 1, 2, 4 and 6 (out of 10) form a water filled cavity, which dissolves the negatively charged phosphate of the head group. What happens to those long hydrocarbon tails? The are left outside the helices in the lipid core of the membrane. It is the charged head groups that are dragged through by the flippase, with the tails wagging along behind them, just like little Bo Peep.

There’s a lot more great chemistry in the paper, particularly how Isoleucine #364 directs the sequential formation and annihilation of the water filled cavities between alpha helices 1, 2, 4 and 6, and how a particular aspartic acid is phosphorylated (by ATP, explaining why the enzyme no longer works in energetically dying cells) changing conformation of all 10 transmembrane helices, so that only one half of the channel is open at a time (either to the inside or the outside).

Go read and enjoy. It’s sad that people who don’t know organic chemistry are cut off from appreciating such elegance. There is more to esthetics than esthetics.

Why drug discovery is so hard: Reason #25 — What if your drug target is really a pointer to the real target?

Any drug safely producing weight loss would be a big (or small) pharma blockbuster. Those finding it should get on the boat to Sweden. Finding a target to attack is the problem. Here’s one way to look. Take lots of fat people, lots of thin people and see what in their genomes differentiates them (assuming anything does). Actually what was done was to look at type II diabetics (non-insulin dependent) the vast majority overweight and controls. The first study involved the genomes of nearly 5,000 diabetics and controls. How did they interrogate the genomes? At the time of the work it was impossible to completely sequence this many genomes.

It’s time to speak of SNPs (single nucleotide polymorphisms). Our genome has 3.2 gigaBases of DNA. With sequencing being what it is, each position has a standard nucleotide at each position (one of A, T, G, or C). If 5% of the population have one of the other 3 at this position you have a SNP. Already 10 years ago, some 7 MILLION SNPs had been found and mapped to the human genome.

The first study found some SNPs associated with obesity in the diabetics. This tells where to look for the gene. A second study with nearly 9,000 diabetics and controls, replicated the first.

Then the monster study, with 39,000 people [ Science vol. 316 pp. 889 – 894 ’07 ] found FTO (FaT mass and Obesity associated gene) on chromosome #16. The 16% of Caucasian adults with two copies of the variant SNP in FTO were 1.67 times more likely to be obese. An intense flurry of work showed that the gene coded for an oxidase, using iron and 2 oxo-glutaric acid (alphaKG for you old timers). The enzyme removes methyl groups from the amino group at position #6 of adenine and the 3 position of thymine. Before this time, no one really paid much attention to them. Subsequently we’ve found 6 methyl adenine in a mere 7,676 mRNAs. Just what it does when it’s there, and why the cell wants to remove it is currently being worked out.

Clearly FTO is a great target for an obesity drug. Of course they knocked the gene out in the mouse. The animals were normal at birth, but at 6 weeks weighed 30 – 40% less than normal mice. FTO as a drug target looked even better after this.

It was somewhat surprising that the SNP was in an intron in the gene. This meant that even in the obese the protein product of the FTO gene was the same as in the skinny. Presumably this could mean more FTO, less FTO or a different splice variant. If some of this molecular biology is above your pay grade, the background you need is in 5 posts starting with

It was somewhat surprising that FTO levels were the same in people with and without the fat SNP. That left splice variants as a possibility.

The denouement came this week [ Nature vol. 507 pp. 309 – 310, 371 – 375 ’14 ]. The intron containing the SNP in FTO produces obesity by controlling another gene called IRX3 which is a mere 500,000 nucleotides away. The intron of FTO binds to the promoter of IRX3 turning the gene on resulting in more IRX3. Mice lacking a functional copy of IRX3 have a 25 – 30% lower body mass. As any C programmer would say, FTO is the pointer not the data.

I don’t know if big or small pharma was at work finding inhibitors or enhancers of FTO function, but this paper should have brought them to a screeching halt. The FTO/IRX3 story just shows how many pitfalls there are to finding new drugs, and why the search has shown relatively little success recently. We are trying to alter the function of an incredibly complex system, whose workings we only dimly understand.

Why drug discovery is so hard: Reason #24 — Is the 3′ untranslated region of every mRNA a ceRNA?

We all know what proteins do. They act as enzymes, structural elements of cells, membrane proteins where drugs bind etc. etc. The background the pure chemist needs for what follows can all be found in the category “Molecular Biology Survival Guide.

We also know that that the messenger RNA for any given protein contains a lot more information than that needed to code for the amino acids making up the protein. Forget the introns that are spliced out from the initial transcript. When the mature messenger RNA for a given protein leaves the nucleus for the cytoplasm where the ribosome translates it into protein at either end it contains nucleotides which the ribosome effectively ignores. These are called the untranslated regions (UTRs). The UTRs at the 3′ end of human mRNAs range in length between 60 and 4,000 nucleotides (average 800). It costs energy to store the information for the UTR in DNA, more energy to synthesize the nucleotides which make it up, even more to patch them together to form the UTR, more to package it and move it out of the nucleus etc. etc.

Why bother? Because the 3′ UTR of the mRNA contains a lot of information which tells the cell how much protein to make, how long the mRNA should hang around in the cell (among many other things). A Greek philosopher got here first — “Nature does nothing uselessly” – Aristotle

Those familiar with competitive endogenous RNA (ceRNA) can skip what follows up to the ****

Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.

Molecular biology is full of such semantic cherry bombs as nonCoding DNA (which meant DNA which didn’t cord for protein), a subset of Junk DNA. Another is the pseudogene — these are genes that look like they should code for protein, except that they don’t because of lack of an initiation codon or a premature termination codon. Except for these differences, they have the nucleotide sequence to code for a known protein. It is estimated that the human genome contains as many pseudogenes (20,000) as it contains true protein coding genes [ Genome Res. vol. 12 pp. 272 – 280 ’02 ]. We now know that well over half the genome is transcribed into mRNA, including the pseudogenes.

PTEN (you don’t want to know what it stands for) is a 403 amino acid protein which is one of the most commonly mutated proteins in human cancer. Our genome also contains a pseudogene for it (called PTENP). Interestingly deletion of PTENP (not PTEN) is found in some cancers. However PTENP deletion is associated with decreased amounts of the PTEN protein itself, something you don’t want as PTEN is a tumor suppressor. How PTEN accomplishes this appears to be fairly well known, but is irrelevant here.

Why should loss of PTENP decrease PTEN itself? The reason is because the mRNA made from PTENP, even though it has a premature termination codon, and can’t be made into protein, is just as long, so it also contains the 3′UTR of PTEN. This means PTENP is sopping up microRNAs which would otherwise decrease the level of PTEN. Think of PTENP mRNA as a sponge.

Subtle isn’t it? But there’s far more. At least PTENP mRNA closely resembles the PTEN mRNA. However other mRNAs coding for completely different proteins, also have binding sites in their 3′UTR for the microRNA which binds to the 3UTR of PTEN, resulting in its destruction. So transcription of a completely different gene (the example of ZEB2 is given) can control the abundance of another protein. Essentially its mRNA is acting as a sponge, sopping up the killer microRNA.

It gets worse. Most microRNAs have binding sites on the mRNAs of many different proteins, and PTEN itself has a 3′UTR which binds to 10 different microRNAs.

So here is a completely unexpected mechanism of control of protein levels in the cell. The general term for this is competitive endogenous RNA (ceRNA). Two years ago the number of human microRNAs was thought to be around 1,000 (release 2.0 of miRBase in June ’13 gives the number at 2,555 — this is unlikely to be complete). Unlike protein coding genes, it’s far from obvious how to find them by looking at the sequence of our genome, so there may be quite a few more.

So most microRNAs bind the 3′UTR of more than one protein (the average number is unclear at this point), and most proteins have binding sites for microRNAs in their 3′UTR (again the average number is unclear). What a mess. What subtlety. What an opportunity for the regulation of cellular function. Who is going to be smart enough to figure out a drug which will change this in a way that we want. Absence of evidence of a regulatory mechanism is not evidence of its absence. A little humility is in order.


If this wasn’t a scary enough, consider the following cautionary tale — Nature vol. 505 pp. 212 – 217 ’14. HMGA2 is a protein we thought we understood for the most part. It is found in the nucleus, where it binds to DNA. While it doesn’t transcribe DNA into RNA, it does bind to DNA helping to form a protein complex which binds to DNA which effectively helps promote transcription of certain genes.

Well that’s what the protein does. However the mRNA for the protein uses its 3′ untranslated region (3’UTR) to sop up microRNAs of the let-7 family. The mRNA for HMGA2 is highly overexpressed in human cancer (notably the very common adenocarcinoma of the lung). You can mutate the mRNA for HMGA2 so it doesn’t produce the protein, just by putting a stop codon in it near the 5′ end. Throw the altered mRNA into a tissue culture of an lung adenocarcinoma cell line, and the cell become more proliferative and grows independently of being anchored to the tissue culture plate (e.g. anchorage independence, a biologic marker for cancer).

So what? It means that it is possible that every mRNA for every protein we make is acting as a ceRN A. The authors conclude the paper with ” Such dual-function ceRNA and protein activities necessitate a deeper exploration of the coding genome in biological systems.”

I’ll say. We’re just beginning to scratch the surface. The control mechanisms within the cell continue to amaze (me) by their elegance and subtlety. I doubt highly that we know them all. Yet more reasons that drug discovery is hard — we are mucking about with a system whose workings we only dimly understand.

Ligand binding is an inherent property of proteins — another reason for drug side effects. Reason #23 — Why drug discovery is so hard

Proteins bind ligands with exquisite specificity. Is this due to natural selection, or is the binding of small molecules an inherent property of proteins? If you consider an alpha helix as a rod 11 Angstroms wide with 3.5 Angstroms of height for every turn, you’ll see that it’s impossible to pack such items into a spherical structure without creating 3 dimensional spaces of some sort. Even when you line seven them up parallel to each other there is space between them. In fact such a structure is one of the favorite targets of the medicinal chemist (the 7 transmembrane helix G protein coupled receptor), with a space in the center of the bundle for ligand binding.

A paper in the current (4 June ’13) issue of PNAS (vol. 110 pp. 9344 – 9349) looks at the question in an unusual way. Certainly spaces exist in naturally occurring proteins (e.g. proteins which have been shaped by natural selection). They found that the spaces in them (which they call pockets) fall into about 400 groups.

Then they looked at a library of proteins designed with no other goal in mind, than the formation of a structure which was 1. stable and 2. compact. They found the same 400 pockets. So the spaces are what the late Stephen Jay Gould called a spandrel, something which exists as an accidental byproduct due to the existence of something else.

In the discussion of the paper the authors state “we conclude that ligand-binding promiscuity is likely an inherent feature resulting from the geometric and physical–chemical properties of proteins.”

What does this mean for the medicinal chemist? No matter how selective the drug (ligand) is for the protein its designed to hit, the 20,000 or so proteins making us up are likely to have other places for it to bind. This makes the design of drugs without side effects nearly impossible.

Why Drug Discovery Is So Hard – Reason #22b — Drugs aren’t always doing the things we think they are

One of the things the AIDS virus does to make ‘curing’ AIDS so difficult is hiding. It integrates a DNA copy of its RNA genome into the genome of immune cells (and God knows what else) where it just sits quietly. Activation of the immune cell to fight infection often leads to emergence and production of more virus. One promising mode of therapy is preventing the DNA copy from entering our genome in the first place. The AIDS virus (aka HIV1) produces a protein called Integrase which does that. This has led to the development of integrase inhibitors.

[ Proc. Natl. Acad. Sci. vol. 110 pp. 8327 – 8328, 8690 – 8695 ’13 ] THe HIV1 integrase is targeted to sites in chromatin by the host protein LEDGF (Lens Epithelium Derived Growth Factor, aka p75). This work shows that the integrase inhibitors blocking the interaction of LEDGF/p75 (a translational coactivator) with the integrase cause something else — they cause AIDS viruses under construction within the cell. to assemble into a noninfectious structure. This happens long after integration and expression of viral RNA and protein. It is they thought that the integrase inhibitors inappropriately stabilize integrase dimers in the viral assembly process.

Who knew? They weren’t designed to do that.

For two more examples along these lines please see

Why even great drugs have serious side effects in some patients

Finding good drugs is hard enough, but even great ones are often laid low by unexpected side effects.  This has to do with the tremendous genetic variation in people, about which, more later.  But first a true story from the past.

Neurologists treat epilepsy.  There was a period of 17 years when I was in practice when not a single new  drug against epilepsy (anticonvulsant) was introduced in the USA.  Each new drug would seem to be the answer for a small group of patients that nothing had helped before.

Felbamate (Felbatol) was one such anticonvulsant.  It helped people that nothing else touched. In the year after introduction some 150,000 people were taking it.   I had several very happy patients using Felbatol in the 90s.   1 year later the bomb dropped.  Ten cases of total bone marrow failure (aplastic anemia) had developed in patients taking the drug, a lethal complication.  Every neurologist (and probably every physician) got an urgent letter from the FDA.

Normally, unless there is an allergic reaction, anticonvulsants are never stopped suddenly.  They are tapered over a week or two.  Why?  Basically all anticonvulsants are sedating.  People adapt to this, and it’s like driving a car with one foot on the brake.  Remove the brake and the car shoots forward.  So neurologists all over the country brought patients into the hospital as the drug was immediately stopped.  We were quite worried that the previously uncontrolled seizures would flare.

I had one such patient.  Her family was quite worried about the possible side effects of suddenly stopping Felbamate.  I managed to control myself (hopefully) as I told them there was no side effect worse than death.  As risky as it is, there are still about 12,000 people taking the drug (after being carefully told about the risks) according to Wikipedia.  That’s how good a drug it is.

Why wasn’t this terrible complication picked up in the phase I, II, III studies of Felbamate — 10 cases in 150,000 people is 1/15,000, and no drug study for epilepsy was that large back then.  The incidence of epilepsy in adults is probably around 1%, meaning that some 1,500,000 people would have to be screened to find those 15,000.  So effectively there is no way to find such a rare complication before the drug was released.

A paper last month in Science (vol. 337 pp. 100 – 104 ’12) showed why this sort of thing is almost certain to happen again and again.

DNA sequencing is getting faster and cheaper all the time, so large numbers of people can have parts of their genomes sequenced.  A recent post discussed a paper that  sequenced roughly three quarters of the genes coding for proteins in some 2,439 people — e.g. 15,585 protein coding genes.

The Science paper was more circumspect.  They sequenced ‘only’ 202 genes coding for proteins in 14,002 people.  These genes were chosen quite carefully out of the 20,000 or so protein coding genes we have.  The 202 genes were known drug targets — say the neurotransmitter uptake proteins targeted by SSRIs and tricyclic antidepressants, the dopamine receptors targeted by antipsychotics.  So were the 14,002 people chosen to have their genes sequenced.  There were two ‘normal’ populations samples with 1,322 and 2,059 people each, and 12 populations chosen from people with particular diseases.  Most of these were European (12,514/14,002).

The findings essentially explain why we’ll always have rare side effects.  The total amount of DNA sequenced in each individual was 864,000 positions.  They found ‘rare’ variants (e.g. found in less than 1/200 people) quite commonly.  In fact in the group as a whole such rare variants occurred once every 21 positions in the Europeans.  The variants are the single nucleotide variants (SNVs).  Here’s a recap of just what a SNV is (for more detail see the link given above).  90% of the rare variants had never been seen before, even in these 202 proteins of great biologic and medical interest.

**** Recall that each nucleotide is one of four possibilities (A, T, G, C), and that each 3 nucleotides therefore has 4^3 = 64 possibilities.  61/64 combinations code for amino acids which, since we have only 20 gives a certain redundancy of the famed genetic code.   The other 3 combinations code for no amino acid (usually) and tell the machinery making proteins to stop.  Although crucial to our existence, these are called nonsense codons.

The genetic code is therefore 3fold degenerate (on average).  However, some amino acids are coded for by just 1 combination of 3 nucleotides while others are coded by as many as 6.  So some single nucleotide variants (SNVs) leave the amino acid coded for the same (these are the synonymous SNVs), while others change the amino acid (nonSynonymous SNVs), and possibly protein function.  *****

Certainly, not all of these variants will cause trouble, and our genomes are incredibly fault tolerant, as most of us carry very impaired genes for at least 35 of the proteins (e.g. they are truncated, so not a full protein is made).  Some almost certainly will cause unexpected reactions or side effects from a given drug.  There are so many SNVs out there.

Have Tibetans illuminated a path to the dark matter (of the genome)?

I speak not of the Dalai Lama’s path to enlightenment (despite the title).  Tall people tend to have tall kids. Eye color and hair color is also hereditary to some extent.  Pitched battles have been fought over just how much of intelligence (assuming one can measure it) is heritable.  Now that genome sequencing is approaching a price of $1,000/genome, people have started to look at variants in the genome to help them find the genetic contribution to various diseases, in the hopes of understanding andtreating them better.

Frankly, it’s been pretty much of a bust.  Height is something which is 80% heritable, yet the 20 leading candidate variants picked up by genome wide association studies (GWAS) account for 3% of the variance [ Nature vol. 461 pp. 458 – 459 ’09 ].  This has happened again and again particularly with diseases.  A candidate gene (or region of the genome), say for schizophrenia, or autism,  is described in one study, only to be shot down by the next.   This is likely due to the fact that many different genetic defects can be associated with schizophrenia — there are a lot of ways the brain cannot work well.  For details — see or see

Typically, even when an  association of a disease with a genetic variant is found, the variant only increases the risk of the disorder by 2% or less.  The bad thing is that when you lump them all of the variants you’ve discovered together (for something like height) and add up the risk, you never account for over 50% of the heredity.  It isn’t for want of looking as by 2010 some 600 human GWAS studies had been published  [ Neuron vol. 68 p. 182 ’10 ].  Yet lots of the studies have shown various disease to have a degree of heritability (particularly schizophrenia).  The fact that we’ve been unable to find the DNA variants causing the heritability was totally unexpected.  Like the dark matter in galaxies, which we know is there by the way the stars spin around the galactic center, this missing heritability has been called the  dark matter of the genome.

Which brings us to Proc. Natl. Acad. Sci. vol. 109 pp. 7391 – 7396 ’12.  It concerns an awful disease causing blindness in kids called Leber’s hereditary optic neuropathy.  The ’cause’ has been found. It is a change of 1 base from thymine to cytosine in the gene for a protein (NADH dehydrogenase subunit 1) causing a change at amino acid #30 from tyrosine to histidine.  The mutation is found in mitochondrial DNA not nuclear DNA, making it easier to find (it occurs at position 3394 of the 16,569 nucleotide mitochondrial DNA).

Mitochondria in animal cells, and chloroplasts in plant cells, are remnants of bacteria which moved inside cells as we know them today (rest in peace Lynn Margulis).

Some 25% of Tibetans have the 3394 T–>C mutations, but they see just fine.  It appears to be an adaptation to altitude, because the same mutation is found in nonTibetans on the Indian subcontinent living about 1500 meters (about as high as Denver).  However, if you have the same genetic change living below this altitude you get Lebers.

This is a spectacular demonstration of the influence of environment on heredity.  Granted that the altitude you live at is a fairly impressive environmental change, but it’s at least possible that more subtle changes (temperature, humidity, air conditions etc. etc.) might also influence disease susceptibility to the same genetic variant.  This certainly is one possible explanation for the failure of GWAS to turn up much.  The authors make no mention of this in their paper, so these ideas may actually be (drumroll please) original.

If such environmental influences on the phenotypic expression of genetic changes are common, it might be yet another explanation for why drug discovery is so hard.  Consider CETP (Cholesterol Ester Transfer Protein) and the very expensive failure of drugs inhibiting it. Torcetrapib was associated with increased deaths in a trial of 15,000 people for 18 – 20 months.  Perhaps those dying somehow lived in a different environment.  Perhaps others were actually helped by the drug


Get every new post delivered to your Inbox.

Join 69 other followers