The next big drug target

So many of the molecular machines used in the cell are composed of many different proteins held together by nonCovalent interactions. The Mediator complex contains 25 – 30 proteins with a mass of 1.6 megaDaltons, RNA polymerase contains 12 subunits, the general transcription factors contain 25 proteins, our ribosome with a mass of 4.3 megaDaltons contains 47 in the large subunit and 33 in the small. The list goes on and on — proteasome,nucleosome, post-synaptic density.

The typical protein/protein interface has an area of 1,000 – 2000 square Angstroms — or circles of diameter between 34 and 50 Angstroms. [ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ]. Think of the largest classical organic molecule you’ve ever made (not any polymer like a protein, polynucleotide, or polysaccharide). It isn’t anywhere close to this.

Yet I’m convinced that drugs targeting these complexes, will be useful. Classical organic chemistry will be useless in designing them. We’ll have to forget our beloved SN1, SN2, nonclassical carbonium ions etc. etc. We need some new sort of physical organic chemistry, one not concerned with reaction mechanism, but with van der Waals interactions, electrostatic interactions. At least stereochemistry will still be important.

The problem is much harder than designing enzyme inhibitors, or their allosteric modifiers, because the target is so large.

What follows are some notes on the protein protein interface I’ve taken over the years to get you started thinking. Good luck. Don’t expect any neat answers. There is a lot of contention concerning the nature of the binding occurring at the interface.

Many of the references aren’t particularly new.  In my reading, I don’t try for the latest reference, but the newest idea that I’m unfamiliar with.  I think they pretty much cover the territory as it stands now.

[ Proc. Natl. Acad. Sci. vol. 108 pp. 603 – 608 ’11 ] A very interesting article argues that worms and humans have about the same number of proteins (20,000) because if they had more, nonspecific protein protein interactions would cause disease. The achievable energy gap favoring specific over nonspecific binding decreases with protein number in a power law fashion (in their model). The optimization of binding interfaces favors networks in which a few proteins have many partners and most proteins have just a few — this is consistent with a scale free network topology.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] The hot spot theory of protein protein interactions says that the binding energy between two proteins is governed in large part by just a few critical residues at the binding interface. In a typical interface of 1000 – 2000 square Angstroms, only 5% of the residues from each protein contribute more than 2 kiloCalories/mole to the binding interaction. (This is controversial — see later)

[ Proc. Natl. Acad. Sci. vol. 99 pp. 14116 – 14121 ’02 ] Specific replacement of amino acids in the interface by alanine (alanine scanning or alanine mutagenesis) and measuring the effect on the interaction has led to the idea that only a small set of ‘hot spot’ residues at the inferface contribute to the binding free energy. A hot spot has been defined as a residue that when mutated to alanine leads to a significant drop in the binding constant (typically 10 fold or higher — should know how many kiloCalories this is — I think 2 or 3 ). This was well worked out for human growth hormone (HGH) and its receptor. Subsequently ‘many’ other studies have suggested that the presence of a few hot spots may be a general characteristic of most protein/protein interfaces.

However there is extreme variation in the size, shape, amino acid character and solvent content of the protein/protein interface. It is not obvious from looking at structural contacts which residues are important for binding. Usually they are found at the center of the interface but sometimes the key residues can lie on the periphery. Peripheral residues serve as an O-ring to exclude solvent from the center. A lowered effective dielectric constant in a ‘dryer’ environment strengthens electrostatic and hydrogen bonding interactions. An interaction deleted by alanine mutagenesis in the periphery can be replaced by a water molecule in the periphery and hence cause less loss in stability (this calls the whole concept of alanine scanning into question).

Interestingly, there is no general correlation between ‘surface accessibility’ and the contribution of a residue to the binding energy.

Polar residues (Arg, Gln, His, Asp, and Asn) are conserved in interfaces. This implies that they are hot spots — implies ? don’t they know? haven’t they tested? However, many interaction hot spots involve hydrophobic or large aromatic residues (also hydrophobic). It is unclear whether buried polar interactions are energetically net stabilizing or merely facilitating specificity (how would you tell?).

Some residues without significant contacts in the interface apparently contribute substantially to the free energy of binding when assayed by alanine scanning mutagenesis, because of destabilization of the unbound protein.

This a report of a free energy function (using packing interactions, hydrogen bonds and an implicit solvation model) which predicts 79% of all interface hot spots. They think that a description of polar interactions with Coulomb electrostatics with a linear distance dependent dielectric constant. ??? The latter ignores the orientation dependence of the hydrogen bond. Also the assumption that acidic or basic residues largely buried in the interface are charged may be wrong. The enthalpic gains of ionization are offset by the cost of desolvating polar groups, and the loss in side chain conformational entropy.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] It is of interest to find out if hot spot theory applies to transient protein protein interactions (such as those involved in enzyme catalysis). This work looked for them in the process of protein substrate recognition for the Cdc25 phosphatase (which dephosphorylates the cyclin dependent kinases). Crystal structures of the catalytic domains of Cdc25A and Cdc25B have shown a shallow active site with no obvious features for mediating substrate recognition. This suggests a broad protein interface rather than lock and key interaction. This is confirmed by the activity of the Cdc25 phosphatases toward Cdk/cyclin protein substrates which is 6 orders of magnitude greater than that of peptidic substrates containing the same primary sequence — this suggests a broad protein interface rather than a lock and key interaction. The shallow active sites also correlates with the lack of potent speicific inhibitors of the Cdc25 phosphatases, despite extensive search. This work finds hot spot residues in the catalytic domain (not the catalytic site) of Cdc25B located 20 – 30 Angstroms away from the active site. They are involved in recognition of substrate. The residues are conserved across eukaryotes.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 11287 – 11292 ’04 ] One can study the effects of mutating a single amino acid on two separate rates (the on rate and the off rate) the ratio of which is the equilibrium constant. Mutations changing the on rate, concern the specificity of protein protein interaction. Mutations only changing the off rate do not affect the transition state of protein binding (don’t see why not). Mutations in bovine pancreatic trypsin inhibitor (BPTI) have been found at positions #15 and #17 which differentially affect on and off rates. K15A decreases by 200 fold in the on rate and by a 1000 fold increase in the off rate. But R17A doesn’t change the on rate but also increases the off rate by 1000 fold.

The concept of anchor residue arose in the study of peptide binding to class I MHC molecules (Major HistoCompatibility complex) In this system the carboxy terminal side chain of the peptide gets buried in pocket F of the MHC binding groove. Sometimes, one also finds a second anchor residue and even a third one buried at other positions.

The authors attempt to apply the anchor residue concept to protein protein interactions. They studied 39 different protein/protein complexes. They found them, and in some way conclude that these anchor residues are already in the ‘bound’ conformation in the free partner. The anchors interact with structurally constrained pockets matching the anchor residues. The presence of nativelike anchor side chains provides a readily attainable geometrical fit that jams the two interacting surfaces, allowing for the recognition and stabilization of a near-native intermediate. Subsequently an induced fit process occurs on the periphery of the binding pocket.

The analysis of ANY (really?) protein/protein complex at the atomic length scale shows that the interface, rather than being smooth and flat, includes side chains deeply protruding into well defined cavities on the other protein. In all complexes studied, the anchor is the side chain whose burial after complex formation yields the largest possible decrease in solvent accessible surface area (SASA). If SASA is over 100 square Angstroms, than only one anchoring interaction is present. For lesser SASA amino acids one anchor isn’t enough.

In all cases tested (39) latch side chains are found in conformations conducive to a relatively straightforward clamping of the anchored intermediate into a high affinity complex.

[ Proc. Natl. Acad. Sci. vol. 102 pp. 57 – 62 ’05 ] An analysis of the protein interface between a beta-lactamase and its inhibitor, shows that the interface can be divided into clusters (by means of cluster anlaysis) using multiple mutant analysis and xray crystallography. Replacing an entire module of 5 interface residues with alanine (in one cluster) created a large cavity in the interface with no effect on the detailed structure of the remaining interface. They obtained similar results when they did this with another of the 5 clusters.

Mutating a single amino acid at a time has been done in the past, but the results of single mutations aren’t additive (e.g. they aren’t linear — no surprise). The sum of the loss in free energy of all of the single mutations within a cluster exceeds by 4 fold the loss in free energy generated when all of the residues of the cluster are mutated simultaneously. The energetic effect of many single mutations is larger than their net contribution due to a penalty paid by leaving the rest of the cluster behind.

“Binding seems to be a result of higher organization of the binding sites, and not just of surface complementrity.”

[ Proc. Natl. Acad. Sci. vol. 103 pp. 311 – 316 ’06 ] Two different ‘interactomes’ both show the same power law distribution of node sizes. However, when the two major S. cerevisiae protein/protein interactions are experiments are compared with each other, only 150 of the THOUSANDS of interactions of each experiments are the same. A similar lack of agreement has been found for independent Y2H experiments in Drosophila.

This work says that desolvation of the interface is a major physical factor in protein/protein interactions. This model reproduces the scale free nature of the topology. The number of interactions made by a protein is correlated with the fraction of hydrophobic residues on its surface.

      [ Proc. Natl. Acad. Sci. vol. 108 pp. 13528 – 13533  ’11 ] The drugs they are looking for disrupt specific protein protein interactions (PPIs).   Tey use computational solvent mapping, which explores the protein surface using a variety of small probe molecules, along with a conformer generator to account to side chain flexibility.  They studied unliganded proteins known to participate PPI.  The surface cavities available at protein protein interfaces which can bind a smal molecule inhibitor are rather different than those seen in traditional drug targets.  The traditional targets have one or two disproportionately large pockets with an average volume of 260 cubic Angstroms — these account for the binding site for the endogenous ligand in over 90% of proteins.  The average volume of pockets at protein protein interfaces is only 54 cubic Angstroms, the same as for all protein surface pockets.  The interface ontains 6 such small pockets (on average). 
      The binding sites of proteins generall include smaller regions called hotspots which are major contributors to the binding free energy.  The results of experimental fragment screens confirm that the hot spotes of proteins are characterized by their ability to bind a variety of small molecules and that the number of different probe molecules observed to bind to a particular site predicts the importance of the site and predicts overall druggability.  
      This work shows that the druggable sites in PPIs have concave topology and both hydrophobic and polar functionality.  So the hotspots bind organic molecules having some polar groups decorating largely hydropobic scaffolds. Sos druggable sites have a ‘general tendency’ to bind organic compounds with a variety of structures.  Conformational flexibility at the binding site (by side chains?) allow the hotspots to expand to accomodate a ligand of druglike dimensions.  This involves low energy side chain motions within 6 Angstroms of a hot spot.
      So druggable sites at a PPI aren’t just sites complementary to particular organic functionality, but they have a general tendency to bind a variety of different organic structures.  
      The most important binding is that the druggable sites are detectable from the structure of the unliganded protein, even when substantial conformational adaptation is needed for optimal ligand binding.

[ Science vol. 347 pp. 673 – 677 ’15 ] Mapping the sequence space of 4 key amino acids in the E. Coli protein kinase PhoQ which drives the recognition of its substrate (PhoP). For histidine kinases mutating just 3 or 4 interfacial amino acids to match those in another kinase is enought to reprogram them. The key variants are Ala284, Val 285, Ser288, Thr289.

All 20^4 = 160,000 variants of PhoQ at these positions were made, of which 1,659 were functional (implying singificant degeneracy of the interface). There were 16 single mutants, 100 double, 544 triple and 998 quadruple mutants which were functional. There was an enrichment of hydrophobic and small polar residues at each position. Most bulky and charged residues appeared at low frequencies. Some substitutions were permissible individually, but not in combination. The combinations, ACLV, TISV, SILS, each involving aresidues found individually in functional mutants at high frequency, were quite impaired in competition against wildtype PhoQ — so the effects of individual substitutions are context dependent (epistatic). Of the 100 functional double mutants, only 23 represent cases where both single mutants are functional. THere are double mutants where neither single mutant is functional. 79/1,658 functional variants can’t be reached from the wild-type combination AVST) without passing through a nonfunctional intermediated. They talk about the Hamming distance between mutants.

Finally some blue sky stuff — implying that (as usual) Nature got there first

       [ Science vol. 341 pp. 1116 – 1120 ’13 ] Small Open Reading Frames (smORFs) code for peptides of under 100 amino acids.  This work has shown that peptides as short as 11 amino acids are translated and provide essential functions during insect development.  This work shows two peptides of 28 and 29 amino acids regulating calcium transport in the Drosophila heart.  The peptides are found in man.  
      They don’t think that smORFs can’t be dismissed as irrelevant, and function should be looked for. 
       [ Science vol. 1356 – 1358 ’15 ] The Drosophila polished-rice (Pri) sORF peptides (11 – 32 amino acids)trigger proteasome mediated processing converting the Shavenbaby transcription repressor into a shorter activator.
       They think that oORF/smORFs mimic protein binding interfaces and control protein interactions that way.


Every class in grad school seemed to begin with a discussion of units. Eventually, Don Voet got fed up and said he preferred the hand stone fortnight system and was going to stick to it. However, even though we all love quantum mechanics dearly for predicting chemical reactivity and spectra, it tells us almost nothing about the events going on in our cells. It’s a crowded environment with objects large and small bumping into one another frequently and at high speeds. At room temperature, a molecule of nitrogen is moving at 500+ meters a second or over 1100 miles an hour. The water in our cells is moving even faster (28/18 times faster to be exact). It’s way too slow for relativity however.

So it’s back classical mechanics to understand cellular events at a physical level, something that will be increasingly important in drug design (but that’s for another post).

The average thermal energy of a molecule at room temperature is kT.

What’s k? It’s the Boltzmann constant. What’s that? It’s the gas constant divided by Avogadro’s number.

I’m assuming that all good chemists know that Avogadro’s number is the number of molecules in a Mole = 6.02 x 10^23

What does the Gas constant have to do with energy?

It’s back to PChem 101 — The ideal gas law is PV = nRT

P = Pressure
V = Volume
n = number of moles
R = Gas constant
T = Temperature

Pressure is Force / Area

Force is Mass * Acceleration
Acceleration is Distance/ (Time * Time)
Area is Distance * Distance
Volume is Distance * Distance * Distance

So PV == [ Force/Area ] * Volume
== { [ Mass * (Distance / Time * Time) ] /( Distance * Distance ) } * ( Distance * Distance * Distance )
== Mass * (Distance/Time) * ( Distance/Time )
== Mass * Velocity * Velocity == mv^2

So PV has the dimensions of (kinetic) energy

The Gas Constant (R) is PV/nT ( == PV/T ) so it has the dimensions of energy/temperature

Now for some actual units (vs. dimensions, although things are much clearer when you think in terms of dimensions)

Force is measured in Newtons which is the force which will accelerate a 1 kiloGram object by 1 meter/second^2

Temperature is measured in Kelvin from absolute zero. A degree Kelvin is the same as 1 degree Celsius (1.8 degrees Fahrenheit)

Room temperature where most of us live is about 27 Centigrade or very close to 300 Kelvin.

So the Boltzmann constant (k) basically energy/temperature per single molecule, which is really what you want to think about when you think about physical processes in the cell.

At room temperature kT works out to 4.1 x 10^-21 Joules.

What’s a Joule? It’s the energy a force of one Newton produces when it moves an object one meter (or you can look at it as the kinetic energy one kilogram has after a force of one Newton has accelerated it over one Meter’s distance.

So a Joule is one Newton * meter

Well 10^-21 is 10^-12 times 10^-9. So what?

This means that at room temperature the average molecule has a thermal energy of 4.1 picoNewton – nanoMeters.

PicoNewtons just happens to be in the range of the force exerted by our molecular motors ( kinesin, dynein, DNA polymerases ) and nanoMeters the range of distances over which they exert forces (act).

Not a coincidence.

Since there are organisms which live at temperatures 20% higher, it would be interesting to know if their motors exert 20% more force. Does anyone out there know?

More interesting even than that are the organisms living at the mid-Ocean ridges where because the extremely high pressures, the water coming from the vents is a lot hotter. What about their motors?

How ‘simple’ can a protein be and still have a significant biological effect

Words only have meaning in the context of the much larger collection of words we call language. So it is with proteins. Their only ‘meaning’ is the biologic effects they produce in the much larger collection of proteins, lipids, sugars, metabolites, cells and tissues of an organism.

So how ‘simple’ can a protein be and still produce a meaningful effect? As Bill Clinton would say, that depends on what you mean by simple. Well one way a protein can be simple is by only having a few amino acids. Met-enkephalin, an endogenous opiate, contains only 5 amino acids. Now many wouldn’t consider met-enkehalin a protein, calling it a polypeptide instead. But the boundary between polypeptide and protein is as fluid and ill-defined as a few grains of sand and a pile of it.

Another way to define simple, is by having most of the protein made up by just a few of the 20 amino acids. Collagen is a good example. Nearly half of it is glycine and proline (and a modified proline called hydroxyProline), leaving the other 18 amino acids to make up the rest. Collagen is big despite being simple — a single molecule has a mass of 285 kiloDaltons.

This brings us to [ Proc. Natl. Acad. Sci. vol 112 pp. E4717 – E4727 ’15 ] They constructed a protein/polypeptide of 26 amino acids of which 25 are either leucine or isoleucine. The 26th amino acid is methionine (which is found at the very amino terminal end of all proteins — remember methionine is always the initiator codon).

What does it do? It causes tumors. How so? It binds to the transmembrane domain of the beta variant for the receptor for Platelet Derived Growth factor (PDGFRbeta). The receptor when turned on causes cells to proliferate.

What is the smallest known oncoprotein? It is the E5 protein of Bovine PapillomaVirus (BPV), which is an essentially a free standing transmembrane domain (which also binds to PDGFRbeta). It has only 44 amino acids.

Well we have 26 letters + a space. I leave it to you to choose 3 of them, use one of them once, the other two 25 times, with as many spaces as you want and construct a meaningful sequence from them (in any language using the English alphabet).

Just back from an Adult Chamber Music Festival (aka Band Camp for Adults).  More about that in a future post

Kuru continues to inform

Neurologists of my generation were fascinated with Kuru, a disease of the (formerly) obscure Fore tribe of New Guinea. Who would have thought they would tell us a good deal about protein structure and dynamics?

It is a fascinating story including a Nobelist pedophile (Carleton Gajdusek) and another (future) Nobelist who I probably ate lunch with when we were both medical students in the same Medical Fraternity but don’t remember –

Kuru is a horrible neurodegeneration starting with incoordination, followed by dementia and death in a vegetative state in 4 months to 2 years. For the cognoscenti — the pathology is neuronal loss, astrocytosis, microglial proliferation, loss of myelinated fibers and the kuru plaque.

It is estimated that it killed 3,000 members of the 30,000 member tribe. The mode of transmission turned out to be ritual cannibalism (flesh of the dead was eaten by the living before burial). Once that stopped the disease disappeared.

It is a prion disease, e.g. a disease due to a protein (called PrP) we all have but in an abnormal conformation (called PrpSc). Like Vonnegut’s Ice-9 ( PrPSc causes normal PrP to assume its conformation, causing it to aggregate and form an insoluble mess. We still don’t know the structure of PrPSc (because it’s an insoluble mess). Even now, “the detailed structure of PrPSc remains unresolved” but ‘it seems to be’ very similar to amyloid [ Nature vol. 512 pp. 32 – 34 ’14]. Not only that, but we don’t know what PrP actually does, and mice with no PrP at all are normal [ Nature vol. 365 p. 386 ’93 ]. For much more on prions please see

Prusiner’s idea that prion diseases were due to a protein, with no DNA or RNA involved met with incredible resistance for several reasons. This was the era of DNA makes RNA makes protein, and Prisoner was asking us to believe that a protein could essentially reproduce without any DNA or RNA. This was also the era in which X-ray crystallography was showing us ‘the’ structure of proteins, and it was hard to accept that there could be more than one.

There are several other prion diseases of humans (all horrible) — mad cow disease, Jakob Creutzfeldt disease, Familial fatal insomnia, etc. etc. and others in animals. All involve the same protein PrP.

One can take brain homogenates for an infected animal, inoculate it into a normal animal and watch progressive formation of PrPSc insoluble aggregates and neurodegeneration. A huge research effort has gone into purifying these homogenates so the possibility of any DNA or RNA causing the problem is very low. There still is one hold out — Laura Manuelidis who would have been a classmate had I gone to Yale Med instead of Penn. n

Enter [ Nature vol. 522 pp. 423 – 424, 478 – 481 ’15 ] which continued to study the genetic makeup of the Fore tribe. In an excellent example of natural selection in action, a new variant of PrP appeared in the tribe. At amino acid #127, valine is substituted for glycine (G127V is how this sort of thing is notated). Don’t be confused if you’re somewhat conversant with the literature — we all have a polymorphism at amino acid #129 of the protein, which can be either methionine or valine. It is thought that people with one methionine and one valine on each gene at 129 were somewhat protected against prion disease (presumably it affects the binding between identical prion proteins required for conformational change to PrPSc.

What’s the big deal? Well, this work shows that mice with one copy of V127 are protected against kuru prions. The really impressive point is that the mice are also protected against variant Creutzfedlt disease prions. Mice with two copies of V127 are completely protected against all forms of human prion disease . So something about V/V at #127 prevents the conformation change to PrPSc. We don’t know what it is as the normal structure of the variant hasn’t been determined as yet.

This is quite exciting, and work is certain to go on to find short peptide sequences mimicking the conformation around #127 to see if they’ll also work against prion diseases.

This won’t be a huge advance for the population at large, as prion diseases, as classically known, are quite rare. Creutzfeldt disease hits 1 person out of a million each year.

There are far bigger fish to fry however. There is some evidence that the neurofibrillary tangles (tau protein) of Alzheimer’s disease and the Lewy bodies (alpha-Synuclein) of Parkinsonism, spread cell to cell by a ‘prionlike’ mechanism [ Nature vol.485 pp. 651 – 655 ’12, Neuron vol. 73 pp. 1204 – 1215 ’12 ]. Could this sort of thing be blocked by a small amino acid change in one of them (or better a small drug like peptide?).

Stay tuned.

The uses of disorder

There was a lot of shock and awe about a report showing how seemingly minor changes in an aliphatic group on benzene led to markedly different conformations in its protein target (lysozyme from bacteriophage T4)

Our noses are being rubbed in just how floppy proteins are, in contrast to the first glimpses of protein structure obtained by Xray crystallography. Back then we knew so little about proteins, that seeing all the atoms laid out in alpha helices and beta sheets was incredibly compelling. We talked about the structure of a protein rather than a structure. Even back then, with hemoglobin (one of the first solved proteins) it was obvious that proteins had to have more than one structure. The porphyrin ring in heme that oxygen binds to is buried deep in hemoglobin, and the initial structure had to move in some way to allow oxygen to find its way in (because the initial structure showed no obvious channel for oxygen). So hemoglobin had to breathe.

We now know that many proteins have intrinsically disordered segments. Amazingly, the most recent estimate I could find in my notes (or in Wikipedia) is this — It is estimated that over 30% of eukaryotic proteins have stretches of over 30 amino acids that are intrinsically disordered [ J. Mol. Biol. vol. 337 pp. 635 – 645 ’04 ]. Does anyone out there know of more recent data?

We’re a lot smarter now — here’s a comment on Derek’s post — “I have always thought crystal structures of proteins/enzymes are more a guide than actually useful. You are crystallizing a protein first-proteins don’t pack like that in vivo. Then you are settling on the conformation that freezes out- is this the lowest energy form? Then you are ignoring hte fact that these are highly dynamic structures that are constantly moving, sliding, shaking, adjusting. Then if you put a ligand in there you get the lowest energy form-which is what it would look like after reaction and before ligand dissociation- this is quite different from what it can look like at other stages of the reaction.”

Here is an interesting example of the uses of protein disorder going on right now in just about every neuron in your body. Most neurons have long processes, far too long for diffusion to move a needed protein to their ends. For that purpose we have microtubules (aka neurotubules in neurons) stretching the length of the processes, onto which two types of motors attach (dyneins which moves things to negative end of the microtubule and kinesins which move things to the positive end).

The microtubule is built from a heterodimer of two proteins (alpha and beta tubulin). Each contains about 450 amino acids and forms a globule 40 Angstroms (4 nanoMeters) in diameter. The heterodimers pack end to end to form a protofilament. 13 protofilaments line up side by side to form the microtubule, a hollow structure about 250 Angstroms in diameter. In cells microtubules are 1 to 10 microns long, but in nerve process they can be ‘up to’ 100 microns in length. Even at 1 micron (1,000 nanoMeters) that’s 13 * 250 heterodimers in a microtubule.

Any protein structure this important has a lot of modifications imposed on it to alter structure and function. Examples include phosphorylation and the addition of glutamic acid chains (polyglutamylation). The carboxy terminal tails of alpha and beta tubulin are flexible and stick out from the tubulin rod (which is why they aren’t seen on Xray crystallography). The carboxy terminal tail is the site of post-translational glutamylation. The enzyme polyglutamylating the carboxy terminal tail of beta tubular is TTLL7 (you don’t want to know what the acronym stands for). It binds to the alpha/beta tubular heterodimer by an intrinsically disordered region of its own (becoming structured in the process), then it binds to the intrinsically disordered carboxyl terminal tails, structuring them and modifying them. It’s basically a mating dance. There is a precedent for this — see

So disordered regions of proteins although structureless are far from functionless

Not physical organic chemistry but organic physical chemistry

This post is about physical chemistry with organic characteristics in the sense that capitalism in China is called socialism with Chinese characteristics. A lot of cell biology is also involved.

I remember the first time I heard about Irving Langmuir and the two dimensional gas he created. It even followed a modified perfect gas law (PA = nRT where A is area). He did this by making a monolayer of long chain fatty acids on water, with the carboxyl groups binding to the water, and the hydrocarbon side chain sticking up into the air. I thought this was incredibly neat. It was the first example of organic physical chemistry. He published his work in 1917 and won the Nobel in Chemistry for it in 1932.

Fast forward to our understanding of the membrane encasing our cells (the technical term is plasma membrane to distinguish from the myriad other membranes inside our cells. To a first approximation it’s just two Langmuir films back to back with the hydrocarbon chains of the lipids dissolving in each other, and the hydrophilic parts of the membrane lipids binding to the water on either side. This is why it’s called a lipid bilayer.

Most of the signals going into our cells must pass through the plasma membrane, using proteins spanning it. As a neurologist I spent a lot of time throwing drugs at them — examples include every known receptor for neurotransmitters, reuptake proteins for them (think the dopamine transporter), ion channels. The list goes on and on and includes the over 800 G protein coupled receptors (GPCRs) with their 7 transmembrane segments we have in our genome [ Proc. Natl. Acad. Sci. vol. 111 pp. 1825 – 1830 ’14 ].

Glypiated proteins (you heard right) also known as PIGtailed proteins (you heard that right too) don’t follow this pattern. They are proteins anchored in the outer leaflet of the plasma membrane lipid bilayer by covalently linked phosphatidyl inositol. — the picture shows you why — inositol is a sugar, hence crawling with hydroxyl groups, while the phosphatidic acid part has two long hydrocarbon chains which can embed in the outer leaflet. We have 150 of them as of 2009 (probably more now). Examples of PIGtailed proteins include alkaline phosphatase, Thy-1 antigen, acetyl cholinesterase, lipoprotein lipase, and decay accelerating factor. So most of them are enzymes working on stuff outside the cell, so they don’t need to signal.

Enter the lipid raft. [ Cell vol. 161 pp. 433 – 434, 581 – 594 ’15 ] It’s been 18 years since rafts were first proposed, and their existence is still controversial (with zillions of papers saying they exist and more zillions saying they don’t). What are they — definitions vary (particularly about how large they are). Here’s what Molecular Biology of the Cell 4th edition p. 589 had to say about them — Rafts are small (700 Angstroms in diameter). Rafts are rich in sphingolipids, glycolipids and cholesterol. The hydrocarbon chains are longer and straighter than those of most membrane lipids, rafts are thicker than other parts of bilayer. This allows them to better accomodate ‘certain’ membrane proteins, which accumulate there. [ Proc. Natl. Acad. Sci. vol. 100 p. 8055 – 7’03 ] These include glycosylphosphatidylinositol anchored proteins (glypiated proteins), cholesterol linked and palmitoylated proteins such as Hedgehog, Src family kinases and the alpha subunits of G proteins, cytokine receptors and integrins.

Biochemical analysis shows that rafts consist of cholesterol and sphingolipids in the exoplasmic leaflet (outer layer of the plasma membrane) of the lipid bilayer and cholesterol and phospholipids with saturated fatty acids in the endoplasmic leaflet (layer facing the cytoplasm). The raft is less fluid than surrounding areas of the membrane. So if they in fact exist, rafts contain a lot of important cellular players.

The Cell paper introduced synthetic fluorescent glypiated proteins into the outer plasma membrane leaflet of Chinese hamster ovary cells and was able to demonstrate nanoClustering on scales under 1,000 Angstroms (way too small to see with visible light, accounting for a lot of the controversy concerning their existence).

How can the authors make such a statement? The evidence was a decrease in fluorescence anisotropy due to Forster resonance energy transfer effects. Forster energy transfer is interesting in that it doesn’t involve molecule #1 losing energy by emitting a photon which is absorbed by molecule #2 increasing its energy. It works by molecule #1 inducing a dipole in molecule #2 (by a Van der Waals effect). Obviously, to do this, the molecules must be fairly close, and transfer efficiency falls off as the inverse 6th power of the distance between the two molecules.

In Fluorescence Resonance Energy Transfer (FRET), one fluorophore (the donor) transfers its excited state energy to a different fluorophore (the acceptor) which emits fluorescence of a different color. For more details see —örster_resonance_energy_transfer — its interesting stuff. Again an example of physical chemistry with organic characteristics (and pretty good evidence for the existence of lipid rafts to boot).

Now it gets even more interesting. Nanoclustering is dependent on the length of the acyl chain forming the GPI anchor (at least 18 carbons must be present). NanoClustering diminishes on cholesterol depletion in actin depleted cell blebs and mutant cell lines deficient in the inner leaflet lipid — phosphatidylserine (PS) — which has two long chain fatty acids hanging off the glycerol. So it looks as if the saturated acyl chains of the glypiated proteins of the outer leaflet interdigitate with those of PS in inner leaflet. The effect is also enhanced on expression of proteins specifically linking PS to the actin cytoskeleton of the cortex. Binding of PS to the cortical actin cytoskeleton determines where and when the clusters will be stabilized. The coupling can work both ways — if something immobilizes and stabilizes the glypiated proteins extracellularly, than PS lipids can form correlated patches.

This might be a mechanism for information transfer across the plasma membrane (and acrossother membranes to boot). This could also serve as a way to couple many outer leaflet membrane lipids such as gangliosides and other sphingolipids with events internal to the cell. Cholesterol can stabilize the local liquid ordered domain over a length scale that is large than the size of the immobilized cluster. A variety of membrane associated proteins inside the cell (spectrin, talin, caldesmon) are able to bind actin. “The formation of the contractile actin clusters then determine when and where the domains may be stabilized, bringing the generation of membrane domains in live cells under control of the actomyosin signaling network.’

So just like the integrins which can signal from outside the cell to inside and from inside the cell to outside, glypiated proteins and the actin cytoskeleton may form a two way network for signaling. No one should have to tell you how important the actomyosin cytoskeleton is in just about everything the cell does. Truly fascinating stuff. Stay tuned.

Should pregnant women smoke pot?

Well, maybe this is why college board scores have declined so much in recent decades that they’ve been normed upwards. Given sequential MRI studies on brain changes throughout adolescence (with more to come), we know that it is a time of synapse elimination. (this will be the subject of another post). We also know that endocannabinoids, the stuff in the brain that marihuana is mimicking, are retrograde messengers there, setting synaptic tone for information transmission between neurons.

But there’s something far scarier in a paper that just came out [ Proc. Natl. Acad. Sci. vol. 112 pp. 3415 – 3420 ’15 ]. Hedgehog is a protein so named because its absence in fruitflies (Drosophila) causes excessive bristles to form, making them look like hedgehogs. This gives you a clue that Hedgehog signaling is crucial in embryonic development. A huge amount is known about it with more being discovered all the time — for far more details than I can provide see

Unsurprisingly, embryonic development of the brain involves hedgehog, e,g, [ Neuron vol. 39 pp. 937 – 950 ’03 ] Hedgehog (Shh) signaling is essential for the establishment of the ventral pattern along the whole neuraxis (including the telencephalon). It plays a mitogenic role in the expansion of granule cell precursors during CNS development. This work shows that absence of Shh decreases the number of neural progenitors in the postnatal subventricular zone and hippocampus. Similarly conditional inactivation of smoothened results in the formation of fewer neurospheres from progenitors in the subventricular zone. Stimulation of the hedgehog pathway in the mature brain results in elevated proliferation in telencephalic progenitors. It’s a lot of unfamiliar jargon, but you get the idea.

Of interest is the fact that the protein is extensively covalently modified by lipids (cholesterol at the carboxy terminal end and palmitic acid at the amino terminal end. These allow hedgehog to bind to its receptor (smoothened). It stands to reason that other lipids might block this interaction. The PNAS work shows this is exactly the case (in Drosophila at least). One or more lipids present in Drosophila lipoprotein particles are needed in vivo to keep Hedgehog signaling turned off in wing discs (when hedgehog ligand isn’t around). The lipids destabilize Smoothtened. This work identifies endocannabinoids as the inhibitory lipids from extracts of human very low density lipoprotein (VLDL).

It certainly is a valid reason for women not to smoke pot while pregnant. The other problem with the endocannabinoids and exocannabinoids (e.g. delta 9 tetrahydrocannabinol), is that they are so lipid soluble they stick around for a long time — see

It is amusing to see regulatory agencies wrestling with ‘medical marihuana’ when it never would have gotten through the FDA given the few solid studies we have in man.

When the active form of a protein is intrinsically disordered

Back in the day, biochemists talked about the shape of a protein, influenced by the spectacular pictures produced by Xray crystallography. Now, of course, we know that a protein has multiple conformations in the cell. I still find it miraculous that the proteins making us up have only relatively few. For details see —

Presently, we also know that many proteins contain segments which are intrinsically disordered (e.g. no single shape).The pendulum has swung the other way — “estimations that contiguous regions longer than 50 amino acids ‘may be present” in ‘up to’ 50% of proteins coded in eukaryotic genomes [ Proc. Natl. Acad. Sci. vol. 102 pp. 17002 – 17007 ’05 ]

[ Science vol. 325 pp. 1635 – 1636 ’09 ] Compared to ordered regions, disordered regions of proteins have evolved rapidly, contain many short linear motifs that mediate protein/protein interactions, and have numerous phosphorylation sites compared to ordered regions. Disordered regions are enriched in serine and threonine residues, while ordered sequences are enriched in tyrosines — this highlights functional differences in the types of phosphorylation. Interestingly tyrosines have been lost during evolution.

What are unstructured protein segments good for? One theory is that the disordered segment can adopt different conformations to bind to different partners — this is the moonlighting effect. Then there is the fly casting mechanism — by being disordered (hence extended rather than compact) such proteins can flail about and find partners more easily.

Given what we know about enzyme function (and by inference protein function), it is logical to assume that the structured form of a protein which can be unstructured is the functional form.

Not so according to this recent example [ Nature vol. 519 pp. 106 – 109 ’15 ]. 4EBP2 is a protein involved in the control of protein synthesis. It binds to another protein also involved in synthesis (eIF4E) to suppress a form of translation of mRNA into protein (cap dependent translation if you must know). 4EBP2 is intrinsically disordered. When it binds to its target it undergoes a disorder to ordered transition. However eIF4E binding only occurs from the intrinsically disordered form.

Control of 4EBP2 activity is due, in part, to phosphorylation on multiple sites. This induces folding of amino acids #18 – #62 into a 4 stranded beta domain which sequesters the canonical YXXXLphi motif with which 4EBP2 binds eIF4E (Y stands for tyrosine, X for any amino acid, L for leucine and phi for any bulky hydrophobic amino acid). So here we have an inactive (e.g. nonbonding) form of a protein being the structured rather than the unstructured form. The unstructured form of 4EBP2 is therefore the physiologically active form of the protein.

The chemical ingenuity of the lake Ontario midge

Well we’re freezing our butts off here in sunny New England, so it’s time to discourse upon the chemical ingenuity of antifreeze proteins. They’ve long been known, with most found in fish living in arctic waters. A very unusual structure is found in a 79 amino acid protein from an insect living near Lake Ontario. It contains 79 amino acids with a set of 10 amino acid tandem repeats making up most of the protein. Here is the the repeat.

X X Cys X Gly X Tyr Cys X Gly ; X = any amino acid.

Can you as a computational chemistry expert figure out what it forms?

The 10 amino acids form a complete circle with the peptide backbone looking nothing like an alpha helix, a beta sheet or anything else I’ve seen. It just sort of wanders around for 360 degrees. In cross section the ‘circle’ resembles the Greek letter theta with a disulfide bond between the two cysteines forming a crossbar inside the circle. This puts all 7 tyrosines from the 7 repeats in a row on one side of the circle, where they form the presumed ice binding site. The solenoid is reinforced by intrachain hydrogen bonds, and side chain salt bridges. You can read about it and see some pictures in Proc. Natl. Acad. Sci. vol. 112 pp. 737 – 742 ’15 ].

The chemical ingenuity of some of these proteins is remarkable. None of them (except one) appear to have been figured out before their structures were determined.

[ Proc. Natl. Acad. Sci. vol. 108 pp. 7281 – 7282 ’11 ] Even now, the structural differences between the surface of ice nuclei and liquid water are poorly characterized (we don’t even know how many hydrogen bonds are involved), yet antifreeze proteins somehow recognize it. Some 12 different structural motifs have been found in antifreeze proteins. 3 are given — one is a small globular protein (sea pout) another is an alpha helix (winter flounder), and the third is a stack of left handed PolyProtein-II helices (snow flea). The present work gives a fourth example — a right handed parallel beta helix from (Marinomonas primoyensis). It is a 34 kiloDalton domain — it is a calcium bound parallel beta helix, with an extensive array of icelike surface waters that are anchored via hydrogen bonds directly to the protein backbone and adjacent side chains. The bound waters make an excellent 3 dimensional match to the primary prism and basal planes of ice.

Probably the most counterintuitive antifreeze protein is the following. It stands a lot of what we thought we knew about protein structure on its head.

[ Science vol. 343 pp. 743 – 744, 795 – 798 ’14 ] Almost all globular proteins reported to date have a dry protein core (e.g. water free). An antifreeze protein called Maxi from the winter flounder (Pseudopleuronectes americanus) has been found with a water filled core. It is a 3 kiloDalton alanine rich 4 helix bundle 145 Angstroms long. The periodicity of the alpha helices is 11 amino acids. A single turn of an alpha helix is 5.4 Angstroms high and 11 Angstroms wide. So 11 amino acids fairly neatly comes out to 16 Angstroms in length (because each helical turn is 3.7 residues (vs. the normal 3.6 in the classic alpha helix). The ice binding residues are Threonine at position i, Alanine at position i+4 and Alanine at position i + 8 (putting them along one face of the helix). The protein is a dimer of monomers each containing two helices. The core is comprised of 400 (yes 400 !) highly organized water molecules. The water is interleaved as a roughly two molecule thick layer between both intra and intermonomer helix interfaces, extending to the ice binding surfaces. Maxi must bind ice nuclei and inhibit their growth. The water molecules inside the bundle form pentagons ! ! ! Amazingly, this was predicted 50 years ago by Scheraga . The 5 membered water rings form cages around individual amino acid side chains, illustrating their semi-clathrate structure — rather than ice. Most of the carbonyls are involved in hydrogen bonding interactions with water — helping to keep the protein soluble. The protein denatures at low temperatures (16 C)

Ordered water can be found in most high resolution Xray crystallograpy protein structures, but they are usually between the proteins. Maxi retains the very structure of water.

Removal of water has been proposed as a potential rate limiting step in protein folding. Maxi folds to the point where water not in direct contact with the protein chain is removed from its core. It then arrests further folding to retain a beautifully ordered core of water interleaved between the protein helices.

Amazing! No one would ever have predicted something like Maxi (except Sheraga).

An interesting way to study the hydrophobic effect between protein surfaces

Protein interaction domains haven’t been studied to nearly the extent they need to be, and we know far less about them than we should. All the large molecular machines of the cell (ribosome, mediator, spliceosome, mitochondrial respiratory chain) involve large numbers of proteins interacting with each other not by the covalent bonds beloved by organic chemists, but by much weaker forces (van der Waals,charge attraction, hydrophobic entropic forces etc. etc.).

Designing drugs to interfere (or promote) such interactions will be tricky, yet they should have profound effects on cellular and organismal physiology. Off target effects are almost certain to occur (particularly since we know so little about the partners of a given motif). Showing how potentially useful such a drug can be, a small molecule inhibitor of the interaction of the AIDs virus capsid protein with two cellular proteins (CPSF6, TNPO3) the capsid protein must interact with to get into the nucleus has been developed. (Unfortunately I’ve lost the reference). For more about the host of new protein interaction domains (and potential durable targets) just discovered please see

Hydrophobic ‘forces’ are certain to be important in protein protein interactions. A very interesting paper figured out a way to measure them using atomic force microscopy (AFM). [ Nature vol. 517 pp. 277 – 279, 347 – 350 ’15 ]. This is particularly interesting to me because entropy has nothing to do with the force as measured. I’ve always assumed that the the hydrophobic force was entropic, similar to the force exerted by rubber when you stretch it. It’s what pushes hydrophobic side chains into the interior of proteins (e.g water doesn’t have to decrease its entropy by organizing itself to solvate hydrophobic side chains). Not so in this case.

The authors prepared self-assembled monolayers using dodecyl thiol (CH3 (CH2) 10 CH2 SH) bound to gold. Every now and then an amino group or a guanido group was placed at the other end of the thiol. This allowed them to produce a mixture of hydrophobic groups (60%) and ionic species (NH4+ or guanidinium ions) within nanoMeters of the hydrophobic regions. The amine and the guanidino groups were the same distance as the hydrocarbon ends from the gold surface. A gold atomic force microscope (AFM) with a hydrophobic tip (the same C(12) moiety), was then used to measure the adhesive force between the tip and the surface in aqueous solution.

This is important because it is a measurement not a theoretical calculation (apologies Ashutosh). This is particularly useful since water is so complex that we don’t have a good understanding (potential function) for it.

Methanol was added (which eliminated most of the hydrophobic interactions). Sensitivity to methanol was taken as a signature of the hydrophobic component of the force. The pH could be manipulated, so the R – NH2 could be charged to R -NH3+, ditto for guanidinium to the uncharged species.

So guess what the effect of amino and guanidine groups were on the hydrophobic interaction. I was rather surprised.

The strength of hydrophobic interactions between the mixed monolayers and the tip doubled when neutral amino groups found within nanoMeters of hydrophobic regions are charged to form R -NH3+ ions by lowering the pH. A similarly placed guanidinium ion eliminates the hydrophobic interactions at all pHs. So the effect of the two side chains (NH2 for lysine, guanidinium for arginine) is opposite.

They note that the ammonium ion is well hydrated, but guanidinium is hydrated only at the edges of the plane (where the electrons are) but not above it. This allows guanidinium an amphipathic behavior, which is why it can be a denaturant (did you know this? I didn’t).

I’m sure that the effect of negative ions (e.g. carboxyl groups) and every other conceivable side chain will be studied in the future.

Thus hydrophobicity is not an intrinsic property of any given nonPolar domain. It can be changed by functional groups within 10 Angstroms.. So placing a charged group near a hydrophobic domain, should allow tuning of the hydrophobic driving force. I’d be amazed if this isn’t found to be the case evolutionarily.

They also studied some wierd looking stuff resembling proteins (beta peptides { e.g. the amino and carboxyl groups on adjacent carbons rather than the same one as with alpha amino acids) with weird side chains which are known to adopt an amphipathic helical conformation. THe nonpolar side chains were trans 2 aminocyclohexanecarboxylic acid (ACHC), and the cationic side chains were beta3 homolysine. Why didn’t they use something more natural. The peptide forms an ACHC rich nonPolar square domain 10 Angstroms on a side with a polar patch on the other side of the helix.

So it’s a fascinating piece of work with large implications for the design of drugs attacking protein protein interfaces.


