Category Archives: Chemistry (relatively pure)

smORFs, dwORFs and now uORFs

A recent post described small Open Reading Frames (smORFs) and DWarf Open Reading Frames (DWORFS) — see the link at the bottom. Now it’s time for uORFs (upstream Open Reading Frames). Upstream of what you might ask? Well messenger RNA is grabbed by the ribosome at one end (called the 5′ end). The current thinking was that the ribosome marched along the mRNA from the 5′ to the 3′ direction looking for the sequence Adenine Uridine Guanine (AUG) which codes for methionine. It then begins reading the mRNA 3 nucleotides at a time and tacking amino acids onto the methionine. This is called translating mRNA into protein. What about the 5′ end of the mRNA before the AUG is reached (perhaps hundreds of nucleotides later) — it isn’t translated which is why its called the 5′ UTR (5′ UnTranslated Region). In bacteria its only a few nucleotides, but our 5′ UTRs can have thousands — https://en.wikipedia.org/wiki/Five_prime_untranslated_region.

Two other terms of art are upstream and downstream. Since the ribosome flows from 5′ to 3′ on mRNA, any nucleotide 5′ to a given point is called upstream, and anything 3′ is called downstream. Logical terminology — what a pleasure.

So a uORF is an upstream Open Reading Frame. Upstream to what? Why to the AUG (the initiator codon). The assumption had always been that since there was no initiator AUG codon on this region — that proteins couldn’t be made from the uORF. Wrong.

This is where [ Science vol. 351 p. 465 aad2867 – 1 –> 9 ’16 ] comes in. It turns out that the ribosome can translate some of these uORFs in protein, and the paper describes a clever technique (called 3T) they developed to find them. One of the problems in finding uORF proteins is that some are quite small, and are missed in the usual protein assays. One uORF from ATF4 contains only3 amino acids which is so small that mass spectrometry can’t see it.

The paper makes the amazing statement that — Nearly half of all mammalian mRNAs harbor uORFs in the 5′ UTRs, and many are initiated with nonAUG start codons. They may be a general mechanism to regulate downstream coding sequence expression and gives two citations that I must have missed in my reading .

For instance Binding immunoglobulin Protein (BiP aka Heat Shock Protein family A member 5 – HSPA5 ) contains uORFs exclusively initiated by UUG and CUG start codons (not AUG).

What might the functions of uORF actually be? The obvious one is that the proteins made from them might actually be doing something. What could a 3 amino acid protein possibly do? Lots. Consider thyrotropin releasing hormone which helps control your thyroid — it is pyroglutamic acid histidine proline. Then there is met-encephalin which has 5 amino acids and is one of the endogenous opiate peptides your brain uses.

Another possibility is that just translating the uORF into protein controls the translation of the protein starting with the AUG codon. This isn’t so far fetched. A recent paper [ Nature vol. 529 pp. 551 – 554 ’16 ] gave a 3 dimensional structure for RNA polymerase II transcribing a DNA template into mRNA. The authoress (Carrie Bernecky) was kind enough to supply the dimensions of the complex when I wrote her. Remember you can consider the DNA double helix as a cylinder 20 Angstroms in diameter. It is roughly 150 x 150 x 160 Angstroms. Figuring 3 stacked nucleotides/10 Angstroms, this is enough to obstruct 45 nucleotides of DNA upstream of the actual start site.

This is just another example of room at the bottom, where all sorts of small molecule metabolites, small RNAs, small DNAs are just being unearthed and their structure determined. For more on this please see the following link

SmORFs and DWORFs — has molecular biology lost its mind?

The road to the Boltzmann constant

If you’re going to think seriously about cellular biophysics, you’ve really got to clearly understand the Boltzmann constant and what it means.

The road to it starts well outside the cell, in the perfect gas law, part of Chem 101. This seems rather paradoxic. Cells (particularly neurons) do use gases (carbon monoxide, hydrogen sulfide, nitric oxide, and of course oxygen and CO2) as they function, but they are far from all gas.

Get out your colored pencils with separate colors for pressure, energy, work, force, area, acceleration, volume. All of them are combinations of just 3 things mass, distance and time for which you don’t need a colored pen,

The perfect gas law is Pressure * Volume = n R Temperature — the units don’t matter at this point. R is the gas constant, and n is the number of moles (chem 101 again).

Pressure = Force / Area
Force = Mass * Acceleration
Acceleration = distance / (time * time )
Area = Distance * distance

Volume = Distance * distance * distance

So Pressure * Volume = { Mass * distance / (time * time * distance * distance ) } * { distance * distance * distance }

= mass * distance * distance / ( time * time )

This looks suspiciously like kinetic energy (because it is )

Since work is defined as force * distance == mass * acceleration * distance

This also comes out to mass * distance * distance / ( time * time )

So Pressure * Volume has the units of work or kinetic energy

Back to the perfect gas law P * V = n * R * T

It’s time to bring in the units we actually use to  measure energy and work.

Energy and work are measured in Joules. Temperature in degrees above absolute zero (e.g. degrees Kelvin) — 300 is close to body temperature at 81.

Assume we have one mole of gas. Then the gas constant (R) is just PV/T or Joules/degree kelvin == energy/degree kelvin.

Statistical mechanics thinks about molecules not moles (6.022 * 10^23 molecules).

So the Boltzmann constant is just the Gas constant (R) divided by (the number of molecules in a mole * one degree Kelvin ) — it’s basically the energy each molecule posses divided by the current temperature — it is called k and equals 8.31441 Joules/ (mole * degree kelvin)

Biophysicists are far more interested in how much energy a given molecule has at body temperature — to find this multiply k by T (which is why you see kT all over the place.

At 300 Kelvin

kT is
4 picoNewton * nanoMNeters — work
23 milliElectron volts
.6 kiloCalories/mole
4.1 * 10^-21 joules/molecule — energy

Now we’re ready to start thinking about the molecular world.

I should do it, but hopefully someone out there can use this information to find how fast a sodium ion is moving around in our cells. Perhaps I’ll do this in a future post if no one does — it’s probably out there on the net.

Smoke, mirrors and statistical mechanics

This will be the year I look at PChem and biophysics. What comes first? Why thermodynamics of course, and chemists always think of molecules not steam engines, so statistical mechanics comes before thermodynamics

The assumptions behind statistical mechanics are really so bizarre that it’s a miracle that it works at all, but work it does.

Macrostates are things you can measure — temperature, pressure, volume.

Microstates give rise to macrostates, but you can’t measure them. However even though you can’t measure them, you can distinguish different ones and count them. Then you assume that each microstate is equally probable, even though you have no way in hell of experimentally measuring even one of them, and probability is what you find after repeated measurements (none of which you can make).

Amazing.

A new form of matter?

Has cellular biology and biochemistry shown us a new form of matter? It’s certainly something I never studied in PChem back in the day. It goes by multiple names, and may be more than one thing.

Start with the nucleolus — it’s been known for years, a visible agglomeration of proteins and RNA in the nucleus, not bound by a membrane. Then there is the processing body (aka stress granule), also made of proteins and RNA (but different ones — transcription factors and mRNAs). Then there is the nuclear pore, made of ‘low complexity sequence tails of proteins surrounding the pore (mostly phenylalanine glycine repeats — aka FG repeats) thought to form a barrier to protein movement through the pore. Then there are RNA granules – said to occur by a phase transition to a hydrogel-like phase (whatever that is). Neurologists have long been interested in FUS/TLS a protein which is mutated in some forms of Amyotrophic Lateral Sclerosis and dementia.

I do think that we’re at the blind men and the elephant stage trying to sort all this out (which, of course, makes it fascinating and a fit subject for scientific work — apologies Wittgenstein — “What we cannot speak about we must pass over in silence”

So in what follows you’ll find a lot of information about these matters, which does not have a neat and tidy explanation. This is what science looks like when it’s being done.

[ Cell vol. 149 pp. 753 – 767, 768 -779 ’12 ] RNA granules don’t just occur in dendrites — they are found in (1) germ cell P granules of C. elegans embryos (2) polar granules of Drosophila embryos (3) stress grnules appearing in cultured yeast and mammalian cells on nutrient deprivation or other forms of metabolic stress (4) neuronal granules transporting mRNAs to dendrites.

Unsurprisingly, the granule contains RNA binding proteins (with KH or RNA Recognition motif (RRM) domains). These domains allow proteins containing them to recognize 3’ untranslated regions of target mRNA in a sequence specific manner (really?).

This work shows that structures resembling RNA granules can be reversibly aggregated and disaggregated in a soluble cellfree system in response to a small molecule (a biotinylated isoxazole ) The proteins in the granules contain low complexity sequences (LC sequences). which show little diversity in their amino acid composition (which is usually repetitive). One example is the leucine rich domain. LC sequences are all you need for aggregation by the isoxazole. The domains undergo a concentration dependent phase transition to a hydrogel-like state with no chemical present?? The hydrogels are made of uniformly polymerized amyloidlike fibers. The fibers form and dissolve and don’t cause trouble (unlike classic amyloid).

LC sequences are particularly enriched in RNA and DNA binding proteins. FUS (FUsed in Sarcoma) is an RNA binding protein containing an LC domain (Gly/Ser Tyr Gly/Ser repeats). Hydrogel droplets formed from the LC sequence of FUS can retain proteins containing either the FUS LC sequence or other LC sequences.

This work finds a potential use for LC sequences — they allow the movement of regulatory proteins into and out of organized subcellular domains, via reversible polymerization into dynamic amyloidlike fibers. It’s possible that something similar occurs in Cajal bodies, nuclear speckles and nuclear factories involved in RNA splicing.

[ Proc. Natl. Acad. Sci. vol. 99 pp. 13583 – 13588 ’02 ] They range in size from 2000 Angstroms to several microns. None of them are bounded by a membrane. It is thought that the same processes leading to the formation of nuclear bodies (e.g. a phase transition) is responsible for similar bodies occuring in the cytoplasm) — e.g. P bodies (Processing bodies), stress granules.

Each type is identified immunologically by antibodies against its components (either signature proteins or ribonucleoproteins or even small nuclear RNAs. They include
l. The Cajal body (the coiled body)
2. The promyelocytic body (PML body, POD)
3. Splicing related bodies
a. SC35 speckles (interchromatin granule cluster)
4. The GEM body
5. The matrix associated deacetylase body
6. HAP body
7. nucleoli associated paraspeckles
8. Nucleoli themselves.

The integrity of a nuclear body can be disrupted after depletion of its normal components — PODs are disrupted in acute PML.

The Cajal body and GEM are colocalized, but otherwise there doesn’t seem to be much association among the different nuclear bodies.

[ Cell vol. 162 pp. 1066 – 1077 ’15 ] FUS forms liquid compartments at sites of DNA damage in the nucleus and in the cytoplasm on stress. With time liquid droplets of FUS convert with time to an aggregated state, a conversion accelerated by mutations (in the prionlike domain) derived from patients.

Why is the compartment called liquidlike? FUS molecules rapidly rearrange within the compartment. The comaprtments formed by FUS are spherical. Two FUS compartments can fuse and relax into one sphere.

FUS compartments belong to a set of RNA protein compartments (P granules, nucleoli) which ‘probably’ form by liquid liquid demixing (phase separation) from cytoplasm.

The conversion between a liquid to a solidlike state is concentration dependent, and mutations blocking nuclear localization sequence (NLS) functgion produce increased concentrations in the cytoplasm with aggregation.

The prionlike domain of FUS is intrinsically disordered.

[ Neuron vol. 88 pp. 678 – 690 ’15 ] Mutations in a bunch of RNA binding proteins (TDP43, FUS, ataxin2, hnRNPA1, hnRNPB2) are associated with ALS/FTD (Amyotrophic Lateral Sclerosis/FrontoTemporal Dementia). Poorly soluble assemblies of the mutant RNA binding protein are found in the nucleus and cytoplasm in the patients.

The assemblies differ from amyloids in the following ways
l. They are soluble in urea
2. They have low beta sheet content
3. They have a mixed granular/fibrillar appearance on EM
4. They don’t bind dyes diagnostic for amyloid (e.g. thioflavin T)
5. When fluorescently labeled, they don’t show the reductions in in vivo fluorescent lifetimes typical of conventional amyloid.

This work shows that the LC domain (Low Complexity domain) of normal FUS undergoes phase transitions, reversibly shifting between dispersed liquid droplets and hydrogel-like phases (defined how). FUS mutants limit the ability to shift between phases, instead increasing the propensity of FUS to condense into poorly soluble stable (e.g. irreversible) fibrillar hydrogel-like assemblies (e.g. a new type of phase. Spontaneous occurrence of this might explain sporadic ALS/FTD with FUS pathology even when no mutations are present. These assemblies selectively entrap other ribonucleoproteins, impair local RNP granule function and decrease new protein synthesis in axon terminals of cultured neurons. The work was done in C. elegans.

“The biophysics of conversion from liquid droplet to reversible hydrogel is not yet clear”. Thw two differ only slightly in viscosity.

The FG repeats (phenylalanine, glycine repeats) of nucleoporins show structural characteristics typical of natively unfolded proteins (e.g. highly flexible proteins lacking ordered secondary structure). They can be quite long (200 – 700 amino acids in yeast). Protease sensitivity shows that most FG repeat containing nucleoporins are disordered in situ within the nuclear pore complexes of purified yeast nuclei. This makes it likely that they form a meshwork of random coils at the pore through which nuclear transport proceeds. Natively unfolded proteins show the following biochemical features

l. multiple domains allowing simultaneous interactions with multiple binding partners
2. nonrigid binding domains that can accomodate a variety of interacting partners
3. fast molecular association and dissociation rates.

Another model has FG domains interacting with each other in the pore to form a protein meshwork which acts as a separate hydrophobic phase. Transport complexes can partititon into this phase because they can bind to the GF repeats. Proteins unable to bind to the FG repeats are excluded from the hydrophobic phase. Molecules below 30 – 40 kiloDaltons get through the water filled holes in the gel.

To get through the pore a midsize protein must recruit a large receptor to pass through a narrow channel. The receptors replace the FG – FG binding of the nups with each other by binding to themselves — they essentially dissolve into the gel.

An alternate view holds that FG repeats form a network of unlinked polymers whose thermally activated undulations create a zone of ‘entropic exclusion’. The entropic penalty in collapsing the chains allows a barrier to form. However by binding to the repeats, carriers can circumvent the exclusion — replacing one type of bond with another.

There are several models for the FG repeats in the nuclear pore. The most convincing (to me) is the ‘selective phase’ model — a sievelike meshwork is formed within the NPC via interactions between FG repeats. The size of the FG mesh determines the upper limits of the diffusion gate (e.g. — the molecules getting through without help — in this case under 30 kiloDaltons). The binding of nuclear transport receptors (NTRs) to the FG repeats is proposed to locally dissolve the FG-FG network, allowing passage of whatever is bound to the NTRs.

‘Sufficiently concentrated’ solutions of cohesive FG domains spontaneously form FG hydrogels (which excludes inert molecules over 50 Angstroms in diameter ). Cargo NTR complexes migrate into such hydrogels ‘up to’ 20,000 times faster than the respective cargoes alone. The intragel diffusion rate of a typical importinBeta:cargo complex predicts a similar NPC passage time (10 milliSeconds) as was actually ssen in living NPCs.

The FG repeat domain of the yeast nucleoporin Nsp1 forms a hydrogel-like structure in vitro which requires hydrophobic interactions between the aromatic rings of the phenylalanines. This work assembled FG hydrogels in vitro, and studied protein entry into them and diffusion through them usingfluorescence microscopy. The influx of various nuclear transport receptors of the importin beta family into the Nsp1 FG hydrogel was 1000 times faster than the entry of a control protein. Access of a model cargo bound to importin beta was accelerated by over 20,000 fold (compared to free cargo). However, not every FG hydrogel shows selectivity. To achieve selective permeability the total FG concentration within the gel had to be raised above 50 milliMolar. This has led the authors to introduce the concept of the saturated hydrogel, in which all the FG repeats must extend completely and undergo a maximum number of interactions. It seems likely (to the authors of the editorial not the authors of the paper) that newly made FG proteins would immediately curl up and form intramolecular FG bridges (rather than intermolecular ones) In vitro gel formation can only be induced from lyophilized proteins under extreme pH and salt. The authors suggest that nuclear transport receptors act as chaperones preventing intramolecuular FG interactions after synthesis. Under more physiologic conditions, the FG domain of Nsp1 formed neither homo nor heterotypic interactions with other FG nucleoporins.

FG repeat domains (they contain a hydrophobic patch, usually FG, FxFG, or GLFG, surrounded by more hydroplic spacers) account for 12 – 20% of the mass of a nuclear pore complex. Up to 50 FG repeat domains may occur in a single protein. FG repeats occur in various flavors — examples are FxFG repeats

So there you have it — quite a mess. Figure it out and get on the boat to Sweden

Les fleurs du PTEN

Les fleurs du Mal is a volume of poetry by Baudelaire about the beauty of evil and depravity. I have the same esthetic appreciation for the horrible things a mutant of PTEN does. It’s awful, but incredibly elegant chemically.

Back in the day med students used to be told ‘know syphylis and you’ll know medicine’ because of its varied clinical manifestations. PTEN is like that for cellular and molecular biology.

PTEN (Phosphatase and TENsin homolog) is a gene mutated in many forms of cancer. So it was regarded as a tumor suppressor, keeping our cells on the straight and narrow. Naturally cancer cells ‘try’ (note the anthropomorphism) to neutralize it. PI3K is a universal tumor driver, integrating growth factor signaling with downstream circuitries of cell proliferation, metabolism and survival.

Inositol is a 6 membered ring (all carbons) with one OH group attached to each carbon, which are numbered 1 through 6. PI3K puts phosphate on the 3 position, PTEN takes it off. Since this is how PI3K signaling begins, cells lacking PTEN grow faster and migrate aberrantly (e.g. spread).

Enter Proc. Natl. Acad. Sci. vol. 112 pp. 13976 – 13981 ’15 which carefully studied a PTEN mutant found in an unfortunate man with aggressive prostate cancer. It just changed one of the 403 amino acids (#126) from alanine to glycine. Not a big deal you say,it’s just a change of CH3 (alanine) to H (glycine). #126 is near the active site of the enzyme. One might expect that the mutation inhibits PTEN’s phosphatase activity (e.g. its enzymatic activity). Not so — the mutations shifts the activity so the enzyme. Instead of removing phosphate from the 3 position of inositol, the phosphate at the 5 position is removed (leaving the 3 position alone). This shifts inositol phosphate levels in the cell with hyperactivation of PI3K signaling (which requires inositol phospholipids containing phosphate at the 3 position).

What happens is that inositol phosphates fit into the mutant active site with the 5 position near the catalytic amino acid (cysteine). Essentially the 6 membered ring rotates the 3 position away from cysteine and puts the 5 position there instead. This changes PTEN from a tumor suppressor (anti-oncogene) to an oncogene.

To a chemist this is elegant and beautiful (apologies Baudelaire).

PTEN has taught us a huge amount about the control of protein levels, pseudogenes, competitive endogenous RNA (ceRNA). You can read all about this in https://luysii.wordpress.com/2014/01/20/why-drug-discovery-is-so-hard-reason-24-is-the-3-untranslated-region-of-every-protein-a-cerna/

That’s fairly grim, so here’s a link to one of the great comedians of years past — Jonathan Winters

http://biggeekdad.com/2013/04/jonathan-winters-stick/

It’s politically incorrect and sure to offend the humorless pompous prigs. Enjoy ! ! !

A new kid on the Alzheimer’s block

There’s a new kid on the Alzheimer’s block, and it may explain why the huge sums thrown at beta-secretase inhibitors by big pharma has been such an abject failure. First, a lot of technical background.

The APP (for amyloid precursor protein) contains anywhere from 563 to 770 amino acids in 5 distinct transcripts made by alternate splicing of the single gene. The 3 main forms contain 695, 751 and 770 amino acids. The 695 amino acid form is found only in brain and peripheral nerve where it predominates, while the transcripts containing 751 and 770 amino acids are found everywhere but predominate in other tissues. The A4 peptides (Abeta peptides) which are the major components of the Alzheimer senile plaque are derived from from the carboxy terminal end of APP (beginning at amino acid #597 ) and contain only 39 – 43 amino acids. About 1/3 of the 39 – 43 amino acid amyloid beta peptide (A beta peptide) is found within the transmembrane segment of APP the other two thirds being found just outside the membrane.  So to get A beta peptides the APP must be cut (more than once) at its carboy terminal end.

For Abetaxx (xx between 39 and 43) to be formed, cleavage must occur outside the membrane in which APP is embedded by beta secretase. This produces a soluble extracellular fragment, with the rest embedded in the membrane (this is called C99). Then gamma secretase (another enzyme) cleaves C99 within the membrane forming the Abeta peptides, which constitute much of the senile plaque of Alzheimer’s disease.

Alpha secretase (yet another enzyme) also cleaves the APP in its carboxy terminal extramembranous part, but does so closer to the membrane, so that part of the protein which would form the aBeta peptide is removed.

R. Scheckman personal communication (2012) — The Abeta peptide is appears to be cleaved by gamma secretase from the fragment generated by beta secretase. However, this happens well inside the cell in the last station of the Golgi apparatus. Then Abeta is swept out of the cell by the secretory pathway. So all this happens INSIDE the cell, rather than at the neuron’s extracellular membrane (which is what I thought).

Remarkably it is very difficult (for me at least) to find out just at what amino acids of the amyloid precursor protein(s) the 3 secretases (alpha, beta, gamma) cleave.

[ Nature vol. 526 pp. 443 – 447 ’15 ] describes a totally new kid on the block, which (if replicated) should make us rethink everything we thought we knew about the amyloid precursor protein and the Abeta peptide. Another set of carboxy terminal fragments (CTFs) called CTFneta is formed from the amyloid precurosr protein (APP). Formation is mediated (in part) by MT5-MMP, a matrix metalloprotease. (In grad school neta is how we pronounced the Greek letter eta, which looks like a script N). The authors call the enzymatic activity forming them neta-secretase (clearly not all the enzymes which do this have been identified at this point). At least the authors tell you where the neta secretases cleave APP695 (between amino acids #504 – #505) . This is amino terminal to the beta and alpha sites (which are at higher amino acid numbers and the gamma site which is at a higher number still).  Alpha and beta secretase then work on CTFneta to produce shorter peptides, called Aneta-alpha, and Aneta-beta.

This isn’t idle chatter as Aneta-alpha, and Aneta-beta are found in the dystrophic neurites in an Alzheimer mouse model (human work is sure to follow). Inhibition of beta secretase activity results in accumulation of CTFneta and Aneta-alpha.

Aneta-alpha itself lowers long term potentiation (LTP) in hippocampal slices (LTP is considered by most to be the best molecular and physiological model we have of learning). As judged by intracellular calcium levels, hippocampal neuronal activity is also inhibited by Aneta-alpha.

What’s fascinating about all this, is that the work possibly explains why the huge amount of money big pharma has spend on beta secretase inhibitors has been such a failure.

Maybe chemistry just isn’t that important in wiring the brain

Even the strongest chemical ego may not survive a current paper which states that the details of ligand receptor binding just aren’t that important in wiring the fetal brain.

The paper starts noting that there isn’t enough information in our 3.2 gigaBase genome to specify each and every synapse. Each cubic milliMeter of cerebral cortex is stated to contain a billion of them [ Cell vol. 163 pp. 277 – 280 ’15 ].

If you have enough receptors and ligands and use them combinatorially, you actually can specify quite a few synapses. We have 70 different protocadherin gene products found on the neuronal surface. They can bind to each other and themselves. The fruitfly has the dscam genes which guide axons to their proper position. Because of alternative splicing some 38,016 dscam isoforms are possible.

It’s not too hard to think of these different proteins on the neuronal surface as barcodes, specifying which neuron will bind to which.

Not so, says [ Cell vol. 163 pp. 285 – 291 ’15 ]. What is important is that there are lot of them, and that a neuron expressing one of them is unlikely to bump into another neuron carrying the same one. Neurons ‘like’ to form synapses, and will even form synapses with themselves (one process synapsing on another) if nothing else is around. These self synapses are called autapses. How likely is this? Well under each square millimeter of cortex in man there are some 100,000 neurons, and each neuron has multiple dendrites and axons. Self synapse formation is a real problem.

The paper says that the structure of all these protocadherins, dscams and similar surface molecules is irrelevant to what program they are carrying out — not synapsing on yourself. If a process bumps into another in the packed cortex with the same surface molecule, the ‘homophilic’ binding prevents self-synapse formation. So the chemical diversity is just the instantiation of the ‘don’t synapse with yourself’ rule — what’s important is that there is a lot of diversity. Just what this diversity is chemically is less important than there is a lot of it.

This is another example of “It’s not the bricks, it’s the plan” in another guise — https://luysii.wordpress.com/2015/09/27/it-aint-the-bricks-its-the-plan/

The next big drug target – II

In a post a week ago I argued that the next big drug target was the protein protein interface. The PNAS of 6 Oct had a paper indirectly confirming just that [ Proc. Natl. Acad. Sci. vol. 112 pp. E5486 – E5495 ’15 ] What they did was fairly simple (intellectually) but a lot of work. They just analyzed the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3 dimensional structures of human proteins in the Protein Data Bank. They looked for clustering of the mutations — on the protein surface (or interior). As you all know, although proteins are a linear string of amino acids, they fold up like a hair ball, so widely separated amino acids in the sequence may be right next to each other in the 3 dimensional structure of the protein.

What’s so confirmatory of the previous post was that they found enrichment of mutations in the interfaces between a variety of oncoprotein and other proteins (including tumor suppressors). Most of the significant interfaces carried mutations in both interaction partners. Overall,they found 50 different proteins with clustering of mutations and/or enrichment of mutations at interaction interfaces. Here are the names of a few of the culprits for the cognoscenti — FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg2+, MAX-DNA, SRSF2-RNA, and others. The paper contains much more detail than this and discusses the significance of the protein pairs shown above. One example should suffice

FBXW7-CCNE1. Cyclin E1 (CCNE1) is a critical cell cycle protein, which at abnormally high levels promotes premature cell division, genomic instability, and tumorigenesis. FBXW7 (F-box/WD repeat-containing protein 7) is a substrate recog- nition component of an E3 ubiquitin-protein ligase complex, mediating the ubiquitination and subsequent proteasomal degradation of CCNE1 and other cancer proteins like MYC and JUN. We found that all six recurrently mutated residues (found in at least three samples from our mutation dataset) of FBXW7 clustered together at the WD40 propeller domain of the protein product. Four of them, R465, R479, R505, and R689, interacted directly with the substrate CCNE1 through hydrogen bonds (Fig. 5A). Changes in these residues could perturb the interaction, causing insufficient ubiquitination/ degradation of CCNE1 in tumor samples (as has been pre- viously shown in model systems).

****
Here’s the post of a week ago

The next big drug target

So many of the molecular machines used in the cell are composed of many different proteins held together by nonCovalent interactions. The Mediator complex contains 25 – 30 proteins with a mass of 1.6 megaDaltons, RNA polymerase contains 12 subunits, the general transcription factors contain 25 proteins, our ribosome with a mass of 4.3 megaDaltons contains 47 in the large subunit and 33 in the small. The list goes on and on — proteasome,nucleosome, post-synaptic density.

The typical protein/protein interface has an area of 1,000 – 2000 square Angstroms — or circles of diameter between 34 and 50 Angstroms. [ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ]. Think of the largest classical organic molecule you’ve ever made (not any polymer like a protein, polynucleotide, or polysaccharide). It isn’t anywhere close to this.

Yet I’m convinced that drugs targeting these complexes, will be useful. Classical organic chemistry will be useless in designing them. We’ll have to forget our beloved SN1, SN2, nonclassical carbonium ions etc. etc. We need some new sort of physical organic chemistry, one not concerned with reaction mechanism, but with van der Waals interactions, electrostatic interactions. At least stereochemistry will still be important.

The problem is much harder than designing enzyme inhibitors, or their allosteric modifiers, because the target is so large.

What follows are some notes on the protein protein interface I’ve taken over the years to get you started thinking. Good luck. Don’t expect any neat answers. There is a lot of contention concerning the nature of the binding occurring at the interface.

Many of the references aren’t particularly new. In my reading, I don’t try for the latest reference, but the newest idea that I’m unfamiliar with. I think they pretty much cover the territory as it stands now.

[ Proc. Natl. Acad. Sci. vol. 108 pp. 603 – 608 ’11 ] A very interesting article argues that worms and humans have about the same number of proteins (20,000) because if they had more, nonspecific protein protein interactions would cause disease. The achievable energy gap favoring specific over nonspecific binding decreases with protein number in a power law fashion (in their model). The optimization of binding interfaces favors networks in which a few proteins have many partners and most proteins have just a few — this is consistent with a scale free network topology.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] The hot spot theory of protein protein interactions says that the binding energy between two proteins is governed in large part by just a few critical residues at the binding interface. In a typical interface of 1000 – 2000 square Angstroms, only 5% of the residues from each protein contribute more than 2 kiloCalories/mole to the binding interaction. (This is controversial — see later)

[ Proc. Natl. Acad. Sci. vol. 99 pp. 14116 – 14121 ’02 ] Specific replacement of amino acids in the interface by alanine (alanine scanning or alanine mutagenesis) and measuring the effect on the interaction has led to the idea that only a small set of ‘hot spot’ residues at the inferface contribute to the binding free energy. A hot spot has been defined as a residue that when mutated to alanine leads to a significant drop in the binding constant (typically 10 fold or higher — should know how many kiloCalories this is — I think 2 or 3 ). This was well worked out for human growth hormone (HGH) and its receptor. Subsequently ‘many’ other studies have suggested that the presence of a few hot spots may be a general characteristic of most protein/protein interfaces.

However there is extreme variation in the size, shape, amino acid character and solvent content of the protein/protein interface. It is not obvious from looking at structural contacts which residues are important for binding. Usually they are found at the center of the interface but sometimes the key residues can lie on the periphery. Peripheral residues serve as an O-ring to exclude solvent from the center. A lowered effective dielectric constant in a ‘dryer’ environment strengthens electrostatic and hydrogen bonding interactions. An interaction deleted by alanine mutagenesis in the periphery can be replaced by a water molecule in the periphery and hence cause less loss in stability (this calls the whole concept of alanine scanning into question).

Interestingly, there is no general correlation between ‘surface accessibility’ and the contribution of a residue to the binding energy.

Polar residues (Arg, Gln, His, Asp, and Asn) are conserved in interfaces. This implies that they are hot spots — implies ? don’t they know? haven’t they tested? However, many interaction hot spots involve hydrophobic or large aromatic residues (also hydrophobic). It is unclear whether buried polar interactions are energetically net stabilizing or merely facilitating specificity (how would you tell?).

Some residues without significant contacts in the interface apparently contribute substantially to the free energy of binding when assayed by alanine scanning mutagenesis, because of destabilization of the unbound protein.

This a report of a free energy function (using packing interactions, hydrogen bonds and an implicit solvation model) which predicts 79% of all interface hot spots. They think that a description of polar interactions with Coulomb electrostatics with a linear distance dependent dielectric constant. ??? The latter ignores the orientation dependence of the hydrogen bond. Also the assumption that acidic or basic residues largely buried in the interface are charged may be wrong. The enthalpic gains of ionization are offset by the cost of desolvating polar groups, and the loss in side chain conformational entropy.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] It is of interest to find out if hot spot theory applies to transient protein protein interactions (such as those involved in enzyme catalysis). This work looked for them in the process of protein substrate recognition for the Cdc25 phosphatase (which dephosphorylates the cyclin dependent kinases). Crystal structures of the catalytic domains of Cdc25A and Cdc25B have shown a shallow active site with no obvious features for mediating substrate recognition. This suggests a broad protein interface rather than lock and key interaction. This is confirmed by the activity of the Cdc25 phosphatases toward Cdk/cyclin protein substrates which is 6 orders of magnitude greater than that of peptidic substrates containing the same primary sequence — this suggests a broad protein interface rather than a lock and key interaction. The shallow active sites also correlates with the lack of potent speicific inhibitors of the Cdc25 phosphatases, despite extensive search. This work finds hot spot residues in the catalytic domain (not the catalytic site) of Cdc25B located 20 – 30 Angstroms away from the active site. They are involved in recognition of substrate. The residues are conserved across eukaryotes.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 11287 – 11292 ’04 ] One can study the effects of mutating a single amino acid on two separate rates (the on rate and the off rate) the ratio of which is the equilibrium constant. Mutations changing the on rate, concern the specificity of protein protein interaction. Mutations only changing the off rate do not affect the transition state of protein binding (don’t see why not). Mutations in bovine pancreatic trypsin inhibitor (BPTI) have been found at positions #15 and #17 which differentially affect on and off rates. K15A decreases by 200 fold in the on rate and by a 1000 fold increase in the off rate. But R17A doesn’t change the on rate but also increases the off rate by 1000 fold.

The concept of anchor residue arose in the study of peptide binding to class I MHC molecules (Major HistoCompatibility complex) In this system the carboxy terminal side chain of the peptide gets buried in pocket F of the MHC binding groove. Sometimes, one also finds a second anchor residue and even a third one buried at other positions.

The authors attempt to apply the anchor residue concept to protein protein interactions. They studied 39 different protein/protein complexes. They found them, and in some way conclude that these anchor residues are already in the ‘bound’ conformation in the free partner. The anchors interact with structurally constrained pockets matching the anchor residues. The presence of nativelike anchor side chains provides a readily attainable geometrical fit that jams the two interacting surfaces, allowing for the recognition and stabilization of a near-native intermediate. Subsequently an induced fit process occurs on the periphery of the binding pocket.

The analysis of ANY (really?) protein/protein complex at the atomic length scale shows that the interface, rather than being smooth and flat, includes side chains deeply protruding into well defined cavities on the other protein. In all complexes studied, the anchor is the side chain whose burial after complex formation yields the largest possible decrease in solvent accessible surface area (SASA). If SASA is over 100 square Angstroms, than only one anchoring interaction is present. For lesser SASA amino acids one anchor isn’t enough.

In all cases tested (39) latch side chains are found in conformations conducive to a relatively straightforward clamping of the anchored intermediate into a high affinity complex.

[ Proc. Natl. Acad. Sci. vol. 102 pp. 57 – 62 ’05 ] An analysis of the protein interface between a beta-lactamase and its inhibitor, shows that the interface can be divided into clusters (by means of cluster anlaysis) using multiple mutant analysis and xray crystallography. Replacing an entire module of 5 interface residues with alanine (in one cluster) created a large cavity in the interface with no effect on the detailed structure of the remaining interface. They obtained similar results when they did this with another of the 5 clusters.

Mutating a single amino acid at a time has been done in the past, but the results of single mutations aren’t additive (e.g. they aren’t linear — no surprise). The sum of the loss in free energy of all of the single mutations within a cluster exceeds by 4 fold the loss in free energy generated when all of the residues of the cluster are mutated simultaneously. The energetic effect of many single mutations is larger than their net contribution due to a penalty paid by leaving the rest of the cluster behind.

“Binding seems to be a result of higher organization of the binding sites, and not just of surface complementrity.”

[ Proc. Natl. Acad. Sci. vol. 103 pp. 311 – 316 ’06 ] Two different ‘interactomes’ both show the same power law distribution of node sizes. However, when the two major S. cerevisiae protein/protein interactions are experiments are compared with each other, only 150 of the THOUSANDS of interactions of each experiments are the same. A similar lack of agreement has been found for independent Y2H experiments in Drosophila.

This work says that desolvation of the interface is a major physical factor in protein/protein interactions. This model reproduces the scale free nature of the topology. The number of interactions made by a protein is correlated with the fraction of hydrophobic residues on its surface.

[ Proc. Natl. Acad. Sci. vol. 108 pp. 13528 – 13533 ’11 ] The drugs they are looking for disrupt specific protein protein interactions (PPIs). Tey use computational solvent mapping, which explores the protein surface using a variety of small probe molecules, along with a conformer generator to account to side chain flexibility. They studied unliganded proteins known to participate PPI. The surface cavities available at protein protein interfaces which can bind a smal molecule inhibitor are rather different than those seen in traditional drug targets. The traditional targets have one or two disproportionately large pockets with an average volume of 260 cubic Angstroms — these account for the binding site for the endogenous ligand in over 90% of proteins. The average volume of pockets at protein protein interfaces is only 54 cubic Angstroms, the same as for all protein surface pockets. The interface ontains 6 such small pockets (on average).
The binding sites of proteins generall include smaller regions called hotspots which are major contributors to the binding free energy. The results of experimental fragment screens confirm that the hot spotes of proteins are characterized by their ability to bind a variety of small molecules and that the number of different probe molecules observed to bind to a particular site predicts the importance of the site and predicts overall druggability.
This work shows that the druggable sites in PPIs have concave topology and both hydrophobic and polar functionality. So the hotspots bind organic molecules having some polar groups decorating largely hydropobic scaffolds. Sos druggable sites have a ‘general tendency’ to bind organic compounds with a variety of structures. Conformational flexibility at the binding site (by side chains?) allow the hotspots to expand to accomodate a ligand of druglike dimensions. This involves low energy side chain motions within 6 Angstroms of a hot spot.
So druggable sites at a PPI aren’t just sites complementary to particular organic functionality, but they have a general tendency to bind a variety of different organic structures.
The most important binding is that the druggable sites are detectable from the structure of the unliganded protein, even when substantial conformational adaptation is needed for optimal ligand binding.
[ Science vol. 347 pp. 673 – 677 ’15 ] Mapping the sequence space of 4 key amino acids in the E. Coli protein kinase PhoQ which drives the recognition of its substrate (PhoP). For histidine kinases mutating just 3 or 4 interfacial amino acids to match those in another kinase is enought to reprogram them. The key variants are Ala284, Val 285, Ser288, Thr289.

All 20^4 = 160,000 variants of PhoQ at these positions were made, of which 1,659 were functional (implying singificant degeneracy of the interface). There were 16 single mutants, 100 double, 544 triple and 998 quadruple mutants which were functional. There was an enrichment of hydrophobic and small polar residues at each position. Most bulky and charged residues appeared at low frequencies. Some substitutions were permissible individually, but not in combination. The combinations, ACLV, TISV, SILS, each involving aresidues found individually in functional mutants at high frequency, were quite impaired in competition against wildtype PhoQ — so the effects of individual substitutions are context dependent (epistatic). Of the 100 functional double mutants, only 23 represent cases where both single mutants are functional. THere are double mutants where neither single mutant is functional. 79/1,658 functional variants can’t be reached from the wild-type combination AVST) without passing through a nonfunctional intermediated. They talk about the Hamming distance between mutants.

Finally some blue sky stuff — implying that (as usual) Nature got there first

[ Science vol. 341 pp. 1116 – 1120 ’13 ] Small Open Reading Frames (smORFs) code for peptides of under 100 amino acids. This work has shown that peptides as short as 11 amino acids are translated and provide essential functions during insect development. This work shows two peptides of 28 and 29 amino acids regulating calcium transport in the Drosophila heart. The peptides are found in man.
They don’t think that smORFs can’t be dismissed as irrelevant, and function should be looked for.
[ Science vol. 1356 – 1358 ’15 ] The Drosophila polished-rice (Pri) sORF peptides (11 – 32 amino acids)trigger proteasome mediated processing converting the Shavenbaby transcription repressor into a shorter activator.
They think that oORF/smORFs mimic protein binding interfaces and control protein interactions that way.

The next big drug target

So many of the molecular machines used in the cell are composed of many different proteins held together by nonCovalent interactions. The Mediator complex contains 25 – 30 proteins with a mass of 1.6 megaDaltons, RNA polymerase contains 12 subunits, the general transcription factors contain 25 proteins, our ribosome with a mass of 4.3 megaDaltons contains 47 in the large subunit and 33 in the small. The list goes on and on — proteasome,nucleosome, post-synaptic density.

The typical protein/protein interface has an area of 1,000 – 2000 square Angstroms — or circles of diameter between 34 and 50 Angstroms. [ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ]. Think of the largest classical organic molecule you’ve ever made (not any polymer like a protein, polynucleotide, or polysaccharide). It isn’t anywhere close to this.

Yet I’m convinced that drugs targeting these complexes, will be useful. Classical organic chemistry will be useless in designing them. We’ll have to forget our beloved SN1, SN2, nonclassical carbonium ions etc. etc. We need some new sort of physical organic chemistry, one not concerned with reaction mechanism, but with van der Waals interactions, electrostatic interactions. At least stereochemistry will still be important.

The problem is much harder than designing enzyme inhibitors, or their allosteric modifiers, because the target is so large.

What follows are some notes on the protein protein interface I’ve taken over the years to get you started thinking. Good luck. Don’t expect any neat answers. There is a lot of contention concerning the nature of the binding occurring at the interface.

Many of the references aren’t particularly new.  In my reading, I don’t try for the latest reference, but the newest idea that I’m unfamiliar with.  I think they pretty much cover the territory as it stands now.

[ Proc. Natl. Acad. Sci. vol. 108 pp. 603 – 608 ’11 ] A very interesting article argues that worms and humans have about the same number of proteins (20,000) because if they had more, nonspecific protein protein interactions would cause disease. The achievable energy gap favoring specific over nonspecific binding decreases with protein number in a power law fashion (in their model). The optimization of binding interfaces favors networks in which a few proteins have many partners and most proteins have just a few — this is consistent with a scale free network topology.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] The hot spot theory of protein protein interactions says that the binding energy between two proteins is governed in large part by just a few critical residues at the binding interface. In a typical interface of 1000 – 2000 square Angstroms, only 5% of the residues from each protein contribute more than 2 kiloCalories/mole to the binding interaction. (This is controversial — see later)

[ Proc. Natl. Acad. Sci. vol. 99 pp. 14116 – 14121 ’02 ] Specific replacement of amino acids in the interface by alanine (alanine scanning or alanine mutagenesis) and measuring the effect on the interaction has led to the idea that only a small set of ‘hot spot’ residues at the inferface contribute to the binding free energy. A hot spot has been defined as a residue that when mutated to alanine leads to a significant drop in the binding constant (typically 10 fold or higher — should know how many kiloCalories this is — I think 2 or 3 ). This was well worked out for human growth hormone (HGH) and its receptor. Subsequently ‘many’ other studies have suggested that the presence of a few hot spots may be a general characteristic of most protein/protein interfaces.

However there is extreme variation in the size, shape, amino acid character and solvent content of the protein/protein interface. It is not obvious from looking at structural contacts which residues are important for binding. Usually they are found at the center of the interface but sometimes the key residues can lie on the periphery. Peripheral residues serve as an O-ring to exclude solvent from the center. A lowered effective dielectric constant in a ‘dryer’ environment strengthens electrostatic and hydrogen bonding interactions. An interaction deleted by alanine mutagenesis in the periphery can be replaced by a water molecule in the periphery and hence cause less loss in stability (this calls the whole concept of alanine scanning into question).

Interestingly, there is no general correlation between ‘surface accessibility’ and the contribution of a residue to the binding energy.

Polar residues (Arg, Gln, His, Asp, and Asn) are conserved in interfaces. This implies that they are hot spots — implies ? don’t they know? haven’t they tested? However, many interaction hot spots involve hydrophobic or large aromatic residues (also hydrophobic). It is unclear whether buried polar interactions are energetically net stabilizing or merely facilitating specificity (how would you tell?).

Some residues without significant contacts in the interface apparently contribute substantially to the free energy of binding when assayed by alanine scanning mutagenesis, because of destabilization of the unbound protein.

This a report of a free energy function (using packing interactions, hydrogen bonds and an implicit solvation model) which predicts 79% of all interface hot spots. They think that a description of polar interactions with Coulomb electrostatics with a linear distance dependent dielectric constant. ??? The latter ignores the orientation dependence of the hydrogen bond. Also the assumption that acidic or basic residues largely buried in the interface are charged may be wrong. The enthalpic gains of ionization are offset by the cost of desolvating polar groups, and the loss in side chain conformational entropy.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] It is of interest to find out if hot spot theory applies to transient protein protein interactions (such as those involved in enzyme catalysis). This work looked for them in the process of protein substrate recognition for the Cdc25 phosphatase (which dephosphorylates the cyclin dependent kinases). Crystal structures of the catalytic domains of Cdc25A and Cdc25B have shown a shallow active site with no obvious features for mediating substrate recognition. This suggests a broad protein interface rather than lock and key interaction. This is confirmed by the activity of the Cdc25 phosphatases toward Cdk/cyclin protein substrates which is 6 orders of magnitude greater than that of peptidic substrates containing the same primary sequence — this suggests a broad protein interface rather than a lock and key interaction. The shallow active sites also correlates with the lack of potent speicific inhibitors of the Cdc25 phosphatases, despite extensive search. This work finds hot spot residues in the catalytic domain (not the catalytic site) of Cdc25B located 20 – 30 Angstroms away from the active site. They are involved in recognition of substrate. The residues are conserved across eukaryotes.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 11287 – 11292 ’04 ] One can study the effects of mutating a single amino acid on two separate rates (the on rate and the off rate) the ratio of which is the equilibrium constant. Mutations changing the on rate, concern the specificity of protein protein interaction. Mutations only changing the off rate do not affect the transition state of protein binding (don’t see why not). Mutations in bovine pancreatic trypsin inhibitor (BPTI) have been found at positions #15 and #17 which differentially affect on and off rates. K15A decreases by 200 fold in the on rate and by a 1000 fold increase in the off rate. But R17A doesn’t change the on rate but also increases the off rate by 1000 fold.

The concept of anchor residue arose in the study of peptide binding to class I MHC molecules (Major HistoCompatibility complex) In this system the carboxy terminal side chain of the peptide gets buried in pocket F of the MHC binding groove. Sometimes, one also finds a second anchor residue and even a third one buried at other positions.

The authors attempt to apply the anchor residue concept to protein protein interactions. They studied 39 different protein/protein complexes. They found them, and in some way conclude that these anchor residues are already in the ‘bound’ conformation in the free partner. The anchors interact with structurally constrained pockets matching the anchor residues. The presence of nativelike anchor side chains provides a readily attainable geometrical fit that jams the two interacting surfaces, allowing for the recognition and stabilization of a near-native intermediate. Subsequently an induced fit process occurs on the periphery of the binding pocket.

The analysis of ANY (really?) protein/protein complex at the atomic length scale shows that the interface, rather than being smooth and flat, includes side chains deeply protruding into well defined cavities on the other protein. In all complexes studied, the anchor is the side chain whose burial after complex formation yields the largest possible decrease in solvent accessible surface area (SASA). If SASA is over 100 square Angstroms, than only one anchoring interaction is present. For lesser SASA amino acids one anchor isn’t enough.

In all cases tested (39) latch side chains are found in conformations conducive to a relatively straightforward clamping of the anchored intermediate into a high affinity complex.

[ Proc. Natl. Acad. Sci. vol. 102 pp. 57 – 62 ’05 ] An analysis of the protein interface between a beta-lactamase and its inhibitor, shows that the interface can be divided into clusters (by means of cluster anlaysis) using multiple mutant analysis and xray crystallography. Replacing an entire module of 5 interface residues with alanine (in one cluster) created a large cavity in the interface with no effect on the detailed structure of the remaining interface. They obtained similar results when they did this with another of the 5 clusters.

Mutating a single amino acid at a time has been done in the past, but the results of single mutations aren’t additive (e.g. they aren’t linear — no surprise). The sum of the loss in free energy of all of the single mutations within a cluster exceeds by 4 fold the loss in free energy generated when all of the residues of the cluster are mutated simultaneously. The energetic effect of many single mutations is larger than their net contribution due to a penalty paid by leaving the rest of the cluster behind.

“Binding seems to be a result of higher organization of the binding sites, and not just of surface complementrity.”

[ Proc. Natl. Acad. Sci. vol. 103 pp. 311 – 316 ’06 ] Two different ‘interactomes’ both show the same power law distribution of node sizes. However, when the two major S. cerevisiae protein/protein interactions are experiments are compared with each other, only 150 of the THOUSANDS of interactions of each experiments are the same. A similar lack of agreement has been found for independent Y2H experiments in Drosophila.

This work says that desolvation of the interface is a major physical factor in protein/protein interactions. This model reproduces the scale free nature of the topology. The number of interactions made by a protein is correlated with the fraction of hydrophobic residues on its surface.

      [ Proc. Natl. Acad. Sci. vol. 108 pp. 13528 – 13533  ’11 ] The drugs they are looking for disrupt specific protein protein interactions (PPIs).   Tey use computational solvent mapping, which explores the protein surface using a variety of small probe molecules, along with a conformer generator to account to side chain flexibility.  They studied unliganded proteins known to participate PPI.  The surface cavities available at protein protein interfaces which can bind a smal molecule inhibitor are rather different than those seen in traditional drug targets.  The traditional targets have one or two disproportionately large pockets with an average volume of 260 cubic Angstroms — these account for the binding site for the endogenous ligand in over 90% of proteins.  The average volume of pockets at protein protein interfaces is only 54 cubic Angstroms, the same as for all protein surface pockets.  The interface ontains 6 such small pockets (on average). 
      The binding sites of proteins generall include smaller regions called hotspots which are major contributors to the binding free energy.  The results of experimental fragment screens confirm that the hot spotes of proteins are characterized by their ability to bind a variety of small molecules and that the number of different probe molecules observed to bind to a particular site predicts the importance of the site and predicts overall druggability.  
      This work shows that the druggable sites in PPIs have concave topology and both hydrophobic and polar functionality.  So the hotspots bind organic molecules having some polar groups decorating largely hydropobic scaffolds. Sos druggable sites have a ‘general tendency’ to bind organic compounds with a variety of structures.  Conformational flexibility at the binding site (by side chains?) allow the hotspots to expand to accomodate a ligand of druglike dimensions.  This involves low energy side chain motions within 6 Angstroms of a hot spot.
      So druggable sites at a PPI aren’t just sites complementary to particular organic functionality, but they have a general tendency to bind a variety of different organic structures.  
      The most important binding is that the druggable sites are detectable from the structure of the unliganded protein, even when substantial conformational adaptation is needed for optimal ligand binding.

[ Science vol. 347 pp. 673 – 677 ’15 ] Mapping the sequence space of 4 key amino acids in the E. Coli protein kinase PhoQ which drives the recognition of its substrate (PhoP). For histidine kinases mutating just 3 or 4 interfacial amino acids to match those in another kinase is enought to reprogram them. The key variants are Ala284, Val 285, Ser288, Thr289.

All 20^4 = 160,000 variants of PhoQ at these positions were made, of which 1,659 were functional (implying singificant degeneracy of the interface). There were 16 single mutants, 100 double, 544 triple and 998 quadruple mutants which were functional. There was an enrichment of hydrophobic and small polar residues at each position. Most bulky and charged residues appeared at low frequencies. Some substitutions were permissible individually, but not in combination. The combinations, ACLV, TISV, SILS, each involving aresidues found individually in functional mutants at high frequency, were quite impaired in competition against wildtype PhoQ — so the effects of individual substitutions are context dependent (epistatic). Of the 100 functional double mutants, only 23 represent cases where both single mutants are functional. THere are double mutants where neither single mutant is functional. 79/1,658 functional variants can’t be reached from the wild-type combination AVST) without passing through a nonfunctional intermediated. They talk about the Hamming distance between mutants.

Finally some blue sky stuff — implying that (as usual) Nature got there first

       [ Science vol. 341 pp. 1116 – 1120 ’13 ] Small Open Reading Frames (smORFs) code for peptides of under 100 amino acids.  This work has shown that peptides as short as 11 amino acids are translated and provide essential functions during insect development.  This work shows two peptides of 28 and 29 amino acids regulating calcium transport in the Drosophila heart.  The peptides are found in man.  
      They don’t think that smORFs can’t be dismissed as irrelevant, and function should be looked for. 
       [ Science vol. 1356 – 1358 ’15 ] The Drosophila polished-rice (Pri) sORF peptides (11 – 32 amino acids)trigger proteasome mediated processing converting the Shavenbaby transcription repressor into a shorter activator.
       They think that oORF/smORFs mimic protein binding interfaces and control protein interactions that way.

Numerology

Every class in grad school seemed to begin with a discussion of units. Eventually, Don Voet got fed up and said he preferred the hand stone fortnight system and was going to stick to it. However, even though we all love quantum mechanics dearly for predicting chemical reactivity and spectra, it tells us almost nothing about the events going on in our cells. It’s a crowded environment with objects large and small bumping into one another frequently and at high speeds. At room temperature, a molecule of nitrogen is moving at 500+ meters a second or over 1100 miles an hour. The water in our cells is moving even faster (28/18 times faster to be exact). It’s way too slow for relativity however.

So it’s back classical mechanics to understand cellular events at a physical level, something that will be increasingly important in drug design (but that’s for another post).

The average thermal energy of a molecule at room temperature is kT.

What’s k? It’s the Boltzmann constant. What’s that? It’s the gas constant divided by Avogadro’s number.

I’m assuming that all good chemists know that Avogadro’s number is the number of molecules in a Mole = 6.02 x 10^23

What does the Gas constant have to do with energy?

It’s back to PChem 101 — The ideal gas law is PV = nRT

P = Pressure
V = Volume
n = number of moles
R = Gas constant
T = Temperature

Pressure is Force / Area

Force is Mass * Acceleration
Acceleration is Distance/ (Time * Time)
Area is Distance * Distance
Volume is Distance * Distance * Distance

So PV == [ Force/Area ] * Volume
== { [ Mass * (Distance / Time * Time) ] /( Distance * Distance ) } * ( Distance * Distance * Distance )
== Mass * (Distance/Time) * ( Distance/Time )
== Mass * Velocity * Velocity == mv^2

So PV has the dimensions of (kinetic) energy

The Gas Constant (R) is PV/nT ( == PV/T ) so it has the dimensions of energy/temperature

Now for some actual units (vs. dimensions, although things are much clearer when you think in terms of dimensions)

Force is measured in Newtons which is the force which will accelerate a 1 kiloGram object by 1 meter/second^2

Temperature is measured in Kelvin from absolute zero. A degree Kelvin is the same as 1 degree Celsius (1.8 degrees Fahrenheit)

Room temperature where most of us live is about 27 Centigrade or very close to 300 Kelvin.

So the Boltzmann constant (k) basically energy/temperature per single molecule, which is really what you want to think about when you think about physical processes in the cell.

At room temperature kT works out to 4.1 x 10^-21 Joules.

What’s a Joule? It’s the energy a force of one Newton produces when it moves an object one meter (or you can look at it as the kinetic energy one kilogram has after a force of one Newton has accelerated it over one Meter’s distance.

So a Joule is one Newton * meter

Well 10^-21 is 10^-12 times 10^-9. So what?

This means that at room temperature the average molecule has a thermal energy of 4.1 picoNewton – nanoMeters.

PicoNewtons just happens to be in the range of the force exerted by our molecular motors ( kinesin, dynein, DNA polymerases ) and nanoMeters the range of distances over which they exert forces (act).

Not a coincidence.

Since there are organisms which live at temperatures 20% higher, it would be interesting to know if their motors exert 20% more force. Does anyone out there know?

More interesting even than that are the organisms living at the mid-Ocean ridges where because the extremely high pressures, the water coming from the vents is a lot hotter. What about their motors?

Follow

Get every new post delivered to your Inbox.

Join 85 other followers