Tag Archives: Protein protein interface

MicroExons

MicroExons have been known for a long time.  They are hard to find using the usual software tools because they are short, coding for just 1 (!) to 9 amino acids (3 – 27 nucleotides).  The neurologist is interested in them, because they are enriched in neurons.

 

The evolutionist is interested in them because they are found on the surfaces of proteins, so that adding their amino acids potentially modifies protein protein interactions (example later) since these are largely determined at the protein surface.  The typical protein interface is said to be a surface of 1,000 – 2,000 square Angstroms, so putting a few amino acids in the surface can change things radically.   Not only that, but alanine scanning of the interfaces shows that only a small set of ‘hot spot’ amino acids contribute to the free energy of binding at the protein/protein interface.  (Hot spots are operationally defined as an amino acid, that when mutated to alanine leads to a greater than 10fold drop in the binding constant).  Let’s hear it for the blind watchmaker for figuring out a way to accelerate evolution by moderating protein protein interactions directly.

It is interesting  that the vast majority of microexons contain multiples of 3 nucleotides (this prevents them from producing a frameshift in the mRNA in which they are found).  This implies that natural selection is at work on them.

Most microExons show high inclusion rates at late stages of neuronal differentiation in genes associated with axon formation and synapse function.  One example — a neural specific microExon in Protrudin increases its interaction with Vesicle Associated Membrane protein VAMP) to promote neurite outgrowth.

A protein called SRRM4 controls the inclusion of most neuronal microexons known so far.  Of all known tissue types the human retina has the largest program of tissue enriched microExons .  Some of these microexons are found only in photoreceptor cells. Ectopic expression of SRRM4 is enough to drive the inclusion of most retinal microExons in nonphotorector cells.

For lots of current references on microExons, particularly those in the retina, please see Proc. Natl. Acad. Sci. vol. 119 e2117090119 ’22

I haven’t been posting for a while because of a computer disaster.  All of the notes I’ve taken on the literature and elsewhere were on an old iMac running HyperCard (which just crashed). As my son was told at USC, there are two types of disc drives.  Those that have crashed, and those that haven’t crashed (yet).  I’ve got another one, but I’m busy programming to transfer the data and metadata to Mathematica.  There is plenty to post about, which should be forthcoming.

The next big drug target – II

In a post a week ago I argued that the next big drug target was the protein protein interface. The PNAS of 6 Oct had a paper indirectly confirming just that [ Proc. Natl. Acad. Sci. vol. 112 pp. E5486 – E5495 ’15 ] What they did was fairly simple (intellectually) but a lot of work. They just analyzed the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3 dimensional structures of human proteins in the Protein Data Bank. They looked for clustering of the mutations — on the protein surface (or interior). As you all know, although proteins are a linear string of amino acids, they fold up like a hair ball, so widely separated amino acids in the sequence may be right next to each other in the 3 dimensional structure of the protein.

What’s so confirmatory of the previous post was that they found enrichment of mutations in the interfaces between a variety of oncoprotein and other proteins (including tumor suppressors). Most of the significant interfaces carried mutations in both interaction partners. Overall,they found 50 different proteins with clustering of mutations and/or enrichment of mutations at interaction interfaces. Here are the names of a few of the culprits for the cognoscenti — FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg2+, MAX-DNA, SRSF2-RNA, and others. The paper contains much more detail than this and discusses the significance of the protein pairs shown above. One example should suffice

FBXW7-CCNE1. Cyclin E1 (CCNE1) is a critical cell cycle protein, which at abnormally high levels promotes premature cell division, genomic instability, and tumorigenesis. FBXW7 (F-box/WD repeat-containing protein 7) is a substrate recog- nition component of an E3 ubiquitin-protein ligase complex, mediating the ubiquitination and subsequent proteasomal degradation of CCNE1 and other cancer proteins like MYC and JUN. We found that all six recurrently mutated residues (found in at least three samples from our mutation dataset) of FBXW7 clustered together at the WD40 propeller domain of the protein product. Four of them, R465, R479, R505, and R689, interacted directly with the substrate CCNE1 through hydrogen bonds (Fig. 5A). Changes in these residues could perturb the interaction, causing insufficient ubiquitination/ degradation of CCNE1 in tumor samples (as has been pre- viously shown in model systems).

****
Here’s the post of a week ago

The next big drug target

So many of the molecular machines used in the cell are composed of many different proteins held together by nonCovalent interactions. The Mediator complex contains 25 – 30 proteins with a mass of 1.6 megaDaltons, RNA polymerase contains 12 subunits, the general transcription factors contain 25 proteins, our ribosome with a mass of 4.3 megaDaltons contains 47 in the large subunit and 33 in the small. The list goes on and on — proteasome,nucleosome, post-synaptic density.

The typical protein/protein interface has an area of 1,000 – 2000 square Angstroms — or circles of diameter between 34 and 50 Angstroms. [ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ]. Think of the largest classical organic molecule you’ve ever made (not any polymer like a protein, polynucleotide, or polysaccharide). It isn’t anywhere close to this.

Yet I’m convinced that drugs targeting these complexes, will be useful. Classical organic chemistry will be useless in designing them. We’ll have to forget our beloved SN1, SN2, nonclassical carbonium ions etc. etc. We need some new sort of physical organic chemistry, one not concerned with reaction mechanism, but with van der Waals interactions, electrostatic interactions. At least stereochemistry will still be important.

The problem is much harder than designing enzyme inhibitors, or their allosteric modifiers, because the target is so large.

What follows are some notes on the protein protein interface I’ve taken over the years to get you started thinking. Good luck. Don’t expect any neat answers. There is a lot of contention concerning the nature of the binding occurring at the interface.

Many of the references aren’t particularly new. In my reading, I don’t try for the latest reference, but the newest idea that I’m unfamiliar with. I think they pretty much cover the territory as it stands now.

[ Proc. Natl. Acad. Sci. vol. 108 pp. 603 – 608 ’11 ] A very interesting article argues that worms and humans have about the same number of proteins (20,000) because if they had more, nonspecific protein protein interactions would cause disease. The achievable energy gap favoring specific over nonspecific binding decreases with protein number in a power law fashion (in their model). The optimization of binding interfaces favors networks in which a few proteins have many partners and most proteins have just a few — this is consistent with a scale free network topology.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] The hot spot theory of protein protein interactions says that the binding energy between two proteins is governed in large part by just a few critical residues at the binding interface. In a typical interface of 1000 – 2000 square Angstroms, only 5% of the residues from each protein contribute more than 2 kiloCalories/mole to the binding interaction. (This is controversial — see later)

[ Proc. Natl. Acad. Sci. vol. 99 pp. 14116 – 14121 ’02 ] Specific replacement of amino acids in the interface by alanine (alanine scanning or alanine mutagenesis) and measuring the effect on the interaction has led to the idea that only a small set of ‘hot spot’ residues at the inferface contribute to the binding free energy. A hot spot has been defined as a residue that when mutated to alanine leads to a significant drop in the binding constant (typically 10 fold or higher — should know how many kiloCalories this is — I think 2 or 3 ). This was well worked out for human growth hormone (HGH) and its receptor. Subsequently ‘many’ other studies have suggested that the presence of a few hot spots may be a general characteristic of most protein/protein interfaces.

However there is extreme variation in the size, shape, amino acid character and solvent content of the protein/protein interface. It is not obvious from looking at structural contacts which residues are important for binding. Usually they are found at the center of the interface but sometimes the key residues can lie on the periphery. Peripheral residues serve as an O-ring to exclude solvent from the center. A lowered effective dielectric constant in a ‘dryer’ environment strengthens electrostatic and hydrogen bonding interactions. An interaction deleted by alanine mutagenesis in the periphery can be replaced by a water molecule in the periphery and hence cause less loss in stability (this calls the whole concept of alanine scanning into question).

Interestingly, there is no general correlation between ‘surface accessibility’ and the contribution of a residue to the binding energy.

Polar residues (Arg, Gln, His, Asp, and Asn) are conserved in interfaces. This implies that they are hot spots — implies ? don’t they know? haven’t they tested? However, many interaction hot spots involve hydrophobic or large aromatic residues (also hydrophobic). It is unclear whether buried polar interactions are energetically net stabilizing or merely facilitating specificity (how would you tell?).

Some residues without significant contacts in the interface apparently contribute substantially to the free energy of binding when assayed by alanine scanning mutagenesis, because of destabilization of the unbound protein.

This a report of a free energy function (using packing interactions, hydrogen bonds and an implicit solvation model) which predicts 79% of all interface hot spots. They think that a description of polar interactions with Coulomb electrostatics with a linear distance dependent dielectric constant. ??? The latter ignores the orientation dependence of the hydrogen bond. Also the assumption that acidic or basic residues largely buried in the interface are charged may be wrong. The enthalpic gains of ionization are offset by the cost of desolvating polar groups, and the loss in side chain conformational entropy.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] It is of interest to find out if hot spot theory applies to transient protein protein interactions (such as those involved in enzyme catalysis). This work looked for them in the process of protein substrate recognition for the Cdc25 phosphatase (which dephosphorylates the cyclin dependent kinases). Crystal structures of the catalytic domains of Cdc25A and Cdc25B have shown a shallow active site with no obvious features for mediating substrate recognition. This suggests a broad protein interface rather than lock and key interaction. This is confirmed by the activity of the Cdc25 phosphatases toward Cdk/cyclin protein substrates which is 6 orders of magnitude greater than that of peptidic substrates containing the same primary sequence — this suggests a broad protein interface rather than a lock and key interaction. The shallow active sites also correlates with the lack of potent speicific inhibitors of the Cdc25 phosphatases, despite extensive search. This work finds hot spot residues in the catalytic domain (not the catalytic site) of Cdc25B located 20 – 30 Angstroms away from the active site. They are involved in recognition of substrate. The residues are conserved across eukaryotes.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 11287 – 11292 ’04 ] One can study the effects of mutating a single amino acid on two separate rates (the on rate and the off rate) the ratio of which is the equilibrium constant. Mutations changing the on rate, concern the specificity of protein protein interaction. Mutations only changing the off rate do not affect the transition state of protein binding (don’t see why not). Mutations in bovine pancreatic trypsin inhibitor (BPTI) have been found at positions #15 and #17 which differentially affect on and off rates. K15A decreases by 200 fold in the on rate and by a 1000 fold increase in the off rate. But R17A doesn’t change the on rate but also increases the off rate by 1000 fold.

The concept of anchor residue arose in the study of peptide binding to class I MHC molecules (Major HistoCompatibility complex) In this system the carboxy terminal side chain of the peptide gets buried in pocket F of the MHC binding groove. Sometimes, one also finds a second anchor residue and even a third one buried at other positions.

The authors attempt to apply the anchor residue concept to protein protein interactions. They studied 39 different protein/protein complexes. They found them, and in some way conclude that these anchor residues are already in the ‘bound’ conformation in the free partner. The anchors interact with structurally constrained pockets matching the anchor residues. The presence of nativelike anchor side chains provides a readily attainable geometrical fit that jams the two interacting surfaces, allowing for the recognition and stabilization of a near-native intermediate. Subsequently an induced fit process occurs on the periphery of the binding pocket.

The analysis of ANY (really?) protein/protein complex at the atomic length scale shows that the interface, rather than being smooth and flat, includes side chains deeply protruding into well defined cavities on the other protein. In all complexes studied, the anchor is the side chain whose burial after complex formation yields the largest possible decrease in solvent accessible surface area (SASA). If SASA is over 100 square Angstroms, than only one anchoring interaction is present. For lesser SASA amino acids one anchor isn’t enough.

In all cases tested (39) latch side chains are found in conformations conducive to a relatively straightforward clamping of the anchored intermediate into a high affinity complex.

[ Proc. Natl. Acad. Sci. vol. 102 pp. 57 – 62 ’05 ] An analysis of the protein interface between a beta-lactamase and its inhibitor, shows that the interface can be divided into clusters (by means of cluster anlaysis) using multiple mutant analysis and xray crystallography. Replacing an entire module of 5 interface residues with alanine (in one cluster) created a large cavity in the interface with no effect on the detailed structure of the remaining interface. They obtained similar results when they did this with another of the 5 clusters.

Mutating a single amino acid at a time has been done in the past, but the results of single mutations aren’t additive (e.g. they aren’t linear — no surprise). The sum of the loss in free energy of all of the single mutations within a cluster exceeds by 4 fold the loss in free energy generated when all of the residues of the cluster are mutated simultaneously. The energetic effect of many single mutations is larger than their net contribution due to a penalty paid by leaving the rest of the cluster behind.

“Binding seems to be a result of higher organization of the binding sites, and not just of surface complementrity.”

[ Proc. Natl. Acad. Sci. vol. 103 pp. 311 – 316 ’06 ] Two different ‘interactomes’ both show the same power law distribution of node sizes. However, when the two major S. cerevisiae protein/protein interactions are experiments are compared with each other, only 150 of the THOUSANDS of interactions of each experiments are the same. A similar lack of agreement has been found for independent Y2H experiments in Drosophila.

This work says that desolvation of the interface is a major physical factor in protein/protein interactions. This model reproduces the scale free nature of the topology. The number of interactions made by a protein is correlated with the fraction of hydrophobic residues on its surface.

[ Proc. Natl. Acad. Sci. vol. 108 pp. 13528 – 13533 ’11 ] The drugs they are looking for disrupt specific protein protein interactions (PPIs). Tey use computational solvent mapping, which explores the protein surface using a variety of small probe molecules, along with a conformer generator to account to side chain flexibility. They studied unliganded proteins known to participate PPI. The surface cavities available at protein protein interfaces which can bind a smal molecule inhibitor are rather different than those seen in traditional drug targets. The traditional targets have one or two disproportionately large pockets with an average volume of 260 cubic Angstroms — these account for the binding site for the endogenous ligand in over 90% of proteins. The average volume of pockets at protein protein interfaces is only 54 cubic Angstroms, the same as for all protein surface pockets. The interface ontains 6 such small pockets (on average).
The binding sites of proteins generall include smaller regions called hotspots which are major contributors to the binding free energy. The results of experimental fragment screens confirm that the hot spotes of proteins are characterized by their ability to bind a variety of small molecules and that the number of different probe molecules observed to bind to a particular site predicts the importance of the site and predicts overall druggability.
This work shows that the druggable sites in PPIs have concave topology and both hydrophobic and polar functionality. So the hotspots bind organic molecules having some polar groups decorating largely hydropobic scaffolds. Sos druggable sites have a ‘general tendency’ to bind organic compounds with a variety of structures. Conformational flexibility at the binding site (by side chains?) allow the hotspots to expand to accomodate a ligand of druglike dimensions. This involves low energy side chain motions within 6 Angstroms of a hot spot.
So druggable sites at a PPI aren’t just sites complementary to particular organic functionality, but they have a general tendency to bind a variety of different organic structures.
The most important binding is that the druggable sites are detectable from the structure of the unliganded protein, even when substantial conformational adaptation is needed for optimal ligand binding.
[ Science vol. 347 pp. 673 – 677 ’15 ] Mapping the sequence space of 4 key amino acids in the E. Coli protein kinase PhoQ which drives the recognition of its substrate (PhoP). For histidine kinases mutating just 3 or 4 interfacial amino acids to match those in another kinase is enought to reprogram them. The key variants are Ala284, Val 285, Ser288, Thr289.

All 20^4 = 160,000 variants of PhoQ at these positions were made, of which 1,659 were functional (implying singificant degeneracy of the interface). There were 16 single mutants, 100 double, 544 triple and 998 quadruple mutants which were functional. There was an enrichment of hydrophobic and small polar residues at each position. Most bulky and charged residues appeared at low frequencies. Some substitutions were permissible individually, but not in combination. The combinations, ACLV, TISV, SILS, each involving aresidues found individually in functional mutants at high frequency, were quite impaired in competition against wildtype PhoQ — so the effects of individual substitutions are context dependent (epistatic). Of the 100 functional double mutants, only 23 represent cases where both single mutants are functional. THere are double mutants where neither single mutant is functional. 79/1,658 functional variants can’t be reached from the wild-type combination AVST) without passing through a nonfunctional intermediated. They talk about the Hamming distance between mutants.

Finally some blue sky stuff — implying that (as usual) Nature got there first

[ Science vol. 341 pp. 1116 – 1120 ’13 ] Small Open Reading Frames (smORFs) code for peptides of under 100 amino acids. This work has shown that peptides as short as 11 amino acids are translated and provide essential functions during insect development. This work shows two peptides of 28 and 29 amino acids regulating calcium transport in the Drosophila heart. The peptides are found in man.
They don’t think that smORFs can’t be dismissed as irrelevant, and function should be looked for.
[ Science vol. 1356 – 1358 ’15 ] The Drosophila polished-rice (Pri) sORF peptides (11 – 32 amino acids)trigger proteasome mediated processing converting the Shavenbaby transcription repressor into a shorter activator.
They think that oORF/smORFs mimic protein binding interfaces and control protein interactions that way.

The next big drug target

So many of the molecular machines used in the cell are composed of many different proteins held together by nonCovalent interactions. The Mediator complex contains 25 – 30 proteins with a mass of 1.6 megaDaltons, RNA polymerase contains 12 subunits, the general transcription factors contain 25 proteins, our ribosome with a mass of 4.3 megaDaltons contains 47 in the large subunit and 33 in the small. The list goes on and on — proteasome,nucleosome, post-synaptic density.

The typical protein/protein interface has an area of 1,000 – 2000 square Angstroms — or circles of diameter between 34 and 50 Angstroms. [ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ]. Think of the largest classical organic molecule you’ve ever made (not any polymer like a protein, polynucleotide, or polysaccharide). It isn’t anywhere close to this.

Yet I’m convinced that drugs targeting these complexes, will be useful. Classical organic chemistry will be useless in designing them. We’ll have to forget our beloved SN1, SN2, nonclassical carbonium ions etc. etc. We need some new sort of physical organic chemistry, one not concerned with reaction mechanism, but with van der Waals interactions, electrostatic interactions. At least stereochemistry will still be important.

The problem is much harder than designing enzyme inhibitors, or their allosteric modifiers, because the target is so large.

What follows are some notes on the protein protein interface I’ve taken over the years to get you started thinking. Good luck. Don’t expect any neat answers. There is a lot of contention concerning the nature of the binding occurring at the interface.

Many of the references aren’t particularly new.  In my reading, I don’t try for the latest reference, but the newest idea that I’m unfamiliar with.  I think they pretty much cover the territory as it stands now.

[ Proc. Natl. Acad. Sci. vol. 108 pp. 603 – 608 ’11 ] A very interesting article argues that worms and humans have about the same number of proteins (20,000) because if they had more, nonspecific protein protein interactions would cause disease. The achievable energy gap favoring specific over nonspecific binding decreases with protein number in a power law fashion (in their model). The optimization of binding interfaces favors networks in which a few proteins have many partners and most proteins have just a few — this is consistent with a scale free network topology.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] The hot spot theory of protein protein interactions says that the binding energy between two proteins is governed in large part by just a few critical residues at the binding interface. In a typical interface of 1000 – 2000 square Angstroms, only 5% of the residues from each protein contribute more than 2 kiloCalories/mole to the binding interaction. (This is controversial — see later)

[ Proc. Natl. Acad. Sci. vol. 99 pp. 14116 – 14121 ’02 ] Specific replacement of amino acids in the interface by alanine (alanine scanning or alanine mutagenesis) and measuring the effect on the interaction has led to the idea that only a small set of ‘hot spot’ residues at the inferface contribute to the binding free energy. A hot spot has been defined as a residue that when mutated to alanine leads to a significant drop in the binding constant (typically 10 fold or higher — should know how many kiloCalories this is — I think 2 or 3 ). This was well worked out for human growth hormone (HGH) and its receptor. Subsequently ‘many’ other studies have suggested that the presence of a few hot spots may be a general characteristic of most protein/protein interfaces.

However there is extreme variation in the size, shape, amino acid character and solvent content of the protein/protein interface. It is not obvious from looking at structural contacts which residues are important for binding. Usually they are found at the center of the interface but sometimes the key residues can lie on the periphery. Peripheral residues serve as an O-ring to exclude solvent from the center. A lowered effective dielectric constant in a ‘dryer’ environment strengthens electrostatic and hydrogen bonding interactions. An interaction deleted by alanine mutagenesis in the periphery can be replaced by a water molecule in the periphery and hence cause less loss in stability (this calls the whole concept of alanine scanning into question).

Interestingly, there is no general correlation between ‘surface accessibility’ and the contribution of a residue to the binding energy.

Polar residues (Arg, Gln, His, Asp, and Asn) are conserved in interfaces. This implies that they are hot spots — implies ? don’t they know? haven’t they tested? However, many interaction hot spots involve hydrophobic or large aromatic residues (also hydrophobic). It is unclear whether buried polar interactions are energetically net stabilizing or merely facilitating specificity (how would you tell?).

Some residues without significant contacts in the interface apparently contribute substantially to the free energy of binding when assayed by alanine scanning mutagenesis, because of destabilization of the unbound protein.

This a report of a free energy function (using packing interactions, hydrogen bonds and an implicit solvation model) which predicts 79% of all interface hot spots. They think that a description of polar interactions with Coulomb electrostatics with a linear distance dependent dielectric constant. ??? The latter ignores the orientation dependence of the hydrogen bond. Also the assumption that acidic or basic residues largely buried in the interface are charged may be wrong. The enthalpic gains of ionization are offset by the cost of desolvating polar groups, and the loss in side chain conformational entropy.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] It is of interest to find out if hot spot theory applies to transient protein protein interactions (such as those involved in enzyme catalysis). This work looked for them in the process of protein substrate recognition for the Cdc25 phosphatase (which dephosphorylates the cyclin dependent kinases). Crystal structures of the catalytic domains of Cdc25A and Cdc25B have shown a shallow active site with no obvious features for mediating substrate recognition. This suggests a broad protein interface rather than lock and key interaction. This is confirmed by the activity of the Cdc25 phosphatases toward Cdk/cyclin protein substrates which is 6 orders of magnitude greater than that of peptidic substrates containing the same primary sequence — this suggests a broad protein interface rather than a lock and key interaction. The shallow active sites also correlates with the lack of potent speicific inhibitors of the Cdc25 phosphatases, despite extensive search. This work finds hot spot residues in the catalytic domain (not the catalytic site) of Cdc25B located 20 – 30 Angstroms away from the active site. They are involved in recognition of substrate. The residues are conserved across eukaryotes.

[ Proc. Natl. Acad. Sci. vol. 101 pp. 11287 – 11292 ’04 ] One can study the effects of mutating a single amino acid on two separate rates (the on rate and the off rate) the ratio of which is the equilibrium constant. Mutations changing the on rate, concern the specificity of protein protein interaction. Mutations only changing the off rate do not affect the transition state of protein binding (don’t see why not). Mutations in bovine pancreatic trypsin inhibitor (BPTI) have been found at positions #15 and #17 which differentially affect on and off rates. K15A decreases by 200 fold in the on rate and by a 1000 fold increase in the off rate. But R17A doesn’t change the on rate but also increases the off rate by 1000 fold.

The concept of anchor residue arose in the study of peptide binding to class I MHC molecules (Major HistoCompatibility complex) In this system the carboxy terminal side chain of the peptide gets buried in pocket F of the MHC binding groove. Sometimes, one also finds a second anchor residue and even a third one buried at other positions.

The authors attempt to apply the anchor residue concept to protein protein interactions. They studied 39 different protein/protein complexes. They found them, and in some way conclude that these anchor residues are already in the ‘bound’ conformation in the free partner. The anchors interact with structurally constrained pockets matching the anchor residues. The presence of nativelike anchor side chains provides a readily attainable geometrical fit that jams the two interacting surfaces, allowing for the recognition and stabilization of a near-native intermediate. Subsequently an induced fit process occurs on the periphery of the binding pocket.

The analysis of ANY (really?) protein/protein complex at the atomic length scale shows that the interface, rather than being smooth and flat, includes side chains deeply protruding into well defined cavities on the other protein. In all complexes studied, the anchor is the side chain whose burial after complex formation yields the largest possible decrease in solvent accessible surface area (SASA). If SASA is over 100 square Angstroms, than only one anchoring interaction is present. For lesser SASA amino acids one anchor isn’t enough.

In all cases tested (39) latch side chains are found in conformations conducive to a relatively straightforward clamping of the anchored intermediate into a high affinity complex.

[ Proc. Natl. Acad. Sci. vol. 102 pp. 57 – 62 ’05 ] An analysis of the protein interface between a beta-lactamase and its inhibitor, shows that the interface can be divided into clusters (by means of cluster anlaysis) using multiple mutant analysis and xray crystallography. Replacing an entire module of 5 interface residues with alanine (in one cluster) created a large cavity in the interface with no effect on the detailed structure of the remaining interface. They obtained similar results when they did this with another of the 5 clusters.

Mutating a single amino acid at a time has been done in the past, but the results of single mutations aren’t additive (e.g. they aren’t linear — no surprise). The sum of the loss in free energy of all of the single mutations within a cluster exceeds by 4 fold the loss in free energy generated when all of the residues of the cluster are mutated simultaneously. The energetic effect of many single mutations is larger than their net contribution due to a penalty paid by leaving the rest of the cluster behind.

“Binding seems to be a result of higher organization of the binding sites, and not just of surface complementrity.”

[ Proc. Natl. Acad. Sci. vol. 103 pp. 311 – 316 ’06 ] Two different ‘interactomes’ both show the same power law distribution of node sizes. However, when the two major S. cerevisiae protein/protein interactions are experiments are compared with each other, only 150 of the THOUSANDS of interactions of each experiments are the same. A similar lack of agreement has been found for independent Y2H experiments in Drosophila.

This work says that desolvation of the interface is a major physical factor in protein/protein interactions. This model reproduces the scale free nature of the topology. The number of interactions made by a protein is correlated with the fraction of hydrophobic residues on its surface.

      [ Proc. Natl. Acad. Sci. vol. 108 pp. 13528 – 13533  ’11 ] The drugs they are looking for disrupt specific protein protein interactions (PPIs).   Tey use computational solvent mapping, which explores the protein surface using a variety of small probe molecules, along with a conformer generator to account to side chain flexibility.  They studied unliganded proteins known to participate PPI.  The surface cavities available at protein protein interfaces which can bind a smal molecule inhibitor are rather different than those seen in traditional drug targets.  The traditional targets have one or two disproportionately large pockets with an average volume of 260 cubic Angstroms — these account for the binding site for the endogenous ligand in over 90% of proteins.  The average volume of pockets at protein protein interfaces is only 54 cubic Angstroms, the same as for all protein surface pockets.  The interface ontains 6 such small pockets (on average). 
      The binding sites of proteins generall include smaller regions called hotspots which are major contributors to the binding free energy.  The results of experimental fragment screens confirm that the hot spotes of proteins are characterized by their ability to bind a variety of small molecules and that the number of different probe molecules observed to bind to a particular site predicts the importance of the site and predicts overall druggability.  
      This work shows that the druggable sites in PPIs have concave topology and both hydrophobic and polar functionality.  So the hotspots bind organic molecules having some polar groups decorating largely hydropobic scaffolds. Sos druggable sites have a ‘general tendency’ to bind organic compounds with a variety of structures.  Conformational flexibility at the binding site (by side chains?) allow the hotspots to expand to accomodate a ligand of druglike dimensions.  This involves low energy side chain motions within 6 Angstroms of a hot spot.
      So druggable sites at a PPI aren’t just sites complementary to particular organic functionality, but they have a general tendency to bind a variety of different organic structures.  
      The most important binding is that the druggable sites are detectable from the structure of the unliganded protein, even when substantial conformational adaptation is needed for optimal ligand binding.

[ Science vol. 347 pp. 673 – 677 ’15 ] Mapping the sequence space of 4 key amino acids in the E. Coli protein kinase PhoQ which drives the recognition of its substrate (PhoP). For histidine kinases mutating just 3 or 4 interfacial amino acids to match those in another kinase is enought to reprogram them. The key variants are Ala284, Val 285, Ser288, Thr289.

All 20^4 = 160,000 variants of PhoQ at these positions were made, of which 1,659 were functional (implying singificant degeneracy of the interface). There were 16 single mutants, 100 double, 544 triple and 998 quadruple mutants which were functional. There was an enrichment of hydrophobic and small polar residues at each position. Most bulky and charged residues appeared at low frequencies. Some substitutions were permissible individually, but not in combination. The combinations, ACLV, TISV, SILS, each involving aresidues found individually in functional mutants at high frequency, were quite impaired in competition against wildtype PhoQ — so the effects of individual substitutions are context dependent (epistatic). Of the 100 functional double mutants, only 23 represent cases where both single mutants are functional. THere are double mutants where neither single mutant is functional. 79/1,658 functional variants can’t be reached from the wild-type combination AVST) without passing through a nonfunctional intermediated. They talk about the Hamming distance between mutants.

Finally some blue sky stuff — implying that (as usual) Nature got there first

       [ Science vol. 341 pp. 1116 – 1120 ’13 ] Small Open Reading Frames (smORFs) code for peptides of under 100 amino acids.  This work has shown that peptides as short as 11 amino acids are translated and provide essential functions during insect development.  This work shows two peptides of 28 and 29 amino acids regulating calcium transport in the Drosophila heart.  The peptides are found in man.  
      They don’t think that smORFs can’t be dismissed as irrelevant, and function should be looked for. 
       [ Science vol. 1356 – 1358 ’15 ] The Drosophila polished-rice (Pri) sORF peptides (11 – 32 amino acids)trigger proteasome mediated processing converting the Shavenbaby transcription repressor into a shorter activator.
       They think that oORF/smORFs mimic protein binding interfaces and control protein interactions that way.