So many of the molecular machines used in the cell are composed of many different proteins held together by nonCovalent interactions. The Mediator complex contains 25 – 30 proteins with a mass of 1.6 megaDaltons, RNA polymerase contains 12 subunits, the general transcription factors contain 25 proteins, our ribosome with a mass of 4.3 megaDaltons contains 47 in the large subunit and 33 in the small. The list goes on and on — proteasome,nucleosome, post-synaptic density.
The typical protein/protein interface has an area of 1,000 – 2000 square Angstroms — or circles of diameter between 34 and 50 Angstroms. [ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ]. Think of the largest classical organic molecule you’ve ever made (not any polymer like a protein, polynucleotide, or polysaccharide). It isn’t anywhere close to this.
Yet I’m convinced that drugs targeting these complexes, will be useful. Classical organic chemistry will be useless in designing them. We’ll have to forget our beloved SN1, SN2, nonclassical carbonium ions etc. etc. We need some new sort of physical organic chemistry, one not concerned with reaction mechanism, but with van der Waals interactions, electrostatic interactions. At least stereochemistry will still be important.
The problem is much harder than designing enzyme inhibitors, or their allosteric modifiers, because the target is so large.
What follows are some notes on the protein protein interface I’ve taken over the years to get you started thinking. Good luck. Don’t expect any neat answers. There is a lot of contention concerning the nature of the binding occurring at the interface.
Many of the references aren’t particularly new. In my reading, I don’t try for the latest reference, but the newest idea that I’m unfamiliar with. I think they pretty much cover the territory as it stands now.
[ Proc. Natl. Acad. Sci. vol. 108 pp. 603 – 608 ’11 ] A very interesting article argues that worms and humans have about the same number of proteins (20,000) because if they had more, nonspecific protein protein interactions would cause disease. The achievable energy gap favoring specific over nonspecific binding decreases with protein number in a power law fashion (in their model). The optimization of binding interfaces favors networks in which a few proteins have many partners and most proteins have just a few — this is consistent with a scale free network topology.
[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] The hot spot theory of protein protein interactions says that the binding energy between two proteins is governed in large part by just a few critical residues at the binding interface. In a typical interface of 1000 – 2000 square Angstroms, only 5% of the residues from each protein contribute more than 2 kiloCalories/mole to the binding interaction. (This is controversial — see later)
[ Proc. Natl. Acad. Sci. vol. 99 pp. 14116 – 14121 ’02 ] Specific replacement of amino acids in the interface by alanine (alanine scanning or alanine mutagenesis) and measuring the effect on the interaction has led to the idea that only a small set of ‘hot spot’ residues at the inferface contribute to the binding free energy. A hot spot has been defined as a residue that when mutated to alanine leads to a significant drop in the binding constant (typically 10 fold or higher — should know how many kiloCalories this is — I think 2 or 3 ). This was well worked out for human growth hormone (HGH) and its receptor. Subsequently ‘many’ other studies have suggested that the presence of a few hot spots may be a general characteristic of most protein/protein interfaces.
However there is extreme variation in the size, shape, amino acid character and solvent content of the protein/protein interface. It is not obvious from looking at structural contacts which residues are important for binding. Usually they are found at the center of the interface but sometimes the key residues can lie on the periphery. Peripheral residues serve as an O-ring to exclude solvent from the center. A lowered effective dielectric constant in a ‘dryer’ environment strengthens electrostatic and hydrogen bonding interactions. An interaction deleted by alanine mutagenesis in the periphery can be replaced by a water molecule in the periphery and hence cause less loss in stability (this calls the whole concept of alanine scanning into question).
Interestingly, there is no general correlation between ‘surface accessibility’ and the contribution of a residue to the binding energy.
Polar residues (Arg, Gln, His, Asp, and Asn) are conserved in interfaces. This implies that they are hot spots — implies ? don’t they know? haven’t they tested? However, many interaction hot spots involve hydrophobic or large aromatic residues (also hydrophobic). It is unclear whether buried polar interactions are energetically net stabilizing or merely facilitating specificity (how would you tell?).
Some residues without significant contacts in the interface apparently contribute substantially to the free energy of binding when assayed by alanine scanning mutagenesis, because of destabilization of the unbound protein.
This a report of a free energy function (using packing interactions, hydrogen bonds and an implicit solvation model) which predicts 79% of all interface hot spots. They think that a description of polar interactions with Coulomb electrostatics with a linear distance dependent dielectric constant. ??? The latter ignores the orientation dependence of the hydrogen bond. Also the assumption that acidic or basic residues largely buried in the interface are charged may be wrong. The enthalpic gains of ionization are offset by the cost of desolvating polar groups, and the loss in side chain conformational entropy.
[ Proc. Natl. Acad. Sci. vol. 101 pp. 16437 – 16441 ’04 ] It is of interest to find out if hot spot theory applies to transient protein protein interactions (such as those involved in enzyme catalysis). This work looked for them in the process of protein substrate recognition for the Cdc25 phosphatase (which dephosphorylates the cyclin dependent kinases). Crystal structures of the catalytic domains of Cdc25A and Cdc25B have shown a shallow active site with no obvious features for mediating substrate recognition. This suggests a broad protein interface rather than lock and key interaction. This is confirmed by the activity of the Cdc25 phosphatases toward Cdk/cyclin protein substrates which is 6 orders of magnitude greater than that of peptidic substrates containing the same primary sequence — this suggests a broad protein interface rather than a lock and key interaction. The shallow active sites also correlates with the lack of potent speicific inhibitors of the Cdc25 phosphatases, despite extensive search. This work finds hot spot residues in the catalytic domain (not the catalytic site) of Cdc25B located 20 – 30 Angstroms away from the active site. They are involved in recognition of substrate. The residues are conserved across eukaryotes.
[ Proc. Natl. Acad. Sci. vol. 101 pp. 11287 – 11292 ’04 ] One can study the effects of mutating a single amino acid on two separate rates (the on rate and the off rate) the ratio of which is the equilibrium constant. Mutations changing the on rate, concern the specificity of protein protein interaction. Mutations only changing the off rate do not affect the transition state of protein binding (don’t see why not). Mutations in bovine pancreatic trypsin inhibitor (BPTI) have been found at positions #15 and #17 which differentially affect on and off rates. K15A decreases by 200 fold in the on rate and by a 1000 fold increase in the off rate. But R17A doesn’t change the on rate but also increases the off rate by 1000 fold.
The concept of anchor residue arose in the study of peptide binding to class I MHC molecules (Major HistoCompatibility complex) In this system the carboxy terminal side chain of the peptide gets buried in pocket F of the MHC binding groove. Sometimes, one also finds a second anchor residue and even a third one buried at other positions.
The authors attempt to apply the anchor residue concept to protein protein interactions. They studied 39 different protein/protein complexes. They found them, and in some way conclude that these anchor residues are already in the ‘bound’ conformation in the free partner. The anchors interact with structurally constrained pockets matching the anchor residues. The presence of nativelike anchor side chains provides a readily attainable geometrical fit that jams the two interacting surfaces, allowing for the recognition and stabilization of a near-native intermediate. Subsequently an induced fit process occurs on the periphery of the binding pocket.
The analysis of ANY (really?) protein/protein complex at the atomic length scale shows that the interface, rather than being smooth and flat, includes side chains deeply protruding into well defined cavities on the other protein. In all complexes studied, the anchor is the side chain whose burial after complex formation yields the largest possible decrease in solvent accessible surface area (SASA). If SASA is over 100 square Angstroms, than only one anchoring interaction is present. For lesser SASA amino acids one anchor isn’t enough.
In all cases tested (39) latch side chains are found in conformations conducive to a relatively straightforward clamping of the anchored intermediate into a high affinity complex.
[ Proc. Natl. Acad. Sci. vol. 102 pp. 57 – 62 ’05 ] An analysis of the protein interface between a beta-lactamase and its inhibitor, shows that the interface can be divided into clusters (by means of cluster anlaysis) using multiple mutant analysis and xray crystallography. Replacing an entire module of 5 interface residues with alanine (in one cluster) created a large cavity in the interface with no effect on the detailed structure of the remaining interface. They obtained similar results when they did this with another of the 5 clusters.
Mutating a single amino acid at a time has been done in the past, but the results of single mutations aren’t additive (e.g. they aren’t linear — no surprise). The sum of the loss in free energy of all of the single mutations within a cluster exceeds by 4 fold the loss in free energy generated when all of the residues of the cluster are mutated simultaneously. The energetic effect of many single mutations is larger than their net contribution due to a penalty paid by leaving the rest of the cluster behind.
“Binding seems to be a result of higher organization of the binding sites, and not just of surface complementrity.”
[ Proc. Natl. Acad. Sci. vol. 103 pp. 311 – 316 ’06 ] Two different ‘interactomes’ both show the same power law distribution of node sizes. However, when the two major S. cerevisiae protein/protein interactions are experiments are compared with each other, only 150 of the THOUSANDS of interactions of each experiments are the same. A similar lack of agreement has been found for independent Y2H experiments in Drosophila.
This work says that desolvation of the interface is a major physical factor in protein/protein interactions. This model reproduces the scale free nature of the topology. The number of interactions made by a protein is correlated with the fraction of hydrophobic residues on its surface.
[ Science vol. 347 pp. 673 – 677 ’15 ] Mapping the sequence space of 4 key amino acids in the E. Coli protein kinase PhoQ which drives the recognition of its substrate (PhoP). For histidine kinases mutating just 3 or 4 interfacial amino acids to match those in another kinase is enought to reprogram them. The key variants are Ala284, Val 285, Ser288, Thr289.
All 20^4 = 160,000 variants of PhoQ at these positions were made, of which 1,659 were functional (implying singificant degeneracy of the interface). There were 16 single mutants, 100 double, 544 triple and 998 quadruple mutants which were functional. There was an enrichment of hydrophobic and small polar residues at each position. Most bulky and charged residues appeared at low frequencies. Some substitutions were permissible individually, but not in combination. The combinations, ACLV, TISV, SILS, each involving aresidues found individually in functional mutants at high frequency, were quite impaired in competition against wildtype PhoQ — so the effects of individual substitutions are context dependent (epistatic). Of the 100 functional double mutants, only 23 represent cases where both single mutants are functional. THere are double mutants where neither single mutant is functional. 79/1,658 functional variants can’t be reached from the wild-type combination AVST) without passing through a nonfunctional intermediated. They talk about the Hamming distance between mutants.
Finally some blue sky stuff — implying that (as usual) Nature got there first