The humble snow flea teaches us some protein chemistry

Who would have thought that the humble snow flea (that we used to cross country ski over in Montana) would teach us a great deal about protein chemistry turning over some beloved shibboleths in the process.

The flea contains an antifreeze protein, which stops ice crystals from forming inside the cells of the flea in the cold environment in which it lives. The protein contains 81 amino acids, is 45% glycine and contains six  type II polyProline helices each 8 amino acids long (https://en.wikipedia.org/wiki/Polyproline_helix). None of the 6 polyProline helices contain proline despite the name, but all contain from 2 to 6 glycines. Also to be noted is (1) the absence of a hydrophobic core (2) the absence of alpha helices (3) the absence of beta turns (4) the protein has low sequence complexity.

Nonethless it quickly folds into a stable structure — meaning that (1), (2), and (3) are not necessary for a stable protein structure. (4) means that low sequence complexity in a protein sequence does not invariably imply an intrinsically disordered protein.

You can read all about it in Proc. Natl. Acad. Sci. vol. 114 pp. 2241 – 2446 ’17.

Time for some humility in what we thought we knew about proteins, protein folding, protein structural stability.

The uses of disorder

There was a lot of shock and awe about a report showing how seemingly minor changes in an aliphatic group on benzene led to markedly different conformations in its protein target (lysozyme from bacteriophage T4) http://pipeline.corante.com/archives/2015/06/18/tiny_and_not_so_tiny_changes.php.

Our noses are being rubbed in just how floppy proteins are, in contrast to the first glimpses of protein structure obtained by Xray crystallography. Back then we knew so little about proteins, that seeing all the atoms laid out in alpha helices and beta sheets was incredibly compelling. We talked about the structure of a protein rather than a structure. Even back then, with hemoglobin (one of the first solved proteins) it was obvious that proteins had to have more than one structure. The porphyrin ring in heme that oxygen binds to is buried deep in hemoglobin, and the initial structure had to move in some way to allow oxygen to find its way in (because the initial structure showed no obvious channel for oxygen). So hemoglobin had to breathe.

We now know that many proteins have intrinsically disordered segments. Amazingly, the most recent estimate I could find in my notes (or in Wikipedia) is this — It is estimated that over 30% of eukaryotic proteins have stretches of over 30 amino acids that are intrinsically disordered [ J. Mol. Biol. vol. 337 pp. 635 – 645 ’04 ]. Does anyone out there know of more recent data?

We’re a lot smarter now — here’s a comment on Derek’s post — “I have always thought crystal structures of proteins/enzymes are more a guide than actually useful. You are crystallizing a protein first-proteins don’t pack like that in vivo. Then you are settling on the conformation that freezes out- is this the lowest energy form? Then you are ignoring hte fact that these are highly dynamic structures that are constantly moving, sliding, shaking, adjusting. Then if you put a ligand in there you get the lowest energy form-which is what it would look like after reaction and before ligand dissociation- this is quite different from what it can look like at other stages of the reaction.”

Here is an interesting example of the uses of protein disorder going on right now in just about every neuron in your body. Most neurons have long processes, far too long for diffusion to move a needed protein to their ends. For that purpose we have microtubules (aka neurotubules in neurons) stretching the length of the processes, onto which two types of motors attach (dyneins which moves things to negative end of the microtubule and kinesins which move things to the positive end).

The microtubule is built from a heterodimer of two proteins (alpha and beta tubulin). Each contains about 450 amino acids and forms a globule 40 Angstroms (4 nanoMeters) in diameter. The heterodimers pack end to end to form a protofilament. 13 protofilaments line up side by side to form the microtubule, a hollow structure about 250 Angstroms in diameter. In cells microtubules are 1 to 10 microns long, but in nerve process they can be ‘up to’ 100 microns in length. Even at 1 micron (1,000 nanoMeters) that’s 13 * 250 heterodimers in a microtubule.

Any protein structure this important has a lot of modifications imposed on it to alter structure and function. Examples include phosphorylation and the addition of glutamic acid chains (polyglutamylation). The carboxy terminal tails of alpha and beta tubulin are flexible and stick out from the tubulin rod (which is why they aren’t seen on Xray crystallography). The carboxy terminal tail is the site of post-translational glutamylation. The enzyme polyglutamylating the carboxy terminal tail of beta tubular is TTLL7 (you don’t want to know what the acronym stands for). It binds to the alpha/beta tubular heterodimer by an intrinsically disordered region of its own (becoming structured in the process), then it binds to the intrinsically disordered carboxyl terminal tails, structuring them and modifying them. It’s basically a mating dance. There is a precedent for this — see https://luysii.wordpress.com/2013/12/29/the-mating-dance-of-a-promiscuous-protein/

So disordered regions of proteins although structureless are far from functionless

When the active form of a protein is intrinsically disordered

Back in the day, biochemists talked about the shape of a protein, influenced by the spectacular pictures produced by Xray crystallography. Now, of course, we know that a protein has multiple conformations in the cell. I still find it miraculous that the proteins making us up have only relatively few. For details see — https://luysii.wordpress.com/2010/08/04/why-should-a-protein-have-just-one-shape-or-any-shape-for-that-matter/.

Presently, we also know that many proteins contain segments which are intrinsically disordered (e.g. no single shape).The pendulum has swung the other way — “estimations that contiguous regions longer than 50 amino acids ‘may be present” in ‘up to’ 50% of proteins coded in eukaryotic genomes [ Proc. Natl. Acad. Sci. vol. 102 pp. 17002 – 17007 ’05 ]

[ Science vol. 325 pp. 1635 – 1636 ’09 ] Compared to ordered regions, disordered regions of proteins have evolved rapidly, contain many short linear motifs that mediate protein/protein interactions, and have numerous phosphorylation sites compared to ordered regions. Disordered regions are enriched in serine and threonine residues, while ordered sequences are enriched in tyrosines — this highlights functional differences in the types of phosphorylation. Interestingly tyrosines have been lost during evolution.

What are unstructured protein segments good for? One theory is that the disordered segment can adopt different conformations to bind to different partners — this is the moonlighting effect. Then there is the fly casting mechanism — by being disordered (hence extended rather than compact) such proteins can flail about and find partners more easily.

Given what we know about enzyme function (and by inference protein function), it is logical to assume that the structured form of a protein which can be unstructured is the functional form.

Not so according to this recent example [ Nature vol. 519 pp. 106 – 109 ’15 ]. 4EBP2 is a protein involved in the control of protein synthesis. It binds to another protein also involved in synthesis (eIF4E) to suppress a form of translation of mRNA into protein (cap dependent translation if you must know). 4EBP2 is intrinsically disordered. When it binds to its target it undergoes a disorder to ordered transition. However eIF4E binding only occurs from the intrinsically disordered form.

Control of 4EBP2 activity is due, in part, to phosphorylation on multiple sites. This induces folding of amino acids #18 – #62 into a 4 stranded beta domain which sequesters the canonical YXXXLphi motif with which 4EBP2 binds eIF4E (Y stands for tyrosine, X for any amino acid, L for leucine and phi for any bulky hydrophobic amino acid). So here we have an inactive (e.g. nonbonding) form of a protein being the structured rather than the unstructured form. The unstructured form of 4EBP2 is therefore the physiologically active form of the protein.