Tag Archives: Intrinsically disordered proteins

Bye bye stoichiometry

I’m republishing this old post from 2018, to refresh my memory (and yours) about liquid liquid phase separation before writing a new post on one of the most interesting papers I’ve read in recent years.  The field has exploded since this was written.

Until recently, developments in physics basically followed earlier work by mathematicians Think relativity following Riemannian geometry by 40 years.  However in the past few decades, physicists have developed mathematical concepts before the mathematicians — think mirror symmetry which came out of string theory — https://en.wikipedia.org/wiki/Mirror_symmetry_(string_theory). You may skip the following paragraph, but here is what it meant to mathematics — from a description of a 400+ page book by Amherst College’s own David A. Cox

Mirror symmetry began when theoretical physicists made some astonishing predictions about rational curves on quintic hypersurfaces in four-dimensional projective space. Understanding the mathematics behind these predictions has been a substantial challenge. This book is the first completely comprehensive monograph on mirror symmetry, covering the original observations by the physicists through the most recent progress made to date. Subjects discussed include toric varieties, Hodge theory, Kahler geometry, moduli of stable maps, Calabi-Yau manifolds, quantum cohomology, Gromov-Witten invariants, and the mirror theorem. This title features: numerous examples worked out in detail; an appendix on mathematical physics; an exposition of the algebraic theory of Gromov-Witten invariants and quantum cohomology; and, a proof of the mirror theorem for the quintic threefold.

Similarly, advances in cellular biology have come from chemistry.  Think DNA and protein structure, enzyme analysis.  However, cell biology is now beginning to return the favor and instruct chemistry by giving it new objects to study. Think phase transitions in the cell, liquid liquid phase separation, liquid droplets, and many other names (the field is in flux) as chemists begin to explore them.  Unlike most chemical objects, they are big, or they wouldn’t have been visible microscopically, so they contain many, many more molecules than chemists are used to dealing with.

These objects do not have any sort of definite stiochiometry and are made of RNA and the proteins which bind them (and sometimes DNA).  They go by any number of names (processing bodies, stress granules, nuclear speckles, Cajal bodies, Promyelocytic leukemia bodies, germline P granules.  Recent work has shown that DNA may be compacted similarly using the linker histone [ PNAS vol.  115 pp.11964 – 11969 ’18 ]

The objects are defined essentially by looking at them.  By golly they look like liquid drops, and they fuse and separate just like drops of water.  Once this is done they are analyzed chemically to see what’s in them.  I don’t think theory can predict them now, and they were never predicted a priori as far as I know.

No chemist in their right mind would have made them to study.  For one thing they contain tens to hundreds of different molecules.  Imagine trying to get a grant to see what would happen if you threw that many different RNAs and proteins together in varying concentrations.  Physicists have worked for years on phase transitions (but usually with a single molecule — think water).  So have chemists — think crystallization.

Proteins move in and out of these bodies in seconds.  Proteins found in them do have low complexity of amino acids (mostly made of only a few of the 20), and unlike enzymes, their sequences are intrinsically disordered, so forget the key and lock and induced fit concepts for enzymes.

Are they a new form of matter?  Is there any limit to how big they can be?  Are the pathologic precipitates of neurologic disease (neurofibrillary tangles, senile plaques, Lewy bodies) similar.  There certainly are plenty of distinct proteins in the senile plaque, but they don’t look like liquid droplets.

It’s a fascinating field to study.  Although made of organic molecules, there seems to be little for the organic chemist to say, since the interactions aren’t covalent.  Time for physical chemists and polymer chemists to step up to the plate.

Bye bye stoichiometry

Until recently, developments in physics basically followed earlier work by mathematicians Think relativity following Riemannian geometry by 40 years.  However in the past few decades, physicists have developed mathematical concepts before the mathematicians — think mirror symmetry which came out of string theory — https://en.wikipedia.org/wiki/Mirror_symmetry_(string_theory). You may skip the following paragraph, but here is what it meant to mathematics — from a description of a 400+ page book by Amherst College’s own David A. Cox

Mirror symmetry began when theoretical physicists made some astonishing predictions about rational curves on quintic hypersurfaces in four-dimensional projective space. Understanding the mathematics behind these predictions has been a substantial challenge. This book is the first completely comprehensive monograph on mirror symmetry, covering the original observations by the physicists through the most recent progress made to date. Subjects discussed include toric varieties, Hodge theory, Kahler geometry, moduli of stable maps, Calabi-Yau manifolds, quantum cohomology, Gromov-Witten invariants, and the mirror theorem. This title features: numerous examples worked out in detail; an appendix on mathematical physics; an exposition of the algebraic theory of Gromov-Witten invariants and quantum cohomology; and, a proof of the mirror theorem for the quintic threefold.

Similarly, advances in cellular biology have come from chemistry.  Think DNA and protein structure, enzyme analysis.  However, cell biology is now beginning to return the favor and instruct chemistry by giving it new objects to study. Think phase transitions in the cell, liquid liquid phase separation, liquid droplets, and many other names (the field is in flux) as chemists begin to explore them.  Unlike most chemical objects, they are big, or they wouldn’t have been visible microscopically, so they contain many, many more molecules than chemists are used to dealing with.

These objects do not have any sort of definite stiochiometry and are made of RNA and the proteins which bind them (and sometimes DNA).  They go by any number of names (processing bodies, stress granules, nuclear speckles, Cajal bodies, Promyelocytic leukemia bodies, germline P granules.  Recent work has shown that DNA may be compacted similarly using the linker histone [ PNAS vol.  115 pp.11964 – 11969 ’18 ]

The objects are defined essentially by looking at them.  By golly they look like liquid drops, and they fuse and separate just like drops of water.  Once this is done they are analyzed chemically to see what’s in them.  I don’t think theory can predict them now, and they were never predicted a priori as far as I know.

No chemist in their right mind would have made them to study.  For one thing they contain tens to hundreds of different molecules.  Imagine trying to get a grant to see what would happen if you threw that many different RNAs and proteins together in varying concentrations.  Physicists have worked for years on phase transitions (but usually with a single molecule — think water).  So have chemists — think crystallization.

Proteins move in and out of these bodies in seconds.  Proteins found in them do have low complexity of amino acids (mostly made of only a few of the 20), and unlike enzymes, their sequences are intrinsically disordered, so forget the key and lock and induced fit concepts for enzymes.

Are they a new form of matter?  Is there any limit to how big they can be?  Are the pathologic precipitates of neurologic disease (neurofibrillary tangles, senile plaques, Lewy bodies) similar.  There certainly are plenty of distinct proteins in the senile plaque, but they don’t look like liquid droplets.

It’s a fascinating field to study.  Although made of organic molecules, there seems to be little for the organic chemist to say, since the interactions aren’t covalent.  Time for physical chemists and polymer chemists to step up to the plate.

The uses of disorder in the cell

We know that many proteins have disordered segments, and an older (2004) estimate says that over 30% of all eukaryotic proteins have disordered stretches of more than 30 amino acids.  Here is another example where the disordered conformation(s) of a protein is the form used by the cell.

Histone H1 (aka the linker histone) binds to DNA between nucleosomes.  It is thought to be important in the 10,000 or so compaction of the 3 meters or so of DNA each cell has so it fits into a 10 micron nucleus.  Histone H1 has a disordered carboxy terminal tail of 100 amino acids.  Unsurprisingly it is strongly positively charged (so it binds to the negatively charged phosphates holding DNA together).

H1 was studied in an interesting paper [ Proc. Natl. Acad. Sci. vol. 115 pp. 11964 – 11969 ’18 ].  The tail was added to short (36 basepairs) double stranded segment of DNA, under various stoichiometries and ionic compositions.  They found regions where the complex formed liquid droplets the size of microns.

We know DNA is compacted and people have looked for the 30 nanoMeter DNA fiber of DNA bound to nucleosomes for years without success.  It is possible that the compaction in DNA is due to phase separation (which is basically unstructured) rather than the rather specific structures proposed.  H1 may be acting as a likquidlike glue.  Fascinating.

In other work H1 was complexed with another protein (Prothymosin alpha) which is another intrinsically disordered protein which actually serves as a histone H1 chaperone.  Prothymosin is is polyAnionic, so it binds to polyCationic H1.  What is fascinating is that the binding is quite tight (picoMolar) and yet even when so tightly bound H1 remains disordered, something to confound drug chemists who are always looking for specific binding conformations.

The paper also describes Psi DNA, which is formed in solutions of cationic polymers. Here DNA condenses into a compact solvent excluded state.  It is an ordered assembly of B-DNA arranged in parallel twisted helical segments with a well define spacing.  It produces an anomalously large scattering signal in circular dichroism spectra.

Here is an older post in which the functional form of a protein is the unstructured one

When the active form of a protein is intrinsically disordered

Back in the day, biochemists talked about the shape of a protein, influenced by the spectacular pictures produced by Xray crystallography. Now, of course, we know that a protein has multiple conformations in the cell. I still find it miraculous that the proteins making us up have only relatively few. For details see — https://luysii.wordpress.com/2010/08/04/why-should-a-protein-have-just-one-shape-or-any-shape-for-that-matter/.

Presently, we also know that many proteins contain segments which are intrinsically disordered (e.g. no single shape).The pendulum has swung the other way — “estimations that contiguous regions longer than 50 amino acids ‘may be present” in ‘up to’ 50% of proteins coded in eukaryotic genomes [ Proc. Natl. Acad. Sci. vol. 102 pp. 17002 – 17007 ’05 ]

[ Science vol. 325 pp. 1635 – 1636 ’09 ] Compared to ordered regions, disordered regions of proteins have evolved rapidly, contain many short linear motifs that mediate protein/protein interactions, and have numerous phosphorylation sites compared to ordered regions. Disordered regions are enriched in serine and threonine residues, while ordered sequences are enriched in tyrosines — this highlights functional differences in the types of phosphorylation. Interestingly tyrosines have been lost during evolution.

What are unstructured protein segments good for? One theory is that the disordered segment can adopt different conformations to bind to different partners — this is the moonlighting effect. Then there is the fly casting mechanism — by being disordered (hence extended rather than compact) such proteins can flail about and find partners more easily.

Given what we know about enzyme function (and by inference protein function), it is logical to assume that the structured form of a protein which can be unstructured is the functional form.

Not so according to this recent example [ Nature vol. 519 pp. 106 – 109 ’15 ]. 4EBP2 is a protein involved in the control of protein synthesis. It binds to another protein also involved in synthesis (eIF4E) to suppress a form of translation of mRNA into protein (cap dependent translation if you must know). 4EBP2 is intrinsically disordered. When it binds to its target it undergoes a disorder to ordered transition. However eIF4E binding only occurs from the intrinsically disordered form.

Control of 4EBP2 activity is due, in part, to phosphorylation on multiple sites. This induces folding of amino acids #18 – #62 into a 4 stranded beta domain which sequesters the canonical YXXXLphi motif with which 4EBP2 binds eIF4E (Y stands for tyrosine, X for any amino acid, L for leucine and phi for any bulky hydrophobic amino acid). So here we have an inactive (e.g. nonbonding) form of a protein being the structured rather than the unstructured form. The unstructured form of 4EBP2 is therefore the physiologically active form of the protein.

Homework assignment for the protein chemist

As an ace protein chemist you are asked to design two proteins, both intrinsically disordered which form a tight complex with a picoMolar dissociation constant.  To make the problem ‘easier’ there is no need for specific amino acid interactions between the proteins.  To make the problem harder, even in the tight complex formed, the two proteins remain intrinsically disordered.

Hint: ‘nature’, ‘evolution’, ‘God’ —  whatever you chose to call it, has solved the problem.

Answer in a few days.

The humble snow flea teaches us some protein chemistry

Who would have thought that the humble snow flea (that we used to cross country ski over in Montana) would teach us a great deal about protein chemistry turning over some beloved shibboleths in the process.

The flea contains an antifreeze protein, which stops ice crystals from forming inside the cells of the flea in the cold environment in which it lives. The protein contains 81 amino acids, is 45% glycine and contains six  type II polyProline helices each 8 amino acids long (https://en.wikipedia.org/wiki/Polyproline_helix). None of the 6 polyProline helices contain proline despite the name, but all contain from 2 to 6 glycines. Also to be noted is (1) the absence of a hydrophobic core (2) the absence of alpha helices (3) the absence of beta turns (4) the protein has low sequence complexity.

Nonethless it quickly folds into a stable structure — meaning that (1), (2), and (3) are not necessary for a stable protein structure. (4) means that low sequence complexity in a protein sequence does not invariably imply an intrinsically disordered protein.

You can read all about it in Proc. Natl. Acad. Sci. vol. 114 pp. 2241 – 2446 ’17.

Time for some humility in what we thought we knew about proteins, protein folding, protein structural stability.

The uses of disorder

There was a lot of shock and awe about a report showing how seemingly minor changes in an aliphatic group on benzene led to markedly different conformations in its protein target (lysozyme from bacteriophage T4) http://pipeline.corante.com/archives/2015/06/18/tiny_and_not_so_tiny_changes.php.

Our noses are being rubbed in just how floppy proteins are, in contrast to the first glimpses of protein structure obtained by Xray crystallography. Back then we knew so little about proteins, that seeing all the atoms laid out in alpha helices and beta sheets was incredibly compelling. We talked about the structure of a protein rather than a structure. Even back then, with hemoglobin (one of the first solved proteins) it was obvious that proteins had to have more than one structure. The porphyrin ring in heme that oxygen binds to is buried deep in hemoglobin, and the initial structure had to move in some way to allow oxygen to find its way in (because the initial structure showed no obvious channel for oxygen). So hemoglobin had to breathe.

We now know that many proteins have intrinsically disordered segments. Amazingly, the most recent estimate I could find in my notes (or in Wikipedia) is this — It is estimated that over 30% of eukaryotic proteins have stretches of over 30 amino acids that are intrinsically disordered [ J. Mol. Biol. vol. 337 pp. 635 – 645 ’04 ]. Does anyone out there know of more recent data?

We’re a lot smarter now — here’s a comment on Derek’s post — “I have always thought crystal structures of proteins/enzymes are more a guide than actually useful. You are crystallizing a protein first-proteins don’t pack like that in vivo. Then you are settling on the conformation that freezes out- is this the lowest energy form? Then you are ignoring hte fact that these are highly dynamic structures that are constantly moving, sliding, shaking, adjusting. Then if you put a ligand in there you get the lowest energy form-which is what it would look like after reaction and before ligand dissociation- this is quite different from what it can look like at other stages of the reaction.”

Here is an interesting example of the uses of protein disorder going on right now in just about every neuron in your body. Most neurons have long processes, far too long for diffusion to move a needed protein to their ends. For that purpose we have microtubules (aka neurotubules in neurons) stretching the length of the processes, onto which two types of motors attach (dyneins which moves things to negative end of the microtubule and kinesins which move things to the positive end).

The microtubule is built from a heterodimer of two proteins (alpha and beta tubulin). Each contains about 450 amino acids and forms a globule 40 Angstroms (4 nanoMeters) in diameter. The heterodimers pack end to end to form a protofilament. 13 protofilaments line up side by side to form the microtubule, a hollow structure about 250 Angstroms in diameter. In cells microtubules are 1 to 10 microns long, but in nerve process they can be ‘up to’ 100 microns in length. Even at 1 micron (1,000 nanoMeters) that’s 13 * 250 heterodimers in a microtubule.

Any protein structure this important has a lot of modifications imposed on it to alter structure and function. Examples include phosphorylation and the addition of glutamic acid chains (polyglutamylation). The carboxy terminal tails of alpha and beta tubulin are flexible and stick out from the tubulin rod (which is why they aren’t seen on Xray crystallography). The carboxy terminal tail is the site of post-translational glutamylation. The enzyme polyglutamylating the carboxy terminal tail of beta tubular is TTLL7 (you don’t want to know what the acronym stands for). It binds to the alpha/beta tubular heterodimer by an intrinsically disordered region of its own (becoming structured in the process), then it binds to the intrinsically disordered carboxyl terminal tails, structuring them and modifying them. It’s basically a mating dance. There is a precedent for this — see https://luysii.wordpress.com/2013/12/29/the-mating-dance-of-a-promiscuous-protein/

So disordered regions of proteins although structureless are far from functionless

When the active form of a protein is intrinsically disordered

Back in the day, biochemists talked about the shape of a protein, influenced by the spectacular pictures produced by Xray crystallography. Now, of course, we know that a protein has multiple conformations in the cell. I still find it miraculous that the proteins making us up have only relatively few. For details see — https://luysii.wordpress.com/2010/08/04/why-should-a-protein-have-just-one-shape-or-any-shape-for-that-matter/.

Presently, we also know that many proteins contain segments which are intrinsically disordered (e.g. no single shape).The pendulum has swung the other way — “estimations that contiguous regions longer than 50 amino acids ‘may be present” in ‘up to’ 50% of proteins coded in eukaryotic genomes [ Proc. Natl. Acad. Sci. vol. 102 pp. 17002 – 17007 ’05 ]

[ Science vol. 325 pp. 1635 – 1636 ’09 ] Compared to ordered regions, disordered regions of proteins have evolved rapidly, contain many short linear motifs that mediate protein/protein interactions, and have numerous phosphorylation sites compared to ordered regions. Disordered regions are enriched in serine and threonine residues, while ordered sequences are enriched in tyrosines — this highlights functional differences in the types of phosphorylation. Interestingly tyrosines have been lost during evolution.

What are unstructured protein segments good for? One theory is that the disordered segment can adopt different conformations to bind to different partners — this is the moonlighting effect. Then there is the fly casting mechanism — by being disordered (hence extended rather than compact) such proteins can flail about and find partners more easily.

Given what we know about enzyme function (and by inference protein function), it is logical to assume that the structured form of a protein which can be unstructured is the functional form.

Not so according to this recent example [ Nature vol. 519 pp. 106 – 109 ’15 ]. 4EBP2 is a protein involved in the control of protein synthesis. It binds to another protein also involved in synthesis (eIF4E) to suppress a form of translation of mRNA into protein (cap dependent translation if you must know). 4EBP2 is intrinsically disordered. When it binds to its target it undergoes a disorder to ordered transition. However eIF4E binding only occurs from the intrinsically disordered form.

Control of 4EBP2 activity is due, in part, to phosphorylation on multiple sites. This induces folding of amino acids #18 – #62 into a 4 stranded beta domain which sequesters the canonical YXXXLphi motif with which 4EBP2 binds eIF4E (Y stands for tyrosine, X for any amino acid, L for leucine and phi for any bulky hydrophobic amino acid). So here we have an inactive (e.g. nonbonding) form of a protein being the structured rather than the unstructured form. The unstructured form of 4EBP2 is therefore the physiologically active form of the protein.