Category Archives: Chemistry (relatively pure)

Ring currents ride again

One of the most impressive pieces of evidence (to me at least) that we really understand what electrons are doing in organic molecules are the ring currents. Recall that the pi electrons in benzene are delocalized above and below the planar ring determined by the 6 carbon atoms.

How do we know this? When a magnetic field is applied the electrons in the ring cloud circulate to oppose the field. So what? Well if you can place a C – H bond above the ring, the induced current will shield it. Such molecules are known, and the new edition of Clayden (p. 278) shows the NMR spectra showing [ 7 ] paracyclophane which is benzene with 7 CH2’s linked to the 1 and 4 positions of benzene, so that the hydrogens of the 4th CH2 is directly over the ring (7 CH2’s aren’t long enough for it to be anywhere else). Similarly, [ 18 ] Annulene has 6 hydrogens inside the armoatic ring — and these hydrogens are even more deshielded. Interestingly building larger and larger annulenes, as shown that aromaticity decreases with increasing size, vanishing for systems with more than 30 pi electrons (diameter 13 Angstroms), probably because planarity of the carbons becomes less and less possible, breaking up the cloud.

This brings us to Nature vol. 541 pp. 200 – 203 ’17 which describes a remarkable molecule with 6 porphyins in a ring hooked together by diyne linkers. The diameter of the circle is 24 Angstroms. Benzene and [ 18 ] Annulene have all the carbons in a plane, but the picture of the molecule given in the paper does not. Each of the porphyrins is planar of course, but each plane is tangent to the circle of porphyrins.

Also discussed is the fact that ‘anti-aromatic’ ring currents exist, in which they circulate to enhance rather than diminish the imposed magnetic field. The molecule can be switched between the aromatic and anti-aromatic states by its oxidation level. When it has 78 electrons ( 18 * 4 ) + 2 in the ring (with a charge of + 6) it is aromatic. When it has 80 elections with a + 4 charge it is anti-aromatic — further confirmation of the Huckel rule (as if it was needed).

On a historical note reference #27 is to a paper of Marty Gouterman in 1961, who was teaching grad students in chemistry in the spring of 1961. He was an excellent teacher. Here he is at the University of Washington —

Memories are made of this ?

Back in the day when information was fed into computers on punch cards, the data was the holes in the paper not the paper itself. A far out (but similar) theory of how memories are stored in the brain just got a lot more support [ Neuron vol. 93 pp. 6 -8, 132 – 146 ’17 ].

The theory says that memories are stored in the proteins and sugar polymers surrounding neurons rather than the neurons themselves. These go by the name of extracellular matrix, and memories are the holes drilled in it which allow synapses to form.

Here’s some stuff I wrote about the idea when I first ran across it two years ago.


An article in Science (vol. 343 pp. 670 – 675 ’14) on some fairly obscure neurophysiology at the end throws out (almost as an afterthought) an interesting idea of just how chemically and where memories are stored in the brain. I find the idea plausible and extremely surprising.

You won’t find the background material to understand everything that follows in this blog. Hopefully you already know some of it. The subject is simply too vast, but plug away. Here a few, seriously flawed in my opinion, theories of how and where memory is stored in the brain of the past half century.

#1 Reverberating circuits. The early computers had memories made of something called delay lines ( where the same impulse would constantly ricochet around a circuit. The idea was used to explain memory as neuron #1 exciting neuron #2 which excited neuron . … which excited neuron #n which excited #1 again. Plausible in that the nerve impulse is basically electrical. Very implausible, because you can practically shut the whole brain down using general anesthesia without erasing memory. However, RAM memory in the computers of the 70s used the localized buildup of charge to store bits and bytes. Since charge would leak away from where it was stored, it had to be refreshed constantly –e.g. at least 12 times a second, or it would be lost. Yet another reason data should always be frequently backed up.

#2 CaMKII — more plausible. There’s lots of it in brain (2% of all proteins in an area of the brain called the hippocampus — an area known to be important in memory). It’s an enzyme which can add phosphate groups to other proteins. To first start doing so calcium levels inside the neuron must rise. The enzyme is complicated, being comprised of 12 identical subunits. Interestingly, CaMKII can add phosphates to itself (phosphorylate itself) — 2 or 3 for each of the 12 subunits. Once a few phosphates have been added, the enzyme no longer needs calcium to phosphorylate itself, so it becomes essentially a molecular switch existing in two states. One problem is that there are other enzymes which remove the phosphate, and reset the switch (actually there must be). Also proteins are inevitably broken down and new ones made, so it’s hard to see the switch persisting for a lifetime (or even a day).

#3 Synaptic membrane proteins. This is where electrical nerve impulses begin. Synapses contain lots of different proteins in their membranes. They can be chemically modified to make the neuron more or less likely to fire to a given stimulus. Recent work has shown that their number and composition can be changed by experience. The problem is that after a while the synaptic membrane has begun to resemble Grand Central Station — lots of proteins coming and going, but always a number present. It’s hard (for me) to see how memory can be maintained for long periods with such flux continually occurring.

This brings us to the Science paper. We know that about 80% of the neurons in the brain are excitatory — in that when excitatory neuron #1 talks to neuron #2, neuron #2 is more likely to fire an impulse. 20% of the rest are inhibitory. Obviously both are important. While there are lots of other neurotransmitters and neuromodulators in the brains (with probably even more we don’t know about — who would have put carbon monoxide on the list 20 years ago), the major inhibitory neurotransmitter of our brains is something called GABA. At least in adult brains this is true, but in the developing brain it’s excitatory.

So the authors of the paper worked on why this should be. GABA opens channels in the brain to the chloride ion. When it flows into a neuron, the neuron is less likely to fire (in the adult). This work shows that this effect depends on the negative ions (proteins mostly) inside the cell and outside the cell (the extracellular matrix). It’s the balance of the two sets of ions on either side of the largely impermeable neuronal membrane that determines whether GABA is excitatory or inhibitory (chloride flows in either event), and just how excitatory or inhibitory it is. The response is graded.

For the chemists: the negative ions outside the neurons are sulfated proteoglycans. These are much more stable than the proteins inside the neuron or on its membranes. Even better, it has been shown that the concentration of chloride varies locally throughout the neuron. The big negative ions (e.g. proteins) inside the neuron move about but slowly, and their concentration varies from point to point.

Here’s what the authors say (in passing) “the variance in extracellular sulfated proteoglycans composes a potential locus of analog information storage” — translation — that’s where memories might be hiding. Fascinating stuff. A lot of work needs to be done on how fast the extracellular matrix in the brain turns over, and what are the local variations in the concentration of its components, and whether sulfate is added or removed from them and if so by what and how quickly.


So how does the new work support this idea? It involves a structure that I’ve never talked about — the lysosome (for more info see It’s basically a bag of at least 40 digestive and synthetic enzymes inside the cell, which chops anything brought to it (e.g. bacteria). Mutations in the enzymes cause all sorts of (fortunately rare) neurologic diseases — mucopolysaccharidoses, lipid storage diseases (Gaucher’s, Farber’s) the list goes on and on.

So I’ve always thought of the structure as a Pandora’s box best kept closed. I always thought of them as confined to the cell body, but they’re also found in dendrites according to this paper. Even more interesting, a rather unphysiologic treatment of neurons in culture (depolarization by high potassium) causes the lysosomes to migrate to the neuronal membrane and release its contents outside. One enzyme released is cathepsin B, a proteolytic enzyme which chops up the TIMP1 outside the cell. So what. TIMP1 is an endogenous inhibitor of Matrix MetalloProteinases (MMPs) which break down the extracellular matrix. So what?

Are neurons ever depolarized by natural events? Just by synaptic transmission, action potentials and spontaneously. So here we have a way that neuronal activity can cause holes in the extracellular matrix,the holes in the punch cards if you will.

Speculation? Of course. But that’s the fun of reading this stuff. As Mark Twain said ” There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.”

Tidings of great joy

One of the hardest things I had to do as a doc was watch an infant girl waste away and die of infantile spinal muscular atrophy (Werdnig Hoffmann disease) over the course of a year. Something I never thought would happen (a useful treatment) may be at hand. The actual papers are not available yet, but two placebo controlled trials with a significant number of patients (84, 121) in each were stopped early because trial monitors (not in any way involved with the patients) found the treated group was doing much, much better than the placebo. A news report of the trials is available [ Science vol. 354 pp. 1359 – 1360 ’16 (16 December) ].

The drug, a modified RNA molecule, (details not given) binds to another RNA which codes for the missing protein. In what follows a heavy dose of molecular biology will be administered to the reader. Hang in there, this is incredibly rational therapy based on serious molecular biological knowledge. Although daunting, other therapies of this sort for other neurologic diseases (Huntington’s Chorea, FrontoTemporal Dementia) are currently under study.

If you want to start at ground zero, I’ve written a series which should tell you enough to get started. Start here —
and follow the links to the next two.

Here we go if you don’t want to plow through all three

Our genes occur in pieces. Dystrophin is the protein mutated in the commonest form of muscular dystrophy. The gene for it is 2,220,233 nucleotides long but the dystrophin contains ‘only’ 3685 amino acids, not the 770,000+ amino acids the gene could specify. What happens? The whole gene is transcribed into an RNA of this enormous length, then 78 distinct segments of RNA (called introns) are removed by a gigantic multimegadalton machine called the spliceosome, and the 79 segments actually coding for amino acids (these are the exons) are linked together and the RNA sent on its way.

All this was unknown in the 70s and early 80s when I was running a muscular dystrophy clininc and taking care of these kids. Looking back, it’s miraculous that more of us don’t have muscular dystrophy; there is so much that can go wrong with a gene this size, let along transcribing and correctly splicing it to produce a functional protein.

One final complication — alternate splicing. The spliceosome removes introns and splices the exons together. But sometimes exons are skipped or one of several exons is used at a particular point in a protein. So one gene can make more than one protein. The record holder is something called the Dscam gene in the fruitfly which can make over 38,000 different proteins by alternate splicing.

There is nothing worse than watching an infant waste away and die. That’s what Werdnig Hoffmann disease is like, and I saw one or two cases during my years at the clinic. It is also called infantile spinal muscular atrophy. We all have two genes for the same crucial protein (called unimaginatively SMN). Kids who have the disease have mutations in one of the two genes (called SMN1) Why isn’t the other gene protective? It codes for the same sequence of amino acids (but using different synonymous codons). What goes wrong?

[ Proc. Natl. Acad. Sci. vol. 97 pp. 9618 – 9623 ’00 ] Why is SMN2 (the centromeric copy (e.g. the copy closest to the middle of the chromosome) which is normal in most patients) not protective? It has a single translationally silent nucleotide difference from SMN1 in exon 7 (e.g. the difference doesn’t change amino acid coded for). This disrupts an exonic splicing enhancer and causes exon 7 skipping leading to abundant production of a shorter isoform (SMN2delta7). Thus even though both genes code for the same protein, only SMN1 actually makes the full protein.

More background. The molecular machine which removes the introns is called the spliceosome. It’s huge, containing 5 RNAs (called small nuclear RNAs, aka snRNAs), along with 50 or so proteins with a total molecular mass again of around 2,500,000 kiloDaltons. Think about it chemists. Design 50 proteins and 5 RNAs with probably 200,000+ atoms so they all come together forming a machine to operate on other monster molecules — such as the mRNA for Dystrophin alluded to earlier. Hard for me to believe this arose by chance, but current opinion has it that way.

Splicing out introns is a tricky process which is still being worked on. Mistakes are easy to make, and different tissues will splice the same pre-mRNA in different ways. All this happens in the nucleus before the mRNA is shipped outside where the ribosome can get at it.

The papers [ Science vol. 345 pp. 624 – 625, 688 – 693 ’14 ].describe a small molecule which acts on the spliceosome to increase the inclusion of SMN2 exon 7. It does appear to work in patient cells and mouse models of the disease, even reversing weakness.

I was extremely skeptical when I read the papers two years ago. Why? Because just about every protein we make is spliced (except histones), and any molecule altering the splicing machinery seems almost certain to produce effects on many genes, not just SMN2. If it really works, these guys should get a Nobel.

Well, I shouldn’t have been so skeptical. I can’t say much more about the chemistry of the drug (nusinersen) until the papers come out.

Fortunately, the couple (a cop and a nurse) took the 25% risk of another child with the same thing and produced a healthy infant a few years later.

A new way to study protein dynamics

“Fields of 1,000,000 Volts/centiMeter are dangerously large from a laboratory point of view” — true enough, but that’s merely one TENTH of the potential difference/distance ratio found across the plasma membrane of all our cells. Here’s why after a bit of background

We wouldn’t exist without the membranes enclosing our cells which are largely hydrocarbon. Chemists know that fatty acids have one end (the carboxyl group) which dissolves in water while the rest is pure hydrocarbon. The classic is stearic acid — 18 carbons in a straight chain with a carboxyl group at one end. 3 molecules of stearic acid are esterified to glycerol in beef tallow (forming a triglyceride). The pioneers hydrolyzed it to make soap. Saturated fatty acids of 18 carbons or more are solid at body temperature (soap certainly is), but cellular membranes are fairly fluid, and proteins embedded in them move around pretty quickly. Why? Because most fatty acids found in biologic membranes over 16 carbons have double bonds in them. Guess whether they are cis or trans. Hint: the isomer used packs less well into crystals — you’ve got it, all the double bonds found in oleic (18 carbons 1 double bond), arachidonic (20 carbons, 4 double bonds) are cis this keeps membranes fluids as well. The cis double bond essentially puts a 60 degree kink in the hydrocarbon chain, making it much more difficult to pack in a liquid crystal type structure with all the hydrocarbon chains stretched out. Then there’s cholesterol which makes up 1/5 or so of membranes by weight — it also breaks up the tendency of fatty acid hydrocarbon chains to align with each other because it doesn’t pack with them very well. So cholesterol is another fluidizer of membranes.

How thick is the cellular membrane? If you figure the hydrocarbon chains of a saturated fatty acid stretched out as far as they can go, you get 1.54 Angstroms * cosine (30 degrees) = 1.33 Angstroms/carbon — times 16 = 21 Angstroms. Now double that because cellular membranes are lipid bilayers meaning that they are made of two layers of hydrocarbons facing each other, with the hydrophilic ends (carboxyls, phosphate groups) pointing outward. So we’re up to 42 Angstroms of thickness for the hydrocarbon part of the membrane. Add another 10 Angstroms or so for the hydrophilic ends (which include things like serine, choline etc. etc.) and you’re up to about 60 Angstroms thickness for the membrane (which is usually cited as 70 Angstroms — I don’t know why).

Because the electric field across our membranes is huge. The potential difference across our cell membranes is 70 milliVolts — 70 x 10^-3 volts. 70 Angstroms is 7 nanoMeters (7 x 10^-9) meters. Divide 70 x 10^-3 volts by 7 x 10^-9 and you get a field of 10,000,000 Volts/centiMeter.

So our membrane proteins live and function quite nicely in this intense electric field. Which brings us to [ Nature vol. 540 pp. 400 – 405 ’16 ] which zaps protein crystals with electric fields of this intensity, and then does Xray crystallography at various intervals to watch how the protein backbone and side chains move. The technique is called Electric Field stimulated Xray crystallography (EF-X). Unlike solution where proteins are all in slightly different conformations, the starting line is the same as is the finish line.

The electric pulse durations range from 50 – 500 nanoSeconds (50 – 500 * 10^-9 seconds). The xray pulse for doing Xray crystallography lasts all of 100 picoSeconds (100 * 10^’12). By timing the delay between the electric pulse and the Xray pulse you watch the protein move in time in response to the electric pulse. Hardly physiologic, but it seems likely that protein motions will follow the path of least resistance, which should tell us which conformations are closest in energy to the energy minimum found in proteins. The pulses are collected 50, 100, 200 nanoSeconds after pulse onset. The crystals tolerated ‘huncreds’ of 100 – 500 nanoSecond megaVolt electric field pulses. But even 50 nanoSeconds is pretty long when protein dynamics is concerned, as bond vibrations are as fast as a few femtoSeconds (10^-15 seconds). An electric field of this strength exerts a force of 10^

The technology enabling this is fantastic, but it is quite similar in concept what the late Nobelist Ahmed Zewail was doing. Of course his work was even faster looking at chemical reactions at the femtoSecond level of time (10^-15 seconds). So as the year draws to a close, it’s nice to see his ideas live on, even if he didn’t.


Pickings have been slim lately, but here’s a great paper and a puzzle for you chemists out there. Most chemists (and biologists) know what a lipid bilayer is. It’s basically a soap bubble, with water loving (hydrophilic) groups on the outside of both sides of the bilayer, and hydrocarbon chains within. If the hydrocarbon chains are all stretched out the distance between carbons 1 and 3 is 2.66 Angstroms, and you have an 18 carbon fatty acid (stearic acid) it should be 8 * 2.66 + 1.33 Angstroms long (22.6 Angstroms). Double this for the bilayer and you have a thickness of 45 Angstroms. It’s probably less because carbon chains aren’t extended, partially because of entropy and largely because of cholesterol which breaks up any chance of such order (which maybe an important function for it). Sitting on either side of the lipid bilayer are phosphates esterified to one of the 3 hydroxyls of glycerol, with fatty acids of at least 16 – 18 carbons esterified to the other two. Hanging off the phosphates are a variety of things, but mostly serine and choline, forming phosphatidyl serine (PS) and phosphatidyl choline (PC). Here’s a picture —

Scramblases are enzymes which move phospholipids from one side of the lipid bilayer essentially randomizing their composition. They undo the action of other enzymes (called flippases believe it or not) which make the lipid composition of the two leaflets of the lipid bilayer rather different. This isn’t trivial, and is behind an elegant mechanism to show scavenger cells that a cell is dead. FLippases work to put phosphatidyl serine (PS) on the side of the lipid bilayer (the leaflet) facing the cytoplasm. This, of course takes energy, and when a cell lacks energy, entropy takes its course and PS appears on the outer leaflet, telling scavenger cells (phagocytes) to eat (phagocytose) the cell.

So how does an enzyme drag phosphatidyl choline (PC) or phosphatidyl serine (PC) across the lipid bilayer — scrambling the compositional asymmetry. Can you figure out a mechanism for a membrane protein to do this without looking at Proc. Natl. Acad. Sci. vol. 113 pp. 140149 – 14054 ’16? Chemists think they’re smart, and if you can design a protein to do this you’re smarter than I am because I’ve always wondered (ineffectually) how this was done for a long time.

The authors describe the structure of a fungal scramblase. It functions as a dimer with each subunit containing a hydrophilic groove containing polar and charged amino acid side chains facing the dimer interface. The protein itself does something unusual — it twists the sheet of the membrane, and decreases the thickness of the membrane from 29 to 18 Angstroms (remember the maximum possible thickness of the lipid bilayer was 45 Angstroms, but isn’t that thick for the reasons given above).

Phosphatidyl choline is a zwitterion (e.g. it contains both negative and positive charges although overall electrically neutral). The charges are separated in space forming a dipole. On the cytoplasmic side of the bilayer the scramblase has some amino acid side chains also forming a dipole, and right near the channel formed by the two hydrophilic grooves of the dimer. So it attracts the head group of PC (phosphate plus choline) as one dipole does to another which is then further attracted to the hydrophilic groove entering it — its hydrocarbon tail remains in the lipid part of the membrane. Then another PC joins the fun, pushing PC #1 farther into the groove, so that a chain of PCs fills the groove, wagging their lipid tails behind them (a la Little Bo Peep).

Clever no?

All is not perfect as the model doesn’t explain how phosphatidyl serine (which isn’t a zwitterion) moves across, but it’s an incredible start.

A scary paper: Cancer by proxy

Can a good kid growing up in a bad neighborhood turn bad? Most think so. What about a genetically normal cell growing up in a bad neighborhood? Can it turn cancerous if its neighbors have a mutation ? A recent paper [ Nature vol. 539 pp.304 – 308 ’16b] demonstrates how this can happen.

A gene called PTPN11 is mutated in myelomonocytic leukemia (MML)in humans and mice. Expressing the mutant in blood cells causes leukemia in mice (nothing spectacular there).

However, expressing the mutant in marrow supporting cells, not blood cells or blood stem cells for long enough gives MML in mice which can be transplanted into normal mice producing MML there.

Note that the blood stem cells don’t contain the mutant gene. One theory has it that mutant PTPN11 recruits monocytes, which then produce other stuff (CCL3 also known as MIP1alpha and interleukin1Beta), which then turns on blood stem cells to proliferate madly causing leukemia. Giving a CCL3 receptor antagonist reverses the myeloproliferation (but it isn’t clear to me if it reverses the leukemia once established)

As far as we know the cells developing into MML don’t contain mutant PTPN11. So it’s cancer by proxy. Obviously some changes (mutations, epigenetic changes) have have occurred in the leukemic cells, but at this point we don’t know what they are.

What is ICP27 trying to tell us? One of you could get a PhD if you figure it out !

It wouldn’t be the first time a viral protein led us to an important cellular mechanism. Consider what the polio virus taught us about the translation of mRNA into protein. It cleaves two components of eIF-4F (eukaryotic Initiation (of ribosome translation of mRNA into protein) Factor 4F totally shutting down synthesis of mRNAs with a cap on their 5′ end (which is most of them). Poliovirus proteins don’t have these caps so their proteins continue to be made.

Well this brings us to ICP27 (Infected Cell Protein 27) a product of the Herpes Simplex virus. You can read all about it in [ Proc. Natl. Acad. Sci. vol. 113 pp. 12256 – 12261 ’16 ]. ICP27 is essential for herpes virus infection. This work shows that it inhibits intron splicing (but in under 1% of cellular genes) and also promotes the use of alternative 5′ splice sites.

It also induces the expression of pre-mRNAS prematurely cleaved and polyAdenylated from cryptic polyAdenylation signals located in intron 1 or intron 2 of an amazing 1% of all cellular genes. These prematurely cleaved and polyAdenylated mRNA sometimes contain novel open reading frames (ORFs). They are typically intronless (they should be) and under 2 kiloBases long. They are expressed early during viral infection and efficiently exported to cytoplasm. The ICP27 targeted genes are GC rich (as are all Herpes simplex genes), contain cytosine rich sequences near the 5′ splice site.

The paper also showed that optimization of splice site sequences, or mutation of nearby cytosines eliminated ICP27 mediated splicing inhibition. Introduction of cytosine rich sequences to an ICP27 INsensitive splicing reporter conferred susceptibility to ICP27.

How is this going to help you get a PhD? Ask yourself. What are cryptic polyAdenylation signals doing in the first two introns in so many genes? It seems obvious (to me) that as well as the virus the cell is using them for some purpose. It isn’t hard to mutate something to the signal for polyadenylation AAUAAA. Interestingly cleavage doesn’t occur here, but 30 nucleotides or so downstream. The sequence occurs every 4^6 == 4096 nucleotides (if they’re random). I’m not sure what the total length of introns #1 and #2 are of our 20,000 or so protein coding genes, but someone should be able to find out and see if 200 occurrences of this sequence is more than would be expected by chance.

The plot thickens when the paper notes that “Over 200 genes are affected by ICP27. Over 30 (including PML, STING, TRAF6, PPP6C, MAP3K7, FBXw11, IFNAR2, NKFB1, RELA and CREBP are related to the immune pathway). Do you think the cell doesn’t use this pathway as well?

What about the existence of other viral (and cellular) proteins doing the same sort of thing (but on different introns perhaps). What are those novel open reading frames in the alternatively spliced mRNAs doing?

Fascinating stuff. Time to get busy if you’re an enterprising grad student, or young faculty member.

The proteasome branches out

The surface of a protein is not at all like a ball of yarn, even though they are both one long string. This has profound implications for the immune system. Look at any solved protein structure. The backbone bobs and weaves taking water hating (hydrophobic) amino acids into the center of the protein, and putting water loving (hydrophilic) amino acids on the surface. So even though the peptide backbone is continuous, only discontinuous patches of it are displayed on the protein surface.

Which is a big problem for the immune system which wants to recognize the surface of the protein (which is all it first gets to see with an invading bug). Now we know that foreign proteins are ingested by the cell, chopped up by the proteasome, and fragments loaded on to immune molecules (class I Major Histocompatibility Complex antigens) and displayed on the cell surface so the immune system can learn what it looks like and react to it. The peptides aren’t very long — under 11 or so amino acids, but they are continuous.

What if the really distinct part of the protein surface (e.g. the immunogen)  is made of two distinct patches from the backbone? A fascinating paper shows how the immune system might still recognize it. Chop the protein up into fragments by the proteasome, and then have the fragments from adjacent patches put back together. You know that any enzyme can be run in reverse, so if the proteasome can split peptide bonds apart it can also join them together.

This is exactly what was found in a recent paper — Science vol. 354 pp. 354 – 358 ’16. The small peptides (containing at most 11 amino acids) finding their way to the cell surface were analyzed in a technical tour de force. In aggregate they go by the fancy name of immunopeptidome. They found that the proteasome IS actually splicing peptide fragments together. This is called Proteasome Catalyzed Peptide Splicing (PCPS). The present work shows that it accounts for 1/3 of the class I immunopeptidome in terms of diversity and 1/4 in terms of abundance. One-third of self antigens are represented on the cell surface of the immune cell line they studied (GR-LCL the GR-lymphoblastoid cell line) ONLY by spliced peptides. The ordering of the spliced peptide was the same as the parent protein in only half. There was no preference for the length of the protein skipped by the splice.

The work has huge implications for immunology, not least autoimmune disease.

So today I wrote the author the following

Dr. Mishto

Terrific paper ! Do you have any evidence for the spliced peptides being spatially contiguous on the surface of the parent protein. Have you looked?

This makes a lot of sense, because the immune system should ‘want’ to recognize protein conformations as they exist in the living cell, rather than stretches of amino acid sequence in the parent protein. Also, with few exceptions the surface of a given protein in vivo is a collection of discontinuous peptide sequences of the parent protein. I’ve always wondered how the immune system did this, and perhaps your paper explains things.


and got this back almost immediately

Dear Luysii

Interesting idea. We shall have a look for few examples where the crystallography structure or the parental protein is disclosed already.



It doesn’t get any better than this. Tomorrow I will be exactly 78 years and 6 months old. It shows I can still think (on occasion).

Addendum 17 Nov ’16;  It looks as though proteins are fed into the central cavity of the proteasome as a completely denatured single strand.  See figure 5 of PNAS 113 pp 12991 -m12996 ’16.  The channel to get in appears quite narrow.

The world’s longest allosteric effect

I think there is some very interesting protein physical chemistry to be discovered/worked out based on a recent report [ Nature vol. 537 pp. 107 – 111 ’16 ]. It involves a long (2,200 Angstrom) coiled coil protein called EEA1 (Early Endosome Antigen 1). It contains 1,400 amino acids 1,275 of which form a coiled coil.

If you are conversant with the alpha helix and how two of them form a coiled coil, jump to ****. Otherwise here is some background and links to pictures which should help.

The alpha helix is a type of protein secondary structure in which the protein backbone assumes the shape of a coiled spring. There are 3.64 amino acids per turn. A single turn is 5.4 Angstroms high and 11 Angstroms wide. The alpha helix is right handed. That is to say, that if you orient the chain so that your thumb points from the N terminal to C terminal amino acid, the chain will twist in the direction of the fingers of the right hand as it rises. For some reason I can’t provide a link to a very large number of images for you hit. However, when I go to Google and type images of alpha helices you see them immediately — you’ll have to do the same to get there.

Coiled coils have two alpha helices winding around each other. This means that for secure interactions, the same types of amino acids must repeat again and again. A 7 residue periodicity (abcdefg)n in the distribution of nonpolar and charged amino acid residues is a feature characteristic of proteins which form alpha helices coiled about each other (coiled coil molecules). The 7 amino acids are lettered a – g from amino to carboxy. Positions a and d are usually hydrophobic amino acids (Leu, Ile, Val, Ala), positions e and g are usually polar or charged. The nonpolar a and d side chains associate by means of complementary knobs into holes packing. Each individual alpha helix is right handed, but the two helices wind around each other with a left handed turn. There are 3.64 amino acids per turn of an alpha helix, so for a regular repeating structure an amino acid should appear at the same position in space on the alpha helix (which forms a rigid rod). To see all the pictures you want — go to Google and type “Images of the Alpha Helix”.

To get the number of amino acids down so there are 3.5 per/turn (so the structure can repeat exactly every 7 amino acids –e.g. after 2 alpha helical turns) left handed supercoiling of each helix occurs (it’s a chicken and the egg situation). The helices are at an angle of 18 degrees to each other, and every 3.5 amino acids still form a 5.4 Angstrom (when one helix is viewed in isolation), but due to the tilt, they take up 5.1 Angstroms. This means that the same type of amino acid is found at positions 1, 8, 15, 22 etc. All intermediate filament proteins (keratin, neurofilaments, vimentin, etc.) contain a coiled coil structure. So to see all the pictures you could want — go to Google and type “Images of coiled coil proteins”

So the 1,275 amino acids of EEA1 divided by 3.5 and multiplied by 5.1 give you a coiled coil of fairly enormous length for a protein (1,858 Angstroms) — average protein diameter (if there is such a thing) is under 50 Angstroms

Functionally, EEA1 seems to be used as a tether with one end free and the other end hooked to a target membrane which wants to ‘catch’ the early endosome. The target membrane isn’t specified in the paper. Apparently EEA1 when not binding the endosome, is in a fully extended state, at around 2,000 Angstroms.

A protein called Rab5 is found on the early endosome membrane, and when EEA1 contacts it, the long coiled coil helix collapses, dragging the endosome toward the target membrane.  This is entropy in action, there being far more configurations of a collapsed protein than a rigidly extended one. To feel entropy for yourself, just pull on a rubber band, entropic effects just like this one are what you feel pulling back.

The collapse of EEA1  is an allosteric effect and a very long one, although the authors note long range allosteric effects are “not uncommon among coiled coil proteins”.

EEA1 is more complicated than initialy described. It contains amino acids which disrupt the 7 amino acid periodicity of the coiled coil (making it a jointed structure). The authors then made an EEA1 protein without the joints (so it was a perfect very long coiled coil). Binding of this protein to Rab5 on an endosome doesn’t result in collapse. So clearly normal EEA1 collapses at the ‘joints’.

The authors talk about some hypotheses as to how this happens in the Supplementary material (but I was unable to find).

So here’s a good research proejct for an enterprising grad student: either find out why and how a protein with multiple joints should exist in a fully extended configuration, or figure out how binding of Rab5 at one end of EEA1 produces such profound allosteric changes through this long linear protein. Happy hunting and thinking.

I must say it’s a pleasure to get back to chemistry after writing about the neurologic and medical issues of the presidential candidates.

Addendum 29 September — I wrote one the following to one of the authors (Dr. Grill) sending him the post above

Dr. Grill

Greatly enjoyed the paper.  I could never find the discussion of possible mechanism in the supplementary material.  You might enjoy the following post written about the paper

He replied as follows:

“Dear Luysii thank you very much for the kind words, and I really like your title!

With the supplementary discussion, besides the method part there is an additional supplement file on the Nature website that is easy to miss…I attach it here for you. We discuss this a bit more, but I must admit that this is not very satisfactory at the moment. We just don’t know how this works, and much of our efforts at the moment are dedicated to understand”
So for other readers of the original paper who also can’t find the supplement with the authors’ speculations as to what is going on– here  is what he sent.

” A key question is how Rab5 can induce such a long-range global molecular transition in flexibility of EEA1. Indeed, long-range allosteric effects have been observed for other coiled-coil proteins. In the case of myosin, the presence of discontinuities in the coiled-coil heptads drive structural changes to flexibility. Other tethering factors may bend through large breaks in coiled-coil structure acting as joints, although it remains to be shown whether and how conformational changes are triggered by Rab binding, as shown for EEA1.

Furthermore, a dynamically flexible coiled-coil is mostly extended, provided its ends are free60. However, when the ends of this coiled coil are tethered, bent, or when torsion is locally applied, compensatory structural changes are propagated and even amplified through the length of the structure. Our results suggest that a change in intrinsic static curvature may contribute but is not the major cause for the reduction in end-to-end distance. However, a more rigorous assessment would require visualizing the thermal fluctuations of the bound and unbound EEA1 very rapidly and in three dimensions.

Force generation due to entropic effects plays a key role in many processes in biology ranging from DNA cytoskeletal filaments to motor proteins. Switching a molecule from stiff to flexible could be an effective and general mechanism of many coiled-coil proteins for generating an attractive force, thereby pulling two objects together or allowing reactions otherwise hindered by polymer rigidity. Future experiments will test to what extent the entropic collapse is a general mechanism used not only by membrane tethers but also in other biological processes.”


Baudelaire comes to Chemistry

Could an evil molecule be beautiful? In Les Fleurs du Mal, a collection of poems, Baudelaire argued that there was a certain beauty in evil. Well, if there ever was an evil molecule, it’s the Abeta42 peptide, the main component of the senile plaque of Alzheimer’s disease, a molecule whose effects I spent my entire professional career as a neurologist ineffectually fighting. And yet, in a recent paper on the way it forms the fibrils constituting the plaque I found the structure compellingly beautiful.

The papers are Proc. Natl. Acad. Sci. vol. 113 pp. 9398 – 9400, E4976 – E4984 ’16. People have been working on the structure of the amyloid fibril of Alzheimer’s for decades, consistently stymied by its insolubility. The authors solved it not by Xray crystallography, not by cryoEM, but by solid state NMR. They basically looked at the distance constraints between pairs of isotopically labeled atoms, and built their model that way. Actually they built a bouquet of models using computer aided energy minimization of the peptide backbone. Another independent study produced nearly the same set.

The root mean square deviation of backbone atoms of the 10 lowest energy models of the bouquets in the two studies was small (.89 and .71 Angstroms). Even better the model bouquets of the two papers resemble each other.

There are two chains of Abeta42, EACH shaped like a double horseshoe (similar to the letter S). The two S’s meet around a twofold axis. The interface between the two S’s is form by two noncontiguous areas on each monomer (#15 – #17) and (#34 – #37).

The hydrophilic amino terminal residues (#1 – #14) are poorly ordered, but amino acids #15 – #42 are arranged into 4 short beta strands (I only see 3 obvious ones) that stack up and down the fibril into parallel in register beta-sheets. Each stack of double horseshoes forms a thread and the two threads twist around each other to form a two stranded protofilament.

Glycines allow sharp turns at the corners of the horseshoes. Hydrogen bonds between amides link the two layers of the fibrils. Asparagine side chains form ladders of hydrogen bonds up and down the fibrils. Water isn’t present between the layers because the beta sheets are so close together (counterintuitively this decreases the entropy, because water molecules don’t have to align themselves just so to solvate the side chains).

Each of the horseshoes is stabilized by hydrophobic interactions among the hydrophobic side chains buried in the core. Charged residues are solvent exposed. The interface between the two horsehoes is a hydrophobic interface.

Many of the famlial mutations are on the outer edges of double S structure — they are K16N, A21G, D23N, E22A, E22K, E22G, E22Q.

The surface hydrophobic patch formed by V40 and A42 may explain the greater rate of secondary nucleation by Abeta42 vs. Abeta40.

The cryoEM structures we have of Abeta42 are different showing the phenomenon of amyloid polymorphism.

The PNAS paper used reombinant Abeta and prepared homogenous fibrils by repeated seeding of dissolved Abeta42 with preformed fibrils. The other study used chemically synthesized Abeta and got fibrils without seeding. Details of pH, peptide concentration, salt concentration differed, and yet the results are the same, making both structures more secure.

The new structure doesn’t immediately suggest the toxic mechanism of Abeta.

To indulge in a bit of teleology — the structure is so beautiful and so intricately designed, that the aBeta42 peptide has probably been evolutionarily optimized to perform an (as yet unknown) function in our bodies. Animals lacking Abeta42’s parent (the amyloid precursor protein) don’t form neuromuscular synapses correctly, but they are viable.