Category Archives: Chemistry (relatively pure)

A beautiful molecule

OK class.  Pop quiz.  Draw the structure of C304 H264 (not a misprint).  To help you out there are 6 sharp NMR peaks (roughly the same size) between 7.8 and 8.5, a doublet at 8.6  and a large peak at 1.45.  This obviously tells you the molecule is quite symmetric.

Give up?   You can read about it’s elegant synthesis and structure in  Science vol. 363 pp. 151 – 155 ’19.  I doubt that Woodward ever thought of a molecule like this, but do not doubt that he would have loved the elegance of its synthesis, using techniques and reagents not available to him years ago.

Give up?  Does the fact the the molecule contains 40 benzene moieties help?  How about the 16 tButyl groups?

Does the fact that the dimensions of the molecule are 17.4 x 16.4 x 16.4 Angstroms help?  How about the fact that the molecule contains a central space which can comfortably fit a buckyball inside?

I’ll fill in some details tomorrow for those without access to the paper.  I’d love to see the NMR of a cyclophane derivative (which the authors haven’t made yet).

Bravo to all concerned in the paper.


Happy New Year

Off to see the grandkids.  Next post next year.  All the best to you and yours.

If you liked or were interested in and have nothing better to do over the holidays, I suggest reading Cell vol. 175 pp.1842 – 1855 ’18 in which phase changes are described at enhancers and possibly a universal phenomenon at the start of mRNA transcription by pol II.

Bye bye stoichiometry

Until recently, developments in physics basically followed earlier work by mathematicians Think relativity following Riemannian geometry by 40 years.  However in the past few decades, physicists have developed mathematical concepts before the mathematicians — think mirror symmetry which came out of string theory — You may skip the following paragraph, but here is what it meant to mathematics — from a description of a 400+ page book by Amherst College’s own David A. Cox

Mirror symmetry began when theoretical physicists made some astonishing predictions about rational curves on quintic hypersurfaces in four-dimensional projective space. Understanding the mathematics behind these predictions has been a substantial challenge. This book is the first completely comprehensive monograph on mirror symmetry, covering the original observations by the physicists through the most recent progress made to date. Subjects discussed include toric varieties, Hodge theory, Kahler geometry, moduli of stable maps, Calabi-Yau manifolds, quantum cohomology, Gromov-Witten invariants, and the mirror theorem. This title features: numerous examples worked out in detail; an appendix on mathematical physics; an exposition of the algebraic theory of Gromov-Witten invariants and quantum cohomology; and, a proof of the mirror theorem for the quintic threefold.

Similarly, advances in cellular biology have come from chemistry.  Think DNA and protein structure, enzyme analysis.  However, cell biology is now beginning to return the favor and instruct chemistry by giving it new objects to study. Think phase transitions in the cell, liquid liquid phase separation, liquid droplets, and many other names (the field is in flux) as chemists begin to explore them.  Unlike most chemical objects, they are big, or they wouldn’t have been visible microscopically, so they contain many, many more molecules than chemists are used to dealing with.

These objects do not have any sort of definite stiochiometry and are made of RNA and the proteins which bind them (and sometimes DNA).  They go by any number of names (processing bodies, stress granules, nuclear speckles, Cajal bodies, Promyelocytic leukemia bodies, germline P granules.  Recent work has shown that DNA may be compacted similarly using the linker histone [ PNAS vol.  115 pp.11964 – 11969 ’18 ]

The objects are defined essentially by looking at them.  By golly they look like liquid drops, and they fuse and separate just like drops of water.  Once this is done they are analyzed chemically to see what’s in them.  I don’t think theory can predict them now, and they were never predicted a priori as far as I know.

No chemist in their right mind would have made them to study.  For one thing they contain tens to hundreds of molecules.  Imagine trying to get a grant to see what would happen if you threw that many different RNAs and proteins together in varying concentrations.  Physicists have worked for years on phase transitions (but usually with a single molecule — think water).  So have chemists — think crystallization.

Proteins move in and out of these bodies in seconds.  Proteins found in them do have low complexity of amino acids (mostly made of only a few of the 20), and unlike enzymes, their sequences are intrinsically disordered, so forget the key and lock and induced fit concepts for enzymes.

Are they a new form of matter?  Is there any limit to how big they can be?  Are the pathologic precipitates of neurologic disease (neurofibrillary tangles, senile plaques, Lewy bodies) similar.  There certainly are plenty of distinct proteins in the senile plaque, but they don’t look like liquid droplets.

It’s a fascinating field to study.  Although made of organic molecules, there seems to be little for the organic chemist to say, since the interactions aren’t covalent.  Time for physical chemists and polymer chemists to step up to the plate.

The uses of disorder in the cell

We know that many proteins have disordered segments, and an older (2004) estimate says that over 30% of all eukaryotic proteins have disordered stretches of more than 30 amino acids.  Here is another example where the disordered conformation(s) of a protein is the form used by the cell.

Histone H1 (aka the linker histone) binds to DNA between nucleosomes.  It is thought to be important in the 10,000 or so compaction of the 3 meters or so of DNA each cell has so it fits into a 10 micron nucleus.  Histone H1 has a disordered carboxy terminal tail of 100 amino acids.  Unsurprisingly it is strongly positively charged (so it binds to the negatively charged phosphates holding DNA together).

H1 was studied in an interesting paper [ Proc. Natl. Acad. Sci. vol. 115 pp. 11964 – 11969 ’18 ].  The tail was added to short (36 basepairs) double stranded segment of DNA, under various stoichiometries and ionic compositions.  They found regions where the complex formed liquid droplets the size of microns.

We know DNA is compacted and people have looked for the 30 nanoMeter DNA fiber of DNA bound to nucleosomes for years without success.  It is possible that the compaction in DNA is due to phase separation (which is basically unstructured) rather than the rather specific structures proposed.  H1 may be acting as a likquidlike glue.  Fascinating.

In other work H1 was complexed with another protein (Prothymosin alpha) which is another intrinsically disordered protein which actually serves as a histone H1 chaperone.  Prothymosin is is polyAnionic, so it binds to polyCationic H1.  What is fascinating is that the binding is quite tight (picoMolar) and yet even when so tightly bound H1 remains disordered, something to confound drug chemists who are always looking for specific binding conformations.

The paper also describes Psi DNA, which is formed in solutions of cationic polymers. Here DNA condenses into a compact solvent excluded state.  It is an ordered assembly of B-DNA arranged in parallel twisted helical segments with a well define spacing.  It produces an anomalously large scattering signal in circular dichroism spectra.

Here is an older post in which the functional form of a protein is the unstructured one

When the active form of a protein is intrinsically disordered

Back in the day, biochemists talked about the shape of a protein, influenced by the spectacular pictures produced by Xray crystallography. Now, of course, we know that a protein has multiple conformations in the cell. I still find it miraculous that the proteins making us up have only relatively few. For details see —

Presently, we also know that many proteins contain segments which are intrinsically disordered (e.g. no single shape).The pendulum has swung the other way — “estimations that contiguous regions longer than 50 amino acids ‘may be present” in ‘up to’ 50% of proteins coded in eukaryotic genomes [ Proc. Natl. Acad. Sci. vol. 102 pp. 17002 – 17007 ’05 ]

[ Science vol. 325 pp. 1635 – 1636 ’09 ] Compared to ordered regions, disordered regions of proteins have evolved rapidly, contain many short linear motifs that mediate protein/protein interactions, and have numerous phosphorylation sites compared to ordered regions. Disordered regions are enriched in serine and threonine residues, while ordered sequences are enriched in tyrosines — this highlights functional differences in the types of phosphorylation. Interestingly tyrosines have been lost during evolution.

What are unstructured protein segments good for? One theory is that the disordered segment can adopt different conformations to bind to different partners — this is the moonlighting effect. Then there is the fly casting mechanism — by being disordered (hence extended rather than compact) such proteins can flail about and find partners more easily.

Given what we know about enzyme function (and by inference protein function), it is logical to assume that the structured form of a protein which can be unstructured is the functional form.

Not so according to this recent example [ Nature vol. 519 pp. 106 – 109 ’15 ]. 4EBP2 is a protein involved in the control of protein synthesis. It binds to another protein also involved in synthesis (eIF4E) to suppress a form of translation of mRNA into protein (cap dependent translation if you must know). 4EBP2 is intrinsically disordered. When it binds to its target it undergoes a disorder to ordered transition. However eIF4E binding only occurs from the intrinsically disordered form.

Control of 4EBP2 activity is due, in part, to phosphorylation on multiple sites. This induces folding of amino acids #18 – #62 into a 4 stranded beta domain which sequesters the canonical YXXXLphi motif with which 4EBP2 binds eIF4E (Y stands for tyrosine, X for any amino acid, L for leucine and phi for any bulky hydrophobic amino acid). So here we have an inactive (e.g. nonbonding) form of a protein being the structured rather than the unstructured form. The unstructured form of 4EBP2 is therefore the physiologically active form of the protein.

Cell biological porn

Could there actually be cell biological porn?  Yes indeedy, and hopefully the following is not behind a paywall. — [ Cell vol. 175 pp. 1430 – 1442 ’18 ]

For why I find the pictures (and videos) in the article sexy, we have to go back the bad old days of 1962 when I entered medical school and saw my first electron micrograph.  Possessed of an immense ego and a newly minted masters of chemistry, I thought I could look at the pictures and figure out what what going on chemically to produce what was seen, namely Robertson’s unit membrane.  We know what’s going on in cell membranes now, but here’s what I had to deal with back then.

Membranes fixed with osmium tetroxide revealed a characteristic tri-laminar appearance con­sisting of two parallel outer dark (osmiophilic) layers and a central light (osmiophobic) layer.

The osmiophilic layers typically measured 20-25 Å (2.0-2.5nm) in thickness and the osmiophobic layers measured 25-35 Å (2.5-3.5 nm), yielding a total thickness of 65-85 Å (6.5-8.5 nm). This value com­pared favorably with the thickness predicted on the basis of chemical studies.

According to Robertson, the unit membrane consisted of a bimolecular lipid leaflet sandwiched between outer and inner layers of protein organized in the pleated sheet con­figuration. Such an arrangement was presumed to be basically the same in all cell membranes.

Well that was the state of the art back then.  I figured I could do better, particularly since I’d used osmium tetroxide as a chemist to convert olefins to vic-diols.  Little did I know that the osmium was being used because of its high atomic weight (76 protons and over 100 neutrons) making it relatively impenetrable to the electrons of the electron microscope.

But then I looked at what was done to prepare tissue for electron microscopy — fix with glutaraldehyde, then osmium.  Dehydrate the (dead) dissue, and embed it in a monomeric resin which polymerizes to form a solid block of plastic, then cut the block, into a very thin section, place it on a copper grid covered with carbon, pump the air out so the electrons could get through, and take a picture (prayer optional).

As soon as I read this, any hope of chemical analysis disappeared.  It also taught me that it was a very large leap to assume the electron micrographs reflected what was going on in living tissue.

Which is why the above paper is so spectacular.  It uses two types of living cells (COS-7 a fibroblast like cell line from kidney and U2OS, an osteosarcoma cell line). The technique (Grazing Incidence Structured Illumination Microscopy — GI-SIM) is incredibly complicated (but well described in the paper).  It allows you to image events near the part of the cell resting on the microscope stage at 970 Angstrom resolution at rates of ‘up to’ 266 frames/second over thousands of time points.  Recall that the lowest wavelength of visible light is 3,800 Angstroms.

Various dyes are used to differentially stain microtubules, the membranes of the endoplasmic reticulumn (ER), late endosomes (LEs), mitochondria and lysosomes.  To my amazements the pictures look the electron micrographs of yore.

You can watch mitochondria touching the ER and then splitting, ER tubules growing and shrinking and being pulled along LEs riding on microtubules etc. etc. The pictures show the same cell over a period of 4 minutes.

Then to make a neurologist’s day complete, they watch dendritic spines form and unform in cultured hippocampal neurons.

So look at the paper if you can.  You don’t even have to read it th,e pictures are explanatory.

An extraordinarily impressive work, considering where we’ve been.

Lactose intolerance and the proteins of the synaptic cleft

What does lactose intolerance have to do with the zillions of proteins happily infesting the synaptic cleft?  Only someone whose mind was warped into very abstract thinking by rooming with philosophy majors in college would see a connection.

The synaptic cleft is of immense theoretical interest to neuroscientists, drug chemists and pharmacologists, and of great practical interest to people affected by neurologic and psychiatric disease either in themselves or someone they know (e.g. just about everyone).

Almost exactly a year ago I wrote a post about a great paper on the proteins of the synaptic cleft by Thomas Sudhof.  You may read the post after the *****

Well Dr. Sudhof is back with another huge review of just how synapses are formed [ Neuron vol. 100 pp. 276 – 293 ’18 ], which covers very similar ground.

It is clear that he’s depressed by the state of the field.  Here are a few quotes

“I believe that we may need to pay more attention to technical details than customary because the pressures on investigators have increased the tendency to publish preliminary results, especially results obtained with new methods whose limitations are not yet clear.”

Translation: a lot of the stuff coming out is junk.

“Given the abundance of papers reporting non-validated protein interactions that cannot possibly be all correct, it seems that confidence in a possible protein-protein interaction requires either isolation of a stable complex or biophysical measurements of interactions using recombinant purified proteins.”

Translation:  Oy vey !

“Pre- or postsynaptic specializations are surprisingly easy to induce by diverse signals. This was first shown in pioneering studies demonstrating that polylysine beads induce formation of presynaptic nerve terminals in cultured neurons and in brain in vivo.” Obviously this means that you have to be very careful when you claim that a given protein or two causes a synapse to form, which researchers have not been.”

Translation not needed.

Then on to the meat of the review.  “An impressive number of candidate synaptic Cell Adhesion Molecules (CAMs) has been described (9 classes are given each with multiple members). For some of these CAMs, compelling data demonstrate their presence in synapses and suggest a functional role in synapses. Others, however, are less well documented. If one looks at the results in total, the overall impression is puzzlement: how do so many CAMs contribute to shaping a synapse?”

Then from 281 – 286 he goes into the various CAMs, showing the extent and variety of proteins found in the synaptic cleft.  Which ones are necessary and what are they doing?  Can they all be important.  There must be some redundancy as knockout of some doesn’t do much.

Here is where lactose tolerance/intolerance comes in to offer succor to the harried investigator.

Bluntly, they must be doing something, and something important,  or they wouldn’t be there.

People with lactose intolerance have nothing wrong with the gene which breaks down lactose.  Babies have no problem with breast milk.  The enzyme (lactase)  produced from the gene is quite normal in all of us.  However 10,000 years ago and earlier, cattle were not domesticated, so there was no dietary reason for a human weaned from the breast to make the enzyme.  Something turned off lactase production — from my reading, it’s not clear what.   The control region (lactase enhancer) for the lactase gene is 14,000 nucleotides upstream from the gene itself.  After domestication of cattle, so that people could digest milk their entire lives a mutation arose changing cytosine to thymine in the enhancer.  The farthest back the mutation has been found is 6.500 years. 3 other mutations are known, which keep the lactase gene expressed past weaning.  They arose independently.  All 4 spread in the population, because back then our ancestors were in a semi-starved state most of the time, and carriers had better nutrition.

How does this offer succor to Dr. Sudhof?  Simply this, here is a mechanism to turn off production of an enzyme our ancestors didn’t need past weaning.  Don’t you think this would be the case for all the proteins found in and around the synapse.  They must be doing something or they wouldn’t be there.  I realize that this is teleology writ large, but evolutionary adaptations make you think this way.


The bouillabaisse of the synaptic cleft

The synaptic cleft is so small ( under 400 Angstroms — 40 nanoMeters ) that it can’t be seen with the light microscope ( the smallest wavelength of visible light 3,900 Angstroms — 390 nanoMeters).  This led to a bruising battle between Cajal and Golgi a just over a century ago over whether the brain was actually made of cells.  Even though Golgi’s work led to the delineation of single neurons he thought the brain was a continuous network.  They both won the Nobel in 1906.

Semifast forward to the mid 60s when I was in medical school.  We finally had the electron microscope, so we could see synapses. They showed up as a small CLEAR spaces (e.g. electrons passed through it easily leaving it white) between neurons.  Neurotransmitters were being discovered at the same time and the synapse was to be the analogy to vacuum tubes, which could pass electricity in just one direction (yes, the transistor although invented hadn’t been used to make anything resembling a computer — the Intel 4004 wasn’t until the 70s).  Of course now we know that information flows back and forth across the synapse, with endocannabinoids (e. g. natural marihuana) being the major retrograde neurotransmitter.

Since there didn’t seem to be anything in the synaptic cleft, neurotransmitters were thought to freely diffuse across it to being to receptors on the other (postsynaptic) side e.g. a free fly zone.

Fast forward to the present to a marvelous (and grueling to read because of the complexity of the subject not the way it’s written) review of just what is in the synaptic cleft [ Cell vol. 171 pp. 745 – 769 ’17 ] (It is likely behind a paywall).  There are over 120 references, and rather than being just a catalogue, the single author Thomas Sudhof extensively discusseswhich experimental work is to be believed (not that Sudhof  is saying the work is fraudulent, but that it can’t be used to extrapolate to the living human brain).  The review is a staggering piece of work for one individual.

The stuff in the synaptic cleft is so diverse, and so intimately involved with itself and the membranes on either side what what is needed for comprehension is not a chemist but a sociologist.  Probably most of the molecules to be discussed are present in such small numbers that the law of mass action doesn’t apply, nor do binding constants which rely on large numbers of ligands and receptors. Not only that, but the binding constants haven’t been been determined for many of the players.

Now for some anatomic detail and numbers.  It is remarkably hard to find just how far laterally the synaptic cleft extends.  Molecular Biology of the Cell ed. 5 p. 1149 has a fairly typical picture with a size marker and it looks to be about 2 microns (20,000 Angstroms, 2,000 nanoMeters) — that’s 314,159,265 square Angstroms (3.14 square microns).  So let’s assume each protein takes up a square 50 Angstroms on a side (2,500 square Angstroms).  That’s room for 125,600 proteins on each side assuming extremely dense packing.  However the density of acetyl choline receptors at the neuromuscular junction is 8,700/square micron, a packing also thought to be extremely dense which would give only 26,100 such proteins in a similarly distributed CNS synapse. So the numbers are at least in the right ball park (meaning they’re within an order of magnitude e.g. within a power of 10) of being correct.

What’s the point?

When you see how many different proteins and different varieties of the same protein reside in the cleft, the numbers for  each individual element is likely to be small, meaning that you can’t use statistical mechanics but must use sociology instead.

The review focuses on the neurExins (I capitalize the E  to help me remember that they are prEsynaptic).  Why?  Because they are the best studied of all the players.  What a piece of work they are.  Humans have 3 genes for them. One of the 3 contains 1,477 amino acids, spread over 1,112,187 basepairs (1.1 megaBases) along with 74 exons.  This means that just over 1/10 of a percent of the gene is actually coding for for the amino acids making it up.  I think it takes energy for RNA polymerase II to stitch the ribonucleotides into the 1.1 megabase pre-mRNA, but I couldn’t (quickly) find out how much per ribonucleotide.  It seems quite wasteful of energy, unless there is some other function to the process which we haven’t figured out yet.

Most of the molecule resides in the synaptic cleft.  There are 6 LNS domains with 3 interspersed EGFlike repeats, a cysteine loop domain, a transmembrane region and a cytoplasmic sequence of 55 amino acids. There are 6 sites for alternative splicing, and because there are two promoters for each of the 3 genes, there is a shorter form (beta neurexin) with less extracellular stuff than the long form (alpha-neurexin).  When all is said and done there are over 1,000 possible variants of the 3 genes.

Unlike olfactory neurons which only express one or two of the nearly 1,000 olfactory receptors, neurons express mutiple isoforms of each, increasing the complexity.

The LNS regions of the neurexins are like immunoglobulins and fill at 60 x 60 x 60 Angstrom box.  Since the synaptic cleft is at most 400 Angstroms long, the alpha -neurexins (if extended) reach all the way across.

Here the neurexins bind to the neuroligins which are always postsynaptic — sorry no mnemonic.  They are simpler in structure, but they are the product of 4 genes, and only about 40 isoforms (due to alternative splicing) are possible. Neuroligns 1, 3 and 4 are found at excitatory synapses, neuroligin 2 is found at inhibitory synapses.  The intracleft part of the neuroligins resembles an important enzyme (acetylcholinesterase) but which is catalytically inactive.  This is where the neurexins.

This is complex enough, but Sudhof notes that the neurexins are hubs interacting with multiple classes of post-synaptic molecules, in addition to the neuroligins — dystroglycan, GABA[A] receptors, calsystenins, latrophilins (of which there are 4).   There are at least 50 post-synaptic cell adhesion molecules — “Few are well understood, although many are described.”

The neurexins have 3 major sites where other things bind, and all sites may be occupied at once.  Just to give you a taste of he complexity involved (before I go on to  larger issues).

The second LNS domain (LNS2)is found only in the alpha-neurexins, and binds to neuroexophilin (of which there are 4) and dystroglycan .

The 6th LNS domain (LNS6) binds to neuroligins, LRRTMs, GABA[A] receptors, cerebellins and latrophilins (of which there are 4)_

The juxtamembrane sequence of the neurexins binds to CA10, CA11 and C1ql.

The cerebellins (of which there are 4) bind to all the neurexins (of a particular splice variety) and interestingly to some postsynaptic glutamic acid receptors.  So there is a direct chain across the synapse from neurexin to cerebellin to ion channel (GLuD1, GLuD2).

There is far more to the review. But here is something I didn’t see there.  People have talked about proton wires — sites on proteins that allow protons to jump from one site to another, and move much faster than they would if they had to bump into everything in solution.  Remember that molecules are moving quite rapidly — water is moving at 590 meters a second at room temperature. Since the synaptic cleft is 40 nanoMeters (40 x 10^-9 meters, it should take only 40 * 10^-9 meters/ 590 meters/second   60 trillionths of a second (60 picoSeconds) to cross, assuming the synapse is a free fly zone — but it isn’t as the review exhaustively shows.

It it possible that the various neurotransmitters at the synapse (glutamic acid, gamma amino butyric acid, etc) bind to the various proteins crossing the cleft to get their target in the postsynaptic membrane (e.g. neurotransmitter wires).  I didn’t see any mention of neurotransmitter binding to  the various proteins in the review.  This may actually be an original idea.

I’d like to put more numbers on many of these things, but they are devilishly hard to find.  Both the neuroligins and neurexins are said to have stalks pushing them out from the membrane, but I can’t find how many amino acids they contain.  It can’t find how much energy it takes to copy the 1.1 megabase neurexin gene in to mRNA (or even how much energy it takes to add one ribonucleotide to an existing mRNA chain).

Another point– proteins have a finite lifetime.  How are they replenished?  We know that there is some synaptic protein synthesis — does the cell body send packages of mRNAs to the synapse to be translated there.  There are at least 50 different proteins mentioned in the review, and don’t forget the thousands of possible isoforms, each of which requires a separate mRNA.

Old Chinese saying — the mountains are high and the emperor is far away. Protein synthesis at the synaptic cleft is probably local.  How what gets made and when is an entirely different problem.

A large part of the review concerns mutations in all these proteins associated with neurologic disease (particularly autism).  This whole area has a long and checkered history.  A high degree of cynicism is needed before believing that any of these mutations are causative.  As a neurologist dealing with epilepsy I saw the whole idea of ion channel mutations causing epilepsy crash and burn — here’s a link —’ve-found-the-mutation-causing-your-disease-not-so-fast-says-this-paper/

Once again, hats off to Dr. Sudhof for what must have been a tremendous amount of work

Thomas Gold lives !

Thomas Gold was a scientific jack of all trades being involved in physics, cosmology and geochemistry, the latter of interest to us here.  He thought petroleum and other hydrocarbons were actually produced by micro-organisms below the surface of the earth, providing us with a replenishable supply (how ecological !)  Here’s part of a Wiki article about him —  Hydrocarbons are not biology reworked by geology (as the traditional view would hold), but rather geology reworked by biology.–

Why bring him up now?  Because [ Proc. Natl. Acad. Sci. vol. 115 pp. 10702 – 10707 ’18 ] ( showed that 600 meters below the surface (where light and molecular oxygen never go) Cyanobacteria were found.  They use the electrons stripped from hydrogen (which is said to be produced in the subsurface by several (unspecified) abiotic mechanisms) as an energy source.  The electrons have to go somewhere, and they postulate that the electron acceptors are iron or manganese oxides. Wherever microorganisms have been found in deep continental settings hydrogen concentration decreases.

Basically life acts as the middleman, taking an energy cut from the flow of electrons from reductant to oxidant.

Seriously, life may have actually arisen in such situations.

Triplets and TADs

Neurologists have long been interested in triplet diseases —  The triplet is made of a string of 3 nucleotides.  Example —  cytosine adenosine guanosine or CAG — which accounts for a lot of them.  We have lots of places in our genome where such repeats normally occur, with the triplets repeated up to 42 times.  However in diseases like Huntington’s chorea the repeats get to be as many as 250 CAGs in a row.  You normally are quite fine as long as you have under 36 of them, and no one has fewer than 6 at this particular location.

Subsequently, expansions of 4, 5, and 6 nucleotide repeats have also been shown to cause disease, bring the total of repeat expansion diseases to over 40.  Why more than half of them should affect the nervous system entirely or for the most part is a mystery.  Needless to say there are plenty of theories.

This leads to three questions (1) there are repeats all over the genome, why do only 40 or so of them expand (2) since we all have repeats in front of the genes where they cause disease why don’t we all have the diseases (3) why do the number of repeats expand with each succeeding generation — the phenomenon is called anticipation.  I saw one such example where a father brought his son to my muscular dystrophy clinic.  The boy had moderately severe myotonic dystrophy.  When I shook the father’s hand, it was clear that he had mild myotonia, which had in no way impaired his life (he was a successful banker).

A recent paper in Cell may help answer the first question and has a hint about the second [ Cell vol. 175 pp. 38 – 40, 224 – 238 ’18 ].  21 of 27 disease associated short tandem repeats (daSTRs) localize to something called a topologically associated domain (TAD) or subdomain (subTAD) boundary. These are defined as contiguous intervals in the genome in which every pair has an elevated interaction frequency compared to loci out side the domain.  TADs and subTADs are measured using chromosome conformation capture assays (acronyms for them include 3C, CCC, 4C, 5C, Hi-C).

Briefly they are performed as follows.  Intact nuclei are isolate from live cel cultures.  These are subjected to paraformaldehye crosslinking to fix segment of genome in close physical proximity. The crosslinked genomic DNA is digested with a restriction endonuclease, and the products expanded by PCR using primers in all possible combinations.  Then having a complete genome sequence in hand, you see what regions of the genome got close enough together to show up in the assay.

This may help explain question one, and the paper gives some speculation about question two — we don’t all have these diseases, because unlike the unfortunates with them, we don’t have problems in our genes for DNA replication, repair and recombination.  There is some evidence for this;  studies in model organisms with these mutations do have short tandem repeat instability.

Unfortunately the paper doesn’t discuss anticipation, because no clinicians appear to be among the authors, even though they’re from Penn which 50+ years ago was very strong in clinical neurology.

None of this work discusses the fascinating questions of how the expanded repeats cause disease, or why so many of them affect the nervous system.

The Kavanaugh Ford confrontation will be to this decade what the Patty Hearst kidnapping was to a previous one  —  Since I suffered 4 episodes of physical (not sexual) abuse as a kid, and dealt with this extensively as a neurologist, I’m trying to decide whether to write about it.  Emotions are high and there are a lot of nuts out there on the net. There is even a reasonable possibility that both Ford and Kavanaugh are right and not lying.

The chemical ingenuity of the AIDs virus

Pop quiz:  You are a virus with under 10,000 nucleotides in your genome.  To make the capsid enclosing your genome, you need to make 250 hexamers of a particular protein.  How do you do it?


Give up?


You grab a cellular metabolite with a mass under 1,000 Daltons to bind the 6 monomers together.  The metabolite occurs at fairly substantial concentrations (for a metabolite) of 10 – 40 microMolar.

What is the metabolite?

Give up?


It has nearly perfect 6 fold symmetry.


Still give up?

[ Nature vol. 560 pp. 509 – 512 ’18 ] says that it’s inositol hexakisphosphate (IP6)  — nomenclature explained at the end.

Although IP6 looks like a sugar (with 6 CHOH groups forming a 6 membered ring), it is not a typical one because it is not an acetal (no oxygen in the ring).  All 6 hydroxyls of IP6 are phosphorylated.  They bind to two lysines on a short (21 amino acids) alpha helix found in the protein (Gag which has 500 amino acids).  That’s how IP6 binds the 6 Gag proteins together. The paper has great pictures.

It is likely that IP6 is use by other cellular proteins to form hexamers (but the paper doesn’t discuss this).

IP6 is quite symmetric, and 5 of the 6 phosphorylated hydroxyls can be equatorial, so this is likely the energetically favored conformation, given the bulk (and mass) of the phosphate group.

I think that the AIDS virus definitely has more chemical smarts than we do.  Humility is definitely in order.

Nomenclature note:  We’re all used to ATP (Adenosine TriPhosphate) and ADP (Adenosine DiPhosphate) — here all 3 or 2 phosphates form a chain.  Each of the 6 hydroxyls of inositol can be singly phosphorylated, leading to inositol bis, tris, tetrakis, pentakis, hexakis phosphates.  Phosphate chains can form on them as well, so IP7 and IP8 are known (heptakis?, Octakis??)

When the dissociation constant doesn’t tell you what you want to know

Drug chemists spend a lot of time getting their drugs to bind tightly to their chosen target.  Kd’s (dissociation constants) are measured with care –  But Kd’s are only  a marker for the biologic effects that are the real reason for the drug.  That’s why it was shocking to find that Kd’s don’t seem to matter in a very important and very well studied system.

It’s not the small molecule ligand protein receptor most drug chemists deal with, it’s the goings on at the immunologic synapse between antigen presenting cell and T lymphocyte (a much larger ligand target interface — 1,000 – 2,000 Angstroms^2 — than the usual site of drug/protein binding).   A peptide fragment lies down in a groove on the Major Histocompatibility Complex (pMHC) where it is presented to the T lymphoCyte Receptor (TCR) — another protein complex.  The hope is that an immune response to the parent protein of the peptide fragment will occur.


However, the Kd’s (affinities)of strong (e.g. producing an immune response) peptide agonist ligands and those producing not much (e.g. weak) are similar and at times overlapping.  High affinity yet nonStimulatory interactions occur with high frequency in the human T cell repertoire [ Cell vol. 174 pp. 672 – 687 ’18 ].  The authors  determined the structure of both weak and strong ligands bound to the TCR.  One particular TCR had virtually the same structure when bound to strong and weak agonist ligands. When studied in two dimensional membranes, the dwell time of ligand with receptor didn’t distinguish strong from weak antigens (surprising).

In general the Kds  pMHC/TCR  are quite low — not in the nanoMolar range beloved by drug chemists (and found in antigen/antibody binding), but 1000 times weaker in the micromolar range.  So [ Proc. Natl. Acad. Sci. vol. 115 pp. E7369 – E7378 ’18 ] cleverly added an extra few amino acids which they call molecular velcro, to boost the affinity x 10 (actually this decreases Kd tenfold).

One rationale for the weak binding is that it facilitates scanning by the TCR of  the pMHC  repertoire allowing the TCR to choose the best.  So they added the velcro, expecting the repertoire to be less diverse (since the binding was tighter).  It was just the same. Again the Kd didn’t seem to matter.

Even more interesting, the first paper noted that productive TCR/pMHC bonds had catch bonds — e.g. bonds which get stronger the more you pull on them. The authors were actually able to measure the phenomenon. Catch bonds been shown to exist in a variety of systems (white cells sticking to blood vessel lining, bacterial adhesion), but their actual mechanism is still under debate.  The great thing about this paper (p. 682) is molecular dynamics simulation showed the conformational changes which occurred during catch bond formation in one case..   They even have videos.  Impressive.

This sort of thing is totally foreign to all solution chemistry, as there is no way to pull on a bond in solution.  Optical tweezers allow you to pull and stretch molecules (if you can attach them to large styrofoam balls).