Category Archives: Chemistry (relatively pure)

The emperor has no clothes

As an old organic chemist, I’ve always been fascinated with size of proteins (n functional groups in a protein of length n — not counting the amide bonds), and the myriad of shapes they can assume.  It seems nothing short of miraculous (to me at least) that the proteins making us up assume just a few shapes out of the nearly 3^n possible shapes (avoiding self intersection removes a few).

This has been ‘explained’ by the potential energy funnel, down which newly formed proteins slide to their final few destinations.  Now I took quantum mechanics 56+ years ago, and back then a lot of heavy lifting was required just to calculate the potential energy surface required to bring two hydrogen atoms together to form molecular hydrogen.

I’ve never seen a potential energy surface for a protein actually calculated, and I’m not sure molecular dynamics simulations do this (please correct me if I’m wrong).

So I was glad to see the following in a paper by


Jacob Gershon-Cohen Professor of Medical Science
Professor of Biochemistry and Biophysics

at my alma mater Penn Med (the hell with the Perelman’s, Penn sold themselves out to the Perelman’s very cheaply).

“A critical feature of the funneled ELT (Energy Landscape Theory) model is that the many-pathway residue-level conformational search must be biased toward native-like interactions. Otherwise, as noted by Levinthal , an unguided random search would require a very long time. How this bias might be implemented in terms of real protein interactions has never been discovered. One simply asserts that natural evolution has made it so, formulates this view as a so-called principle of minimal frustration, and attributes it to the shape of the funneled energy landscape. 

 Proc. Natl. Acad. Sci. vol. 114 pp. 8253 – 8258 ’17.

So the potential energy funnel of energy landscape theory is not something you can calculate explicitly (like a gravitational or an electrical potential), but just a high-falutin’ description of what happens inside our cells, masquerading as an explanation.

So when does a description become an explanation?  Newton famously said Hypotheses non fingo (Latin for “I feign no hypotheses” when discussing the action at a distance which his theory of gravity entailed.

Well it becomes an explanation when you can use the description to predict and define new phenomena — e.g. using Newton’s laws to send a projectile to Jupiter, using Einstein’s theory of gravitation to predict black holes and gravitational waves etc. etc.

In this sense Energy Landscape Theory is just words.  If it wasn’t you could predict the shape an arbitrary string of amino acids would assume (and you can’t).  Theory does work fairly well when folding algorithms are given a protein of known shape (but not published), but try them out on an arbitrary string — which I don’t think has been done.

But it gets worse.  ELT sweeps the problem of why a protein should have one (or a few) shapes under the rug, by assuming that they do.  I’m far from convinced that this is the case in general, which means that the proteins which make us up are quite special.

I’ll conclude with an earlier post on this subject, which basically says that an experiment to decide the issue, while possible in theory is physically impossible to fully perform.

A chemical Gedanken experiment

This post is mostly something I posted on the Skeptical Chymist 2 years ago.  Along with the previous post “Why should a protein have just one shape (or any shape for that matter)” both will be referred to in the next one –“Gentlemen start your motors”, concerning the improbability of the chemistry underlying our existence and whether it is reasonable to believe that it arose by chance.

In the early days of quantum mechanics Einstein and Bohr threw thought experiments (gedanken experiments) at each other like teenagers tossing cherry bombs.  None of the gedanken experiments were regarded as remotely possible back then, although thanks to Bell and Aspect, quantum nonlocality and entanglement now have a solid experimental basis.  To read more about this you can’t do much better than “The Age of Entanglement” by Louisa Gilder.

Frankly, I doubt that most strings of amino acids have a dominant shape (e.g., biological meaning), and even if they did, they couldn’t find it quickly enough (theLevinthal paradox).  For details see the previous post.

How would you prove me wrong? The same way you’d prove a pair of dice was loaded. Just make (using solid-phase protein synthesis a la Merrifield) a bunch of random strings of amino acids (each 41 amino acids long) and see how many have a dominant shape. Any sequence forming a crystal does have a dominant shape, if the sequence doesn’t crystallize, use NMR to look at it in solution. You can’t make all of them, because the earth doesn’t have enough mass to do so (see “ That’s why this is a gedanken experiment — it can’t possibly be performed in toto.

Even so, the experiment is over (and I’m wrong) if even 1% of the proteins you make turn out to have a dominant shape.

However, choosing a random string of amino acids is far from trivial. Some amino acids appear more frequently than others depending on the protein. Proteins are definitely not a random collection of amino acids. Consider collagen. In its various forms (there are over 20, coded for by at least 30 distinct genes) collagen accounts for 25% of body protein. Statistically, each of the 20 amino acids should account for 5% of the protein, yet one amino acid (glycine) accounts for 30% and proline another 15%. Even knowing this, the statistical chances of producing 300 copies in a row of glycine–any amino acid–any amino acid by a random distribution of the glycines are less than zilch. But one type of bovine collagen protein has over 300 such copies in its 1042 amino acids.

One further example of the nonrandomness of proteins. If you were picking out a series of letters randomly hoping to form a word, you would not expect a series of 10 ‘a’s to show up. But we normally contain many such proteins, and for some reason too many copies of the repeated amino acid produce some of the neurological diseases I (ineffectually) battled as a physician. Normal people have 11 to 34 glutamines in a row in a huge (molecular mass 384 kiloDaltons — that’s over 3000 amino acids) protein known as huntingtin. In those unfortunate individuals withHuntington’s chorea, the number of repeats expands to over 40. One of Max Perutz’s last papers [Proc. Natl. Acad. Sci. USA 99, 5591–5595 (2002)] tried to figure out why this was so harmful.

On to the actual experiment. Suppose you had made 1,000,000 distinct random sequence proteins containing 41 amino acids and none of them had a dominant shape. This proves/disproves nothing. 10^6 is fewer than the possibilities inherent in a string of 5 amino acids, and you’ve only explored 10^6/(20^41) of the possibilities.

Would Karl Popper, philosopher of science, even allow the question of how commonly proteins have a dominant shape to be called scientific? Much of what I know about Popper comes from a fascinating book “Wittgenstein’s Poker” and it isn’t pleasant. Questions not resolvable by experiment fall outside Popper’s canon of questions scientific. The gedanken experiment described can resolve the question one way, but not the other. In this respect it’s like the halting problem in computer science (there is no general rule to tell if a program will terminate).

Would Ludwig Wittgenstein, uberphilosopher, think the question philosophical? Probably not. His major work “Tractatus Logico-Philosophicus” concludes with “What we cannot speak of we must pass over in silence”. While he’s the uberphilosopher he’s also the antiscientist. It’s exactly what we don’t know which leads to the juiciest speculation and most creative experiments in any field of science. That’s what I loved about organic chemistry years ago (and now). It is nearly always possible to design a molecule from scratch to test an idea. There was no reason to make [7]paracyclophane, other than to get up close and personal with the ring current.

If the probability or improbability of our existence, to which the gedanken experiment speaks, isn’t a philosophical question, what is?

Back then, this post produced the following excellent comment.

I’m not sure your assessment of what Popper would regard as science is accurate. Popper advocated “falsifiability”, i.e. that a statement cannot be proved true, only false. Non-scientific statements are those for which evidence that they are false cannot be found. You are in fact giving a perfect example of a situation where falsifiability is useful. If you tested, as you suggested, a million random proteins and many of them formed structures reliably, this would in fact disprove the hypothesis fairly conclusively (if only probabilistically). The fact that the test was passed by the first million proteins would be evidence that the theory was true (though obviously not concrete).

Also, it is relatively easy to choose what random proteins to make. Just use a random number generator (a pseudorandom generator would do too, probably). It doesn’t matter that they would be unlikely to produce a specific sequence generated in nature, as we are looking at specifically wanting to look at random sequences. The idea that 300 glycines is particularly unusual if protein generation is random is probably one which should be treated with a degree of caution. As the sequence was not specified as an unusual sequence beforehand, there are a large number of possible sequences that you could have seized on, and so care is needed.

This is only the most obvious experiment that could be carried out to test this idea, and I’m sure with advances, there is the distinct chance that more ingenious ways could be devised.

Additionally the the mass restriction is not in fact terribly useful except as an illustration that there is a massively large number of proteins, as once you have made and tested a protein, you can in fact reuse its atoms to make another protein.

Finally, I haven’t read Wittgenstein, but that final quote does not really support your statement that he is “anti-science” or would be against the production of novel cyclophanes. Organic chemistry clearly lies in the realm of “what we can speak”, as we are in fact speaking about it.

Posted by: MCliffe

My response —

MCliffe — thank you for your very thoughtful comments on the post. It’s great to know that someone out there is reading them.

Popper and the logical positivists solved many philosophical problems by declaring them meaningless (which Popper later took to mean not falsifiable). Things got to such a point in the 50s that Bertrand Russell was moved to came up with the meaningful (to most) but non-falsifiable statement — In the event of a nuclear war we shall all be dead.

You are quite right that it is easy to make a random sequence of amino acids using a computer. It’s been shown again and again that our intuitive notion of randomness is usually incorrect. I chose collagen because it is the most common protein in our body, and because it is highly nonrandom. Huntingtin was used because I dealt with its effects as a Neurologist (and because there are 8 more diseases with too many identical amino acids in a row — all of which for some unfathomable reason produce neurologic disease — they are called triplet diseases because it takes 3 nucleotides of DNA to code for a single amino acid).

Even accepting 300 glycines in 1000 or so amino acids (collagen) and putting that frequency into the random generator and turning it on, we would not expect those 300 glycines to appear at position n, position n+4, n+7, . . . , n + 898 randomly.

The idea of using the atoms over and over to escape the mass restriction is clever. Unfortunately it runs up against a time restriction. Let us suppose there is a super-industrious post-doc who can make a new protein every nanosecond (reusing the atoms). There are 60 * 60 * 24 * 365 = 31,536,000 ~ 10^7 seconds in a year and 10^10 years (more or less) since the big bang. This is 10^9 * 10^7 * 10 ^10 = 10^26 different proteins he could make since the dawn of time. But there are 20^41 = 2^41 * 10^41 proteins of length 41 amino acids. 2^41 = 2,199,023,255,552 = 10^12. So he has only tested 10^26 of 10^53 possible 41 amino acid proteins in all this time.

This is what I was getting at by saying the the gedanken experiment was not a priori falsifiable — we lack the time, space and mass to run it to completion. As you note, it could well end quite early if I’m wrong. Suppose 10^9/10^53 of the proteins DO have a dominant shape — the postdoc will be very unlikely to find any of them.

I think your final point is well taken. My reading of “Wittgenstein’s Poker” is that what he was saying in his last sentence really was “What we cannot speak (with certainty) of we must pass over in silence”. We cannot speak of the outcome of this Gedanken experiment with any degree of certainty.

Once again Thanks

What our DNA looks like inside a living cell

Time to rewrite the textbooks.  DNA in the living cell looks nothing like the pictures that have appeared in textbooks for years. Gone are the 30 nanoMeter fiber and higher order structures.

Here is the old consensus of how DNA in the nucleus is organized.

There are two different structural models of the 30 nanoMeter fiber (1) solenoid — diameter 33 nanoMeters with 6 nucleosomes ever 11 nanoMeters along the axis (2) two start zigzag fiber — diameter 27 – 30 nanoMeters with 5 – 6 nucleosomes every 11 nanoMeters.
The 30 nanoMeter fiber is throught to assemble into helically folded 120 nanoMeter chromonema, 300  – 700 nanoMeter chromitids and mitotic chromosomes (1,400 nanoMeters).     The chromonema structures 9measured between 100 and 130 nanoMeters) are based on electron micrographic studies of permeabilized nuclei from which other components have been extracted with detergenes and high salt to visualize chromatin — hardly physiologic.

Got all that?  Good, now forget it.  It’s wrong.

First off, forget nanoMeters.  Organic chemists think in Angstroms — the diameter of the smallest atom Hydrogen is almost exactly 1 Angstrom, making it the perfect organic chemical yardstick.  If you must think in nanoMeters, just divide the number of Angstroms by 10.

First, a few numbers to get started.
 The classic form of DNA is B-DNA (this is still correct).  Each nucleotide pair is 3.4 Angstroms above the next and there are 10.4 nucleotides per turn of the helix  (so 1 full turn of B DNA is 35.36 Angstroms).  The diameter of B-DNA is 19 Angstroms.

The nucleosome consists of 147 bases of DNA wrapped around a central mass made of 8 histone proteins. The histone octamer is made of two copies each of histones H2A, H2B, H3 and H4.  The core particle in its entirety is 100 Angstroms in diameter and 57 Angstroms along the axis of the disk and possesses nearly dyadic symmetry.  There are 1.65 turns of DNA around the histone octamer, and during the trip there are 14 contact points between histones and DNA.

Now on to the actual paper [ Science vol. 357 pp. 354 – 355, 370, eaag0025 1 –> 13 ‘ 17 ]  The movies contained within alone are worth a year’s subscription to Science.

To visualize DNA in the living cells the authors invented a technique called  Chromatin Electron Microscopy Tomography (ChromEMT).

 DNA is transparent to electrons.  They use a fluorescent  DNA binding dye (Deep Red fluorescing AnthraQuinone Nr.5  ). For a structure see —;2-7/full.  It has 3 probably aromatic rings fused together like anthracene, so it could easily intercalate between the bases of the double helix.   Then there are OH groups and amines to bind to the backbone.  The dye gets into cells easily.  Most importantly, DRAQ5 produces reactive oxygen species when hit by the right kind of light.  Somehow they get diaminobenzidine in the cells, which the reactive oxygen species polymerizes to polybenzimidazole.

 We’re not done yet.  The polymer is also transparent to electrons, but it can react with good old Osmium tetroxide (which is electron dense). permitting visualization of DNA on electron microscopy (at last)

  The technique is the first that can be used in living cells.  It shows that most chromatin in the nucleus is mostly organized as a disordered polymer of 50 to 240 Angstroms diameter.   This is consistent with beads on a string (with nucleosomes being the beads).  They found little evidence for higher order structures (the 300 to 1,200 Angstrom fibers of classic textbook models — which is in fact based on in vitro visualization of non-native chromatin. The 30 nanoMeter chromatin fiber (300 Angstroms) is nowhere to be seen.  However, they do find 300 Angstrom fibers using their new  method but only in nuclei purified from hypotonically lysed chicken RBCs treated with MgCl2 (hardly physiologic).

       They were able to make a movie of an electron micrograph in the nucleus using eight tilts of the stage There is more DNA at the nuclear rim (as that’s where the heterochromatin is mostly), but you still see the little 5 – 24  nanoMeter circles (just more of them the closer you get to the nuclear membrane).
      Another movie of a mitotic chromosome shows the same little circles (50 – 240 Angstroms) just packed together more closely.  You just see a lot of them, but there is no obvious bunching of them into higher structures.
     The technique (ChromEMT is amazing in that it allows the ultrastructure of individual chromatin chains, megabase domains and mitotitc chromosomes to be resolved and visualized as a continuum in serial slices.   The found that the 5 – 12 and 12 – 24 chromatin diameters were the same regardless of how heavily the chromatin was compacted.
      The paper is incredible and worth a year’s subscription to Science.  It likely is behind a paywall.
It’s hard to get your mind around the amount of compaction involved in getting the meter of DNA of the human genome into a nucleus.  Molecular Biology of the Cell 4th Edition p. 198 put it this way —  Compacting the meter of DNA into a 6 micron nucleus is like compacting 24 miles of very fine thread into a tennis ball.
I actually wrote a series of posts, trying to put the amount of compaction into human scale.  Here is the first post — follow the links at the end to the others.

The cell nucleus and its DNA on a human scale – I

The nucleus is a very crowded place, filled with DNA, proteins packing up DNA, proteins patching up DNA, proteins opening up DNA to transcribe it etc. Statements like this produce no physical intuition of the sizes of the various players (to me at least).  How do you go from the 1 Angstrom hydrogen atom, the 3.4 Angstrom thickness per nucleotide (base) of DNA, the roughly 20 Angstrom diameter of the DNA double helix, to any intuition of what it’s like inside a spherical nucleus with a diameter of 10 microns?

How many bases are in the human genome?  It depends on who you read — but 3 billion (3 * 10^9) is a lowball estimate — Wikipedia has 3.08, some sources have 3.4 billion.  3 billion is a nice round number.  How physically long is the genome?  Put the DNA into the form seen in most textbooks — e.g. the double helix.  Well, an Angstrom is one ten billionth (10^-10) of a meter, and multiplying it out we get

3 * 10^9 (bases/genome) * 3.4 * 10^-10 (meters/base) = 1 (meter).

The diameter of a typical nucleus is 10 microns (10 one millionths of a meter == 10 * 10^-6 = 10^-5 meter.   So we’ve got fit the textbook picture of our genome into something 1/100,000 smaller. We’ll definitely have to bend it like Beckham.

As a chemist I think in Angstroms, as a biologist in microns and millimeters, but as an American I think in feet and inches.  To make this stuff comprehensible, think of driving from New York City to Seattle.  It’s 2840 miles or 14,995,200 feet (according to one source on the internet). Now we’re getting somewhere.  I know what a foot is, and I’ve driven most of those miles at one time or other.  Call it 15 million feet, and pack this length down by a factor of 100,000.  It’s 150 feet, half the size of a (US) football field.

Next, consider how thick DNA is relative to its length.  20 Angstroms is 20 * 10^-10 meters or 2 nanoMeters (2 * 10^-9 meters), so our DNA is 500 million times longer than it is thick.  What is 1/500,000,000 of 15,000,000 feet?  Well, it’s 3% of a foot which is .36  of an inch, very close to 3/8 of an inch.   At least in my refrigerator that’s a pair of cooked linguini twisted around each other (the double helix in edible form).  The twisting is pretty tight, a complete turn of the two strands every 35.36 angstroms, or about 1 complete turn every 1.5 thicknesses, more reminiscent of fusilli than linguini, but fusilli is too thick.  Well, no analogy is perfect.  If it were, it would be a description.   One more thing before moving on.

How thinly should the linguini be sliced to split it apart into the constituent bases?  There are roughly 6 bases/thickness, and since the thickness is 3/8 of an inch, about 1/16 of an inch.  So relative to driving from NYC to Seattle, just throw a base out the window every 1/16th of an inch, and you’ll be up to 3 billion before you know it.

You’ve been so good following to this point that you get tickets for 50 yardline seats in the superdome.  You’re sitting far enough back so that you’re 75 feet above the field, placing you right at the equator of our 150 foot sphere. The north and south poles of the sphere are over the 50 yard line. halfway between the two sides.  You are about to the watch the grounds crew pump 15,000,000 feet of linguini into the sphere. Will it burst?  We know it won’t (or we wouldn’t exist).  But how much of the sphere will the linguini take up?

The volume of any sphere is 4/3 * pi * radius^3.  So the volume of our sphere of 10 microns diameter is 4/3 * 3.14 * 5 * 5 * 5 *  = 523 cubic microns. There are 10^18 cubic microns in a meter.  So our spherical nucleus has a volume of 523 * 10^-18 cubic meters.  What is the volume of the DNA cylinder? Its radius is 10 Angstroms or 1 nanoMeter.  So its volume is 1 meter (length of the stretched out DNA) * pi * 10^-9 * 10^-9 meters 3.14 * 10^-18 cubic meters (or 3.14 cubic microns == 3.14 * 10^-6 * 10^-6 * 10^-6

Even though it’s 15,000,000 feet long, the volume of the linguini is only 3.14/523 of the sphere.  Plenty of room for the grounds crew who begin reeling it in at 60 miles an hour.  Since they have 2840 miles of the stuff to reel in, we’ll have to come back in a few days to watch the show.  While we’re waiting, we might think of how anything can be accurately located in 2840 miles of linguini in a 150 foot sphere.

Here’s a link to the next paper in the series

Should you take aspirin after you exercise?

I just got back from a beautiful four and a half mile walk around a reservoir behind my house.  I always take 2 adult aspirin after such things like this.  A recent paper implies that perhaps I should not [ Proc. Natl. Acad. Sci. vol. 114 pp. 6675 – 6684 ’17 ].  Here’s why.

Muscle has a set of stem cells all its own.  They are called satellite cells.  After injury they proliferate and make new muscle. One of the triggers for this is a prostaglandin known as PGE2 — — clearly a delightful structure for the organic chemist to make.  It binds to a receptor on the satellite cell (called EP4R) following which all sorts of things happen, which will make sense to you if you know some cellular biochemistry.  Activation of EP4R triggers activation of the cyclic AMP (CAMP) phosphoCREB pathway.  This activates Nurr1, a transcription factor which causes cellular proliferation.

Why no aspirin? Because it inhibits cyclo-oxygenase which forms the 5 membered ring of PGE2.

I think you should still aspirin afterwards, as the injury produced in the paper was pretty severe — muscle toxins, cold injury etc. etc. Probably the weekend warriors among you don’t damage your muscles that much.

A few further points about aspirin and the NSAIDs

Now aspirin is an NSAID (NonSteroid AntiInflammatory Drug) — along with a zillion others (advil, anaprox, ansaid, clinoril, daypro, dolobid, feldene, indocin — etc. etc. a whole alphabet’s worth). It is rather different in that it has an acetyl group on the benzene ring.  Could it be an acetylating agent for things like histones and transcription factors, producing far more widespread effects than those attributable to cyclo-oxygenase inhibition.   I’ve looked at the structures of a few of them — some have CH2-COOH moieties in them, which might be metabolized to an acetyl group –doubt.  Naproxen (Anaprox, Naprosyn) does have an acetyl group — but the other 13 structures I looked at do not.

Another possible negative of aspirin after exercise, is the fact that inhibition of platelet cyclo-oxygenase makes it harder for them to stick together and form clots (this is why it is used to prevent heart attack and stroke). So aspirin might result in more extensive micro-hemorrhages in muscle after exercise (if such things exist).

Remember entropy? — Take II

Organic chemists have a far better intuitive feel for entropy than most chemists. Condensations such as the Diels Alder reaction decrease it, as does ring closure. However, when you get to small ligands binding proteins, everything seems to be about enthalpy. Although binding energy is always talked about, mentally it appears to be enthalpy (H) rather than Gibbs free energy (F).

A recent fascinating editorial and paper [ Proc. Natl. Acad. Sci. vol. 114 pp. 4278 – 4280, 4424 – 4429 ’17 ]shows how the evolution has used entropy to determine when a protein (CzrA) binds to DNA and when it doesn’t. As usual, advances in technology permit us to see this (e.g. multidimensional heteronuclear nuclear magnetic resonance). This allows us to determine the motion of side chains (methyl groups), backbones etc. etc. When CzrA binds to DNA methyl side chains on the protein move more, increasing entropy (deltaS) and as well all know the Gibbs free energy of reaction (deltaF) isn’t just enthalpy (deltaH) but deltaH – TdeltaS, so an increase in deltaS pushes deltaF lower meaning the reaction proceeds in that direction.

Binding of Zinc redistributes these side chain motion so that entropy decreases, and the protein moves off DNA. The authors call this dynamics driven allostery. The fascinating thing, is that this may happen without any conformational change of CzrA.

I’m not sure that molecular dynamics simulations are good enough to pick this up. Fortunately newer NMR techniques can measure it. Just another complication for the hapless drug chemist thinking about protein ligand interactions.

A recent paper [ Proc. Natl. Acad. Sci. vol. 114 pp. 6563  – 6568 ’17 ] went into more detail about measuring side chain motions  as a surrogate for conformational entropy.  It can now be measured by NMR.  They define complete restriction of  the methyl group symmetry axis as 1, and complete disorder, and state that ‘a variety of models’ imply that the value is LINEARLY related to conformational entropy making it an ‘entropy meter’.  They state that measurement of fast internal side chain motion is largely restricted to the methyl group — this makes me worry that other side chains (which they can’t measure) are moving as well and contributing to entropy.

The authors studied some 28 protein/ligand systems, and found that the contribution of conformational entropy to ligand binding can be favorable, negligible or unfavorable.

What is bothersome to the authors (and to me) is that there were no obvious structural correlates between the degree of conformation entropy and protein structure.  So it’s something you measure not something you predict, making life even more difficult for the computational chemist studying protein ligand interactions.

Entangled points

The terms Limit point, Cluster point, Accumulation point don’t really match the concept point set topology is trying to capture.

As usual, the motivation for any topological concept (including this one) lies in the real numbers.

1 is a limit point of the open interval (0, 1) of real numbers. Any open interval containing 1 also contains elements of (0, 1). 1 is entangled with the set (0, 1) given the usual topology of the real line.

What is the usual topology of the real line? (E.g. how are its open sets defined) It’s the set of open intervals) and their infinite unions and their finite intersection.

In this topology no open set can separate 1 from the set ( 0, 1) — e.g. they are entangled.

So call 1 an entangled point.This way of thinking allows you to think of open sets as separators of points from sets.

Hausdorff thought this way, when he described the separation axioms (TrennungsAxioms) describing points and sets that open sets could and could not separate.

The most useful collection of open sets satisfy Trennungsaxiom #2 — giving a Hausdorff topological space. There are enough of them so that every two distinct points are contained in two distinct disjoint open sets.

Thinking of limit points as entangled points gives you a more coherent way to think of continuous functions between topological spaces. They never separate a set and any of its entangled points in the domain when they map them to the target space. At least to me, this is far more satisfactory (and actually equivalent) to continuity than the usual definition; the inverse of an open set in the target space is an open set in the domain.

Clarity of thought and ease of implementation are two very different things. It is much easier to prove/disprove that a function is continuous using the usual definition than using the preservation of entangled points.

Organic chemistry could certainly use some better nomenclature. Why not call an SN1 reaction (Substitution Nucleophilic 1) SN-pancake — as the 4 carbons left after the bond is broken form a plane. Even better SN2 should be called SN-umbrella, as it is exactly like an umbrella turning inside out in the wind.

What is docosahexenoic acid and why should you care?

Why should drug chemists care about docosahexenoic acid — it’s a fairly trivial organic structure as these things go – a 22 carbon straight chain carboxylic acid with 6 double bonds — However the structure is decidedly non-random (see later)

Docosahexenoic acid turns out to be crucial for the function of the blood brain barrier (BBB), something that makes it very difficult to get drugs into the brain. Years of work have shown that the only drugs able to get through the BBB are small lipid soluble molecules of mass under 400 kiloDaltons with fewer than 9 hydrogen bonds. Certainly not a large group of drugs. The more we know about the BBB, the more likely we’ll be able to figure out something to circumvent it.

The BBB was known to exist more than 100 years ago. Ehrlich found that dyes injected into the circulation were rapidly taken up by all organs except the brain. His student E. Goldmann found that dye injected into the CSF stained the brain but not other organs.

The barrier has at least two components — (1) a very tight seal between the cells lining brain blood vessels (e.g. the endothelium) — see the end of the post — (2)very low transfer across the endothelial cell from the vessel lumen. The latter is called transcytosis and involves formation of small vesicles at the lumenal surface of the endothelium, migration across the endothelial cell with release of vesicle content on the other side.

In general there are two mechanisms of transcytosis — clathrin coated pits, and caveolae. Brain endothelium shows very low rates of transcytosis. There aren’t any coated pits (no explanation I can find) and the rate of caveolar transcytosis is very low.

Dococsahexaenoic acid is the reason for the low rate of caveolar transcytosis. Here is why.

[ Nature vol. 509 pp. 432 – 433, 503 – 506, 507 – 511 ’14 Neuron vol. 82 pp. 728 – 730 ’14 ] An orphan transporter, MFSD2a (Major Facilitator Superfamily Domain containing 2a) is selectively expressed in the BBB endothelium. It is REQUIRED for formation and maintenance of BBB integrity. Animals lacking MFSD2a show uninhibited bulk transcytosis across the endothelium. The animals show no obvious defects in the junctions between the endothelial cells. Pericytes (cells in the brain layer after the endothelium) are important in keeping the levels of MFSD2a at normal levels as animals lacking them show the same defects in the BBB as those lacking MFSD2a. Even though knockouts don’t have much of a BBB, they have normal patterning of vascular networks.

MFSD2a is the major transporter of docohexaenoic acid (DHA), an omega3 fatty acid (more later). DHA isn’t made in the brain and must be transported into it. Knockouts have reduced levels of DHA in the brain accompanied by neuronal loss in the hippocampus and cerebellum and microcephaly. Human cases due to mutation are now known (11/15). Transport of DHA and fatty acids into the brain across the BBB occurs only in the form of esters with lysophosphatidylcholines (LPCs) but not as free fatty acids in a sodium dependent manner. The phospho-zwitterionic headgroup of of LPC is essential for transport. MFSD2a ‘prefers’ long chain fatty acids (oleic, palmitic), failing to transport fatty acids with chain lengths under 14.

So MFSD2a inhibits transcytosis at the same time it promotes fatty acid transport into the brain. Major Facilitator Superfamily (MFS) proteins use the electrochemical potential of the cell to transport substrates. The best known MFSs are the glucose transporters (GLUT1 – 4).

So the blood brain barrier is due in part to the lipid transport activity of MFSD2a which gives BBB endothelium a different lipid composition (with lots of docosahexenoic acid) ) than others, inhibiting caveolar transport. Increased DHA levels are associated with membrane cholesterol depletion, as well as displacement of caveolin1 (the major protein involved in this form of transcytosis) from caveolae.

It is likely that MFSD2A acts as a lipid flippase, transporting phospholipids, including DHA containing species from the outer to the inner plasma membrane leaflet (where caveolin1 binds).

What is so hot about docosahexenoic acid — 22 carbons all in a row, a carboxyl group and 6 double bonds. We’re not talking fused ring systems, alkaloids, bizarre functional groups etc. etc.

Half the answer is that the double bonds are NOT randomly arranged. The 6 occur all in a row (but with methylene groups between them). This tells the chemist that they are not conjugated, hence the chain is probably not straight. Think how unlikely the arrangement is considering the way 6 double bonds and 9 methylenes COULD be arranged in a chain (2^15). Answer 5 ways depending on where the arrangement starts relative to the end of the chain.

The other half is that all the double bonds are cis, making it very unlikely that the 21 carbon chain can straighten out and cross the membrane. Lots of DHA means a very disordered membrane, which may be impossible to caveolin1 (and clathrin) to bind to.

So even though it’s years and years since I left organic chemistry, it permits the enjoying of the biochemical esthetics of the blood brain barrier.

The tight junctions between endothelial cells are primarily responsible for barrier function. These tight junctions are found only in the capillaries and postcapillary venules of the brain. Endothelial cells of the brain have few pinocytotic vesicles and fenestriae. [ Neuron vol. 71 p. 408 ’11 ] The brain vasculature has the thinnest endothelial cells, with the tightest junction and a higher degree of pericyte coverage coverage (‘up to’ 30%). [ Neuron vol. 78 pp. 214 – 232 ’13 ] The tight junctions are made from occludin, claudins and junctional adhesion molecules, and are closer to the lumen than the adherens junctions (which also link endothelial cells to each other) made by the cadherins (E, P and N). (ibid p. 219) TLR2/6 specific stimuli.

Remember entropy?

Organic chemists have a far better intuitive feel for entropy than most chemists. Condensations such as the Diels Alder reaction decrease it, as does ring closure. However, when you get to small ligands binding proteins, everything seems to be about enthalpy. Although binding energy is always talked about, mentally it appears to be enthalpy (H) rather than Gibbs free energy (F).

A recent fascinating editorial and paper [ Proc. Natl. Acad. Sci. vol. 114 pp. 4278 – 4280, 4424 – 4429 ’17 ]shows how the evolution has used entropy to determine when a protein (CzrA) binds to DNA and when it doesn’t. As usual, advances in technology permit us to see this (e.g. multidimensional heteronuclear nuclear magnetic resonance). This allows us to determine the motion of side chains (methyl groups), backbones etc. etc. When CzrA binds to DNA methyl side chains on the protein move more, increasing entropy (deltaS) and as well all know the Gibbs free energy of reaction (deltaF) isn’t just enthalpy (deltaH) but deltaH – TdeltaS, so an increase in deltaS pushes deltaF lower meaning the reaction proceeds in that direction.

Binding of Zinc redistributes these side chain motion so that entropy decreases, and the protein moves off DNA. The authors call this dynamics driven allostery. The fascinating thing, is that this may happen without any conformational change of CzrA.

I’m not sure that molecular dynamics simulations are good enough to pick this up. Fortunately newer NMR techniques can measure it. Just another complication for the hapless drug chemist thinking about protein ligand interactions.

The incredible combinatorial complexity of cellular biochemistry

K8, K14, K20, T92, P125, S129, S137, Y176, T195, K276, T305, T308, T312, P313, T315, T326, S378, T450, S473, S477, S479. No, this is not some game of cosmic bingo. They represent amino acid positions in Protein Kinase B (AKT).

In the 1 letter amino acid code K is lysine T, threonine, S serine, P proline, Y tyrosine.

All 21 amino acids are modified (or not) one of them in 3 ways. This gives 4 * 2^20 = 4,194,304 possible post-translational modifications. Will we study all of them? It’s pretty easy to substitute alanine for serine or threonine making an unmodifiable position, or to substitute aspartic acid for threonine or serine making a phosphorylation mimic which is pretty close to phosphoserine or phosphothreonine, creating even more possibilities for study.

Most of the serines, threonines, tyrosines listed are phosphorylated, but two of the threonines are Nacetyl glucosylated. The two prolines are hydroxylated in the ring. The lysines can be methylated, acetylated, ubiquitinated, sumoylated. I did take the trouble to count the number of serines in the complete amino acid sequence and there are 24, of which only 6 are phosphorylated — so the phosphorylation pattern is likely to be specific and selected for. Too lazy do the same for lysine, threonine, tyrosine and proline. Here’s a link to the full sequence if you want to do it —

The phosphorylations at each serine/threonine/tyrosine are carried out by not more than one of the following 8 kinases (CK2, IKKepsilon, ACK1,TBK1, PDK1, GSK3alpha, mTORC2 and CDK2)

AKT contains some 481 amino acids, divided (by humans for the purposes of comprehension) into 4 regions Pleckstrin Homology (#1 – #108), linker (#108 – #152) catalytic –e.g. kinase (#152 – #409),regulatory (#409 – #481).

This is from an excellent review of the functions of AKT in Cell vol. 169 pp. 381 – 3405 ’17. It only takes up the first two pages of the review before the functionality of AKT is even discussed.

This raises the larger issue of the possibility of human minds comprehending cellular biochemistry.

This is just one protein, although a very important one. Do you think we’ll ever be able to conduct enough experiments, to figure out what each modification (along or in combination) does to the many functions of AKT (and there are many)?

Now design a drug to affect one of the actions of AKT (particularly since AKT is the cellular homolog of a viral oncogene). Quite a homework assignment.

Is Martin Burke the anti-Christ for synthetic organic chemistry?

Will a machine put synthetic organic chemists out of business. Is its proponent and inventor Martin Burke the anti-Christ? 2 years ago he thought that he’d need 5,000 building blocks to make 282,487 natural products. Now he’s down to 1,400, 20 years and 1 Billion dollars [ Science vol. 356 pp. 231 – 232 ’17 ].

Back in the day we studied the zillions of terpene natural products built from various machinations of just the isopentyl group. Does he really need another 1,399?

The synthesis is a modification of the Suzuki synthesis in which R-B(OH)2 and R’ – X are coupled by palladium to form R -R’. It uses MIDA (HOOC CH2 NCH3 CH2 COOH — N-MethylIminodiAcetic acid) which wraps itself around the boron and shuts down further synthesis.

In 2008 Burke found that MIDA boronates stick to silica when methanol and ether are both present, and then drop off when tetrahydrofuran (THF) is present. This allows catch and release. For purification they can run the compounds through a silica containing vial.

In 2015 some 200 building blocks with the halogen and MIDA capped boronic acid were availablle commercially.

Burke hooked up with a computer scientist to look at the structures of the 282,487 and break them down into fragments needing only carbon carbon bond formation — a fascinating problem in graph theory.

Derek did a post on this a few years ago. Hopefully he’ll do another.

Because they aren’t there

George Mallory tried 3 times to be the first to climb Everest dying on his last attempt. When asked why he was so obsessed, he achieved immortality by saying “because it’s there”. Chemists have spent 60 years trying to synthesize carbon nanobelts “because they aren’t there”.

Well a group of Japanese chemists finally did it [ Science vol. 356 pp. 172 -175 ’17 ] It’s not quite the ultimate belt because the 6 membered rings are staggered as they are in phenacenes – There are 6 three ringed phenacenes in the structure, and the diameter of the ring is 8.324 Angstroms. There is no question that they got the compound as they crystallized it and have bond lengths for all.

If you look at the paper, this is a zig zag structure rather than a linear poly anthracene. The bond lengths show that every other ring has symmetric bond lengths midway between sp2 and sp3 (e.g. it’s aromatic), while the other ring clearly is not.

It be interesting to measure the chemical shifts of the C-H bonds over the center of the ring — if they could make a paraCyclophane-type molecule bridging the diameter by a (CH2)n moiety.

As long as we’re on the subject what about putting a twist in the ring and making a mobius belt. Mobius molecules are known — — is a very nice review — with a lot of pictures.

The authors think that their work has potential applications — “our synthesis of carbon nanobelt 1 could ultimately lead to the programmable synthesis of single- chirality, uniform-diameter CNTs (30–32) and open a field of nanobelt science and technology”. I think they were just having fun as chemists are wont to do.