Category Archives: Chemistry (relatively pure)

Consensus isn’t what it used to be.

Technology marches on.  The influence of all 2^20 = 1,048,576 variants of 5 nucleotides on either side of two consensus sequences for transcription factor binding were (1) synthesized (2) had their dissociation constants (Kd’s) measured.  The consensus sequences were for two yeast transcription factors (Pho4 and Cbf1).  [ Proc.  Natl. Acad. Sci. vol. 115 pp. E3692 – E3702 ’18 ] .  The technique is called BET-seq (Binding Energy Topography by sequencing).

What do you think they found?

A ‘large fraction’ of the flanking mutations changed overall binding energies by as much as consensus site mutations.  The numbers aren’t huge (only 2.6 kiloCalories/mole).  However at 298 Kelvin 25 Centigrade 77 Fahrenheit (where RT = .6) every 1.36 kiloCalories/mole is worth a factor of 10 in the equilibrium constant.  So binding can vary by 100 fold even in this range.

The work may explain some ChIP data in which some strips of DNA are occupied despite the lack of a consensus site, with other regions containing consensus sites remaining unoccupied.  The authors make the interesting point that submaximal binding sites might be preferred to maximal ones because they’d be easier for the cell to control (notice the anthropomorphism of endowing the cell with consciousness, or natural selection with consciousness).  It is very easy to slide into teleological thinking in these matters.  Whether or not you like it is a matter of philosophical and/or theological taste.

Pity the poor computational chemist, trying to figure out binding energy to such accuracy with huge molecules like a transcriptional factors and long segments of DNA.

It is also interesting to think what “Molar” means with these monsters.  How much does a mole of hemoglobin weigh?  64 kiloGrams more or less.  It simply can’t be put into 1000 milliliters of water (which weighs 1 kiloGram).  A liter of water contains 1000/18 moles (55.6) moles of water.  So solubilizing 1 molecule of hemoglobin would certainly use more than 55 molecules of water.  Reality must intrude, but we blithely talk about concentration this way.  Does anyone out there know what the maximum achievable concentration of hemoglobin actually is?

Advertisements

A research idea yours for the taking

Why would the gene for a protein contain a part which could form amyloid (the major component of the senile plaque of Alzheimer’s disease) and another part to prevent its formation. Therein lies a research idea, requiring no grant money, and free for you to pursue since I’ll be 80 this month and have no academic affiliation.

Bri2 (aka Integral TransMembrane protein 2B — ITM2B) is such a protein.  It is described in [ Proc. Natl. Acad. Sci. vol. 115 pp. E2752 – E2761 ’18 ] http://www.pnas.org/content/pnas/115/12/E2752.full.pdf.

As a former neurologist I was interested in the paper because two different mutations in the stop codon for Bri2 cause 2 familial forms of Alzheimer’s disease  Familial British Dementia (FBD) and Familial Danish Dementia (FDD).   So the mutated protein is longer at the carboxy terminal end.  And it is the extra amino acids which form the amyloid.

Lots of our proteins form amyloid when mutated, mutations in transthyretin cause familial amyloidotic polyneuropathy.  Amylin (Islet Amyloid Polypeptide — IAPP) is one of the most proficient amyloid formers.  Yet amylin is a protein found in the beta cell of the pancreas which releases insulin (actually in the same secretory granule containing insulin).

This is where Bri2 is thought to come in. It is also found in the pancreas.   Bri2 contains a 100 amino acid motif called BRICHOS  in its 266 amino acids which acts as a chaperone to prevent IAPP from forming amyloid (as it does in the pancreas of 90% of type II diabetics).

Even more interesting is the fact that the BRICHOS domain is found in 300 human genes, grouped into 12 distinct protein families.

Do these proteins also have segments which can form amyloid?  Are they like the amyloid in Bri2, in segments of the gene which can only be expressed if a stop codon is read through.  Nothing in the cell is perfect and how often readthrough occurs at stop codons isn’t known completely, but work is being done — Nucleic Acids Res. 2014 Aug 18; 42(14): 8928–8938.

I find it remarkable that the cause and the cure of a disease is found in the same protein.

Here’s the research proposal for you.  Look at the other 300 human genes containing the BRICHOS motif (itself just a beta sheet with alpha helices on either side) and see how many have sequences which can form amyloid.  There should be programs which predict the likelihood of an amino acid sequence forming amyloid.

It’s very hard to avoid teleology when thinking about cellular biochemistry and physiology.  It’s back to Aristotle where everything has a purpose and a design.  Clearly BRICHOS is being used for something or evolution/nature/natural selection/the creator would have long ago gotten rid of it.  Things that aren’t used tend to disappear in evolutionary time — witness the blind fish living in caves in Mexico that have essentially lost their eyes. The BRICHOS domain clearly hasn’t disappeared being present in over 1% of our proteins.

Suppose that many of the BRICHOS containing proteins have potential amyloid segments.  That would imply (to me at least) that the amyloid isn’t just junk that causes disease, but something with a cellular function. Finding out just what the function is would occupy several research groups for a long time.   This is also where you come in.  It may not pan out, but pathbreaking research is always a gamble when it isn’t stamp collecting.

 

Amyloid again, again . . .

Big pharma has spent (and lost) several fortunes trying to attack the amyloid deposits of Alzheimer’s.  But like my late med school classmate’s book — “Why God Won’t Go Away” ==https://www.amazon.com/Why-God-Wont-Go-Away/dp/034544034X, amyloid won’t go away either.   It’s a bit oblique but some 300 of our proteins contain a 100 amino acid stretch called BRICHOS.  Why? Because it acts as a chaperone protein preventing proteins with a tendency to form amyloid from aggregating into fibrils.   The amino acids form a beta sheet surrounded front and back by a single alpha helix.

[ Proc. Natl. Acad. Sci. vol. 115 pp. E2752 – E2761 ’18 ] Discusses Bri2 (aka Integral Transmembrane protein 2B (ITM2B), a 266 amino acid type II transmembrane protein. Bri2 contains a carboxy terminal domain Bri23 released by proteolytic processing between amino acids #243 #244 by furinlike proteases. Different missense mutations at the stop codon of Bri2 cause extended carboxy terminal peptides called  Abri or Adan to be released by the proteases. Abri produces Familial British Dementia (FBD) and Adan produces Familial Danish Dementia (FDD). Both are associated with amyloid deposition in blood vessels, and amyloid plaques throughout the brain along with neurofibrillary tangles.

What is fascinating (to me) is that the cause and cure are both present in the same molecule Bri2 also contains a BRICHOS domain.  This implies (to me) that possibly the segment possibly forming amyloid is being used by the cell in some other fashion.

Bri2 is found in the beta cell of the pancreas (produces insulin).  The beta cell also produces Islet Amyloid PolyPeptide (IAPP  aka amylin ) one of the most potent amyloid forming proteins known.  Nonetheless the pancreas makes tons of it, and like insulin, is secreted by the beta cell in response to elevated blood glucose.  The present work shows that Bri2 is what keeps IAPP from forming amyloid.  The BRICHOS segment (amino acids #130 – #231) is released from Bri2 by ADAM10 (you don’t want to know what the acronym stands for).

How many of the 300 or so human proteins containing the BRICHOS domain also have amyloid forming segments.  If they do, this implies that the amyloid forming segments are doing something physiologically useful.

 

 

Stephen Hawking R. I. P.

Stephen Hawking, brilliant mathematician and physicist has died.  Forget all that. He did something for my patients with motor neuron disease that I, as a neurologist, could not do.  He gave them hope.

What has chemistry done for them?  Quite a bit, but there’s so much left.

Chemistry, when successful, just becomes part of the wallpaper and ignored. All genome sequencing depends on what some chemist did.

For one spectacular example of what, without chemistry, would be impossible is Infantile Spinal Muscular Atrophy (Werdnig Hoffmann disease).  For the actual molecular biology behind it — please see — https://luysii.wordpress.com/2016/12/25/tidings-of-great-joy/.   Knowing the cause has led to not one but two specific therapies — an antisense oligonucleotide and a virus which infects neurons and actually changes the gene.

So knowing what the cause of a disease is should lead to a treatment, shouldn’t it?  Hold that thought.  Sometimes one form of motor neuron disease (amyotrophic lateral sclerosis or ALS) can be hereditary.  Find out what is being inherited to find how ALS is caused.

Well, the first protein in which a mutation is associated with familial ALS (FALS) was found exactly 25 years ago.  It is called superoxide dismutase (SOD1).  Over 150 mutations have been found in the protein associated with FALS, and yet despite literally thousands of papers on the subject we don’t know if the mutations cause a loss of function, a gain of function (and if so what that function is), an increased tendency to fold incorrectly, and on and on and on.  It’s a fascinating puzzle for the protein chemist and over the years my notes on the papers I’ve read about SOD1 have ballooned to some 25,000 words.

If you’re tired of working on SOD1, try a few of the other proteins in which mutations have been associated with FALS — Alsin, TAF15, Ubiquilin, Optineurin, TBK1 etc. etc.  The list is long.

Now it’s biology’s turn.  Motor neurons go from the spinal cord (mostly) and brain to produce muscle contraction.  Why should only this tiny (but crucial) minority of cells be affected.  The nerve fibers leave the spinal cord and travel to muscle in nerves which contain sensory nerve fibers making the same long trip, yet somehow these nerves are spared.

More than that, why should these mutations affect only these neurons, and that often after decades.  Also why should great athletes (Lou Gehrig, Ezzard Charles, etc. etc. ) get the disease.

One closing point.  Hawking shows why, in any disease median survival (when 50% of those afflicted die) is much a more meaningful statistic than average duration of survival.  Although he gave my patients great hope, they all died within a few years even as he mightily extended average survival.

 

Homework assignment and answer

A few days ago I gave the following homework assignment for the ace protein chemist, and promised an answer.

Here’s the assignment

Homework assignment for the protein chemist

As an ace protein chemist you are asked to design two proteins, both intrinsically disordered which form a tight complex with a picoMolar dissociation constant.  To make the problem ‘easier’ there is no need for specific amino acid interactions between the proteins.  To make the problem harder, even in the tight complex formed, the two proteins remain intrinsically disordered.

Hint: ‘nature’, ‘evolution’, ‘God’ —  whatever you chose to call it, has solved the problem.

Here’s the answer

The structure of an unstructured protein

Protein structure without structure.  No I haven’t fallen under the spell of a Zen master.  As Bill Clinton would say, it depends on what you mean by structure.

If you mean a segment of the protein chain which doesn’t settle down into one structure, you are talking about intrinsically disordered proteins. It is estimated that 40% of all human proteins contain at least one intrinsically disordered segment of 30 amino acids or more ( Nature vol. 471 pp. 151 – 153 ’11 ).   The same paper ‘estimates’ that 25% of all human proteins are likely to be disordered from beginning to end.

Frankly, I’ve always been amazed that any protein settles down into one shape — for details please see — https://luysii.wordpress.com/2010/08/04/why-should-a-protein-have-just-one-shape-or-any-shape-for-that-matter/. But that’s ‘old news’ as another Clinton would say.

Two fascinating papers in the current Nature (vol. 555 pp. 37 – 38, 61 – 66 ’18 1 March) describe the interaction of two very unstructured proteins.  One is prothymosin-alpha with 111 amino acids and a net negative charge of -44.  The other is Histone H1 with at least 189 amino acids and a net positive charge of + 53.  With such a charge imbalance it’s unlikely that they can coalesce into a compact single form.  So they are both intrinsically disordered proteins.

However the two proteins bind to each other quite tightly (dissociation constant is in the picoMolar range).  Even when they form a complex, a variety of techniques (NMR, single molecule fluorescent techniques, computation) show that neither settles down into a single form and are still unstructured.

So where’s the structure?  It isn’t in the amino acid sequence.  It isn’t the conformations adopted in space.  The structure is  in the net charge.  Many intrinsically disordered proteins have levels of net charge similar to those of prothymosin alpha and histone H1.  In the human proteome alone, several hundred proteins that are predicted to be intrinsically disordered contain contiguous stretches of at least 50 residues with a fractional net charge similar to that of H1 or proThymosin alpha (Bioinformatics 21, 3433–3434 2005) — hopefully there’s something newer.

The amino-acid sequences of disordered regions in proteins evolve rapidly, yet (Proc. Natl. Acad. Sci vol. 114 pp. E1450–E1459 2017) showed that the net charge is conserved despite a high degree of sequence diversity .  This should be a current enough reference.

Why in the world would the cell have something like this?  Most readers probably know what histones are.  If so, stop and think how the binding of the two proteins could be used by the cell before reading what the authors say about it.

“The interaction mechanism of proThymosin alpha and Histone H1 probably aids their biological function.  proThymosin alpha assists with the assembly and disassembly of chromatin, the material in which DNA is packaged with histone proteins (such as H1) in cells. To perform its function, proThymosin alpha must recognize its histone substrates rapidly and with sufficient affinity to compete with the high affinity of histone–DNA interactions (a similar high positive charge high negative charge interaction). The high binding affinity of Pro-Tα for H1 and the association rate of the two pro-teins imply that the dissociation of proThymosin alpha–H1 complexes is slow enough to allow functional outcomes, but fast enough not to slow down biological turnover.”

Why don’t they form a coacervate — a bunch of molecules held together by hydrophobic forces? Why don’t they show liquid liquid phase separations? The authors speculate that it might be due to the complementarity of the two proteins in terms of effective length and opposite net charge. Also they don’t have hydrophobic and aromatic side chains and cation pi interations which are said to favor phase separation mediated by proteins.

Addendum 20 March’18 — from my comment and a response on Derek’s blog

Luysii:

Another way to look at these very charge imbalanced proteins, is that they are being strongly (and positively) selected for. They are incredibly improbable on a purely statistical basis. Prothymosin alpha has 111 amino acids of which 44 are negatively charged. There are 20 amino acids of which only 2 (glutamic acid and aspartic acid) have negative charges at physiologic pH — cysteine and tyrosine can form anions but under much more basic conditions. So, assuming a random assortment of amino acids, the idea that 10% of the amino acids could fight for space with 90% of the rest and win around 40% of the time in 111 battles is extremely improbable. You’d have to use Stirling’s approximation for factorials to figure out exactly how improbable this is. Any takers?

Reply
DCRogers says:
March 20, 2018 at 2:35 pm
CDF(N=111, X=44, p=0.1) = 1.87 * 10^-16

Homework assignment for the protein chemist

As an ace protein chemist you are asked to design two proteins, both intrinsically disordered which form a tight complex with a picoMolar dissociation constant.  To make the problem ‘easier’ there is no need for specific amino acid interactions between the proteins.  To make the problem harder, even in the tight complex formed, the two proteins remain intrinsically disordered.

Hint: ‘nature’, ‘evolution’, ‘God’ —  whatever you chose to call it, has solved the problem.

Answer in a few days.

The structure of an unstructured protein

Protein structure without structure.  No I haven’t fallen under the spell of a Zen master.  As Bill Clinton would say, it depends on what you mean by structure.

If you mean a segment of the protein chain which doesn’t settle down into one structure, you are talking about intrinsically disordered proteins. It is estimated that 40% of all human proteins contain at least one intrinsically disordered segment of 30 amino acids or more ( Nature vol. 471 pp. 151 – 153 ’11 ).   The same paper ‘estimates’ that 25% of all human proteins are likely to be disordered from beginning to end.

Frankly, I’ve always been amazed that any protein settles down into one shape — for details please see — https://luysii.wordpress.com/2010/08/04/why-should-a-protein-have-just-one-shape-or-any-shape-for-that-matter/. But that’s ‘old news’ as another Clinton would say.

Two fascinating papers in the current Nature (vol. 555 pp. 37 – 38, 61 – 66 ’18 1 March) describe the interaction of two very unstructured proteins.  One is prothymosin-alpha with 111 amino acids and a net negative charge of -44.  The other is Histone H1 with at least 189 amino acids and a net positive charge of + 53.  With such a charge imbalance it’s unlikely that they can coalesce into a compact single form.  So they are both intrinsically disordered proteins.

However the two proteins bind to each other quite tightly (dissociation constant is in the picoMolar range).  Even when they form a complex, a variety of techniques (NMR, single molecule fluorescent techniques, computation) show that neither settles down into a single form and are still unstructured.

So where’s the structure?  It isn’t in the amino acid sequence.  It isn’t the conformations adopted in space.  The structure is  in the net charge.  Many intrinsically disordered proteins have levels of net charge similar to those of prothymosin alpha and histone H1.  In the human proteome alone, several hundred proteins that are predicted to be intrinsically disordered contain contiguous stretches of at least 50 residues with a fractional net charge similar to that of H1 or proThymosin alpha (Bioinformatics 21, 3433–3434 2005) — hopefully there’s something newer.

The amino-acid sequences of disordered regions in proteins evolve rapidly, yet (Proc. Natl. Acad. Sci vol. 114 pp. E1450–E1459 2017) showed that the net charge is conserved despite a high degree of sequence diversity .  This should be a current enough reference.

Why in the world would the cell have something like this?  Most readers probably know what histones are.  If so, stop and think how the binding of the two proteins could be used by the cell before reading what the authors say about it.

“The interaction mechanism of proThymosin alpha and Histone H1 probably aids their biological function.  proThymosin alpha assists with the assembly and disassembly of chromatin, the material in which DNA is packaged with histone proteins (such as H1) in cells. To perform its function, proThymosin alpha must recognize its histone substrates rapidly and with sufficient affinity to compete with the high affinity of histone–DNA interactions (a similar high positive charge high negative charge interaction). The high binding affinity of Pro-Tα for H1 and the association rate of the two pro-teins imply that the dissociation of proThymosin alpha–H1 complexes is slow enough to allow functional outcomes, but fast enough not to slow down biological turnover.”

Why don’t they form a coacervate — a bunch of molecules held together by hydrophobic forces? Why don’t they show liquid liquid phase separations? The authors speculate that it might be due to the complementarity of the two proteins in terms of effective length and opposite net charge. Also they don’t have hydrophobic and aromatic side chains and cation pi interations which are said to favor phase separation mediated by proteins.

Carbynes ! ! !

An article on carbynes brought back memories of the Spring of 1961 when I convinced Woodward to let me work on an original idea about carbenes for my PhD thesis.  Back then you had to pass 8 cumulative exams (given monthly) before you could start such work.  It took me 9.

At the time, carbenes were a rather speculative idea, but it seemed to me that they could be generated by photolysis of a diazocarbonyl compound. I thought they might be involved in the Wolff rearrangement

One of the joys of organic chemistry back then (and hopefully now) is that if you have an idea, just build a molecule to test it.

So here’s the idea the great man bought.

l. Condense acrylic acid with cytopentadiene by a Diels Alder reaction.  Because of steric effects the acid will point below the ring system

2. Form the acyl chloride

3. React with diazoMethane to form the diazocarbonyl — there will be no change in the orientation of the carbonyl relative to the ring system

4. Photolyze — a a carbene is formed it would be in perfect position to form a cyclopropane on the other side of the ring system, pretty much proving its existence.

Malheureusement, having the worst lab technique in the world and being very frightened by what I’d heard about diazoMethane, I couldn’t get the idea to work.

However the idea was good, and a friend who kept on in chemistry becoming a department head told me that I was right.

Which brings us to the current article [ Nature vol. 554 pp. 36 – 38, 86 – 91 ’18 ] http://www.nature.com/magazine-assets/d41586-018-01308-7/d41586-018-01308-7.pdf.

A carbyne is basically R – C where the carbon has 3 electrons not forming covalent bonds (two are paired).  As you might imagine, carbynes are quite reactive.  However both articles talk about a carbyne equivalent which is R – C = N2, which IMHO is not a carbyne at all.  It is intrguing that it would be if the N2 were photolyzed off a la 1961, but that isn’t what happens in the paper.  It remains as the intermediate performs all sorts of interesting chemistry, forming an Aryl – C (R) = N2 moiety etc. etc.

One interesting aside is that carbynes were one of the first molecules found in interstellar space.

Can anyone out there enlighten me as to why R – C = N2 is a carbyne equivalent.  Neither paper provides an explanation.

 

 

Why drug development is hard #31: retroviruses at the synapse

What if I told you that a very important neuronal synaptic protein Arc (Arg3.1) is acting like like a virus, sending copies of itself (and its messenger RNA) across the synapse?  Would a team of shrinks, who’ve never examined me, tell you that I was crazy and unfit to blog?  Well there is very good evidence that exactly this occurs in one situation and probably many more [ Cell vol. 172 pp. 8 – 10, 262 – 274, 275 – 288 ’18] — http://www.cell.com/cell/fulltext/S0092-8674(17)31509-X.

Arc stands for Activity Regulated Cytoskeleton associated protein.  It’s messenger RNA (mRNA) is transcribed from the gene in response to neuronal activity.  More importantly, the mRNA for  Arc is rapidly distributed to active synapses through the cell body and dendrites, where it is translated into protein. It is locally and rapidly stimulated during the induction of long term depression and plays a critical role in removing a class of glutamic acid receptors (AMPA receptors) from the synapse.  To whet the interest of drug developers, Arc regulates the activity dependent cleavage of the Amyloid Precursor Protein (APP) and beta amyloid production by its interaction with presenilin

Several posts could easily be filled with what Arc does, but that’s not what is so amazing about these papers.  Parts of the Arc protein arose from one of the many transcriptionally dead retroviruses found in our genome.  Our species literally wouldn’t exist without other retroviral gifts.  For instance syncytin1 is a protein expressed a high levels in the placenta.  It is produced from the envelope gene of an endogenous retrovirus (HERV-W) which has undergon inactivating mutations in its other major genes (gag and pol).  Mutant mice in which the gene has been knocked out die in utero due to failure of placenta formation.

Part of the arc gene arose from the Gag gene (Group specific antigen gene) of a retrovirus.  Recall most viruses have proteins coating their genetic material when they’re on the move (e. g. a capsid).  In the case of retroviruses, the genetic material is RNA rather than DNA.  Well the gag elements of the Arc protein form a capsid containing the mRNA for Arc (just like a virus).  In some way or other the capsid containing mRNA gets outside the neuron at the nerve muscle junction and gets into muscle.  The evidence is good that this happens, but in a system somewhat removed from us — the fruitfly (Drosophila).  Fruitfly neuromuscular junctions lacking this mechanism are weaker.

Well that’s pretty far from us.  However one of the papers (275 – 288) showed that the Arc protein and its mRNA was found in extracellular vesicles released from mouse neurons cultured from their cerebral cortex.  Could viral-like particles be crossing the synapses in our brains (which are already pretty chockfull of stuff — see https://luysii.wordpress.com/2017/11/15/the-bouillabaisse-of-the-synaptic-cleft/).  It’s very early times (in fact the Cell issue came out 3 days ago) but people are sure to look.  There are at least 100 Gag derived genes in the human genome (Campillos, M., Doerks, T., Shah, P.K., and Bork, P. (2006). Computational characterization of multiple Gag-like human proteins. Trends Genet. 22, 585–589.).

Remarkable.  Remember CRISPR was hiding in plain sight for half a century.  We have a lot to learn.  No wonder drugs have unexpected side effects.

Why drug development is hard #30 — more new interactions we had no idea existed

We’re full of proteins which bind RNA wrangling it into a desired conformation.  The ribosome (whose enzymatic business end is pure RNA) has a mere 80 proteins doing this.  Its mass is 4,300,000 times that of a hydrogen atom.  However the idea that RNA could return the favor was pretty much unheard of until [ Science vol. 358 pp. 993 – 994, 1051 – 1055 ’17 — http://science.sciencemag.org/content/358/6366/1051 ].

As is often the case, viruses and the RNA world continue to instruct us.  In order to survive, some viruses induce cells to express a long (2,200+ nucleotides) nonCoding (for protein that is) RNA called lncRNA-ACOD1.   It binds to a protein enzyme (called GOT2, for Glutamic acid OxaloAcetic Transaminase 2) increasing its catalytic efficiency.  This shifts cellular metabolism around making it more favorable for virus proliferation, as GOT2 is found in mitochondria being used to replenish tricarboxylic cycle intermediates — e.g. making more energy available to the virus.

lncRNA-ACOD1 is induced by a variety of viruses, most importantly influenza virus in man, and vaccinia, herpes simplex 1, vesicular stomatitis virus in mice.  Exactly how viruses induce it isn’t clear, but the transcription factor NFkappaB is involved.

Viruses continue to teach us.  The amino acids of GOT2 (#15 – #68) and the interacting sequence of nucleotides in lncRNA-ACOD1 (#165 – #390) are well conserved across species.  This might be a primordial mechanism from the RNA world (forgotten but not gone) to produce ATP production to compe with metabolic stress.   The RNA/protein binding site is close (4.2 Angstroms) to the substrate binding site.

The fun is just starting as several other lncRNAs are induced by viruses.  You can only imagine what they will tell us.  Another set of drug targets perhaps, or worse, the cause of peculiar side effects from drugs already in use.