Category Archives: Chemistry (relatively pure)

Has the great white whale of oncology finally been harpooned?

The ras oncogene is the great white whale of oncology. Mutations in 20 – 40% of cancer turn its activity on so that nothing can turn it off, resulting in cellular proliferation. People have been trying to turn mutated ras off for years with no success.

A current paper [ Cell vol. 165 pp. 643 – 655 ’16 ] describes a new and different way to attack it. Once  ras is turned on (either naturally or by mutation) many other proteins must bind to it, to produce their effects — they are called RAS effectors, among which are the uneuphoniously named RAF, RalGDS and PI3K. They bind to activated ras by the cleverly named Ras Binding Domain (RBD) which has 78 amino acids.

The paper describes rigosertib, a not that complicated molecule to the chemist, which inhibits the binding (by resembling the site on ras that the RBD binds to). It is a styryl benzyl sulfone and you can see the structure here —

What’s good about it? Well it is in phase III trials for a fairly uncommon form of cancer (myelodysplastic syndrome). That means it isn’t horribly toxic or it wouldn’t have made it out of phase I.

Given the mechanism described, it is possible that Rigosertib will be useful in 20 – 40% of all cancer. Can you say blockbuster drug?

Do you have a speculative bent? Buy the company testing the drug and owning the patent — Oncova Therapeutics. It’s quite cheap — trading at $.40 (yes 40 cents !). It once traded as high as $30.00 — symbol ONTX. I don’t own any (yet), but for the price of a movie with a beer and some wings afterwards you could be the proud owner of 100 shares. If Rigosertib works, the stock will certainly increase more than a hundredfold.

Enough kidding around. This is serious business. In what follows you will find some hardcore molecular biology and cellular physiology showing just what we’re up against. Some of the following is quite old, and probably out of date (like yours truly), but it does give you the broad outlines of what is involved.

The pathway from Ras to the nucleus

The components of the pathway had been found in isolation (primarily because mutations in them were associated with malignancy). Ras was discovered as an oncogene in various sarcoma viruses. Mutations in ras found in tumors left it in a ‘turned on’ state, but just how ras (and everything else) fit into the chain of binding of a growth factor (such as platelet derived growth factor, epidermal growth factor, insulin, etc. etc.) to its receptor on the cell surface to alterations in gene expression wasn’t clear. It is certain to become more complicated, because anything as important as cellular proliferation is very likely to have a wide variety of control mechanisms superimposed on it. Although all sorts of protein kinases are involved in the pathway it is important to remember that ras is NOT a protein kinase.

l. The first step is binding of a growth factor to its receptor on the cell surface. The receptor is usually a tyrosine kinase. Binding of the factor to the receptor causes ‘activation’ of the receptor. Activation usually means increasing the enzymatic activity of the receptor in the tyrosine kinase reaction (most growth factor receptors are tyrosine kinases). The increase in activity is usually brought about by dimerization of the receptor (so it phosphorylates itself on tyrosine).

2. Most activated growth factor receptors phosphorylate themselves (as well as other proteins) on tyrosine. A variety of other proteins have domains known as SH2 (for src homology 2) which bind to phosphorylated tyrosine.

3. A protein called grb2 binds via its SH2 domain to a phosphorylated tyrosine on the receptor. Grb2 binds to the polyproline domain of another protein called sos1 via its SH3 domain. At this point, the unintiated must find the proceedings pretty hokey, but the pathway is so general (and fundamental) that proteins from yeast may be substituted into the human pathway and still have it work.

4. At last we get to ras. This protein is ‘active’ when it binds GTP, and inactive when it binds GDP. Ras is a GTPase (it can hydrolyze GTP to GDP). Most mutations which make ras an oncogene decrease the GTPase activity of RAS leaving it in a permanently ‘turned on’ state. It is important for the neurologist to know that the defective gene in type I neurofibromatosis activates the GTPase activity of ras, turning ras off. Deficiencies (in ras inactivation) lead to a variety of unusual tumors familiar to neurologists.

Once RAS has hydrolyzed GTP to GDP, the GDP remains bound to RAS inactivating it. This is the function of sos1. It catalyzes the exchange of GDP for GTP on ras, thus activating ras.

5. What does activated ras do? It activates Raf-1 silly. Raf-1 is another oncogene. How does activated ras activate Raf-1 ?  Ras appears to activate raf by causing raf to bind to the cell membrane (this doesn’t happen in vitro as there is no membrane). Once ras has done its job of localizing raf to the plasma membrane, it is no longer required. How membrane localization activates raf is less than crystal clear. [ Proc. Natl. Acad. Sci. vol. 93 pp. 6924 – 6928 ’96 ] There is increasing evidence that Ras may mediate its actions by stimulating multiple downstream targets of which Raf-1 is only one.

6. Raf-1 is a protein kinase. Protein kinases work by adding phosphate groups to serine, threonine or tyrosine. In general protein kinases fall into two classes those phosphorylating on serine or threonine and those phosphorylating on tyrosine. Biochemistry has a well documented series of examples of enzymes being activated (or inhibited) by phosphorylation. The best worked out is the pathway from the binding of epinephrine to its cell surface receptor to glycogen breakdown. There is a whole sequence of one enzyme phosphorylating another which then phosphorylates a third. Something similar goes on between Raf-1 and a collection of protein kinases called MAPKs (mitogen activated protein kinases). These were discovered as kinases activated when mitogens bound to their extracellular receptors.There may be a kinase lurking about which activates Raf (it isn’t Ras which has no kinase activity). Removal of phosphate from Raf (by phosphatases) inactivates it.

7. Raf-1 activates members of the MAPK family by phosphorylating them. There may be several kinases in a row phosphorylating each other. [ Science vol. 262 pp. 1065 – 1067 ’93 ] There are at least three kinase reactions at present at this point. It isn’t known if some can be sidestepped. Raf-1 activates mitogen activated protein kinase kinase (MAPK-K) by phosphorylation (it is called MEK in the ras pathway). MAPK-K activates mitogen activation protein kinase (MAPK) by phosphorylation. Thus Raf-1 is actually mitogen activated protein kinase kinase kinase (sort of like the character in Catch-22 named Junior Junior Junior). (1/06 — I think that Raf-1 is now called BRAF)

8. The final step in the pathway is activation of transcription factors (which turn genes off or on) by MAP kinases by (what else) phosphorylation. Thus the pathway from cell surface is complete.

A new wrinkle in an old reaction

Just when you thought we knew everything there was to know about the Diels Alder reaction, cometh Nature vol. 532 pp. 484 – 488 ’16 in which triple bonds are used in both the diene and the dienophile. Naturally they are all put in the same molecule so they can’t get away from each other. I can’t draw the structure in this post, but it’s worth a look, particularly since a benzyne intermediate is formed in one, and an even more bizarre (and labile) intermediate (a diradical with the unpaired electrons, each on an atom, separated by two more carbons) is formed in the other. It’s sort of chemical bonsai. Enjoy

State functions, state equations, graphs of them and reversibility

Thermodynamic States are all considered to be continuous variables (the fact that Internal Energy (U) is a state variable is half of the first law).

A continuous function of state_function_1 in terms of state_function_2, . . . . state_function_n produces a graph which is an n dimensional surface in n + 1 dimensional space. If this seems rather abstract, we’ll get concrete shortly. Consider the classic calculus 101 function y = x^2. Write it like this

f : R^1 –> R^1
f : x |–> x^2

This does seem a bit stuffy, but the clarity it provides is useful, as you’ll see. R^1 is the set of real numbers. The first line tells you that f goes from the real numbers to the real numbers. The second like gives you what f does to a point in the domain. What about the graph of f? It is the parabola, which lives in the x – y plane, a 2 dimensional space. The graph of f is just a curved line with dimension 1, living in a space one dimension higher (e.g. dimension 2).

Different state functions apply to different physical systems at which point they are called equations of state, with every point on their graph representing a collection of state variables at which the system is at equilibrium (e.g. not changing with time)

The simplest state function comes from the ideal gas law PV = nRT, which was promulgated in 1834 by Claperyon. You may regard it as

T : R^2 –> R^1
T : (P, V) |–> P*V/R == T

This is Temperature (statefunction1) in terms of P (statefunction2) and V (statefunction3). What is its graph — something 2 dimensional living in 3 dimensional space — e. g. a surface.

If you’ve studied PChem, you’ve probably met the Carnot cycle. Here’s a link  It is represented by  a bunch of curved lines in the PV plane, but each line in the diagramreally represents a line on the 3 dimensional graph of T. You can think of this like a topographic map of a mountain, but not quite. The top and bottom lines represent constant temperature (altitude) but the (semi)vertical lines are paths up and down the mountain. Just looking at the flat PV diagram is pretty misleading.

Any combination of P, V, T not satisfying PV = RT is not on the surface, and is not in equilibrium.  You won’t see any of them on the diagram of the PV plane, which is why it’s so misleading. 

P, V and T will change so they approach the surface (either by minimizing internal energy or maximizing entropy or a combination of both — these are the driving forces of Dill’s book — Molecular Driving Forces.

The definition of surface given above is quite general and applies to more complicated situations — which is why I went to the trouble to go through it. For instance, in some systems Internal Energy (U) is a function of 3 variables Entropy (S), Volume (V) and the number of molecules (N). This is a 3 dimensional surface living in 4 dimensions. It’s just as much of a surface as that for T in terms of P and V, but I can’t visualize it (perhaps you can) Note also that when you go to higher magnification N is not a continuous variable, any more than concentration is.

Any point on the surface can be reached reversibly from any other — what does reversibility actually mean?

Berry Physical Chemistry 2nd Ed 2000 p. 377. Reversibility of changes in equilibrium means 3 things.

l. The change occurs almost infinitesmally slowly (a very large class of real processes have work and heat values very close to reversible processes)

2. Changes remain infinitesmally close to equilibrium (e.g. they stay on the surface. At equilibrium, thermodynamic variables still fluctuate. If movement on the surface is slow enough that the thermodynamic variables are within 1 standard deviation of the average values of the thermodynamic state variables, no observation can show that the stat eof the system has changed

3. Intensive variables corresponding to work being done (e.g. pressure, surface tension, voltage) are continuous across the boundary of the system on which work is being done.

Objects off the surface aren’t in equilibrium and maximization of entropy or minimization of internal energy drive them toward the surface. This implies that the surface is is an attractor. Now that chaos is well known, are there thermodynamic attractors — I’ve written Dill to ask about this.

Hopefully this will be helpful to some of you. Putting it together was to me. As always, the best way to learn something is trying to explain it to someone else.

Sn2 — It’s a gas

Sn2 reactions are a lot more complicated than as taught in orgo 101 (at least in the gas phase). The classic mechanism is very easy to teach to students, it’s just an umbrella turning inside out in the wind. A current article in Science (vol. 352 pp. 32 – 33 1 April ’16) shows how complicated things can be when the reaction is carried out in the gas phase. Mechanisms illustrated include rebound stripping, frontside attack, ion-dipole complex, roundabout, hydrogen bond complex, frontside complex and double inversion.

Why study Sn2 in the gas phase? One reason is to sharpen computational and theoretical methods to be able to predict reaction rates (in gas phase reactions). I was surprised on looking up Rice-Ramsperger-Kassel-Marcus theory to find out how old it was. Back in the 60’s it was taught to us without any names attached. One assumes that before and after reaction the ion molecule complexes are trapped in potential wells. It is assumed that vibrational energies in the complex are quickly distributed to ‘equilibrium’ in the complexes so that detailed computation of rates can be carried out.

Is this of any use to the chemist actually reacting molecules in solution? Other than by sharpening computational tools, I don’t see how it can be given the present state of the art.

Gas phase kineticists are starting to try, but they’ve got a very long way to go. “Stepwise addition of solvent molecules to the bare reactant anion offers a bottom up approach to learn more about the transition of chemical reactions from the gas to liquid phase. To investigate the role of solvation in Sn2 reactions Otto et al. have performed crossed molecular beam studies of the microsolvated” Sn2 reaction (e.g. the approaching anion solvated with all of one or two waters). “The results show that “the dynamics differ dramatically from the unsolved anion.”

That’s not a bug — that’s a feature

Back in the early days of computers you could own (aka personal computers) it wasn’t point and click, but hunt and peck, where commands in the early operating systems (DOS, etc.) had to be typed onto the command line using a keyboard. The interfaces were far from intuitive, to say the least, and the unexpected was always expected. When things went south software designers quickly learned to say “That’s not a bug, thats a feature ! ”

Essentially the same thing has happened to the latest and greatest tool in genetic engineering, the CRISPR system. It’s fascinating that it has been hiding in plain sight for FOUR decades. In med school in the mid60s the basic book about hereditary and DNA was “Sexuality and the Genetics of Bacteria” (1961) by Francois Jacob. No one had any idea that DNA would be sequenced. Viruses were studied (called bacteriophages back then).

No one had any idea that bacteria could defend themselves against viruses, but defend they do by their CRISPR system. It’s only been known for a decade, earlier papers on the subject by 3 different authors Mojica, Gilles Vergnaud, Alexander Bolotin were rejected before eventual publication.

Briefly, when a bacterium is infected by a virus, it makes a copy of fragments of its DNA, and pastes it into its genome. On subsequent invasions, it uses the DNA copy to make RNA, which along with a complex enzyme binds to the genome of the new organism, and destroys it.

It turns out that a PAM (Protospacer Adjacent Motif) is crucial for the whole system to work. The bacterial DNA doesn’t have such a sequence of DNA, and searches for it in the invader. The PAM isn’t large (just 3 nucleotides in a row) and the system looks for it in invading viral DNA double helices.

But where does it look? On the side of the double helix with the least information — the minor groove

Look at the following

It shows classic Watson Crick base pairing — the major groove is a lot bigger taking up 210 degrees (hardly a groove) with more chemical information) than the minor groove. So binding to the major groove is likely to be far more accurate (as well as easier because it’s a larger space)

So why does E. Coli do this? Because different viruses contain different PAM sequences. [ Nature vol. 530 pp. 499 – 503 ’16 ] This is the crystal structure of the E. Coli Cascade complex (the business end of CRISPR) bound to a foreign double stranded DNA target. The 5′ ATG PAM is recognized in duplex form, from the minor groove side, by 3 structural features in the Cse1 subunit of cascade. The promiscuity inherent to minor groove DNA recognition explains how a single Cascade complex can respond to several distinct PAM sequences — this is a feature not a bug.

Types of variables you need to know to understand thermodynamics

I’m been through the first 200 pages of Dill’s Book “Molecular Driving Forces (2003)” which is all about thermodynamics and statistical mechanics, things that must be understood to have any hope of understanding cellular biophysics. There are a lot of variables to consider (with multiple names for some) and they fall into 7 non mutually exclusive types.

Here they are with a few notes about them

l. Thermodynamic State Variables: These are the classics — Entropy (S), Internal Energy (U), Helmholtz Free Energy (F), Gibbs Free Energy (G), Enthalpy (H).
All are continuous functions of their Natural Variables (see next) so they can be differentiated. Their differentials are exact.

2. Natural variable of a thermodynamic state variable — these are defined as continuous variables which when an extremum (maximum, minimum) of the state variable using them is found, the state function won’t change with time (e.g. is at equilibrium). Here they are for the 5 state functions. T is Temperature, V is Volume, N is number of molecules, S and U are what you think, and p is pressure

State Name State Function Natural Variables
Helmholtz Free Energy— F —T, V, N
Entropy —S —U, V, N
Internal Energy —U — S, V, N
Gibbs Free Energy— G —T. p, N
Enthalpy — H —S, p, N

Note that U and S are both state variables and natural variables of each other. Note also (for maximum confusion) that Helmholtz free energy is not H but F, and that H is Enthalpy not Helmholtz Free energy

3. Extensive variable –can only be characterized by how much there is of it. This includes all 5 thermodynamic state variables (F, S, U, G, H) alone with V volume, and N number of molecules.  Extensive variables are also known as degrees of freedom.

4. Intensive variable — temperature, pressure, and ratios of State and Natural variables (actually the derivative of a state variable with respect to a natural variable — temperature is actually defined this way ( partial U / partial S)

5. Control variables — these are under the experimenter’s control, and are usually kept constant. They are also known as constraints, and most are intensive (volume isn’t). Examples constant temperature, constant volume, constant pressure

6. Conjugate variables. Here we need the total differential of a state variable (which exists for all) in terms of its natural variables to under stand what is going on.

Since U is a continuous function of each of S, V, and N

we have

dU = (partial U/ partial X) dS + (partial U / partial V) dV + (partial u / partial N ) dN

= T dS – p dV – mu dN ; mu is the chemical potential

So T is conjugate to S, p is conjugate to V, and mu is conjugate to N ; note that each pair of conjugates has one intensive variable (T, p, mu) and one extensive one ( S, V, N). Clearly the derivatives ( T, p, mu) are intensive.

7. None of the above — work(w) and heat (q)

Thermodynamics can be difficult to master unless these are clear. Another reason is that what you really want is to maximize (S) or minimize (U, H, F, G) state variables — the problem is you have no way to directly measure the two crucial ones you really want (U, S) and have to infer what they are from various derivatives and control variables. You can measure changes in S and U  between two temperatures by using heat capacities. That’s just like spectroscopy, where all you measure is the difference between energy levels, not the energy levels themselves. But it is the minimum values of U, G, H, F and maximum values of S which determine what you want to know.

There’s more to come about Dill’s book. I’ve found a few mistakes and have corresponded with him about various things that seem ambiguous (to me at least). As mentioned earlier, in grad school 44 years ago, I audited a statistical mechanics course taught by E. Bright Wilson himself. I never had the temerity to utter a word to him. How things have changed for the better, to be able to Email an author and get a response. He’s been extremely helpful and patient.

When knowledge isn’t power

Here is a genetic disease, where we’ve known exactly what’s wrong with the causative gene for 23 years, over 10,000 papers have been written (a Google search comes up with about 418,000 results (0.45 seconds), but we don’t know how the mutation causes the problems it does or have a clue how to treat the disease. So much for finding the cause of a genetic disease leading to therapy. Imagine how much harder cancer is.

I speak of Huntington’s chorea, and the causative gene huntingtin. It’s a terrible neurologic disease characterized by progressive movement disorders, dementia and incapacitation over a decade or two. Woodie Guthrie had it; fortunately Arlo escaped. Like many people with the disorder Woodie was quite fertile, having 8 children.

It being a neurologic disorder, I’ve read a lot about it, and my jottings about my readings over the past few decades have consumed 83,635 characters (aren’t computers wonderful)? I’ve had a fair amount of experience with it, as an Indian agent in Montana had it, and produced many progeny with his women, leading to a good deal of devastation in one tribe.

Neuron vol. 89 pp. 910 – 926 ’16 is an excellent recent review (but not one for the fainthearted). Several mysteries are immediately apparent.

First huntingtin is expressed in nearly every neuron, but only a few die. It is expressed outside the brain in lung ovary and testes, but they work just fine.

Second Huntingtin interacts with over 350 different proteins. Figuring which are the important ones has provided steady employment.

Third it exists in many forms, so many that there aren’t enough scientists living to test them all. This is because huntingtin is subject to a variety of chemical modifications (phosphorylation, ubiquitination, acetylation, palmitoylation, sumoylation) at FORTY-EIGHT different sites (listed in the article). So this gives 2^48 possible modified forms of the protein (either modification being present or absent). 2^48 = 281,474,976,710,656 if you’re interested.

In addition to the modifications, the protein is huge — some 3,144 amino acids occurring in 67 exons forming two mRNAs of 10,366 and 13.711 nucleotides.

Fourth The protein can also be chopped up by at least 5 different enzymes at 6 different sites, and some fragments are biologically active (toxic in tissue culture).

Naturally, the region with the mutation (near the amino terminal end) of the protein has been studied most intensively.

Huntingtin has its fingers in many physiologic pies — the reference is excellent in this area — these include vesicular trafficking, cell division, cilia formation, endocytosis, autophagy, gene transcription. Abnormalities of which one causes the neurologic disease.

The mutant form forms protein aggregates. Like Alzheimer’s disease senile plaque or the Lewy body of Parkinson’s disease, we don’t know if the aggregates are toxic or protective.

Fifth: Despite all its known functions we don’t know if the mutation produces a loss of some vital function of Huntingtin, or a new and toxic function.

Even worse, compared to cancer, Huntington’s chorea is ‘simple’ because we know the cause.

The chemical ingenuity of the cell

If you know a bit of molecular biology, you know that messenger RNA (mRNA) has a tail of consecutive adenines added at its 5′ end (sorry ! ! !  3′ end — oh well). If you don’t know that much all the background you need can be found in — just follow the links.

The adenines are not coded in the genome. Why? I’ve always thought of it as something preventing the mRNA from being broken down before the ribosome translates it into protein. Gradually the adenines are nibbled off by cytoplasmic nucleases. The literature seems to agree — from my notes on various sources

Most mRNAs in mammalian cells are quite stable and have a half life measured in hours, but others turn over within 10 to 30 minutes. The 5′ cap structure in mRNA prevents attack by 5′ exonucleases and the polyadenine (polyA) tail prohibits the action of 3′ exonucleases. The absence of a polyA tail is associated with rapid degradation of mRNA. Histone mRNAs lack a polyA tail but have near their 3′ terminus a sequence which can form a stem loop structure this appears to confer resistance to exonucleolytic attack.

polyA — the polyAdenine tail found on most mRNAs must be removed before mRNA degradation can occur. Anything longer than 10 adenines in a row seems to protect mRNA. The polyA tail is homogenous in length in most species ( 70 – 90 in yeast, 220 – 250 nucleotides in mammalian cells). PolyA shortening can be separated into two phases, the first being the shortening of the tail down to 12 – 25 residues, and the second terminal deadenylation being the removal of some or all of them.

Molecular Biology of the Cell 4th Edition p. 449 — Once a critical threshold of tail shortening has been reached (about 30 As) the 5′ cap is removed (decapping) and the RNA is rapidly degraded. The proteins that carry out tail shortening compete directly with the machinery that catalyzes translation; therefore any factors increasing translation initiation efficiency increase mRNA stability. Many RNAs carry in the 3′ UTR sequences binding sites for specific proteins that increase or decrease the rate of polyA shortening.

But why polyAdenine? Why not polyCytosine or PolyGuanine or polyUridine? Here’s were the chemical ingenuity comes in. Of the 64 possible codons for amino acids only 3 tell the ribosome to stop. These are called various — termination codons, stop codons,and (idiotically) nonsense codons — they aren’t nonsense at all, and are  functionally vital for the following reason. Stop codons cause the ribosome to separate into two parts releasing the mRNA and the protein. Suppose a given mRNA doesn’t have a stop codon? Then the ribosome and the mRNA remain stuck together, and future protein synthesis by that particular ribosome becomes impossible. Not good.

This is probably why the codons for stop are so similar UAA, UAG and UGA — mutating a G to an A gives another one, and mutating either A in UAA to a G gives another stop codon. So the coding chosen for stop codons is somewhat resistant to mutation, because mRNAs with stop codons are disastrous for reasons shown above.

Well, randomness happens and suppose that the termination codon has been mutated to another amino acid. These are called nonStop RNAs which code for nonStop proteins. So the poor ribosome then translates the mRNA right to its 3′ end. Well what does AAA translate into — lysine. Lysine is quite basic and quickly becomes protonated on its epsilon lysine (even within the confines of the ribosome). The exit tunnel for the ribosome is strongly negatively charged, and so coulomb interaction grinds things to a halt. What other basic amino acids are there? There’s arginine, and perhaps histidine, but no codons for them is CCC or GGG or UUU.

Then the Ribosomal Quality Control system (RQC) then springs into action. I didn’t realize this until reading the following paper this year. Did you? Amazing cleverness on the part of the cell.

[ Nature vol. 531 pp. 191 – 195 ’16 ] Translation of an mRNA lacking a stop codon (nonStop mRNA) in eukaryotes results in a polyLysine protein (AAA codes for lysine). The positively charged lysine cause stalling in the negatively charged ribosomal exit tunnel. The Ribosomal Quality Control complex (RQC complex) recognizes nonStop proteins and mediates their ubiquitination and proteasomal degradation.

The eukaryotic RQC comprises Listerin (Ltn1) an E3 ubiquitin ligase, Rqc1, Rqc2 and the AAA+ protein CDC48. On dissociation of the stalled ribosome, Rqc binds to the peptidyl tRNA of the 60S sunit and recruits Ltn1 which curves around the 60S ribosome, positioning its ligase domain near the nascent chain exit. R2c2 is a nucleotide binding protein that recruits tRNA^Ala and tRNA^Thr to the 60S peptidyl tRNA complex. This results in the addition of a Carboxy terminal Ala/Thr sequence (a CAT tail) to the stalled nascent chain.

Mutation of Listerin causes neurodegeneration in mice.

Threading the ribosomal needle

What do you do when you to try to thread a needle? You straighten out the thread. This is exactly what a newly discovered RNA modification (1 methyl adenosine) is doing. If you look at the of adenine pairing with thymine in the following link, the hydrogen sitting between the adenine and thymine is replaced with a much bulkier methyl group in 1 methyl adenosine. Watson-Crick base pairing is impossible.

Not much 1 methyl adenosine is found in a given mRNA (usually one or less). The authors note that it is usually found near a transcription start site (and in a highly structured region — based on the PARS score — whatever that is). In particular it is found at alternative initiation sites in the second or third exon of a gene. Unsurprisingly, when it is present more protein is expressed from the mRNA.

The work is described in Nature vol. 530 pp. 422 – 423, 441 – 446 ’16. The authors wonder how many mRNA modifications are out there waiting to be discovered. Let’s hope they look. Other mRNA modifications are known (pseudouridine, 6 methyl adenine and 5 methyl cytosine). The modification is dynamic, the amount changing with cellular conditions. This isn’t a flash in the pan as 1/3 of the same sites are methylated in mouse mRNA.

Numerology – I

It’s time to put some numbers on the formulas of statistical mechanics to bring home just how fantastic the goings on inside our cells actually are.

To start — we live at temperatures of 300 Kelvin (27 Centigrade, 80 Fahrenheit). If you’ve studied statistical mechanics you know that the kinetic energy of a molecule is 3/2 k * T — where k is the Boltzmann constant and T is the temperature in Kelvin. The Boltzmann constant is the gas constant R divided by Avogadro’s number. R is to be found in the perfect gas law familiar from elementary PChem or physics — PV – nRT, where P is Pressure, V is volume, and n is the number of moles.

If you’re a bit foggy on this look at where you’ll find an explanation of why the dimensional units of R are energy divided by temperature times the number of moles.

This is all very nice but how fast are things moving at room temperature? We need to choose some units and stick to them. We’ve got Kelvin already. We can get from k (the Boltzmann constant) to R (the gas constant) easily by multiplying k by Avogadro’s number.

So now we have kinetic energy per mole (not molecule) is 3/2 R * T

You now need a choice of units for expressing the gas constant. The first part of every course in grad school was consumed with units. Don Voet used to say he preferred the hand stone fortnight system, but that isn’t used much anymore. We’ll use the MKS (Meter KiloGram Second) system. This gives kinetic energy in Joules.

A Joule is the kinetic energy of a mass of 1 kiloGram moving at a velocity of 1 meter/second — or in units — kilogram (meter/second)^2.

Now we’re getting somewhere. The next step is to get molar mass in kiloGrams. Chemists use the Dalton, where the mass of 1 mole of hydrogen is 1 Dalton (1 gram — not kilogram).

Kinetic energy = 1/2 *mass * velocity^2 = mass * (meter/second)^2 == 3/2 R*T

So velocity (in meters/second) = Sqrt ( 3 * R * T / molar mass in kilograms).

To keep things simple I’m going to assume that we’re dealing with hydrogen atoms — so its molar mass is 1 gram (10^-3 kiloGrams)

Putting it all together — the velocity of a hydrogen atom at 300 Kelvin is Sqrt ( 3 *8.314 * 300 / 10^-3 ) == 2,735 meters second

Pretty fast. To convert this to kilometers per hour multiple by 3600 and divide by 1000 == 9,846 Kilometers/hour

In Miles per hour this is 9846 (miles/kilometer) = 6,113 miles per hour.

Recall the number 2735. All you have to do to find out how fast ANY molecular species is moving at room temperature is divide this by the square root of the molecules mass (in Daltons not kiloGrams). So that of water is 2735/ sqrt (18) = 644 meters/second.

I never could be sure that some of the energy of a molecule wasn’t sucked up in vibrations and conformation change. Multiple attempts at understanding the equipartition of energy theorem didn’t help. Finally one of authors of one of 3 biophysics books I’m reading said that “the speed just depends on mass. That’s the translational part. Other degrees of Freedom (like vibrations) can absorb potential energy. But it doesn’t affect velocity.

The velocity formula works even for something as large as RNA polymerase II (500 kilodaltons). To make things really easy lets work with a molecular complex of mass 1,000,000 daltons (1 megaDalton) — there are plenty of such protein complexes of this size (and more) in the cell. A 1 megaDalton mass has a velocity of 2.7 meters a second.

Cells are small. The 3 polymerases transcribed DNA into RNA have masses in the megaDalton range. So how long should it take them to traverse a nucleus 10 microns (10^-5 meters) in diameter. It’s going at 2.7 meters/second so it will traverse 270,000 in a second or one every 4 microSeconds.

Clearly I’ve left something out — nothing in the cell moves in a straight line. It is very crowded, so that even though things are moving very quickly their trajectory isn’t straight (although the numbers I’ve given are correct for the total length of the trajectory when straightened out.  I’ll be writing about diffusion constants etc. etc. in the future, but here’s one more numerological example.

Consider pure water. How many moles of water are in a liter (1 kilogram) of water. 1000/18 – 55.5 moles. How many molecules is that

55.5 * 6.023 * 10^23. How big is water — I found a source that water can be considered a squashed sphere of maximum diameter 2.82 Angstroms. Now Angstroms are something chemist’s deal with — the hydrogen atom is about 1 Angstrom in diameter, and the carbon carbon single bond is 1.54 Angstroms.

So what is the volume of a water molecule — its (4/3) * pi * (2.82/2)^3 == 11.7 cubic Angstroms.

What is the volume of a liter in cubic Angstroms? An Angstrom is 10^-10 meters and a liter is a cube .1 meter on a side — so there are 10^27 cubic Angstroms in a liter. How many cubic Angstroms do the 55.5 moles of water in a liter take up

11.7 * 55.5 * 6.023 ^ 10^23  == 3.9 * 10^26 cubic Angstroms — 40% of the volume of a liter. So water molecule1 is likely to hit another one in 2.5 * 2.28 Angstroms or in about 7 Angstroms. How long will that take ? It’s moving at 6.44 * 10^2 meters/second and 7 Angstroms is a distance of 7 * 10^-10 meters, so it’s like to meet another water in (roughly) 10^-12 seconds (1 picoSecond).

There’s all sorts of hell breaking loose with the water inside our cells. That’s enough for now.


Get every new post delivered to your Inbox.

Join 91 other followers