Category Archives: Philosophical issues raised

It ain’t the bricks, it’s the plan

Nothing better shows the utility (and the futility) of chemistry in biology than using it to explain the difference between man and chimpanzee. You’ve all heard that our proteins are only 2% different than the chimp, so we are 98% chimpanzee. The facts are correct, the interpretation wrong. We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

This is like saying Monticello and Independence Hall are just the same because they’re both made out of bricks. One could chemically identify Monticello bricks as coming from the Virginia piedmont, and Independence Hall bricks coming from the red clay of New Jersey, but the real difference between the buildings is the plan.

It’s not the proteins, but where and when and how much of them are made. The control for this (plan if you will) lies outside the genes for the proteins themselves, in the rest of the genome (remember only 2% of the genome codes for the amino acids making up our 20,000 or so protein genes). The control elements have as much right to be called genes, as the parts of the genome coding for amino acids. Granted, it’s easier to study genes coding for proteins, because we’ve identified them and know so much about them. It’s like the drunk looking for his keys under the lamppost because that’s where the light is.

We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

All the molecular biology you need to understand what follows is in the following post —

Briefly an enhancer is a stretch of DNA distinct from the DNA coding for a given protein, to which a variety of other proteins called transcription factors bind. The enhancer DNA and associated transcription factors, then loops over to the protein coding gene and ‘turns it on’ — e.g. causes a huge (megaDalton) enzyme called pol II to make an RNA copy of the gene (called mRNA) which is then translated into protein by another huge megaDalton machine called the ribosome. Complicated no? Well, it’s happening inside you right now.

The faces of chimps and people are quite different (but not so much so that they look frighteningly human). The cell paper studied cells which in embryos go to make up the bones and soft tissues of the face called Cranial Neural Crest Cells (CNCCs). How did they get them? Not from Planned Parenthood, rather they made iPSCs (induced Pluripotent Stem Cells — differentiate into CNCCs. Not only that but they studied both human and chimp CNCCs. So you must at least wonder how close to biologic reality this system actually is.

It’s rather technical, but they had several ways of seeing if a given enhancer was active or not. By active I mean engaged in turning on a given protein coding gene so more of that protein is made. For the cognoscenti, these methods included (1) p300 binding (2) chromatin accessibility,(3) H3K4Me1/K3K4me3 ratio, (4) H3K27Ac.

The genome is big — some 3,200,000 positions (nucleotides) linearly arranged along our chromosomes. Enhancers range in size from 50 to 1,500 nucleotides, and the study found a mere 14,500 enhancers in the CNCCs. More interestingly 13% of them were activated differentially in man and chimp CNCCs. This is probably why we look different than chimps. So although the proteins are the same, the timing of their synthesis is different.

At long last, molecular biology is beginning to study the plan rather than the bricks.

Chemistry has a great role in this and will continue to do so. For instance, enhancers can be sequenced to see how different enhancer DNA is between man and chimp. The answer is not much (again 2 or so nucleotides per hundred nucleotides of enhancer). The authors did find one new enhancer motif, not seen previously called the coordinator motif. But it was present in man in chimp. Chemistry can and should explain why changing so few nucleotides changes the proteins binding to a given enhancer sequence, and it will be important in designing proteins to take advantage of these changes.

So why is chemistry futile? Because as soon as you ask what an enhancer or a protein is for, you’ve left physicality entirely and entered the realm of ideas. Asking what something is for is an entirely different question than how something actually does what it is for.  The latter question  is answerable by chemistry and physics. The first question is unanswerable by them.  The Cartesian dualism of flesh and spirit is alive and well.

It’s interesting to see how quickly questions in biology lead to teleology.

How ‘simple’ can a protein be and still have a significant biological effect

Words only have meaning in the context of the much larger collection of words we call language. So it is with proteins. Their only ‘meaning’ is the biologic effects they produce in the much larger collection of proteins, lipids, sugars, metabolites, cells and tissues of an organism.

So how ‘simple’ can a protein be and still produce a meaningful effect? As Bill Clinton would say, that depends on what you mean by simple. Well one way a protein can be simple is by only having a few amino acids. Met-enkephalin, an endogenous opiate, contains only 5 amino acids. Now many wouldn’t consider met-enkehalin a protein, calling it a polypeptide instead. But the boundary between polypeptide and protein is as fluid and ill-defined as a few grains of sand and a pile of it.

Another way to define simple, is by having most of the protein made up by just a few of the 20 amino acids. Collagen is a good example. Nearly half of it is glycine and proline (and a modified proline called hydroxyProline), leaving the other 18 amino acids to make up the rest. Collagen is big despite being simple — a single molecule has a mass of 285 kiloDaltons.

This brings us to [ Proc. Natl. Acad. Sci. vol 112 pp. E4717 – E4727 ’15 ] They constructed a protein/polypeptide of 26 amino acids of which 25 are either leucine or isoleucine. The 26th amino acid is methionine (which is found at the very amino terminal end of all proteins — remember methionine is always the initiator codon).

What does it do? It causes tumors. How so? It binds to the transmembrane domain of the beta variant for the receptor for Platelet Derived Growth factor (PDGFRbeta). The receptor when turned on causes cells to proliferate.

What is the smallest known oncoprotein? It is the E5 protein of Bovine PapillomaVirus (BPV), which is an essentially a free standing transmembrane domain (which also binds to PDGFRbeta). It has only 44 amino acids.

Well we have 26 letters + a space. I leave it to you to choose 3 of them, use one of them once, the other two 25 times, with as many spaces as you want and construct a meaningful sequence from them (in any language using the English alphabet).

Just back from an Adult Chamber Music Festival (aka Band Camp for Adults).  More about that in a future post

Is natural selection disprovable?

One of the linchpins of evolutionary theory is that natural selection works by increased reproductive success of the ‘fittest’. Granted that this is Panglossian in its tautology — of course the fittest is what survives, so of course it has greater reproductive success.

So decreased reproductive success couldn’t be the result of natural selection could it? A recent paper says that is exactly what has happened, and in humans to boot, not in some obscure slime mold or the like.

The work comes from in vitro fertilization which the paper says is responsible for 2 -3 % of all children born in developed countries — seems high. Maternal genomes can be sequenced and the likelihood of successful conception correlated with a variety of variants. It was found that there is a strong association between change in just one nucleotide (e.g. a single nucleotide polymorphism or SNP) and reproductive success. The deleterious polymorphism (rs2305957) decreases reproductive success. This is based on 15,388 embryos from 2,998 mothers sampled at the day-5 blastocyst stage.

What is remarkable is that the polymorphism isn’t present in Neanderthals (from which modern humans diverged between 100,000 and 400,000 year ago). It is in an area of the genome which has the characteristics of a ‘selective sweep’. Here’s the skinny

A selective sweep is the reduction or elimination of variation among the nucleotides in neighbouring DNA of a mutation as the result of recent and strong positive natural selection.

A selective sweep can occur when a new mutation occurs that increases the fitness of the carrier relative to other members of the population. Natural selection will favour individuals that have a higher fitness and with time the newly mutated variant (allele) will increase in frequency relative to other alleles. As its prevalence increases, neutral and nearly neutral genetic variation linked to the new mutation will also become more prevalent. This phenomenon is called genetic hitchhiking. A strong selective sweep results in a region of the genome where the positively selected haplotype (the mutated allele and its neighbours) is essentially the only one that exists in the population, resulting in a large reduction of the total genetic variation in that chromosome region.

So here we have something that needs some serious explaining — something decreasing fecundity which is somehow ‘fitter’ (by the definition of fitness) because it spread in the human population. The authors gamely do their Panglossian best explaining “the mitotic-error phenotype (which causes decreased fecundity) may be maintained by conferring both a deleterious effect on maternal fecundity and a possible beneficial effect of obscured paternity via a reduction in the probability of successful pregnancy per intercourse. This hypothesis is based on the fact that humans possess a suite of traits (such as concealed ovulation and constant receptivity) that obscure paternity and may have evolved to increase paternal investment in offspring.

Nice try fellas, but this sort of thing is a body blow to the idea of natural selection as increased reproductive success.

There is a way out however, it is possible that what is being selected for is something controlled near to rs2305957 so useful, that it spread in our genome DESPITE decreased fecundity.

Don’t get me wrong, I’m not a creationist. The previous post described some of the best evidence we have in man for another pillar of evolutionary theory — descent with modification. Here duplication of a single gene since humans diverged from chimps causes a massive expansion of the gray matter of the brain (cerebral cortex).


Addendum 13 April

I thought the following comment was so interesting that it belongs in the main body of the text


Mutations dont need to confer fitness in order to spread through the population. These days natural selection is considered a fairly minor part of evolution. Most changes become fixed as the result of random drift, and fitness is usually irrelevant. “Nearly neutral theory” explains how deleterious mutations can spread through a population, even without piggybacking on a beneficial mutation; no need for panglossian adaptive hypotheses.

Here’s my reply

Well, the authors of the paper didn’t take this line, but came up with a rather contorted argument to show why decreased fecundity might be a selective advantage, rather than just saying it was random drift. They also note genomic evidence for a ‘selective sweep’ — decreased genomic heterogeneity around the SNP.

Why we imperfectly understand randomness the way we do.

The cognoscenti think the average individual is pretty dumb when it comes to probability and randomness. Not so, says a fascinating recent paper [ Proc. Natl. Acad. Sci. vol. 112 pp. 3788 – 3792 ’15 ] The average joe (this may mean you) when asked to draw a random series of fifty or so heads and tails never puts in enough runs of heads or runs of tails. This leads to the gambler’s fallacy, that if an honest coin gives a run of say 5 heads, the next result is more likely to be tails.

There is a surprising amount of structure lurking within purely random sequences such as the toss of a fair coin where the probability of heads is exactly 50%. Even with a series with 50% heads, the waiting time for two heads (HH) or two tails (TT) to appear is significantly longer than for an alternation (HT or TH). On average 6 tosses will be required for HH or TT to appear while only an average of 4 are needed for HT or TH.

This is why Joe SixPack never puts in enough runs of Hs or Ts.

Why should the wait be longer for HH or TT even when 50% of the time you get a H or T. The mean time for HH and TT is the same as for HT and TH. The variance is different because the occurrences of HH and TT are bunched in time, while the HT and TH are spread evenly.

It gets worse for longer repetitions — they can build on each other. HHH contains two instances of HH, while alterations do not. Repetitions bunch together as noted earlier. We are very good at perceiving waiting times, and this is probably why we think repetitions are less likely and soon to break up.

The paper goes a lot farther constructing a neural model, based on the way our brains integrate information over time when processing sequences of events. It takes into consideration our perceptions of mean time AND waiting times. We average the two. This produces the best fitting bias gain parameter for an existing Bayesian model of randomness.

See, you’re not as dumb as they thought you were.

Another reason for our behavior comes from neuropsychology and physiological psychology. We have ways to watch the electrical activity of your brain and find out when you perceive something as different. It’s called mismatch negativity (see for more detail). It a brain potential (called P300) peaking .1 -.25 seconds after a deviant tone or syllable.

Play 5 middle c’s in a row followed by a d than c’s again. The potential doesn’t occur after any of the c’s just after the d. This has been applied to the study of infant perception long before they can speak.

It has shown us that asian and western newborn infants both hear ‘r’ and ‘l’ quite well (showing mismatch negativity to a sudden ‘r’ or ‘l’ in a sequence of other sounds). If the asian infant never hears people speaking words with r and l in them for 6 months, it loses mismatch negativity to them (and clinical perception of them). So our brains are literally ‘tuned’ to understand the language we hear.

So we are more likely to notice the T after a run of H’s, or an H after a run of T’s. We are also likely to notice just how long it has been since it last occurred.

This is part of a more general phenomenon — the ability of our brains to pick up and focus on changes in stimuli. Exactly the same phenomenon explains why we see edges of objects so well — at least here we have a solid physiologic explanation — surround inhibition (for details see — It happens in the complicated circuitry of the retina, before the brain is involved.

Philosophers should note that this destroys the concept of the pure (e.g. uninterpreted) sensory percept — information is being processed within our eyes before it ever gets to the brain.

Update 31 Mar — I wrote the following to the lead author

” Dr. Sun:

Fascinating paper. I greatly enjoyed it.

You might be interested in a post from my blog (particularly the last few paragraphs). I didn’t read your paper carefully enough to see if you mention mismatch negativity, P300 and surround inhibition. if not, you should find this quite interesting.


And received the following back in an hour or two

“Hi, Luysii- Thanks for your interest in our paper. I read your post, and find it very interesting, and your interpretation of our findings is very accurate. I completely agree with you making connections to the phenomenon of change detection and surround inhibition. We did not spell it out in the paper, but in the supplementary material, you may find some relevant references. For example, the inhibitory competition between HH and HT detectors is a key factor for the unsupervised pattern association we found in the neural model.


Nice ! ! !

How formal tensor mathematics and the postulates of quantum mechanics give rise to entanglement

Tensors continue to amaze. I never thought I’d get a simple mathematical explanation of entanglement, but here it is. Explanation is probably too strong a word, because it relies on the postulates of quantum mechanics, which are extremely simple but which lead to extremely bizarre consequences (such as entanglement). As Feynman famously said ‘no one understands quantum mechanics’. Despite that it’s never made a prediction not confirmed by experiments, so the theory is correct even if we don’t understand ‘how it can be like that’. 100 years of correct prediction of experimentation are not to be sneezed at.

If you’re a bit foggy on just what entanglement is — have a look at Even better; read the book by Zeilinger referred to in the link (if you have the time).

Actually you don’t even need all the postulates for quantum mechanics (as given in the book “Quantum Computation and Quantum Information by Nielsen and Chuang). No differential equations. No Schrodinger equation. No operators. No eigenvalues. What could be nicer for those thirsting for knowledge? Such a deal ! ! ! Just 2 postulates and a little formal mathematics.

Postulate #1 “Associated to any isolated physical system, is a complex vector space with inner product (that is a Hilbert space) known as the state space of the system. The system is completely described by its state vector which is a unit vector in the system’s state space”. If this is unsatisfying, see an explication of this on p. 80 of Nielson and Chuang (where the postulate appears)

Because the linear algebra underlying quantum mechanics seemed to be largely ignored in the course I audited, I wrote a series of posts called Linear Algebra Survival Guide for Quantum Mechanics. The first should be all you need. but there are several more.

Even though I wrote a post on tensors, showing how they were a way of describing an object independently of the coordinates used to describe it, I did’t even discuss another aspect of tensors — multi linearity — which is crucial here. The post itself can be viewed at

Start by thinking of a simple tensor as a vector in a vector space. The tensor product is just a way of combining vectors in vector spaces to get another (and larger) vector space. So the tensor product isn’t a product in the sense that multiplication of two objects (real numbers, complex numbers, square matrices) produces another object of the exactly same kind.

So mathematicians use a special symbol for the tensor product — a circle with an x inside. I’m going to use something similar ‘®’ because I can’t figure out how to produce the actual symbol. So let V and W be the quantum mechanical state spaces of two systems.

Their tensor product is just V ® W. Mathematicians can define things any way they want. A crucial aspect of the tensor product is that is multilinear. So if v and v’ are elements of V, then v + v’ is also an element of V (because two vectors in a given vector space can always be added). Similarly w + w’ is an element of W if w an w’ are. Adding to the confusion trying to learn this stuff is the fact that all vectors are themselves tensors.

Multilinearity of the tensor product is what you’d think

(v + v’) ® (w + w’) = v ® (w + w’ ) + v’ ® (w + w’)

= v ® w + v ® w’ + v’ ® w + v’ ® w’

You get all 4 tensor products in this case.

This brings us to Postulate #2 (actually #4 on the book on p. 94 — we don’t need the other two — I told you this was fairly simple)

Postulate #2 “The state space of a composite physical system is the tensor product of the state spaces of the component physical systems.”

Where does entanglement come in? Patience, we’re nearly done. One now must distinguish simple and non-simple tensors. Each of the 4 tensors products in the sum on the last line is simple being the tensor product of two vectors.

What about v ® w’ + v’ ® w ?? It isn’t simple because there is no way to get this by itself as simple_tensor1 ® simple_tensor2 So it’s called a compound tensor. (v + v’) ® (w + w’) is a simple tensor because v + v’ is just another single element of V (call it v”) and w + w’ is just another single element of W (call it w”).

So the tensor product of (v + v’) ® (w + w’) — the elements of the two state spaces can be understood as though V has state v” and W has state w”.

v ® w’ + v’ ® w can’t be understood this way. The full system can’t be understood by considering V and W in isolation, e.g. the two subsystems V and W are ENTANGLED.

Yup, that’s all there is to entanglement (mathematically at least). The paradoxes entanglement including Einstein’s ‘creepy action at a distance’ are left for you to explore — again Zeilinger’s book is a great source.

But how can it be like that you ask? Feynman said not to start thinking these thoughts, and if he didn’t know you expect a retired neurologist to tell you? Please.


Anyone wanting to understand the language of general relativity must eventually tackle tensors. The following is what I wished I’d known about them before I started studying them on my own.

First, mathematicians and physicists describe tensors so differently, that it’s hard to even see that they’re talking about the same thing (one math book of mine says exactly that). Also mathematicians basically dump on the physicists’ way of doing tensors.

My first experience with tensors was years ago when auditing a graduate abstract algebra course. The instructor prefaced his first lecture by saying that tensors were the hardest thing in mathematics. Unfortunately right at that time my father became ill and I had to leave the area.

I’ll write a bit more about the mathematical approach at the end.

The physicist’s way of looking at tensors actually is a philosophical position. It basically says that there is something out there, and how two people viewing that something from different perspectives are seeing the same thing, and how they numerically describe it, while important, is irrelevant to the thing itself (ding an sich if you want to get fancy). What a tensor tries to capture is how one view of the object can be transformed into another without losing the object in the process.

This is a bit more subtle than using different measuring scales (fahrenheit vs. centigrade). That salt shaker siting there looks a bit different to everyone present at the table. Relative to themselves they’d all use different numbers to describe its location, height and width. Depending on distance it would subtend different visual angles. But it’s out there and has but one height and no one around the table would disagree.

You’re tall and see it from above, while your child sees it at eye level. You measure the distances from your eye to its top and to its bottom, subtract them and get the height. So does you child. You get the same number.

The two of you have actually used two distinct vectors in two different coordinate systems. To transform your view into that of your child’s you have to transform your coordinate system (whose origin is your eye) to the child’s. The distance numbers to the shaker from the eye are the coordinates of the shaker in each system.

So the position of the bottom of the shaker actually has two parts (e.g. the vector describing it)
l. The coordinate system of the viewer
2. The distances measured by each (the components or the coefficients of the vector).

To shift from your view of the salt shaker to that of your child’s you must change both the coordinate system and the distances measured in each. This is what tensors are all about. So the vector from the top to the bottom of the salt shaker is what you want to keep constant. To do this the coordinate system and the components must change in opposite ways. This is where the terms covariant and contravariant and all the indices come in.

What is taken as the basic change is that of the coordinate system (the basis vectors if you know what they are). In the case of the vector to the salt shaker the components transform the opposite way (as they must to keep the height of the salt shaker the same). That’s why they are called contravariant.

The use of the term contravariant vector is terribly confusing, because every vector has two parts (the coefficients and the basis) which transform oppositely. There are mathematical objects whose components (coefficients) transform the same way as the original basis vectors — these are called covariant (the most familiar is the metric, a bilinear symmetric function which takes two vectors and produces a real number). Remember it’s the way the coefficients of the mathematical object transform which determines whether they are covariant or contravariant. To make things a bit easier to remember, contRavariant coefficients have their indices above the letter (R for roof), while covariant coefficients have their indices below the letter. The basis vectors (when written in) always have the opposite position of their indices.

Another trap — the usual notation for a vector skips the basis vectors entirely, so the most familial example (x, y, z) or (x^1, x^2, x^3) is really
x^1 * e_1 + x^2 * e_2 + x^3 * e-3. Where e_1 is (1,0,0), etc. etc.

So the crucial thing about tensors is the way they transform from one coordinate system to another.

There is a far more abstract way to define tensors, as the way multilinear products of vector spaces factor through it. I don’t think you need it for relativity (I hope not). If you want to see a very concrete to this admittedly abstract business — I recommend “Differential Geometry of Manifolds” by Stephen Lovett pp. 381 – 383.

An even more abstract definition of tensors (seen in the graduate math course) is to define them on modules, not vector spaces. Modules are just vector spaces whose scalars are rings, rather than fields like the real or the complex numbers. The difference, is that unlike fields the nonZero elements don’t have inverses.

I hope this is helpful to some of you

The incredible information economy of frameshifting

Her fox and dog ate our pet rat

H erf oxa ndd oga teo urp etr at

He rfo xan ddo gat eou rpe tra t

The last two lines make no sense at all, but (neglecting the spaces) they have identical letter sequences.

Here are similar sequences of nucleotides making up the genetic code as transcribed into RNA




Again, in our genome there are no spaces between the triplets. But all the triplets you see are meaningful in the sense that they each code for one of the twenty amino acids (except for TAA which says stop). ATG codes for methionine (the purists will note that all the T’s should be U). I’m too lazy to look the rest up, but the ribosome doesn’t care, and will happily translate all 3 sequences into the sequential amino acids of a protein.

Both sets of sequences have undergone (reading) frame shifts.

A previous post marveled about how something too small even to be called a virus coded for a protein whose amino acids were read in two different frames.

Frameshifting is used by viruses to get more mileage out of their genomes. Why? There is only so much DNA you can pack into the protein coat (capsids) of a virus.

[ Proc. Natl. Acad. Sci. vol. 111 pp. 14675 – 14680 ’14 ] Usually DNA density in cell nuclei or bacteria is 5 – 10% of volume. However, in viral capsids it is 55% of volume. The pressure inside the viral capsid can reach ten atmospheres. Ejection is therefore rapid (60,000 basepairs/second).

The AIDS virus (HIV1) relies on frame shifting of its genome to produce viable virus. The genes for two important proteins (gag and pol) have 240 nucleotides (80 amino acids) in common. Frameshifting occurs to allow the 240 nucleotides to be read by the cell’s ribosomes in two different frames (not at once). Granted that there are 61 3 nucleotide combinations to code for only 20 amino acids, so some redundancy is built in, but the 80 amino acids coded by the two frames are usually quite different.

That the gag and pol proteins function at all is miraculous.

The phenomenon is turning out to be more widespread. [ Proc. Natl. Acad. Sci. vol. 111 pp. E4342 – E4349 ’14 ] KSHV (Kaposi’s Sarcoma HerpesVirus) causes (what else?) Kaposi’s sarcoma, a tumor quite rare until people with AIDS started developing it (due to their lousy immune system being unable to contend with the virus). Open reading frame 73 (ORF73) codes for a major latency associated nuclear antigen 1 (LANA1). It has 3 domains a basic amino terminal region, an acidic central repeat region (divisible into CR1, CR2 and CR3) and another basic carboxy terminal region. LANA1 is involved in maintaning KSHV episomes, regulation of viral latency, transcriptional regulation of viral and cellular genes.

LANA1 is made of multiple high and lower molecular weight isoforms — e.g. a LANA ladder band pattern seen in immunoblotting.

This work shows that LANA1 (and also Epstein Barr Nuclear antigen 1` ) undergo highly efficient +1 and -2 programmed frameshifting, to generate previously undescribed alternative reading frame proteins in their repeat regions. Programmed frameshifting to generate multiple proteins from one RNA sequence can increase coding capacity, without increasing the size of the viral capsid.

The presence of similar repeat sequences in human genes (such as huntingtin — the defective gene in Huntington’s chorea) implies that we should look for frame shifting translation in ourselves as well as in viruses. In the case of mutant huntingtin frame shifting in the abnormally expanded CAG tracts rproduces proteins containing polyAlanine or polySerineArginine tracts.

Well G, A , T and C are the 1’s and 0’s of the way genetic information is stored in our genomic computer. It really isn’t surprising that the genome can be read in alternate frames. In the old days, textual information in bytes had parity bits to make sure the 1’s and 0’s were read in the correct frame. There is nothing like that in our genome (except for the 3 stop codons).

What is truly suprising it that reading in alternate frame produces ‘meaningful’ proteins. This gets us into philosophical waters. Clearly

Erf oxa ndd oga teo urp etr at

Rfo xan ddo gat eou rpe tra t

aren’t meaningful to us. Yet gag and pol are quite meaningful (even life and death meaningful) to the AIDS virus. So meaningful in the biologic sense, means able to function in the larger context of the cell. That really is the case for linguistic meaning. You have to know a lot about the world (and speak English) for the word cat to be meaningful to you. So meaning can never be defined by the word itself. Probably the same is true for concepts as well, but I’ll leave that to the philosophers, or any who choose to comment on this.

The Bach Fugue of the Genome

There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy.
– Hamlet (1.5.167-8), Hamlet to Horatio

Just when you thought we’d figured out what genomes could do, the virusoid of rice yellow mottle virus performs a feat of dense coding I’d have thought impossible. The following work requires a fairly sophisticated understanding of molecular biology which the articles in “Molecular Biology Survival Guide for Chemists” might provide the background. Give it a shot. This is fascinating stuff. If the following seems incomprehensible, start with – and then follow the links forward.

Virusoids are single stranded circular RNAs which are dependent on a virus for replication. They are distinct from viroids because viroids need nothing else to replicate. Neither the virusoid or the viroid were thought to code for protein (until now). They are usually found inside the protein shells of plant viruses.

[ Proc. Natl. Acad. Sci. vol. 111 pp. 14542 – 14547 ’14 ] Viroids and virusoids (viroid like satellite RNAs) are small (220 – 450 nucleotide) covalently closed circular RNAs. They are the smallest known replicating circular RNA pathogens. They replicate via a rolling circle mechanism to produce larger concatemers which are then processed into monomeric forms by a self-splicing hammerhead ribozyme, or by cellular enzymes.

The rice yellow mottle virus (RYMV) contains a virusoid which is a covalently closed circular RNA of a mere 220 nucleotides. A 16 kiloDalton basic protein is made from it. How can this be? Figure the average molecular mass of an amino acid at 100 Daltons, and 3 codons per amino acid. This means that 220 can code for 73 amino acids at most (e.g. for a 7 – 8 kiloDalton protein).

So far the RYMV virusoid is the only RNA of viroids and virusoids which actually codes for a protein. The virusoid sequence contains an internal ribosome entry site (IRES) of the following form UGAUGA. Intiation starts at the AUG, and since 220 isn’t an integral multiple of 3 (the size of amino acid codons), it continues replicating in another reading frame until it gets to one of the UGAs (termination codons) in UGAUGA or UGAUGA. Termination codons can be ignored (leaky codons) to obtain larger read through proteins. So this virusoid is a circular RNA with no NONcoding sequences which codes for a protein in either 2 or 3 of the 3 possible reading frames. Notice that UGAUGA contains UGA in both of the alternate reading frames ! So it is likely that the same nucleotide is being read 2 or 3 ways. Amazing ! ! !

It isn’t clear what function the virusoid protein performs for the virus when the virus has infected a cell. Perhaps there aren’t any, and the only function of the protein is to help the virusoid continue existence inside the virus.

Talk about information density. The RYMV virusoid is the Bach Fugue of the genome. Bach sometimes inverts the fugue theme, and sometimes plays it backwards (a musical palindrome if you will).

It is unfortunate that more people don’t understand the details of molecular biology so they can appreciate mechanisms of this elegance. Whether you think understanding it is an esthetic experience, is up to you. I do. To me, this resembles the esthetic experience that mathematics offers.

A while back I wrote a post, wondering if the USA was acquiring brains from the MidEast upheavals, the way we did from Europe because of WWII. Here’s the link

Clearly Canada has done just that. Here are the authors of the PNAS paper above and their affiliations. Way to go Canada !

Mounir Georges AbouHaidar
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Srividhya Venkataraman
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Ashkan Golshani
bBiology Department, Carleton University, Ottawa, ON, Canada K1S 5B6
Bolin Liu
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Tauqeer Ahmad
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and

A Troublesome Inheritance – IV — Chapter 3

Chapter III of “A Troublesome Inheritance” contains a lot of very solid molecular genetics, and a lot of unfounded speculation. I can see why the book has driven some otherwise rational people bonkers. Just because Wade knows what he’s talking about in one field, doesn’t imply he’s competent in another.

Several examples: p. 41 “”Nonethless, it is reasonable to assume that if traits like skin color have evolved in a population, the same may be true of its social behavior.” Consider yes, assume no.

p. 42 “The society of living chimps can thus with reasonable accuracy stand as a surrogate for the joint ancester” (of humans and chimps — thought to be about 7 megaYears ago) and hence describe the baseline from which human social behavior evolved.” I doubt this.

The chapter contains many just so stories about the evolution of chimp and human societies (post hoc propter hoc). Plausible, but not testable.

Then follows some very solid stuff about the effects of the hormone oxytocin (which causes lactation in nursing women) on human social interaction. Then some speculation on the ways natural selection could work on the oxytocin system to make people more or less trusting. He lists several potential mechanisms for this (1) changes in the amount of oxytocin made (2) increasing the number of protein receptors for oxytocin (3) making each receptor bind oxytocin more tightly. This shows that Wade has solid molecular biological (and biological) chops.

He quotes a Dutch psychologist on his results with oxytocin and sociality — unfortunately, there have been too many scandals involving Dutch psychologists and sociologists to believe what he says until its replicated (Google Diederik Stapel, Don Poldermans, Jens Forster, Markus Denzler if you don’t believe me). It’s sad that this probably honest individual is tarred with that brush but he is.

p. 59 — He notes that the idea that human behavior is solely the result of social conditions with no genetic influence is appealing to Marxists, who hoped to make humanity behave better by designing better social conditions. Certainly, much of the vitriol heaped on the book has come from the left. A communist uncle would always say ‘it’s the system’ to which my father would reply ‘people will corrupt any system’.

p. 61 — the effect of mutations of lactose tolerance on survival on society are noted — people herding cattle and drinking milk, survive better if their gene to digest lactose (the main sugar in milk) isn’t turned off after childhood. If your society doesn’t herd animals, there is no reason for anyone to digest milk after weaning from the breast. The mutations aren’t in the enzyme digesting lactose, but in the DNA that turns on expression of the gene for the enzyme (e.g. the promoter). Interestingly, 3 separate mutations in African herders have been found to do this, and different from the one that arose in the Funnel Beaker Culture of Scandinavia 6,000 yers ago. This is a classic example of natural selection producing the same phenotypic effect by separate mutations.

There is a much bigger biological fish to be fried here, which Wade doesn’t discuss. It takes energy to make any protein, and there is no reason to make a protein to help you digest milk if you aren’t nursing, and one very good reason not to — it wastes metabolic energy, something in short supply in humans as they lived until about 15,000 years ago. So humans evolved a way not to make the protein in adult life. The genetic change is in the DNA controlling protein production not the protein itself.

You may have heard it said that we are 98% Chimpanzee. This is true in the sense that our 20,000 or so proteins are that similar to the chimp. That’s far from the whole story. This is like saying Monticello and Independence Hall are just the same because they’re both made out of bricks. One could chemically identify Monticello bricks as coming from the Virginia piedmont, and Independence Hall bricks coming from the red clay of New Jersey, but the real difference between the buildings is the plan.

It’s not the proteins, but where and when and how much of them are made. The control for this (plan if you will) lies outside the genes for the proteins themselves, in the rest of the genome. The control elements have as much right to be called genes, as the parts of the genome coding for amino acids. Granted, it’s easier to study genes coding for proteins, because we’ve identified them and know so much about them. It’s like the drunk looking for his keys under the lamppost because that’s where the light is.

p. 62 — There follows some description of the changes of human society from hunter gathering, to agrarian, to the rise of city states, is chronicled. Whether adaptation to different social organizations produced genetic changes permitting social adaptation or were the cause of it isn’t clear. Wade says “changes in social behavior, has most probably been molded by evolution, through the underlying genetic changes have yet to be identified.” This assumes a lot, e.g. that genetic changes are involved. I’m far from sure, but the idea is not far fetched. Stating that genetic changes have never, and will never shape society, is without any scientific basis, and just as fanciful as many of Wade’s statements in this chapter. It’s an open question, which is really all Wade is saying.

In defense of Wade’s idea, think about animal breeding as Darwin did extensively. The Origin of Species (worth a read if you haven’t already read it) is full of interchanges with all sorts of breeders (pigeons, cattle). The best example we have presently are the breeds of dogs. They have very different personalities — and have been bred for them, sheep dogs mastifs etc. etc. Have a look at [ Science vol. 306 p. 2172 ’04, Proc. Natl. Acad. Sci. vol. 101 pp. 18058 – 18063 ’04 ] where the DNA of variety of dog breeds was studied to determine which changes determined the way they look. The length of a breed’s snout correlated directly with the number of repeats in a particular protein (Runx-2). The paper is a decade old and I’m sure that they’re starting to look at behavior.

More to the point about selection for behavioral characteristics, consider the domestication of the modern dog from the wolf. Contrast the dog with the chimp (which hasn’t been bred).

[ Science vol. 298 pp. 1634 – 1636 ’02 ] Chimps are terrible at picking up human cues as to where food is hidden. Cues would be something as obvious as looking at the containing, pointing at the container or even touching it. Even those who eventually perform well, take dozens of trials or more to learn it. When tested in more difficult tests requiring them to show flexible use of social cues they don’t

This paper shows that puppies (raised with no contact with humans) do much better at reading humans than chimps. However wolf cubs do not do better than the chimps. Even more impressively, wolf cubs raised by humans don’t show the same skills. This implies that during the process of domestication, dogs have been selected for a set of social cognitive abilities that allow them to communicate with humans in unique ways. Dogs and wolves do not perform differently in a non-social memory task, ruling out the possibility that dogs outperform wolves in all human guided tasks.

All in all, a fascinating book with lots to think about, argue with, propose counterarguments, propose other arguments in support (as I’ve just done), etc. etc. Definitely a book for those who like to think, whether you agree with it all or not.

Old dog does new(ly discovered) tricks

One of the evolutionarily oldest enzyme classes is aaRS (for amino acyl tRNA synthetase). Every cell has them including bacteria. Life as we know it wouldn’t exist without them. Briefly they load tRNA with the appropriate amino acid. If this Greek to you, look at the first 3 articles in

Amino acyl tRNA syntheses are enzymes of exquisite specificity, having to correctly match up 20 amino acids to some 61 different types of tRNAs. Mistakes in the selection of the correct amino acid occurs every 1/10,000 to 1/100,000, and in the selection of the correct tRNA every 1/1,000,000. The lower tRNA error rate is due to the fact that tRNAs are much larger than amino acids, and so more contacts between enzyme and tRNA are possible.

As the tree of life was ascended from bacteria over billions of years, 13 new protein domains which have no obvious association with aminoacylation have been added to AARS genes. More importantly, the additions have been maintained over the course of evolution (with no change in the primary function of the synthetase). Some of the new domains are appended to each of several synthetases, while others are specific to a single synthetase. The fact that they’ve been retained implies they are doing something that natural selection wants (teleology inevitably raises its ugly head with any serious discussion of molecular biology or cellular physiology — it’s impossible to avoid).

[ Science vol.345 pp 328 – 332 ’14 ] looked at what mRNAs some 37 different AARS genes were transcribed into. Six different human tissues were studied this way. Amazingly, 79% of the 66 in-frame splice variants removed or disrupted the aaRS catalytic domain. . The AARS for histidine had 8 inframe splice variants all of which removed the catalytic domain. 60/70 variants losing the catalytic domain (they call these catalytic nulls) retained at least one of the 13 added domains in higher eukaryotes. Some of the transcripts were tissue specific (e.g. present in some of the 6 tissues but not all).

Recent work has shown roles for specific AARSs in a variety of pathways — blood vessel formation, inflammation, immune response, apoptosis, tumor formation, p53 signaling. The process of producing a completely different function for a molecule is called exaptation — to contrast it with adaptation.

Up to now, when a given protein was found to have enzymatic activity, the book on what that protein did was closed (with the exception of the small GTPases). End of story. Yet here we have cells spending the metabolic energy to make an enzymatically dead protein (aaRSs are big — the one for alanine has nearly 1,000 amino acids). Teleology screams — what is it used for? It must be used for something! This is exactly where chemistry is silent. It can explain the incredible selectivity and sensitivity of the enzyme but not what it is ‘for’. We have crossed the Cartesian dualism between flesh and spirit.

Could this sort of thing be the tip of the iceberg? We know that splice variants of many proteins are common. Could other enzymes whose function was essentially settled once substrates were found, be doing the same thing? We may have only 20,000 or so protein coding genes, but 40,000, 60,000, . . . or more protein products of them, each with a different biological function.

So aaRSs are very old molecular biological dogs, who’ve been doing new tricks all along. We just weren’t smart enough to see them (’till now).

Novels may have only 7 basic plots, but molecular biology continues to surprise and enthrall.


Get every new post delivered to your Inbox.

Join 82 other followers