Category Archives: Philosophical issues raised

SmORFs and DWORFs — has molecular biology lost its mind?

There’s Plenty of Room at The Bottom is a famous talk given by Richard Feynman 56 years ago. He was talking about something not invented until decades later — nanotechnology. He didn’t know that the same advice now applies to molecular biology. The talk itself is well worth reading — here’s the link http://www.zyvex.com/nanotech/feynman.html.

Those not up to speed on molecular biology can find what they need at — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/. Just follow the links (there are only 5) in the series.

lncRNA stands for long nonCoding RNA — nonCoding for protein that is. Long is taken to mean over 200 nucleotides. There is considerable debate concerning how many there are — but “most estimates place the number in the tens of thousands” [ Cell vol. 164 p. 69 ’16 ]. Whether they have any cellular function is also under debate. Could they be like the turnings from a lathe, produced by the various RNA polymerases we have (3 actually) simply transcribing the genome compulsively? I doubt this, because transcription takes energy and cells are a lot of things but wasteful isn’t one of them.

Where does Feynmann come in? Because at least one lncRNA codes for a very small protein using a Small Open Reading Frame (SMORF) to do so. The protein in question is called DWORF (for DWorf Open Reading Frame). It contains only 34 amino acids. Its function is definitely not trivial. It binds to something called SERCA, which is a large enzyme in the sarcoplasmic reticulum of muscle which allows muscle to relax after contracting. Muscle contraction occurs when calcium is released from the endoplasmic reticulum of muscle.  SERCA takes the released calcium back into the endoplasmic reticulum allowing muscle to contract. So repetitive muscle contraction depends on the flow and ebb of calcium tides in the cell. Amazingly there are 3 other small proteins which also bind to SERCA modifying its function. Their names are phospholamban (no kidding) sarcolipin and myoregulin — also small proteins of 52, 31 and 46 amino acids.

So here is a lncRNA making an oxymoron of its name by actually coding for a protein. So DWORF is small, but so are its 3 exons, one of which is only 4 amino acids long. Imagine the gigantic spliceosome which has a mass over 1,300,000 Daltons, 10,574 amino acids making up 37 proteins, along with several catalytic RNAs, being that precise and operating on something that small.

So there’s a whole other world down there which we’ve just begun to investigate. It’s probably a vestige of the RNA world from which life is thought to have sprung.

Then there are the small molecules of intermediary metabolism. Undoubtedly some of them are used for control as well as metabolism. I’ll discuss this later, but the Human Metabolome DataBase (HMDB) has 42,000 entries and METLIN, a metabolic database has 240,000 entries.

Then there is competitive endogenous RNA –https://luysii.wordpress.com/2012/01/29/why-drug-discovery-is-so-hard-reason-20-competitive-endogenous-rnas/

Do you need chemistry to understand this? Yes and no. How the molecules do what they do is the province of chemistry. The description of their function doesn’t require chemistry at all. As David Hilbert said about axiomatizing geometry, you don’t need points, straight lines and planes You could use tables, chairs and beer mugs. What is important are the relations between them. Ditto for the chemical entities making us up.

I wouldn’t like that.  It’s neat to picture in my mind our various molecular machines, nuts and bolts doing what they do.  It’s a much richer experience.  Not having the background is being chemical blind..  Not a good thing, but better than nothing.

An uplifting way to start the New Year

This not a scientific post. Going to a memorial service for an old friend hardly seems like an uplifting way to begin the new year. And yet it was. David and I had been friends since ’58 when we were in the same eating club. He also became an M. D. and unfortunately passed away of a slowly dementing illness, probably Alzheimer’s. As a neurologist I could do nothing for him. What little I did accomplish was discussing the scientific aspects with with his wife, explaining the latest breakthroughs she’d read about (which never were). She was a rock, standing by him until the end. Having taken care of many such patients, and having an uncle die of it, I know just how hard this is.

What in the world could be uplifting about something like this? Seeing how David’s intelligence and personality has now marched on through 4 children and (at least) 4 grandchildren. So in a way he really isn’t gone. What was uncanny was seeing David’s eyes staring at me out of his oldest daughter. It really is remarkable, given what we think we know about genetics, and that 10,000 or so of our 20,000 protein coding genes come from one parent, that an offspring will resemble just one parent and not be an amalgam of both. Perhaps just a few genes determine what we look like.

The grandchildren I talked to ranged in age from about 8 to 17. All were smart and articulate. I tried to push them to use their obvious brains to go into research and perhaps prevent or treat what happened to their grandfather. The littlest one said that he was going to be a particle physicist.

I don’t remember talking religion with David or anyone else back in college. There were devout members of the club who would march in glowing after Sunday church, only to be treated by hungover club mates to a chorus of “Onward Christian Soldiers”. One classmate did become the Lutheran Bishop of Western New York, but he certainly didn’t push his religiosity. The most religious one I do remember became a physics professor at Berkeley.

Of course there were remembrances, that of his oldest daughter being the most interesting (to me). She is a religious Christian who clearly loved her father very much, even though he was a professed atheist, although with a strong sense of right and wrong. They used to argue about the existence or nonexistence of God. She and I agreed that he would never do anything that he thought was wrong, probably one of the reasons I liked him (remember the hungover reprobates of a few paragraphs ago). I suppose his daughter now has the last word, but such an argument really has no end.

It was pretty hard to be a doc back in the 60s and 70s watching good people suffer and die, and still conceive of a benevolent creator. “The Plague” by Camus with its hideous death scene of a child pretty much sums up the argument against one.

And yet, now that we know so much more molecular biology, cellular and organismal biochemistry and physiology, our existence seems totally miraculous. I at least have achieved a sense of peace about illness, suffering and death. These things seem natural. What is truly miraculous is that we are well and functional for so long.

You can take or leave the argument from design of Reverend Paley — here it is

“”In crossing a heath, suppose I pitched my foot against a stone, and were asked how the stone came to be there; I might possibly answer, that, for anything I knew to the contrary, it had lain there forever: nor would it perhaps be very easy to show the absurdity of this answer. But suppose I had found a watch upon the ground, and it should be inquired how the watch happened to be in that place; I should hardly think of the answer I had before given, that for anything I knew, the watch might have always been there. … There must have existed, at some time, and at some place or other, an artificer or artificers, who formed [the watch] for the purpose which we find it actually to answer; who comprehended its construction, and designed its use. … Every indication of contrivance, every manifestation of design, which existed in the watch, exists in the works of nature; with the difference, on the side of nature, of being greater or more, and that in a degree which exceeds all computation.”

The more chemistry and biochemistry I know about what’s going on inside us, the harder I find it to accept that this arose by chance.

This does not make me an anti-evoloutionist. One of the best arguments for evolution, is the evidence for descent with modification, one of its major tenets. The fact that we can use one of our proteins to replace one on yeast using our present genetic technology is hard to explain any other way.

Actually to me now, the existence or nonexistence of a creator is irrelevant. The facts of how we are built is not something you need faith about. The awe about it all comes naturally the more we know and the more we find out.

It ain’t the bricks, it’s the plan

Nothing better shows the utility (and the futility) of chemistry in biology than using it to explain the difference between man and chimpanzee. You’ve all heard that our proteins are only 2% different than the chimp, so we are 98% chimpanzee. The facts are correct, the interpretation wrong. We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

This is like saying Monticello and Independence Hall are just the same because they’re both made out of bricks. One could chemically identify Monticello bricks as coming from the Virginia piedmont, and Independence Hall bricks coming from the red clay of New Jersey, but the real difference between the buildings is the plan.

It’s not the proteins, but where and when and how much of them are made. The control for this (plan if you will) lies outside the genes for the proteins themselves, in the rest of the genome (remember only 2% of the genome codes for the amino acids making up our 20,000 or so protein genes). The control elements have as much right to be called genes, as the parts of the genome coding for amino acids. Granted, it’s easier to study genes coding for proteins, because we’ve identified them and know so much about them. It’s like the drunk looking for his keys under the lamppost because that’s where the light is.

We are far more than the protein ‘bricks’ that make us up, and two current papers in Cell [ vol. 163 pp. 24 – 26, 66 – 83 ’15 ] essentially prove this.

All the molecular biology you need to understand what follows is in the following post — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/

Briefly an enhancer is a stretch of DNA distinct from the DNA coding for a given protein, to which a variety of other proteins called transcription factors bind. The enhancer DNA and associated transcription factors, then loops over to the protein coding gene and ‘turns it on’ — e.g. causes a huge (megaDalton) enzyme called pol II to make an RNA copy of the gene (called mRNA) which is then translated into protein by another huge megaDalton machine called the ribosome. Complicated no? Well, it’s happening inside you right now.

The faces of chimps and people are quite different (but not so much so that they look frighteningly human). The cell paper studied cells which in embryos go to make up the bones and soft tissues of the face called Cranial Neural Crest Cells (CNCCs). How did they get them? Not from Planned Parenthood, rather they made iPSCs (induced Pluripotent Stem Cells — https://en.wikipedia.org/wiki/Induced_pluripotent_stem_cell) differentiate into CNCCs. Not only that but they studied both human and chimp CNCCs. So you must at least wonder how close to biologic reality this system actually is.

It’s rather technical, but they had several ways of seeing if a given enhancer was active or not. By active I mean engaged in turning on a given protein coding gene so more of that protein is made. For the cognoscenti, these methods included (1) p300 binding (2) chromatin accessibility,(3) H3K4Me1/K3K4me3 ratio, (4) H3K27Ac.

The genome is big — some 3,200,000 positions (nucleotides) linearly arranged along our chromosomes. Enhancers range in size from 50 to 1,500 nucleotides, and the study found a mere 14,500 enhancers in the CNCCs. More interestingly 13% of them were activated differentially in man and chimp CNCCs. This is probably why we look different than chimps. So although the proteins are the same, the timing of their synthesis is different.

At long last, molecular biology is beginning to study the plan rather than the bricks.

Chemistry has a great role in this and will continue to do so. For instance, enhancers can be sequenced to see how different enhancer DNA is between man and chimp. The answer is not much (again 2 or so nucleotides per hundred nucleotides of enhancer). The authors did find one new enhancer motif, not seen previously called the coordinator motif. But it was present in man in chimp. Chemistry can and should explain why changing so few nucleotides changes the proteins binding to a given enhancer sequence, and it will be important in designing proteins to take advantage of these changes.

So why is chemistry futile? Because as soon as you ask what an enhancer or a protein is for, you’ve left physicality entirely and entered the realm of ideas. Asking what something is for is an entirely different question than how something actually does what it is for.  The latter question  is answerable by chemistry and physics. The first question is unanswerable by them.  The Cartesian dualism of flesh and spirit is alive and well.

It’s interesting to see how quickly questions in biology lead to teleology.

How ‘simple’ can a protein be and still have a significant biological effect

Words only have meaning in the context of the much larger collection of words we call language. So it is with proteins. Their only ‘meaning’ is the biologic effects they produce in the much larger collection of proteins, lipids, sugars, metabolites, cells and tissues of an organism.

So how ‘simple’ can a protein be and still produce a meaningful effect? As Bill Clinton would say, that depends on what you mean by simple. Well one way a protein can be simple is by only having a few amino acids. Met-enkephalin, an endogenous opiate, contains only 5 amino acids. Now many wouldn’t consider met-enkehalin a protein, calling it a polypeptide instead. But the boundary between polypeptide and protein is as fluid and ill-defined as a few grains of sand and a pile of it.

Another way to define simple, is by having most of the protein made up by just a few of the 20 amino acids. Collagen is a good example. Nearly half of it is glycine and proline (and a modified proline called hydroxyProline), leaving the other 18 amino acids to make up the rest. Collagen is big despite being simple — a single molecule has a mass of 285 kiloDaltons.

This brings us to [ Proc. Natl. Acad. Sci. vol 112 pp. E4717 – E4727 ’15 ] They constructed a protein/polypeptide of 26 amino acids of which 25 are either leucine or isoleucine. The 26th amino acid is methionine (which is found at the very amino terminal end of all proteins — remember methionine is always the initiator codon).

What does it do? It causes tumors. How so? It binds to the transmembrane domain of the beta variant for the receptor for Platelet Derived Growth factor (PDGFRbeta). The receptor when turned on causes cells to proliferate.

What is the smallest known oncoprotein? It is the E5 protein of Bovine PapillomaVirus (BPV), which is an essentially a free standing transmembrane domain (which also binds to PDGFRbeta). It has only 44 amino acids.

Well we have 26 letters + a space. I leave it to you to choose 3 of them, use one of them once, the other two 25 times, with as many spaces as you want and construct a meaningful sequence from them (in any language using the English alphabet).

Just back from an Adult Chamber Music Festival (aka Band Camp for Adults).  More about that in a future post

Is natural selection disprovable?

One of the linchpins of evolutionary theory is that natural selection works by increased reproductive success of the ‘fittest’. Granted that this is Panglossian in its tautology — of course the fittest is what survives, so of course it has greater reproductive success.

So decreased reproductive success couldn’t be the result of natural selection could it? A recent paper http://www.sciencemag.org/content/348/6231/180.full.pdf says that is exactly what has happened, and in humans to boot, not in some obscure slime mold or the like.

The work comes from in vitro fertilization which the paper says is responsible for 2 -3 % of all children born in developed countries — seems high. Maternal genomes can be sequenced and the likelihood of successful conception correlated with a variety of variants. It was found that there is a strong association between change in just one nucleotide (e.g. a single nucleotide polymorphism or SNP) and reproductive success. The deleterious polymorphism (rs2305957) decreases reproductive success. This is based on 15,388 embryos from 2,998 mothers sampled at the day-5 blastocyst stage.

What is remarkable is that the polymorphism isn’t present in Neanderthals (from which modern humans diverged between 100,000 and 400,000 year ago). It is in an area of the genome which has the characteristics of a ‘selective sweep’. Here’s the skinny

A selective sweep is the reduction or elimination of variation among the nucleotides in neighbouring DNA of a mutation as the result of recent and strong positive natural selection.

A selective sweep can occur when a new mutation occurs that increases the fitness of the carrier relative to other members of the population. Natural selection will favour individuals that have a higher fitness and with time the newly mutated variant (allele) will increase in frequency relative to other alleles. As its prevalence increases, neutral and nearly neutral genetic variation linked to the new mutation will also become more prevalent. This phenomenon is called genetic hitchhiking. A strong selective sweep results in a region of the genome where the positively selected haplotype (the mutated allele and its neighbours) is essentially the only one that exists in the population, resulting in a large reduction of the total genetic variation in that chromosome region.

So here we have something that needs some serious explaining — something decreasing fecundity which is somehow ‘fitter’ (by the definition of fitness) because it spread in the human population. The authors gamely do their Panglossian best explaining “the mitotic-error phenotype (which causes decreased fecundity) may be maintained by conferring both a deleterious effect on maternal fecundity and a possible beneficial effect of obscured paternity via a reduction in the probability of successful pregnancy per intercourse. This hypothesis is based on the fact that humans possess a suite of traits (such as concealed ovulation and constant receptivity) that obscure paternity and may have evolved to increase paternal investment in offspring.

Nice try fellas, but this sort of thing is a body blow to the idea of natural selection as increased reproductive success.

There is a way out however, it is possible that what is being selected for is something controlled near to rs2305957 so useful, that it spread in our genome DESPITE decreased fecundity.

Don’t get me wrong, I’m not a creationist. The previous post https://luysii.wordpress.com/2015/04/07/one-reason-our-brain-is-3-times-that-of-a-chimpanzee/ described some of the best evidence we have in man for another pillar of evolutionary theory — descent with modification. Here duplication of a single gene since humans diverged from chimps causes a massive expansion of the gray matter of the brain (cerebral cortex).

Fascinating

Addendum 13 April

I thought the following comment was so interesting that it belongs in the main body of the text

Handles:

Mutations dont need to confer fitness in order to spread through the population. These days natural selection is considered a fairly minor part of evolution. Most changes become fixed as the result of random drift, and fitness is usually irrelevant. “Nearly neutral theory” explains how deleterious mutations can spread through a population, even without piggybacking on a beneficial mutation; no need for panglossian adaptive hypotheses.

Here’s my reply

Well, the authors of the paper didn’t take this line, but came up with a rather contorted argument to show why decreased fecundity might be a selective advantage, rather than just saying it was random drift. They also note genomic evidence for a ‘selective sweep’ — decreased genomic heterogeneity around the SNP.

Why we imperfectly understand randomness the way we do.

The cognoscenti think the average individual is pretty dumb when it comes to probability and randomness. Not so, says a fascinating recent paper [ Proc. Natl. Acad. Sci. vol. 112 pp. 3788 – 3792 ’15 ] http://www.pnas.org/content/112/12/3788.abstract. The average joe (this may mean you) when asked to draw a random series of fifty or so heads and tails never puts in enough runs of heads or runs of tails. This leads to the gambler’s fallacy, that if an honest coin gives a run of say 5 heads, the next result is more likely to be tails.

There is a surprising amount of structure lurking within purely random sequences such as the toss of a fair coin where the probability of heads is exactly 50%. Even with a series with 50% heads, the waiting time for two heads (HH) or two tails (TT) to appear is significantly longer than for an alternation (HT or TH). On average 6 tosses will be required for HH or TT to appear while only an average of 4 are needed for HT or TH.

This is why Joe SixPack never puts in enough runs of Hs or Ts.

Why should the wait be longer for HH or TT even when 50% of the time you get a H or T. The mean time for HH and TT is the same as for HT and TH. The variance is different because the occurrences of HH and TT are bunched in time, while the HT and TH are spread evenly.

It gets worse for longer repetitions — they can build on each other. HHH contains two instances of HH, while alterations do not. Repetitions bunch together as noted earlier. We are very good at perceiving waiting times, and this is probably why we think repetitions are less likely and soon to break up.

The paper goes a lot farther constructing a neural model, based on the way our brains integrate information over time when processing sequences of events. It takes into consideration our perceptions of mean time AND waiting times. We average the two. This produces the best fitting bias gain parameter for an existing Bayesian model of randomness.

See, you’re not as dumb as they thought you were.

Another reason for our behavior comes from neuropsychology and physiological psychology. We have ways to watch the electrical activity of your brain and find out when you perceive something as different. It’s called mismatch negativity (see http://en.wikipedia.org/wiki/Mismatch_negativity for more detail). It a brain potential (called P300) peaking .1 -.25 seconds after a deviant tone or syllable.

Play 5 middle c’s in a row followed by a d than c’s again. The potential doesn’t occur after any of the c’s just after the d. This has been applied to the study of infant perception long before they can speak.

It has shown us that asian and western newborn infants both hear ‘r’ and ‘l’ quite well (showing mismatch negativity to a sudden ‘r’ or ‘l’ in a sequence of other sounds). If the asian infant never hears people speaking words with r and l in them for 6 months, it loses mismatch negativity to them (and clinical perception of them). So our brains are literally ‘tuned’ to understand the language we hear.

So we are more likely to notice the T after a run of H’s, or an H after a run of T’s. We are also likely to notice just how long it has been since it last occurred.

This is part of a more general phenomenon — the ability of our brains to pick up and focus on changes in stimuli. Exactly the same phenomenon explains why we see edges of objects so well — at least here we have a solid physiologic explanation — surround inhibition (for details see — http://en.wikipedia.org/wiki/Lateral_inhibition). It happens in the complicated circuitry of the retina, before the brain is involved.

Philosophers should note that this destroys the concept of the pure (e.g. uninterpreted) sensory percept — information is being processed within our eyes before it ever gets to the brain.

Update 31 Mar — I wrote the following to the lead author

” Dr. Sun:

Fascinating paper. I greatly enjoyed it.

You might be interested in a post from my blog (particularly the last few paragraphs). I didn’t read your paper carefully enough to see if you mention mismatch negativity, P300 and surround inhibition. if not, you should find this quite interesting.

Luysii

And received the following back in an hour or two

“Hi, Luysii- Thanks for your interest in our paper. I read your post, and find it very interesting, and your interpretation of our findings is very accurate. I completely agree with you making connections to the phenomenon of change detection and surround inhibition. We did not spell it out in the paper, but in the supplementary material, you may find some relevant references. For example, the inhibitory competition between HH and HT detectors is a key factor for the unsupervised pattern association we found in the neural model.

Yanlong”

Nice ! ! !

How formal tensor mathematics and the postulates of quantum mechanics give rise to entanglement

Tensors continue to amaze. I never thought I’d get a simple mathematical explanation of entanglement, but here it is. Explanation is probably too strong a word, because it relies on the postulates of quantum mechanics, which are extremely simple but which lead to extremely bizarre consequences (such as entanglement). As Feynman famously said ‘no one understands quantum mechanics’. Despite that it’s never made a prediction not confirmed by experiments, so the theory is correct even if we don’t understand ‘how it can be like that’. 100 years of correct prediction of experimentation are not to be sneezed at.

If you’re a bit foggy on just what entanglement is — have a look at https://luysii.wordpress.com/2010/12/13/bells-inequality-entanglement-and-the-demise-of-local-reality-i/. Even better; read the book by Zeilinger referred to in the link (if you have the time).

Actually you don’t even need all the postulates for quantum mechanics (as given in the book “Quantum Computation and Quantum Information by Nielsen and Chuang). No differential equations. No Schrodinger equation. No operators. No eigenvalues. What could be nicer for those thirsting for knowledge? Such a deal ! ! ! Just 2 postulates and a little formal mathematics.

Postulate #1 “Associated to any isolated physical system, is a complex vector space with inner product (that is a Hilbert space) known as the state space of the system. The system is completely described by its state vector which is a unit vector in the system’s state space”. If this is unsatisfying, see an explication of this on p. 80 of Nielson and Chuang (where the postulate appears)

Because the linear algebra underlying quantum mechanics seemed to be largely ignored in the course I audited, I wrote a series of posts called Linear Algebra Survival Guide for Quantum Mechanics. The first should be all you need. https://luysii.wordpress.com/2010/01/04/linear-algebra-survival-guide-for-quantum-mechanics-i/ but there are several more.

Even though I wrote a post on tensors, showing how they were a way of describing an object independently of the coordinates used to describe it, I did’t even discuss another aspect of tensors — multi linearity — which is crucial here. The post itself can be viewed at https://luysii.wordpress.com/2014/12/08/tensors/

Start by thinking of a simple tensor as a vector in a vector space. The tensor product is just a way of combining vectors in vector spaces to get another (and larger) vector space. So the tensor product isn’t a product in the sense that multiplication of two objects (real numbers, complex numbers, square matrices) produces another object of the exactly same kind.

So mathematicians use a special symbol for the tensor product — a circle with an x inside. I’m going to use something similar ‘®’ because I can’t figure out how to produce the actual symbol. So let V and W be the quantum mechanical state spaces of two systems.

Their tensor product is just V ® W. Mathematicians can define things any way they want. A crucial aspect of the tensor product is that is multilinear. So if v and v’ are elements of V, then v + v’ is also an element of V (because two vectors in a given vector space can always be added). Similarly w + w’ is an element of W if w an w’ are. Adding to the confusion trying to learn this stuff is the fact that all vectors are themselves tensors.

Multilinearity of the tensor product is what you’d think

(v + v’) ® (w + w’) = v ® (w + w’ ) + v’ ® (w + w’)

= v ® w + v ® w’ + v’ ® w + v’ ® w’

You get all 4 tensor products in this case.

This brings us to Postulate #2 (actually #4 on the book on p. 94 — we don’t need the other two — I told you this was fairly simple)

Postulate #2 “The state space of a composite physical system is the tensor product of the state spaces of the component physical systems.”

http://planetmath.org/simpletensor

Where does entanglement come in? Patience, we’re nearly done. One now must distinguish simple and non-simple tensors. Each of the 4 tensors products in the sum on the last line is simple being the tensor product of two vectors.

What about v ® w’ + v’ ® w ?? It isn’t simple because there is no way to get this by itself as simple_tensor1 ® simple_tensor2 So it’s called a compound tensor. (v + v’) ® (w + w’) is a simple tensor because v + v’ is just another single element of V (call it v”) and w + w’ is just another single element of W (call it w”).

So the tensor product of (v + v’) ® (w + w’) — the elements of the two state spaces can be understood as though V has state v” and W has state w”.

v ® w’ + v’ ® w can’t be understood this way. The full system can’t be understood by considering V and W in isolation, e.g. the two subsystems V and W are ENTANGLED.

Yup, that’s all there is to entanglement (mathematically at least). The paradoxes entanglement including Einstein’s ‘creepy action at a distance’ are left for you to explore — again Zeilinger’s book is a great source.

But how can it be like that you ask? Feynman said not to start thinking these thoughts, and if he didn’t know you expect a retired neurologist to tell you? Please.

Tensors

Anyone wanting to understand the language of general relativity must eventually tackle tensors. The following is what I wished I’d known about them before I started studying them on my own.

First, mathematicians and physicists describe tensors so differently, that it’s hard to even see that they’re talking about the same thing (one math book of mine says exactly that). Also mathematicians basically dump on the physicists’ way of doing tensors.

My first experience with tensors was years ago when auditing a graduate abstract algebra course. The instructor prefaced his first lecture by saying that tensors were the hardest thing in mathematics. Unfortunately right at that time my father became ill and I had to leave the area.

I’ll write a bit more about the mathematical approach at the end.

The physicist’s way of looking at tensors actually is a philosophical position. It basically says that there is something out there, and how two people viewing that something from different perspectives are seeing the same thing, and how they numerically describe it, while important, is irrelevant to the thing itself (ding an sich if you want to get fancy). What a tensor tries to capture is how one view of the object can be transformed into another without losing the object in the process.

This is a bit more subtle than using different measuring scales (fahrenheit vs. centigrade). That salt shaker siting there looks a bit different to everyone present at the table. Relative to themselves they’d all use different numbers to describe its location, height and width. Depending on distance it would subtend different visual angles. But it’s out there and has but one height and no one around the table would disagree.

You’re tall and see it from above, while your child sees it at eye level. You measure the distances from your eye to its top and to its bottom, subtract them and get the height. So does you child. You get the same number.

The two of you have actually used two distinct vectors in two different coordinate systems. To transform your view into that of your child’s you have to transform your coordinate system (whose origin is your eye) to the child’s. The distance numbers to the shaker from the eye are the coordinates of the shaker in each system.

So the position of the bottom of the shaker actually has two parts (e.g. the vector describing it)
l. The coordinate system of the viewer
2. The distances measured by each (the components or the coefficients of the vector).

To shift from your view of the salt shaker to that of your child’s you must change both the coordinate system and the distances measured in each. This is what tensors are all about. So the vector from the top to the bottom of the salt shaker is what you want to keep constant. To do this the coordinate system and the components must change in opposite ways. This is where the terms covariant and contravariant and all the indices come in.

What is taken as the basic change is that of the coordinate system (the basis vectors if you know what they are). In the case of the vector to the salt shaker the components transform the opposite way (as they must to keep the height of the salt shaker the same). That’s why they are called contravariant.

The use of the term contravariant vector is terribly confusing, because every vector has two parts (the coefficients and the basis) which transform oppositely. There are mathematical objects whose components (coefficients) transform the same way as the original basis vectors — these are called covariant (the most familiar is the metric, a bilinear symmetric function which takes two vectors and produces a real number). Remember it’s the way the coefficients of the mathematical object transform which determines whether they are covariant or contravariant. To make things a bit easier to remember, contRavariant coefficients have their indices above the letter (R for roof), while covariant coefficients have their indices below the letter. The basis vectors (when written in) always have the opposite position of their indices.

Another trap — the usual notation for a vector skips the basis vectors entirely, so the most familial example (x, y, z) or (x^1, x^2, x^3) is really
x^1 * e_1 + x^2 * e_2 + x^3 * e-3. Where e_1 is (1,0,0), etc. etc.

So the crucial thing about tensors is the way they transform from one coordinate system to another.

There is a far more abstract way to define tensors, as the way multilinear products of vector spaces factor through it. I don’t think you need it for relativity (I hope not). If you want to see a very concrete to this admittedly abstract business — I recommend “Differential Geometry of Manifolds” by Stephen Lovett pp. 381 – 383.

An even more abstract definition of tensors (seen in the graduate math course) is to define them on modules, not vector spaces. Modules are just vector spaces whose scalars are rings, rather than fields like the real or the complex numbers. The difference, is that unlike fields the nonZero elements don’t have inverses.

I hope this is helpful to some of you

The incredible information economy of frameshifting

Her fox and dog ate our pet rat

H erf oxa ndd oga teo urp etr at

He rfo xan ddo gat eou rpe tra t

The last two lines make no sense at all, but (neglecting the spaces) they have identical letter sequences.

Here are similar sequences of nucleotides making up the genetic code as transcribed into RNA

ATG CAT TAG CCG TAA GCC GTA GGA

TGC ATT AGC CGT AAG CCG TAG GA.

GCA TTA GCC TAA GCC GTA GGA ..

Again, in our genome there are no spaces between the triplets. But all the triplets you see are meaningful in the sense that they each code for one of the twenty amino acids (except for TAA which says stop). ATG codes for methionine (the purists will note that all the T’s should be U). I’m too lazy to look the rest up, but the ribosome doesn’t care, and will happily translate all 3 sequences into the sequential amino acids of a protein.

Both sets of sequences have undergone (reading) frame shifts.

A previous post https://luysii.wordpress.com/2014/10/13/the-bach-fugue-of-the-genome/ marveled about how something too small even to be called a virus coded for a protein whose amino acids were read in two different frames.

Frameshifting is used by viruses to get more mileage out of their genomes. Why? There is only so much DNA you can pack into the protein coat (capsids) of a virus.

[ Proc. Natl. Acad. Sci. vol. 111 pp. 14675 – 14680 ’14 ] Usually DNA density in cell nuclei or bacteria is 5 – 10% of volume. However, in viral capsids it is 55% of volume. The pressure inside the viral capsid can reach ten atmospheres. Ejection is therefore rapid (60,000 basepairs/second).

The AIDS virus (HIV1) relies on frame shifting of its genome to produce viable virus. The genes for two important proteins (gag and pol) have 240 nucleotides (80 amino acids) in common. Frameshifting occurs to allow the 240 nucleotides to be read by the cell’s ribosomes in two different frames (not at once). Granted that there are 61 3 nucleotide combinations to code for only 20 amino acids, so some redundancy is built in, but the 80 amino acids coded by the two frames are usually quite different.

That the gag and pol proteins function at all is miraculous.

The phenomenon is turning out to be more widespread. [ Proc. Natl. Acad. Sci. vol. 111 pp. E4342 – E4349 ’14 ] KSHV (Kaposi’s Sarcoma HerpesVirus) causes (what else?) Kaposi’s sarcoma, a tumor quite rare until people with AIDS started developing it (due to their lousy immune system being unable to contend with the virus). Open reading frame 73 (ORF73) codes for a major latency associated nuclear antigen 1 (LANA1). It has 3 domains a basic amino terminal region, an acidic central repeat region (divisible into CR1, CR2 and CR3) and another basic carboxy terminal region. LANA1 is involved in maintaning KSHV episomes, regulation of viral latency, transcriptional regulation of viral and cellular genes.

LANA1 is made of multiple high and lower molecular weight isoforms — e.g. a LANA ladder band pattern seen in immunoblotting.

This work shows that LANA1 (and also Epstein Barr Nuclear antigen 1` ) undergo highly efficient +1 and -2 programmed frameshifting, to generate previously undescribed alternative reading frame proteins in their repeat regions. Programmed frameshifting to generate multiple proteins from one RNA sequence can increase coding capacity, without increasing the size of the viral capsid.

The presence of similar repeat sequences in human genes (such as huntingtin — the defective gene in Huntington’s chorea) implies that we should look for frame shifting translation in ourselves as well as in viruses. In the case of mutant huntingtin frame shifting in the abnormally expanded CAG tracts rproduces proteins containing polyAlanine or polySerineArginine tracts.

Well G, A , T and C are the 1’s and 0’s of the way genetic information is stored in our genomic computer. It really isn’t surprising that the genome can be read in alternate frames. In the old days, textual information in bytes had parity bits to make sure the 1’s and 0’s were read in the correct frame. There is nothing like that in our genome (except for the 3 stop codons).

What is truly suprising it that reading in alternate frame produces ‘meaningful’ proteins. This gets us into philosophical waters. Clearly

Erf oxa ndd oga teo urp etr at

Rfo xan ddo gat eou rpe tra t

aren’t meaningful to us. Yet gag and pol are quite meaningful (even life and death meaningful) to the AIDS virus. So meaningful in the biologic sense, means able to function in the larger context of the cell. That really is the case for linguistic meaning. You have to know a lot about the world (and speak English) for the word cat to be meaningful to you. So meaning can never be defined by the word itself. Probably the same is true for concepts as well, but I’ll leave that to the philosophers, or any who choose to comment on this.

The Bach Fugue of the Genome

There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy.
– Hamlet (1.5.167-8), Hamlet to Horatio

Just when you thought we’d figured out what genomes could do, the virusoid of rice yellow mottle virus performs a feat of dense coding I’d have thought impossible. The following work requires a fairly sophisticated understanding of molecular biology which the articles in “Molecular Biology Survival Guide for Chemists” might provide the background. Give it a shot. This is fascinating stuff. If the following seems incomprehensible, start with –https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ and then follow the links forward.

Virusoids are single stranded circular RNAs which are dependent on a virus for replication. They are distinct from viroids because viroids need nothing else to replicate. Neither the virusoid or the viroid were thought to code for protein (until now). They are usually found inside the protein shells of plant viruses.

[ Proc. Natl. Acad. Sci. vol. 111 pp. 14542 – 14547 ’14 ] Viroids and virusoids (viroid like satellite RNAs) are small (220 – 450 nucleotide) covalently closed circular RNAs. They are the smallest known replicating circular RNA pathogens. They replicate via a rolling circle mechanism to produce larger concatemers which are then processed into monomeric forms by a self-splicing hammerhead ribozyme, or by cellular enzymes.

The rice yellow mottle virus (RYMV) contains a virusoid which is a covalently closed circular RNA of a mere 220 nucleotides. A 16 kiloDalton basic protein is made from it. How can this be? Figure the average molecular mass of an amino acid at 100 Daltons, and 3 codons per amino acid. This means that 220 can code for 73 amino acids at most (e.g. for a 7 – 8 kiloDalton protein).

So far the RYMV virusoid is the only RNA of viroids and virusoids which actually codes for a protein. The virusoid sequence contains an internal ribosome entry site (IRES) of the following form UGAUGA. Intiation starts at the AUG, and since 220 isn’t an integral multiple of 3 (the size of amino acid codons), it continues replicating in another reading frame until it gets to one of the UGAs (termination codons) in UGAUGA or UGAUGA. Termination codons can be ignored (leaky codons) to obtain larger read through proteins. So this virusoid is a circular RNA with no NONcoding sequences which codes for a protein in either 2 or 3 of the 3 possible reading frames. Notice that UGAUGA contains UGA in both of the alternate reading frames ! So it is likely that the same nucleotide is being read 2 or 3 ways. Amazing ! ! !

It isn’t clear what function the virusoid protein performs for the virus when the virus has infected a cell. Perhaps there aren’t any, and the only function of the protein is to help the virusoid continue existence inside the virus.

Talk about information density. The RYMV virusoid is the Bach Fugue of the genome. Bach sometimes inverts the fugue theme, and sometimes plays it backwards (a musical palindrome if you will).

It is unfortunate that more people don’t understand the details of molecular biology so they can appreciate mechanisms of this elegance. Whether you think understanding it is an esthetic experience, is up to you. I do. To me, this resembles the esthetic experience that mathematics offers.

A while back I wrote a post, wondering if the USA was acquiring brains from the MidEast upheavals, the way we did from Europe because of WWII. Here’s the link https://luysii.wordpress.com/2014/09/28/maryam-mirzakhani/.

Clearly Canada has done just that. Here are the authors of the PNAS paper above and their affiliations. Way to go Canada !

Mounir Georges AbouHaidar
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Srividhya Venkataraman
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Ashkan Golshani
bBiology Department, Carleton University, Ottawa, ON, Canada K1S 5B6
Bolin Liu
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and
Tauqeer Ahmad
aDepartment of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3B2; and

Follow

Get every new post delivered to your Inbox.

Join 85 other followers