Tag Archives: HIV1

Is there anything in the cell that has just one function — more moonlighting — this time mRNA

Able was I ere I saw Elba said Napoleon. It’s called a palindrome, and can be read either way. So can DNA which brings me to antisense transcription of DNA, particularly in two famous retroviruses –the AIDS virus (HIV1) and HTLV-1.  

Proc. Natl. Acad. Sci. vol. 118 e2014783118 ’21  shows that mRNA can moonlight to do other things than code for protein.  Here’s a direct quote to set the stage.

“Retroviruses share a similar genome structure. The integrated retroviral genome, called the provirus, has two identical long terminal repeats (LTR) located at its 5′ and 3′ ends, respectively. The 5′ LTR acts as the promoter of almost all retroviral genes and thus is indispensable for viral transcription and replication. However, selective methylation of the 5′ LTR and the subsequent viral latency have been observed in HIV-1 and HTLV-1. In contrast, the 3′ LTR of HIV-1 and HTLV-1 remains nonmethylated, and recent findings have shown that novel retroviral genes are transcribed from the 3′ LTR in an antisense direction”.

The 3′ LTR of the AIDS virus enables antisense transcription for  the unimaginatively named ASP (AntiSense Protein).  So the mRNA for ASP is transcribed in the nucleus.  But it doesn’t get out as well as it might, because its 5′ end isn’t polyAdenylated.  So it sticks around in the nucleus and binds to DNA, turning off transcription of the regular HIV1 genome — e.g. helping to maintain viral latency (and preventing a true cure of HIV1 in any individual).

This is unprecedented.  Here is an mRNA with a completely different function (e.g. regulating gene expression).  This is classic moonlighting as something else and the authors call the mRNA for ASP a bifunctional mRNA. 

The other, retrovirus HTLV-1 also has an antisense transcript making a protein called HBZ (your don’t want to know what it stands for). Unlike ASP, HBZ turns on a variety of genes. 

I’ve been fascinated by moonlighting molecules, probably because they show the depths of our ignorance of the biochemical machinations inside the cell.  Even when you think you’ve got the function of a molecule tied down, it goes off and does something else. 

 Here are some links to other posts on the subject.  To get to them just click on the titles

Moonlighting molecules

More moonlighting

A moonlighting quorum sensing molecule

Why the news about SARS-CoV-2 mutations is actually good

How can the latest news about mutations in the pandemic virus be good?   Simply this — there are so many known ones, that it’s almost certain that nearly every possible mutations has been formed out there, and since not much has changed about the lethality of the virus, none of them are that bad.  Not only that, but the ones that haven’t occurred must be lethal to the virus and will give us new ideas about how to attack it.

Here is a link to an article (vol. 585 pp. 174 – 177 ’20) in the current 10 September Nature — https://media.nature.com/original/magazine-assets/d41586-020-02544-6/d41586-020-02544-6.pdf.  Hopefully not behind a paywall. It’s definitely worth a read

You’ve probably heard about the D614G mutation in the spike protein of the virus.  It came out of nowhere and has taken over worldwide, even in areas where different forms of the viral genome were previously established.  D is the one letter abbreviation for aspartic acid, one of the twenty amino acids, and G stands for another one glycine.  This immediately makes bells ring for the chemist, because glycine is the smallest amino acid, having a single hydrogen atom for its side chain, while the side chain of aspartic acid contains 2 carbons 4 hydrogens one oxygen and one nitrogen atom.  So there’s a lot more room for the protein where aspartic acid used to be.

Whether or not the mutation made the virus more infectious still isn’t known.  It appears to be more infectious in studies using pseudoviruses.  Not everyone has a high level containment facility, so people work with the AIDS virus (HIV1) which doesn’t need one and simply change one of its proteins to the spike protein of the pandemic virus (yes we have the technology to do that).  Then they infect cells with the pseudovirus.  Translating this to whole organisms (us) with the real virus requires a leap of faith.  It’s a long leap, but pseudoviruses are the best thing we have at present.

Here are three quotes from the article ”

“More than 90,000 isolates have been sequenced and made public (see http://www.gisaid.org). ”

“Two SARS-CoV-2 viruses collected from anywhere in the world differ by an average of just 10 RNA letters out of 29,903,”

“Researchers have catalogued more than 12,000 mutations in SARS-CoV-2 genomes. ”

How many mutations are possible in the  viral genome?  Just 29,903 times 3, because at each position, the element normally there can change to only 3 others — the viral genome is made of RNA is a linear chain of only 29,003 nucleotides, and each nucleotide can be uracil (U), adenine (A) guanosine  (G) or cytosine (C).  That’s it.  Proteins can have 20 different amino acids at each position.

So 13% of all possible mutations have been found in the virus, out of only 90,000 completely sequenced genomes. There are now 28,000,000 cases out there, so it’s almost certain with 1,000 times more virus out there to sequence, that nearly all the other 44,000 or so possible mutations have already occurred somewhere in the world.

How can this be good news?  Because if any of them were truly horrible, we’d know about it.  It would have taken over just the way the D614D mutation did.

But there’s even more to be gleaned from this work.  Hopefully http://www.gisaid.org is continuing to accumulate more and more sequences from all over the world.  Suppose certain mutations don’t show up.   This means they are fatal to an infectious virus.  Since we know exactly what proteins the virus is making and what stretch of the genome makes each one, this should suggest  clear lines of attack into the virus.

A bombshell that wasn’t

Yesterday, a friend sent me the following

” Chinese Coronavirus Is a Man Made Virus According to Luc Montagnier the Man Who Discovered HIV

Contrary to the narrative that is being pushed by the mainstream that the COVID 19 virus was the result of a natural mutation and that it was transmitted to humans from bats via pangolins, Dr Luc Montagnier the man who discovered the HIV virus back in 1983 disagrees and is saying that the virus was man made.”

Pretty impressive isn’t it?  Montagnier says that in the 30,000 nucleotide sequence of the new coronovirus SARS-CoV-2 he found sequences of the AIDS virus (HIV1).  Worse, the biolab in Wuhan was working both on HIV1 and coronaviruses.  It seems remote that a human could have been simultaneously infected with both, but these things happen all the time in the lab, intentionally or not.

It really wouldn’t take much to prove Montagnier’s point.  Matching 20 straight nucleotides from HIV1 to the Wuhan coronavirus is duck soup now that we have the sequences of both.  HIV1 has a genome with around 10,000 nucleotides, and the Wuhan coronavirus has a genome of around 30,000.  Recall that each nucleotide can be one of 4 things: A, U, G, C.  In the genome the nucleotides are ordered, and differences in the order mean different things — consider the two words united and untied.

Suppose Montagnier found a 20 nucleotide sequence from HIV1 in the new coronavirus genome. How many possibilities are there for such a sequence?  Well for a 2 nucleotide sequence there are 4 x 4 == 4^2 = 16,  for a 3 nucleotide sequence 4 x 4 x 4 == 4^3 = 64.  So for 20 nucleotides there are 4^20 possible sequences == 1,099,511,622,776 different possibilities.  So out of the HIV1 genome there are 10,000 – 20 such sequences, and in the coronavirus sequence there are 30,000 -20  such sequences so there are 10,000 times 30,000 ways for a 20 nucleotide sequence to match up between the two genomes.  That 300,000,000 ways for a match to occur by chance — or less than .1%.  If you’re unsatisfied with those odds than make the match larger.  25 nucleotides should satisfy the most skeptical.

But there’s a rub — as Carl Sagan has said  “Extraordinary claims require extraordinary evidence.”  Apparently Montagnier hasn’t published the sequence of HIV1 he claims to have found in the coronavirus.   If anyone knows what it is please write a comment.

Then there’s the fact that Montagnier appears to have gone off his rocker. In 2009 he published a  paper (in a journal he apparently built) which concludes that diluted DNA from pathogenic bacterial and viral species is able to emit specific radio waves” and that “these radio waves [are] associated with ‘nanostructures’ in the solution that might be able to recreate the pathogen”.

Sad.  Just as one of the greatest chemists of the 20th century will be remembered for his crackpot ideas about vitamin C (Linus Pauling), Montagnier may be remembered for this.

On second thought, there is no reason to need Montagnier and his putative sequence at all. The sequences of both genomes are known.     Matching any 20 nucleotide sequence from HIV1 to any of the 30,000 – 20 20 nucleotide sequences from the Wuhan flu is a problem right out of Programming 101.  It’s a matter of a few loops, if thens and go to’s.  . If you’re ambitious  you could start with smaller sequences say 5 – 10 nucleotides, find a match, move to the next largest size sequence and repeat until you find the largest contiguous sequence of nucleotides in HIV1 to be found in the coronavirus.

You can read about the Wuhan lab in an article from Nature in 2017 — https://www.nature.com/news/inside-the-chinese-lab-poised-to-study-world-s-most-dangerous-pathogens-1.21487


hed oga tet hec atw hoa tet her atw hob ith erp aw

Say what?  It’s a simple sentence made of 3 letter words frameshifted by one

he dog ate the cat who ate the rat who bit her paw

Codons are read as groups of three nucleotides, and frameshifting has always been thought to totally destroy the meaning of a protein, as an entirely different protein is made.

Not so says PNAS vol. 117 pp. 5907 – 5912 ’20. Normally a frameshifted protein has only 7% sequence identity with the original.  This is about what one would expect given that there are 20 amino acids, and chance coincidence would argue for 5%.  But there are more ways for proteins to be similar rather than identical.  One can classify our amino acids in several ways, charged vs. uncharged, aromatic vs. nonaromatic, hydrophilic vs. hydrophobic etc. etc.

The authors looked at 2,900 human proteins, then they frameshifted the original by +1 and compared the hydrophobicity profiles of the two.  Amazingly there was a correlation of .7 between the two, despite sequence identity of 7%.  Similarly frameshifting didn’t disturb the chance of intrinsic disorder.  So frameshifting is embedded in the structure of the universal genetic code, and may have actually contributed to its shaping.  Frameshifting could be an evolutionary mechanism of generating proteins with similar attributes (hydrophobicity, intrinsic order vs. disorder, etc.) but with vastly different sequences.  The evolution, aka natural selection aka deus ex machine aka God could muck about the ready made protein and find something new for it to do.   A remarkable concept.

The gag-pol precursor p180 of the AIDS virus is derived from the gag-pol mRNA by translation involving ribosomal frameshifting within the gag-pol overlap region.  The overlap is 241 nucleotides with pol in the -1 phase with respect to gag (that’s an amazing 80 amino acids).  I was amazed at the efficiency of coding of two different proteins (one and enzyme and one structural), but perhaps they aren’t that different in terms of hydrophobicity (or something else).

I’d love to see the hydropathy profile of the overlap of the two proteins, but I don’t know how to get it.

When is the AIDs virus really dead?

When should we regard an AIDs virus lurking in the genome of a white blood cell as dead (or at least harmless).  Such proviruses are called defective, and commonly formed, because the process of reverse transcription (of RNA into DNA) is quite error prone.

Most would say an HIV1 provirus in the genome is dead  if can’t reproduce and get outside the cell carrying it.  Not so fast says Proc. Natl. Acad. Sci. vol. 117 pp. 3704 – 3710 ’20.  They show that such defective proviruses can be transcribed into RNA and these RNAs can produce proteins (when translated).

There is some evidence for this as the Nef protein of HIV1 can be detected in cells and plasma even when HAART (Highly Active Anti Retroviral Therapy) has knocked plasma viremia down to a level of under   50 copies/milliLiter.

How could this cause trouble ? Easy.  This would be chronically stimulating the immune system and in effect wearing it out.

This is very new stuff, and the fate of white cells containing replication incompetent proviruses which are still producing proteins isn’t known (but I’m sure this isn’t far off).

The chemical ingenuity of the AIDs virus

Pop quiz:  You are a virus with under 10,000 nucleotides in your genome.  To make the capsid enclosing your genome, you need to make 250 hexamers of a particular protein.  How do you do it?


Give up?


You grab a cellular metabolite with a mass under 1,000 Daltons to bind the 6 monomers together.  The metabolite occurs at fairly substantial concentrations (for a metabolite) of 10 – 40 microMolar.

What is the metabolite?

Give up?


It has nearly perfect 6 fold symmetry.


Still give up?

[ Nature vol. 560 pp. 509 – 512 ’18 ]  https://www.nature.com/articles/s41586-018-0396-4 says that it’s inositol hexakisphosphate (IP6)  — nomenclature explained at the end. http://www.refinebiochem.com/pages/InositolHexaphosphate.html

Although IP6 looks like a sugar (with 6 CHOH groups forming a 6 membered ring), it is not a typical one because it is not an acetal (no oxygen in the ring).  All 6 hydroxyls of IP6 are phosphorylated.  They bind to two lysines on a short (21 amino acids) alpha helix found in the protein (Gag which has 500 amino acids).  That’s how IP6 binds the 6 Gag proteins together. The paper has great pictures.

It is likely that IP6 is use by other cellular proteins to form hexamers (but the paper doesn’t discuss this).

IP6 is quite symmetric, and 5 of the 6 phosphorylated hydroxyls can be equatorial, so this is likely the energetically favored conformation, given the bulk (and mass) of the phosphate group.

I think that the AIDS virus definitely has more chemical smarts than we do.  Humility is definitely in order.

Nomenclature note:  We’re all used to ATP (Adenosine TriPhosphate) and ADP (Adenosine DiPhosphate) — here all 3 or 2 phosphates form a chain.  Each of the 6 hydroxyls of inositol can be singly phosphorylated, leading to inositol bis, tris, tetrakis, pentakis, hexakis phosphates.  Phosphate chains can form on them as well, so IP7 and IP8 are known (heptakis?, Octakis??)

Bad news on the AIDs front

Bad news for those hoping for an AIDs cure. As you know, the active virus (HIV1) has a genome made of RNA. However, thanks to an enzyme it possesses called reverse transcriptase (which has led to Nobel prizes), it copies itself into DNA and integrates into the genome of lymphocytes. There it sits presumably doing nothing, but it’s always capable of activating and producing more infectious virus.

We seem to have fought the virus to a draw, using a cocktail of drugs which attack different aspects — HAART (Highly Active Antiretroviral Therapy). Success is usually considered being unable to detect viral RNA in the blood (see later). However blood cells are short-lived. What about the longer living lymphocytes found in the lymph nodes and spleen.

That’s what was studied in a current paper [ Nature vol. 530 pp. 5` – 45 ’16 ] but in only 3 people. All had no detectable virus in the blood (under 48 copies/milliLiter — an incredibly tiny amount — see later). What they did was to biopsy lymph nodes in the groin on study entry and at 3 and 6 months.

Then they sequenced the genomes of the lymphocytes from the nodes, to study the HIV1 DNA integrated into the genome. They found that the genome changed with time. This is very bad. Why?

Because it implies that, even though you the virus in the blood, the virus was not staying latent in the lymph nodes, but coming out of the lymphocytes and forming infectious virus which then mutated. Subsequently the mutated virus integrated into the genome of another lymphocyte. So even with what we consider excellent control, the virus is not purely latent. Drug resistance could arise from mutations (although they didn’t see it in this study).

Clearly, more people need to be studied this way (but serial biopsies? It will probably be done in prisoners, if such things are still done).

It’s worthwhile thinking about how incredibly selective and accurate our methods of analysis are. 48 copies of the viral RNA per milliLiter of blood is the lower limit of detection. Remember that water has a molecular weight of 18, so a liter of distilled water is 1000 grams / 18 grams = 55.5 Molar. A mole has 6 x 10^23 molecules. A milliLiter is 10^-3 liters. So 1 milliLiter of distilled water has 55 * 6 * 10^23 * 10^-3 == 3 * 10^22 molecules of water in it so the assay is finding 48 or more molecules of HIV1 RNA in the water haystack. Even figuring that the concentration of water in blood is 1/10 that of distilled water, this is still impressive.

Why Drug Discovery Is So Hard – Reason #22b — Drugs aren’t always doing the things we think they are

One of the things the AIDS virus does to make ‘curing’ AIDS so difficult is hiding. It integrates a DNA copy of its RNA genome into the genome of immune cells (and God knows what else) where it just sits quietly. Activation of the immune cell to fight infection often leads to emergence and production of more virus. One promising mode of therapy is preventing the DNA copy from entering our genome in the first place. The AIDS virus (aka HIV1) produces a protein called Integrase which does that. This has led to the development of integrase inhibitors.

[ Proc. Natl. Acad. Sci. vol. 110 pp. 8327 – 8328, 8690 – 8695 ’13 ] THe HIV1 integrase is targeted to sites in chromatin by the host protein LEDGF (Lens Epithelium Derived Growth Factor, aka p75). This work shows that the integrase inhibitors blocking the interaction of LEDGF/p75 (a translational coactivator) with the integrase cause something else — they cause AIDS viruses under construction within the cell. to assemble into a noninfectious structure. This happens long after integration and expression of viral RNA and protein. It is they thought that the integrase inhibitors inappropriately stabilize integrase dimers in the viral assembly process.

Who knew? They weren’t designed to do that.

For two more examples along these lines please see



We wouldn’t exist if retroviruses weren’t moving around in our genome.

Time for some of the excellent molecular biology I’ve put off writing about while I plow through the new Clayden.  I reached the halfway point today (p. 590) Exactly 2 months and 2 weeks after it arrived.  The chemist might need  some brushing up on DNA and messenger RNA before pushing on.  Pretty much all the background needed is found in https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ an d https://luysii.wordpress.com/2010/07/11/molecular-biology-survival-guide-for-chemists-ii-what-dna-is-transcribed-into/.

Everyone has heard of the AIDs virus.  It has so far been impossible to cure because it hides in our DNA doing next to nothing.  Tickle it in a variety of unknown ways, and it’s DNA is transcribed into messenger RNA (mRNA), the virus is assembled and goes on to wreak havoc with our immune system.  How does the AIDs virus get into our DNA in the first place?  Its genome is made of RNA, not DNA.  It has an enzyme (reverse transcriptase) which transcribes its RNA into DNA, and another enzyme (the integrate, which is actually a complex of proteins) which patches the DNA copy (called cDNA) into our genome.  That’s why we can’t get rid of it.  That’s also why it’s called a retrovirus — because of retrograde transcription of its RNA into cDNA).

Well, sorry to say, but at least 10% of our DNA is made of retrovirus remnants.  The vast majority of them have been crippled by mutation so their reverse transcriptases  don’t work any more, or there is something wrong with their integrase, etc. etc.  Some of them do make RNA copies of themselves however, but the copies are mutated enough that infectious virus doesn’t form.  But the RNA copies can be reverse transcribed  into cDNA and reinserted back into our DNA, and in a new site to boot.  This is why they are called retrotransposons.

The whole bunch of retroviruses, retrotransposons, and other repetitive elements of DNA have been called ‘junk’ by eminent authority.  Another epithet for them is the selfish gene — which exists only to reproduce itself.  Humans are said to be machines for reproducing human DNA.

Enter  [ Cell vol. 150 pp. 7 – 9, 29 – 38 ’12 ].  Now it’s time for some very human biology The fetus represents an immunologically different graft to the mother.  Half its antigens are tolerated because they are maternal, the paternal half are not likely to be.  Allogeneic means a transplant from a different member of the same species, so the fetus is regarded as semiallogeneic. 

So why doesn’t our immune system attack the placenta surrounding the fetus, which expresses the paternal proteins?  There’s probably a lot more to it but a class of immune cell called a regulatory T cell (Treg) shuts down the immune response wherever they are found, and the placenta has lots of them.

Different cells express different proteins, and Tregs are no exception. A transcription factor is something that binds to the DNA in front of a gene, turning on transcription of the gene,  ultimately increasing production of the protein the gene codes for. Specificity is obtained by the transcription factor binding to particular sequences of DNA, which are found in only in front of a subset of  genes

The transcription factor which turns on genes necessary to turn an immune cell into a Treg is called Foxp3.  Foxp3 is a protein and to have lots of it around the gene for it must be turned on so its mRNA can be made.  Guess what?  This means that other transcription factors must bind in front the Foxp3 gene.
Here’s Jonathan Swift on the subject
So nat’ralists observe, a flea
Hath smaller fleas that on him prey,
And these have smaller fleas that bite ’em,
And so proceed ad infinitum.”

An important protein like Foxp3 is highly controlled.  There are 3 distinct regions in front of the gene were other transcription factors and repressors of transcription bind.  They are called conserved nonCoding sequences (CNSs), an oxymoron, because they are clearly coding for something quite important. The 3 sequences are called CNS1, CNS2 and CNS3.    Technology has progressed to the point where we can remove just about any DNA sequence from the mouse genome we wish (the resultant mice are called knockout mice).  

Anyway if you knockout CNS1 the mice resorb semiallogenic fetuses (where the father and the mother aren’t genetically related), but not allogenic fetuses (where the genomes of the father and the mother are pretty much the same due to inbreeding).  It’s possible to trace Foxp3 far back in evolution.  Only animals with placentas (eutherians) have CNS1 in addition to CNS2 and CNS3. Marsupials, which don’t have placentas, just have CNS2 and CNS3. 

So where do retrotransposons come in?  The structure of CNS1 shows that it is a retrotransposon which moved in front of the Foxp3 gene.  It mutated enough for a new and different set of transcription factors to bind to it and turn on Foxp3 expression in the placenta allowing survival of the fetus.  Some Junk DNA indeed !