Tag Archives: glycine

Why the news about SARS-CoV-2 mutations is actually good

How can the latest news about mutations in the pandemic virus be good?   Simply this — there are so many known ones, that it’s almost certain that nearly every possible mutations has been formed out there, and since not much has changed about the lethality of the virus, none of them are that bad.  Not only that, but the ones that haven’t occurred must be lethal to the virus and will give us new ideas about how to attack it.

Here is a link to an article (vol. 585 pp. 174 – 177 ’20) in the current 10 September Nature — https://media.nature.com/original/magazine-assets/d41586-020-02544-6/d41586-020-02544-6.pdf.  Hopefully not behind a paywall. It’s definitely worth a read

You’ve probably heard about the D614G mutation in the spike protein of the virus.  It came out of nowhere and has taken over worldwide, even in areas where different forms of the viral genome were previously established.  D is the one letter abbreviation for aspartic acid, one of the twenty amino acids, and G stands for another one glycine.  This immediately makes bells ring for the chemist, because glycine is the smallest amino acid, having a single hydrogen atom for its side chain, while the side chain of aspartic acid contains 2 carbons 4 hydrogens one oxygen and one nitrogen atom.  So there’s a lot more room for the protein where aspartic acid used to be.

Whether or not the mutation made the virus more infectious still isn’t known.  It appears to be more infectious in studies using pseudoviruses.  Not everyone has a high level containment facility, so people work with the AIDS virus (HIV1) which doesn’t need one and simply change one of its proteins to the spike protein of the pandemic virus (yes we have the technology to do that).  Then they infect cells with the pseudovirus.  Translating this to whole organisms (us) with the real virus requires a leap of faith.  It’s a long leap, but pseudoviruses are the best thing we have at present.

Here are three quotes from the article ”

“More than 90,000 isolates have been sequenced and made public (see http://www.gisaid.org). ”

“Two SARS-CoV-2 viruses collected from anywhere in the world differ by an average of just 10 RNA letters out of 29,903,”

“Researchers have catalogued more than 12,000 mutations in SARS-CoV-2 genomes. ”

How many mutations are possible in the  viral genome?  Just 29,903 times 3, because at each position, the element normally there can change to only 3 others — the viral genome is made of RNA is a linear chain of only 29,003 nucleotides, and each nucleotide can be uracil (U), adenine (A) guanosine  (G) or cytosine (C).  That’s it.  Proteins can have 20 different amino acids at each position.

So 13% of all possible mutations have been found in the virus, out of only 90,000 completely sequenced genomes. There are now 28,000,000 cases out there, so it’s almost certain with 1,000 times more virus out there to sequence, that nearly all the other 44,000 or so possible mutations have already occurred somewhere in the world.

How can this be good news?  Because if any of them were truly horrible, we’d know about it.  It would have taken over just the way the D614D mutation did.

But there’s even more to be gleaned from this work.  Hopefully http://www.gisaid.org is continuing to accumulate more and more sequences from all over the world.  Suppose certain mutations don’t show up.   This means they are fatal to an infectious virus.  Since we know exactly what proteins the virus is making and what stretch of the genome makes each one, this should suggest  clear lines of attack into the virus.

A very UNtheoretical approach to cancer diagnosis

We have tons of different antibodies in our blood. Without even taking mutation into account we have 65 heavy chain genes, 27 diversity segments, and 6 joining regions for them (making 10,530) possibilities — then there are 40 genes for the kappa light chains and 30 for the lambda light chains or over 1,200 * 10,530. That’s without the mutations we know that do occur to increase antibody affinity. So the number of antibodies probably ramming around in our blood is over a billion (I doubt that anyone has counted then, just has no one has ever counted the neurons in our brain). Antibodies can bind to anything — sugars, fats, but we think of them as mostly binding to protein fragments.

We also know that cancer is characterized by mutations, particularly in the genes coding for proteins. Many of the these mutations have never been seen by the immune system, so they act as neoantigens. So what [ Proc. Natl. Acad. Sci. vo. 111 pp. E3072 – E3080 ’14 ] did was make a chip containing 10,000 peptides, and saw which of them were bound by antibodies in the blood.

The peptides were 20 amino acids long, with 17 randomly chosen amino acids, and a common 3 amino acid linker to the chip. While 10,000 seems like a lot of peptides, it is a tiny fraction (actually 10^-18
of the 2^17 * 10^17 = 1.3 * 10^22 possible 17 amino acid peptides).

The blood was first diluted 500x so blood proteins other than antibodies don’t bind significantly to the arrays. The assay is disease agnostic. The pattern of binding of a given person’s blood to the chip is called an immunosignature.

What did they measure? 20 samples from each of five cancer cohorts collected from multiple geographic sites and 20 noncancer samples. A reference immunosignature was generated. Then 120 blinded samples from the same diseases gave 95$% classification accuracy. To investigate the breadth of the approach and test sensitivity, the immunosignatures 75% of over 1,500 historical samples (some over 10 years old) comprising 14 different diseases were used as training, then the other 25% were read blind with an accuracy of over 98% — not too impressive, they need to get another 1,500 samples. Once you’ve trained on 75% of the sample space, you’d pretty much expect the other 25% to look the same.

The immunosignature of a given individual consists of an overlay of the patterns from the binding signals of many of the most prominent circulating antibodies. Some are present in everyone, some are unique.

A 2002 reference (Molecular Biology of the Cell 4th Edition) states that there are 10^9 antibodies circulating in the blood. How can you pick up a signature on 10K peptides from this. Presumably neoAntigens from cancer cells elicit higher afifnity antibodies then self-antigens. High affiity monoclonals can be diluted hundreds of times without diminishing the signal.

The next version of the immunosignature peptide microArray under development contains over 300,000 peptides.

The implication is that each cancer and each disease produces either different antigens and or different B cell responses to common antigens.

Since the peptides are random, you can’t align the peptides in the signature to the natural proteomic space to find out what the antibody is reacing to.

It’s a completely atheoretical approach to diagnosis, but intriguing. I’m amazed that such a small sample of protein space can produce a significant binding pattern diagnostic of anything.

It’s worth considering just what a random peptide of 17 amino acids actually is. How would you make one up? Would you choose randomly giving all 20 amino acids equal weight, or would you weight the probability of a choice by the percentage of that amino acid in the proteome of the tissue you are interested in. Do we have such numbers? My guess is that proline, glycine and alanine would the most common amino acids — there is so much collagen around, and these 3 make up a high percentage of the amino acids in the various collagens we have (over 15 at least).