Tag Archives: Frameshift deletion

The pandemic virus as evolution professor

Like it or not, the pandemic virus (SARS-CoV-2) is giving us all lessons in evolution and natural selection. The latest is one of the clearest examples of natural selection you are likely to see.  It is very clear cut, but to leave almost no one behind, I’m going to put in a lot of background material which will bore the cognoscenti — they can skip all this and go to the meat of the issue after the ****

The genetic code is read in groups of 3.  Imagine a language in which all words must be 3 letters long. 

The dog ate the fat cat who bit the toe off one mad rat.   Call this the reading frame, in which the words all make sense to you

Any combination of 3 letters means something to the machinery inside the cell responsible for reading the code, so deleting the f in fat 

gives us 

The dog ate the atc atw hob itt het oeo ffo nem adr at.   So this is a shift of 1 from the reading frame.  While it may not make sense to you, it makes sense to the cellular machinery. 

Now let’s delete 2 letters (in a row)

The dog ate the fat cat who bit the tof fon ema dra t.  

Not much sense after the deletion is there?  Or at least a completely different message.  This is a shift of 2 from the reading frame.

Now 3 letters (in a row)

The dog ate the fat cat who bit the toe off one mad rat.  

This gives 

The dog ate the fat cat who bit the tff one mad rat.  

Which has a funny looking word (tff), but leaves the rest of the 3 letter words intact (one mad rat).  This is called an in frame deletion. It basically lops out a single 3 letter word.  

Lopping out 4, 5, 6, .. letters will just give you one of the 3 patterns (frame shift of 1, frame shift of 2 or no frameshift at all) shown above (but nothing new)

*****

Now the business end of the pandemic virus is the spike protein, and these are where the mutations everyone is worried about occur.  The spike protein binds to another protein (ACE2) on the surface of human cells and then the virus enters causing havoc.  All the vaccines we have are against the spike protein. 

The spike protein is big (1,273 different 3 letter words).  

Mutations occur randomly.  We now have something called GISAID (Global Initiative on Sharing All Influenza Data) which has well over 100,000 genome sequences of the virus.  

Other things being equal we should see as many 1,  4 (3+1), 7 (2*[3] + 1), 10 letter deletions as 2, 5 (3 + 2), 8 ( 2*[3] + 2) , as 3, 6, 9, 12, letter   deletions.

The set  1, 4, 7, 10, . . represents a shift of 1 from the original reading frame, the set 2, 5, 8, 11 … represents a frame shift of two and 3, 6, 9 .. represents a set of deletions producing no frameshift at all.

Since thousands on thousands of experiments show that mutations occur randomly, 1/3 of all deletion mutations should show a frameshift of 1, 1/3 of all deletion mutations should have a frame shift of 2, and 1/3 of all deletion mutations should produce no frameshift at all. 

Well the authors of Science vol. 371 pp. 1139 – 1142 ’21  looked at 146,795 viral sequences and found 1,108 deletions in the gene for the spike protein.

They did not find each of the 3 types of deletions occuring to the same extent (1/3 of the time).  Among all deletions, 93% were in frame.  

Why? Because out of frame deletions change everything that comes after them. 

Recall

The dog ate the atc atw hob itt het oeo ffo nem adr at.  

This means that a functional spike protein won’t be formed, and the virus won’t infect our  cells, and it certainly won’t be found in GISAID.  

Ladies and Gentlemen you have just witnessed natural selection in action. 

Actually it’s even more complicated and even more impressive than that.  The in frame deletions occurred in one of four areas, which happen to be where antibodies to the spike protein bind.  So the out of frame deletions were selected against, and the in frame deletions were selected for. 

The blind watchmaker in action.

Another way to see how improbable it is that random choice should choose one of 3 equally probable possibilities 97% of the time, imagine that you are throwing dice.  You throw a single dye 100 times, and 97 times you get either of two numbers (say 3 and 6) .  You know the dye is loaded.  The load being natural selection in the case of genome deletions.