Tag Archives: exome sequencing

What can dogs tell us about cancer, and (wait for it) sexually transmitted disease

What can 546 dogs tell us about cancer, and STDs (sexually transmitted diseases)?  An enormous amount ! [ Science vol 365 pp. 440 – 441, 464 3aau9923 1 –> 7 ’19 ].  You may have heard about the transmissible tumor that has reduced the Tasmanian Devil population from its appearance in ’96 by 80%.  The animals bite each other transmitting the tumor.  Only 10 – 100 cells are transferred, but death occurs within a year.  The cells survive because Tasmanian devels have low genetic diversity.

The work concerns a much older transmissible tumor (Canine Transmissible Venereal Tumor — aka CTVT) which appeared in Asia an estimated 6,000 year ago, and began dispersing worldwide 2,000 years ago.   Unlike the Tasmanian devil tumor, the tumor is usually cleared by the immune system.

The Science paper has 80+ authors from all over the world, who sequenced the protein coding part of the dog genome (the exome) to a > 100fold depth.   The exome contains 43.6 megabases.   The tumor is transmitted by sex, and the authors note that this mode of transmission nearly requires a rather indolent clinical course, as the animal must survive long enough to transmit the organism again.  This fits with syphilis, AIDs, gonorrhea.  Contrast this with anthrax, cholera, plague which spread differently and kill much faster.

So what does CTVT tell us about cancer?   Quite a bit.  First some background.  The Cancer Genome Atlas (CGA) was criticized as being a boondoggle, but it at least gave us an idea of how many mutations are present in various cancers– around 100 in colon and breast cancers.

Viewed across all dogs, the CTVT genome is riddled with somatic mutations (as compared to the genome of the dog carrying the tumor) –148,030 single nucleotide variants (3.4/1000 !) 12,177 insertion/deletions.  Of the 20,000 dog genes only 2,000 didn’t contain a mutation.   This implies that most genes in the mammalian genome aren’t needed by the cancer cells.  The CTVTs also show no signs of the high rates of chromosomal instability seen in human tumors.

The work provides evidence that cancer isn’t inherently progressive.  This gives hope that some relatively indolent human cancers (say cancer of the prostate) can be controlled.  This calls for ‘adaptive therapy’  — something that limits tumor  growth rather than trying to kill every cancer cell with curative therapy which, if it fails, essentially selects for more aggressive cancer cells.

Some 14,412 genes have 1 mutation changing the amino acid sequence (nonSynonymous) and 5,704 have protein truncating mutations.  The ratio of synonymous to non synonymous mutations is about 3 implying that the mutations which have arisen haven’t been selected for (after all the triplet code for 20 amino acids and 1 stop codon has 64 possibilities), so the average amino acid has 3 codons for it.  This is called neutral genetic drift.

They also found 5 mutated genes present in all 541 tumors — these are the driver mutations, 3 are well known, MYC, PTEN, and retinoblastoma1.

Tons to think about here.  I’ll be away for a few weeks traveling and playing music, but this work should keep you busy thinking about its implications.



How Badly are Thy Genomes, Oh Humanity — take II

With apologies to Numbers 24:5, “How goodly are thy tents, Oh Jacob” —  a recent paper shows how shockingly error ridden our genomes actually are [ Science vol. 360 pp. 327 – 331  ’18 ].  I’d written about this in 2012 (see the end), but technology has marched on.  Back then only the parts of the genome coding for protein (the exome) were sequenced.  The present work did whole genome sequencing (WGS) to a mean coverage of 40+ (e.g. they sequenced the other 98 percent of the genome).

The authors were studying families in which one or more children had autism spectrum disorder to find genome abnormalities which might have caused the ASD. They were looking for structural variants (SVs) by which they mean ” biallelic deletion, tandem duplications, inversions, four classes of complex SV, and four families of mobile element insertions”

Why?  Because studying proteins alone doesn’t tell you how they are controlled.  That’s in the DNA surrounding them.  Structural variants are more likely to affect control elements than the proteins themselves.

Showing how technology has marched on they determined the whole genomes of 9274 subjects from 2600 families affected by ASD.

The absolutely mindboggling point in the article is the following direct quote “An average of 3746 SVs were detected per individual”.  That’s simply incredible (assuming the above isn’t a misprint).

Here’s the older post

How Badly Are Thy Genomes, Oh Humanity

With apologies to Numbers 24:5, “How goodly are thy tents, Oh Jacob” —  a recent paper shows how shockingly error ridden our genomes actually are [ Science vol. 337 pp. 64 – 69 ’12 ].  The authors sequenced roughly three quarters of the genes coding for proteins in some 2,439 people — e.g. 15,585 protein coding genes.  This left 98% of the genome untouched, primarily because we really don’t know what it does or how it does it, despite the fact that it controls, when, where and how much of each protein is made.  So they basically looked at the bricks from which we are built (the proteins) and not the plans (the 98%).

The news is not very good.  The subjects came from two groups: 1,351 Europeans and 1,088 Africans (the latter, because genetic diversity is far higher among Africans as that’s where humanity arose, and where mutations have had the longest time to accumulate).

The news is not very good. First, some background.

Recall that each nucleotide is one of four possibilities (A, T, G, C), and that each 3 nucleotides therefore has 4^3 = 64 possibilities.  61/64 combinations code for amino acids which, since we have only 20 gives a certain redundancy of the famed genetic code.   The other 3 combinations code for no amino acid (usually) and tell the machinery making proteins to stop.  Although crucial to our existence, these are called nonsense codons.

The genetic code is therefore 3fold degenerate (on average).  However, some amino acids are coded for by just 1 combination of 3 nucleotides while others are coded by as many as 6.  So some single nucleotide variants (SNVs) leave the amino acid coded for the same (these are the synonymous SNVs), while others change the amino acid (nonSynonymous SNVs), and possibly protein function.

Ask some one with sickle cell anemia how much trouble just one nonSynonymous SNV can cause — it’s only 1 amino acid out of 147.  Even worse, ask someone with cystic fibrosis where just one of 1,480 amino acids is missing.

Here’s the bad news.  In the population as a whole, they found 500,000 single nucleotide variants (SNVs).  If you’re still not sure what is meant by this, the 5 articles in https://luysii.wordpress.com/category/molecular-biology-survival-guide/ should be all the background you need.

More than 400,000 of the variants were previously unknown.  Also more than 400,000 of them were found either in Africans or Europeans but not both.  If you divide 500,000 by 2,439 you get 205 variants per person.  However, SNVs are far more common than that, and each individual contains an average of 14,000.

Well, how many of the 500,000 or so SNVs they found are nonSynonymous? One would think about 1/3 statistically.  However, They found more than half 292,125/500,000 — nearly 60% — were nonSynonymous.

It get’s worse: 6,165 of the nonSynonymous variants are nonSense codons.  This means that the protein coded for by such a gene, terminates prematurely, meaning that it can terminate anywhere.  On average one would expect that half of these nonsense codons result in a protein of less than half the normal length.   This would very likely obliterate whatever function the protein had.

Obviously, they couldn’t test all 500,000 SNVs to see how they affected protein function (and we really only have a decent idea of what half our 20,000 or so proteins are doing).  They had to guess.  They came up with a figure of 2 – 4% of the 14,000 SNVs being functionally significant — That’s 280 – 560 significant mutations per individual.

Clearly, despite the horrible examples of cystic fibrosis and sickle cell anemia above, most of these can’t be doing very much, because these were normal people being studied.

There are all sorts of implications of this work.  One is the subject of a future post — how hard this diversity makes drug discovery.  Another reiterates the Tolstoy theme mentioned earlier about the genetic defects causing schizophrenia and autism — ““Happy families are all alike; every unhappy family is unhappy in its own way”.  Thus beginneth Anna Karenina.

For details please see https://luysii.wordpress.com/2010/04/25/tolstoy-was-right-about-hereditary-diseases-imagine-that/  and  https://luysii.wordpress.com/2010/07/29/tolstoy-rides-again-autism-spectrum-disorder/

A third is that this shows that the 1000 fold expansion of the human population has pretty much obviated much natural selection eliminating these variants.  I’ll leave it to the geneticists to figure out what this means for the eventual survival of the species, as these mutants continue to accumulate.

The paper is fascinating, and sure to change our conception of what a ‘normal’ genome actually is.  Nonetheless, all they did was follow Yogi Berra’s dictum — “You can observe a lot by watching.”   It certainly wasn’t creative or ingenious in any sense.  Sometimes grunt work like this wins the day.