Tag Archives: Autism spectrum disorder

How Badly are Thy Genomes, Oh Humanity — take II

With apologies to Numbers 24:5, “How goodly are thy tents, Oh Jacob” —  a recent paper shows how shockingly error ridden our genomes actually are [ Science vol. 360 pp. 327 – 331  ’18 ].  I’d written about this in 2012 (see the end), but technology has marched on.  Back then only the parts of the genome coding for protein (the exome) were sequenced.  The present work did whole genome sequencing (WGS) to a mean coverage of 40+ (e.g. they sequenced the other 98 percent of the genome).

The authors were studying families in which one or more children had autism spectrum disorder to find genome abnormalities which might have caused the ASD. They were looking for structural variants (SVs) by which they mean ” biallelic deletion, tandem duplications, inversions, four classes of complex SV, and four families of mobile element insertions”

Why?  Because studying proteins alone doesn’t tell you how they are controlled.  That’s in the DNA surrounding them.  Structural variants are more likely to affect control elements than the proteins themselves.

Showing how technology has marched on they determined the whole genomes of 9274 subjects from 2600 families affected by ASD.

The absolutely mindboggling point in the article is the following direct quote “An average of 3746 SVs were detected per individual”.  That’s simply incredible (assuming the above isn’t a misprint).

Here’s the older post

How Badly Are Thy Genomes, Oh Humanity

With apologies to Numbers 24:5, “How goodly are thy tents, Oh Jacob” —  a recent paper shows how shockingly error ridden our genomes actually are [ Science vol. 337 pp. 64 – 69 ’12 ].  The authors sequenced roughly three quarters of the genes coding for proteins in some 2,439 people — e.g. 15,585 protein coding genes.  This left 98% of the genome untouched, primarily because we really don’t know what it does or how it does it, despite the fact that it controls, when, where and how much of each protein is made.  So they basically looked at the bricks from which we are built (the proteins) and not the plans (the 98%).

The news is not very good.  The subjects came from two groups: 1,351 Europeans and 1,088 Africans (the latter, because genetic diversity is far higher among Africans as that’s where humanity arose, and where mutations have had the longest time to accumulate).

The news is not very good. First, some background.

Recall that each nucleotide is one of four possibilities (A, T, G, C), and that each 3 nucleotides therefore has 4^3 = 64 possibilities.  61/64 combinations code for amino acids which, since we have only 20 gives a certain redundancy of the famed genetic code.   The other 3 combinations code for no amino acid (usually) and tell the machinery making proteins to stop.  Although crucial to our existence, these are called nonsense codons.

The genetic code is therefore 3fold degenerate (on average).  However, some amino acids are coded for by just 1 combination of 3 nucleotides while others are coded by as many as 6.  So some single nucleotide variants (SNVs) leave the amino acid coded for the same (these are the synonymous SNVs), while others change the amino acid (nonSynonymous SNVs), and possibly protein function.

Ask some one with sickle cell anemia how much trouble just one nonSynonymous SNV can cause — it’s only 1 amino acid out of 147.  Even worse, ask someone with cystic fibrosis where just one of 1,480 amino acids is missing.

Here’s the bad news.  In the population as a whole, they found 500,000 single nucleotide variants (SNVs).  If you’re still not sure what is meant by this, the 5 articles in https://luysii.wordpress.com/category/molecular-biology-survival-guide/ should be all the background you need.

More than 400,000 of the variants were previously unknown.  Also more than 400,000 of them were found either in Africans or Europeans but not both.  If you divide 500,000 by 2,439 you get 205 variants per person.  However, SNVs are far more common than that, and each individual contains an average of 14,000.

Well, how many of the 500,000 or so SNVs they found are nonSynonymous? One would think about 1/3 statistically.  However, They found more than half 292,125/500,000 — nearly 60% — were nonSynonymous.

It get’s worse: 6,165 of the nonSynonymous variants are nonSense codons.  This means that the protein coded for by such a gene, terminates prematurely, meaning that it can terminate anywhere.  On average one would expect that half of these nonsense codons result in a protein of less than half the normal length.   This would very likely obliterate whatever function the protein had.

Obviously, they couldn’t test all 500,000 SNVs to see how they affected protein function (and we really only have a decent idea of what half our 20,000 or so proteins are doing).  They had to guess.  They came up with a figure of 2 – 4% of the 14,000 SNVs being functionally significant — That’s 280 – 560 significant mutations per individual.

Clearly, despite the horrible examples of cystic fibrosis and sickle cell anemia above, most of these can’t be doing very much, because these were normal people being studied.

There are all sorts of implications of this work.  One is the subject of a future post — how hard this diversity makes drug discovery.  Another reiterates the Tolstoy theme mentioned earlier about the genetic defects causing schizophrenia and autism — ““Happy families are all alike; every unhappy family is unhappy in its own way”.  Thus beginneth Anna Karenina.

For details please see https://luysii.wordpress.com/2010/04/25/tolstoy-was-right-about-hereditary-diseases-imagine-that/  and  https://luysii.wordpress.com/2010/07/29/tolstoy-rides-again-autism-spectrum-disorder/

A third is that this shows that the 1000 fold expansion of the human population has pretty much obviated much natural selection eliminating these variants.  I’ll leave it to the geneticists to figure out what this means for the eventual survival of the species, as these mutants continue to accumulate.

The paper is fascinating, and sure to change our conception of what a ‘normal’ genome actually is.  Nonetheless, all they did was follow Yogi Berra’s dictum — “You can observe a lot by watching.”   It certainly wasn’t creative or ingenious in any sense.  Sometimes grunt work like this wins the day.

Microexons, great new drugable targets

Some very serious new players in cellular and tissue molecular biology have just been found. They are very juicy drugable targets, not that targeting them will be easy. If you don’t know what introns, exons and alternate splicing are, it’s time to learn. Go to https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ read and follow the links forward. It should be all you need to comprehend the following.

The work came out at the tail end of 2014 [ Cell vol. 159 pp. 1488 – 1489, 1511 -1523 ’14 ]. Microexons are defined as exons containing 50 nucleotides or less (the paper says 3 – 27 nucleotides). They have been overlooked, partially because their short length makes them computationally difficult to find. Also few bothered to look for them as they were thought to be unfavorable for splicing because they were too short to contain exonic splicing enhancers. They are so short that it was thought that the splicing machinery (which is huge) couldn’t physically assemble at both the 3′ and 5′ splice sites. So much for theory, they’re out there.

What is a cell and tissue differentially regulated alternative splicing event? It’s the way a given mRNA can be spliced together one way in tissue/cell #1 and another in tissue/cell #2 producing different proteins in each. Exons subject to tissue specific alternative splicing are significantly UNDERrepresented in well folded domains in proteins. Instead they are found in regions of protein disorder more frequently than one would expect by chance. Typically these regions are on the protein surface. The paper found that the microexons code for short amino acid motifs which typically interact with other proteins and ligands. 3 – 27 nucleotides lets you only code for 1 – 9 amino acids.

One well known example of a short interaction motif is RGD (for Arginine Glycine Aspartic acid in the single letter amino acid code). The sequence is found in a family of surface proteins (the integrins) with at least 26 known members. These 3 amino acids are all that is needed for the interns to bind to a variety of extracellular molecules — collagen, fibrin, glycosaminoglycans, proteoglycans. So a 3 amino acid sequence on the surface of a protein can do quite a bit.

Among a set of analyzed neural specific exons (e. g. they were only spliced that way in the brain) found in known disordered regions of the parent protein, 1/3 promoted or disrupted interactions with partner proteins. So regulated exon splicing might specify tissue and cell type specific protein interaction networks (Translation: they might explain why tissues look different even when they express the same genes). The authors regard microExon inclusion/exclusion as protein surface microsurgery.

The paper has found HUNDREDS of evolutionarily highly conserved microexons from RNA-Seq data sets (http://en.wikipedia.org/wiki/RNA-Seq) in various species. Many of them impact neurogenesis and brain function. Regulation of microExons changes significantly during neuronal differentiation. Although microexons represent only 1% of the alternate splice sites seen, they constitute ‘up to’ 1/3 of all evolutionarily conserved neural-regulated alternative splicing between man and mouse.

The inclusion in the final transcript of most identified neural microExons is regulated by a brain specific factor nSR100 (neural specific SR related protein of 100 kiloDaltons)/SRRM4 which binds to intronic enhancer UGC motifs close to the 3′ splice sites, resulting in their inclusion. They are ‘enhanced’ by tissue specific RBFox proteins. nSR100 is reduced in Autism Spectrum DIsorder (really? all? some?). nSR100 is strongly coexpressed in the developing human brain in a gene network module M2 which is enriched for rare de novo ASD assciated mutations.

MicroExons are enriched for lengths which are multiples of 3 nucleotides (implying strong selection pressure to preserve reading frames). The microExons are also enriched in charged amino acids. Most microExons show high inclusion at late stages of neuronal differentiation in genes associated with axon and synapse function. A neural specific microExon in Protrudin/Zfyve27 increases its interaction with Vesicle Associated membrane protein associated Protein VAP) and to promote neurite outgrowth. A 6 nucleotide neural microExon in Apbb1/Fe65 promotes an interaction with Kat5/Tip60. Apbb1 is an adaptor protein functioning in neurite outgrowth.

So inclusion/exclusion of microExons can alter the interactions of proteins involved in neurogenesis. Misregulation of neural specific microexons has been found in autism spectrum disorder (what hasn’t? Pardon the cynicism).

Protein interaction domains haven’t been studied to nearly the extent they need to be, and we know far less about them than we should. All the large molecular machines of the cell (ribosome, mediator, spliceosome, mitochondrial respiratory chain) involve large numbers of proteins interacting with each other not by the covalent bonds beloved by organic chemists, but by much weaker forces (van der Waals,charge attraction, hydrophobic entropic forces etc. etc.).

Designing drugs to interfere (or promote) such interactions will be tricky, yet they should have profound effects on cellular and organismal physiology. Off target effects are almost certain to occur (particularly since we know so little about the partners of a given motif). Showing how potentially useful such a drug can be, a small molecule inhibitor of the interaction of the AIDs virus capsid protein with two cellular proteins (CPSF6, TNPO3) it must interact with to get into the nucleus has been developed. (Unfortunately I’ve lost the reference)

My cousin married a high school dropout a few years ago. Not to worry — he dropped out of high school to go to college, and has a PhD in Electrical Engineering from Berkeley and has worked at Bell labs. He was very interested in combining his math and modeling skills with my knowledge of neurology to make some models of CNS function. I demurred, as I thought we knew too little about the brain to come up with models (which I generally distrust anyway). The basic problem was that I felt we didn’t know all the players in the brain and how they fit together.

MicroExons show this in spades.

Here’s just how poor the research on Autism Spectrum Disorder has been

A stroke 50 years ago would be a stroke today. Cancer is still cancer. Not so with Autism, Autism Spectrum Disorder (ASD), Asperger’s syndrome. The Diagnostic and Statistical Manual of the American Psychiatric Association (DSM to you) has gone through 6 editions (DSM-I through DSM-V, including DSM-IV-TR). The terms are defined differently in all (or not even defined). For the gory details see:

Click to access WalterKaufmannAC2012Symposium.pdf

Here’s a summary.

DSM-I (1952) & DSM-II (1968)
No term Autism or Pervasive Developmental Disorder Closest term: Schizophrenic Reaction (Childhood Type)
1980 DSM-III
Pervasive Developmental Disorders (PDD):
Childhood Onset PDD, Infantile Autism, Atypical Autism
1987 DSM-III-R
Pervasive Developmental Disorders (PDD):
PDD-NOS, Autistic Disorder
1994 DSM-IV
Pervasive Developmental Disorders (PDD):
PDD-NOS, Autistic Disorder, Asperger Disorder, Childhood Disintegrative Disorder, Rett syndrome
2000 DSM-IV-TR
Same diagnoses, text correction for PDD-NOS

Needless to say, DSM-V (out since May 2013) has it differently.

The following paper [ Proc. Natl. Acad. Sci. vol. 111 pp. 1981 – 1986 ’14 ] should have neuroscientific researchers on Autism Spectrum Disorder (ASD) of the DSM-IV hanging their heads in shame.

The criteria for ASD are obviously a wastebasket. Here they are from the DSM-IV-TR

Autistic disorder (classic autism)
Asperger’s disorder (Asperger syndrome)
Pervasive developmental disorder not otherwise specified (PDD-NOS)
Rett’s disorder (Rett syndrome)
Childhood disintegrative disorder (CDD).

Just about any kid not doing well cognitively fits in to #3. Hopefully the DSM-V makes things better — it’s too early to tell. The link has the details and a lot of the reasoning behind yet another change.

All the work cited in the PNAS paper concerns ASD as diagnosed by DSM-IV-TR. DSM-V is too new to have papers out using its criteria.

If you take 100 kids developing normally, do various types of magnetic resonance imaging (MRI) on their brains, and compare then with 100 kids not doing well, you are certain to find more structural abnormalities of the brain in the second group, regardless of how they were diagnosed, or what they were diagnosed with.

Dozens of papers were written on MRIs in ASD kids. 40% had fewer than 15 subjects. The most replicated finding (up to the PNAS paper) was poor connections between various parts of the brain, manifest as abnormalities of the white matter. Exactly what they were measuring, and how the measurement requires a tensor and not a vector is really quite interesting and can teach you some math. I’ll save this for the end.

Only 2 of the dozens of papers controlled for data quality. MRIs have been around long enough, that most know that to get a decent study the subject has to be quite still, something very difficult for kids, and even harder for autistic kids.

When head motion was controlled for in this large (52 ASDs 73 normals to start) all the abnormalities disappeared (save one, and they looked at 18 different white matter tracts in the brain). They had to throw out the studies on 12/52 of the autistics and 2/73 normals because of motion– showing how suspect the previous data really was. The head motion was producing the abnormalities.

Is this terrible research or what (PNAS paper excepted)?

Perhaps the new criteria in DSM-V will result in a more homogenous group.

What does diffusion tensor imaging actually measure? Imagine the nerve fibers (axons) of the brain in the deep white matter as a bundle of wires, most of them going in the same direction in any small volume. Assume they are bathed in water. Now add some colored water at one end, and see how quickly the color diffuses in various directions in the bundle. The color will diffuse faster along the bundle, than in the two directions perpendicular to it. This is what the diffusion tensor measures — diffusion of tissue water in a variety of directions. If the bundle is loose, or disorganized, or some wires are missing, than the diffusion in the various directions won’t differ as much — this is called lack of anisotropy. Take out the wires and the diffusion is the same in all directions (no anisotropy at all). This was the finding in ASD — less white matter anisotropy in diffusion tensor imaging — implying that there is something wrong with it.

Why wouldn’t an overall vector summing up the diffusion in the major direction be enough. One can add vectors together after all. Because you’d lose all the information about anisotropy. The tensor preserves it. It’s why tensors are used to measure the stress in a given material. Slick. Now you understand (something) about tensors. However it should be noted that vectors are tensors too. There’s a lot more too it (particularly indices).