Tag Archives: Huntington’s chorea

Triplets and TADs

Neurologists have long been interested in triplet diseases — https://en.wikipedia.org/wiki/Trinucleotide_repeat_disorder.  The triplet is made of a string of 3 nucleotides.  Example —  cytosine adenosine guanosine or CAG — which accounts for a lot of them.  We have lots of places in our genome where such repeats normally occur, with the triplets repeated up to 42 times.  However in diseases like Huntington’s chorea the repeats get to be as many as 250 CAGs in a row.  You normally are quite fine as long as you have under 36 of them, and no one has fewer than 6 at this particular location.

Subsequently, expansions of 4, 5, and 6 nucleotide repeats have also been shown to cause disease, bring the total of repeat expansion diseases to over 40.  Why more than half of them should affect the nervous system entirely or for the most part is a mystery.  Needless to say there are plenty of theories.

This leads to three questions (1) there are repeats all over the genome, why do only 40 or so of them expand (2) since we all have repeats in front of the genes where they cause disease why don’t we all have the diseases (3) why do the number of repeats expand with each succeeding generation — the phenomenon is called anticipation.  I saw one such example where a father brought his son to my muscular dystrophy clinic.  The boy had moderately severe myotonic dystrophy.  When I shook the father’s hand, it was clear that he had mild myotonia, which had in no way impaired his life (he was a successful banker).

A recent paper in Cell may help answer the first question and has a hint about the second [ Cell vol. 175 pp. 38 – 40, 224 – 238 ’18 ].  21 of 27 disease associated short tandem repeats (daSTRs) localize to something called a topologically associated domain (TAD) or subdomain (subTAD) boundary. These are defined as contiguous intervals in the genome in which every pair has an elevated interaction frequency compared to loci out side the domain.  TADs and subTADs are measured using chromosome conformation capture assays (acronyms for them include 3C, CCC, 4C, 5C, Hi-C).

Briefly they are performed as follows.  Intact nuclei are isolate from live cel cultures.  These are subjected to paraformaldehye crosslinking to fix segment of genome in close physical proximity. The crosslinked genomic DNA is digested with a restriction endonuclease, and the products expanded by PCR using primers in all possible combinations.  Then having a complete genome sequence in hand, you see what regions of the genome got close enough together to show up in the assay.

This may help explain question one, and the paper gives some speculation about question two — we don’t all have these diseases, because unlike the unfortunates with them, we don’t have problems in our genes for DNA replication, repair and recombination.  There is some evidence for this;  studies in model organisms with these mutations do have short tandem repeat instability.

Unfortunately the paper doesn’t discuss anticipation, because no clinicians appear to be among the authors, even though they’re from Penn which 50+ years ago was very strong in clinical neurology.

None of this work discusses the fascinating questions of how the expanded repeats cause disease, or why so many of them affect the nervous system.

The Kavanaugh Ford confrontation will be to this decade what the Patty Hearst kidnapping was to a previous one  — https://en.wikipedia.org/wiki/Patty_Hearst.  Since I suffered 4 episodes of physical (not sexual) abuse as a kid, and dealt with this extensively as a neurologist, I’m trying to decide whether to write about it.  Emotions are high and there are a lot of nuts out there on the net. There is even a reasonable possibility that both Ford and Kavanaugh are right and not lying.


18 at one blow said the molecular biologist

With apologies to the brothers Grimm, molecular biologists may have found a way to treat 18 genetic diseases at one blow [ Cell vol. 170 pp. 899 – 912 ’17 ]. They use adeno-associated virus (AAV) packing a modified enzyme and an RNA to remove repeat expansions from RNA.   The paper give a list of the 18, all but one of which are neurologic.  They include such horrors as Huntington’s chorea, the most common form of familial ALS, 3 forms of spinocerebellar ataxia and 6 forms of spinocerebellar atrophy.

They use Cas9 from Streptococcus Pyogenes, part of the CRISPR system (https://en.wikipedia.org/wiki/CRISPR)  bacteria use to defend themselves against viruses, with a single guide RNA.  Even more interestingly, Cas9 is an enzyme which breaks up RNA, but the Cas9 they used is catalytically dead.  They think that just binding to the aggregated RNA containing the repeats is enough to break up the aggregate.  This is the way antiSense oligoNucleotides are thought to work.

The problem with getting a bacterial enzyme into a human cell is avoided here by using a virus to infect them (AAV).  It did get rid of RNA aggregates in patients’ cells from 4 of the diseases (two myotonic dystrophies, and the familial ALS).

It is almost too fantastic to be true.

Why almost all of these repeat expansion diseases affect the nervous system is anyone’s guess.  As you can image theories abound.  So all we have to do is figure out how to get the therapy into the brain (hardly a small task).

When knowledge isn’t power

Here is a genetic disease, where we’ve known exactly what’s wrong with the causative gene for 23 years, over 10,000 papers have been written (a Google search comes up with about 418,000 results (0.45 seconds), but we don’t know how the mutation causes the problems it does or have a clue how to treat the disease. So much for finding the cause of a genetic disease leading to therapy. Imagine how much harder cancer is.

I speak of Huntington’s chorea, and the causative gene huntingtin. It’s a terrible neurologic disease characterized by progressive movement disorders, dementia and incapacitation over a decade or two. Woodie Guthrie had it; fortunately Arlo escaped. Like many people with the disorder Woodie was quite fertile, having 8 children.

It being a neurologic disorder, I’ve read a lot about it, and my jottings about my readings over the past few decades have consumed 83,635 characters (aren’t computers wonderful)? I’ve had a fair amount of experience with it, as an Indian agent in Montana had it, and produced many progeny with his women, leading to a good deal of devastation in one tribe.

Neuron vol. 89 pp. 910 – 926 ’16 is an excellent recent review (but not one for the fainthearted). Several mysteries are immediately apparent.

First huntingtin is expressed in nearly every neuron, but only a few die. It is expressed outside the brain in lung ovary and testes, but they work just fine.

Second Huntingtin interacts with over 350 different proteins. Figuring which are the important ones has provided steady employment.

Third it exists in many forms, so many that there aren’t enough scientists living to test them all. This is because huntingtin is subject to a variety of chemical modifications (phosphorylation, ubiquitination, acetylation, palmitoylation, sumoylation) at FORTY-EIGHT different sites (listed in the article). So this gives 2^48 possible modified forms of the protein (either modification being present or absent). 2^48 = 281,474,976,710,656 if you’re interested.

In addition to the modifications, the protein is huge — some 3,144 amino acids occurring in 67 exons forming two mRNAs of 10,366 and 13.711 nucleotides.

Fourth The protein can also be chopped up by at least 5 different enzymes at 6 different sites, and some fragments are biologically active (toxic in tissue culture).

Naturally, the region with the mutation (near the amino terminal end) of the protein has been studied most intensively.

Huntingtin has its fingers in many physiologic pies — the reference is excellent in this area — these include vesicular trafficking, cell division, cilia formation, endocytosis, autophagy, gene transcription. Abnormalities of which one causes the neurologic disease.

The mutant form forms protein aggregates. Like Alzheimer’s disease senile plaque or the Lewy body of Parkinson’s disease, we don’t know if the aggregates are toxic or protective.

Fifth: Despite all its known functions we don’t know if the mutation produces a loss of some vital function of Huntingtin, or a new and toxic function.

Even worse, compared to cancer, Huntington’s chorea is ‘simple’ because we know the cause.


Too late to start an enormous post on huntingtin, the protein mutated in Huntington’s chorea. Coming soon. Sorry. But consider this: If Trump wins the presidency it will be a remarkable demonstration of the the lack of power of the press. When have the following agreed on an issue — New York Times, Wall Street Journal, National Review, The Nation, Weekly Standard, etc. etc. They all hate Trump editorially and by their selection of articles and phraseology. I’ve never seen anything like it.

Remarkable times. There is tremendous dissatisfaction with the way things are going. Bernie taps into it as well.

The incredible information economy of frameshifting

Her fox and dog ate our pet rat

H erf oxa ndd oga teo urp etr at

He rfo xan ddo gat eou rpe tra t

The last two lines make no sense at all, but (neglecting the spaces) they have identical letter sequences.

Here are similar sequences of nucleotides making up the genetic code as transcribed into RNA




Again, in our genome there are no spaces between the triplets. But all the triplets you see are meaningful in the sense that they each code for one of the twenty amino acids (except for TAA which says stop). ATG codes for methionine (the purists will note that all the T’s should be U). I’m too lazy to look the rest up, but the ribosome doesn’t care, and will happily translate all 3 sequences into the sequential amino acids of a protein.

Both sets of sequences have undergone (reading) frame shifts.

A previous post https://luysii.wordpress.com/2014/10/13/the-bach-fugue-of-the-genome/ marveled about how something too small even to be called a virus coded for a protein whose amino acids were read in two different frames.

Frameshifting is used by viruses to get more mileage out of their genomes. Why? There is only so much DNA you can pack into the protein coat (capsids) of a virus.

[ Proc. Natl. Acad. Sci. vol. 111 pp. 14675 – 14680 ’14 ] Usually DNA density in cell nuclei or bacteria is 5 – 10% of volume. However, in viral capsids it is 55% of volume. The pressure inside the viral capsid can reach ten atmospheres. Ejection is therefore rapid (60,000 basepairs/second).

The AIDS virus (HIV1) relies on frame shifting of its genome to produce viable virus. The genes for two important proteins (gag and pol) have 240 nucleotides (80 amino acids) in common. Frameshifting occurs to allow the 240 nucleotides to be read by the cell’s ribosomes in two different frames (not at once). Granted that there are 61 3 nucleotide combinations to code for only 20 amino acids, so some redundancy is built in, but the 80 amino acids coded by the two frames are usually quite different.

That the gag and pol proteins function at all is miraculous.

The phenomenon is turning out to be more widespread. [ Proc. Natl. Acad. Sci. vol. 111 pp. E4342 – E4349 ’14 ] KSHV (Kaposi’s Sarcoma HerpesVirus) causes (what else?) Kaposi’s sarcoma, a tumor quite rare until people with AIDS started developing it (due to their lousy immune system being unable to contend with the virus). Open reading frame 73 (ORF73) codes for a major latency associated nuclear antigen 1 (LANA1). It has 3 domains a basic amino terminal region, an acidic central repeat region (divisible into CR1, CR2 and CR3) and another basic carboxy terminal region. LANA1 is involved in maintaning KSHV episomes, regulation of viral latency, transcriptional regulation of viral and cellular genes.

LANA1 is made of multiple high and lower molecular weight isoforms — e.g. a LANA ladder band pattern seen in immunoblotting.

This work shows that LANA1 (and also Epstein Barr Nuclear antigen 1` ) undergo highly efficient +1 and -2 programmed frameshifting, to generate previously undescribed alternative reading frame proteins in their repeat regions. Programmed frameshifting to generate multiple proteins from one RNA sequence can increase coding capacity, without increasing the size of the viral capsid.

The presence of similar repeat sequences in human genes (such as huntingtin — the defective gene in Huntington’s chorea) implies that we should look for frame shifting translation in ourselves as well as in viruses. In the case of mutant huntingtin frame shifting in the abnormally expanded CAG tracts rproduces proteins containing polyAlanine or polySerineArginine tracts.

Well G, A , T and C are the 1’s and 0’s of the way genetic information is stored in our genomic computer. It really isn’t surprising that the genome can be read in alternate frames. In the old days, textual information in bytes had parity bits to make sure the 1’s and 0’s were read in the correct frame. There is nothing like that in our genome (except for the 3 stop codons).

What is truly suprising it that reading in alternate frame produces ‘meaningful’ proteins. This gets us into philosophical waters. Clearly

Erf oxa ndd oga teo urp etr at

Rfo xan ddo gat eou rpe tra t

aren’t meaningful to us. Yet gag and pol are quite meaningful (even life and death meaningful) to the AIDS virus. So meaningful in the biologic sense, means able to function in the larger context of the cell. That really is the case for linguistic meaning. You have to know a lot about the world (and speak English) for the word cat to be meaningful to you. So meaning can never be defined by the word itself. Probably the same is true for concepts as well, but I’ll leave that to the philosophers, or any who choose to comment on this.