Tag Archives: huntingtin

When knowledge isn’t power

Here is a genetic disease, where we’ve known exactly what’s wrong with the causative gene for 23 years, over 10,000 papers have been written (a Google search comes up with about 418,000 results (0.45 seconds), but we don’t know how the mutation causes the problems it does or have a clue how to treat the disease. So much for finding the cause of a genetic disease leading to therapy. Imagine how much harder cancer is.

I speak of Huntington’s chorea, and the causative gene huntingtin. It’s a terrible neurologic disease characterized by progressive movement disorders, dementia and incapacitation over a decade or two. Woodie Guthrie had it; fortunately Arlo escaped. Like many people with the disorder Woodie was quite fertile, having 8 children.

It being a neurologic disorder, I’ve read a lot about it, and my jottings about my readings over the past few decades have consumed 83,635 characters (aren’t computers wonderful)? I’ve had a fair amount of experience with it, as an Indian agent in Montana had it, and produced many progeny with his women, leading to a good deal of devastation in one tribe.

Neuron vol. 89 pp. 910 – 926 ’16 is an excellent recent review (but not one for the fainthearted). Several mysteries are immediately apparent.

First huntingtin is expressed in nearly every neuron, but only a few die. It is expressed outside the brain in lung ovary and testes, but they work just fine.

Second Huntingtin interacts with over 350 different proteins. Figuring which are the important ones has provided steady employment.

Third it exists in many forms, so many that there aren’t enough scientists living to test them all. This is because huntingtin is subject to a variety of chemical modifications (phosphorylation, ubiquitination, acetylation, palmitoylation, sumoylation) at FORTY-EIGHT different sites (listed in the article). So this gives 2^48 possible modified forms of the protein (either modification being present or absent). 2^48 = 281,474,976,710,656 if you’re interested.

In addition to the modifications, the protein is huge — some 3,144 amino acids occurring in 67 exons forming two mRNAs of 10,366 and 13.711 nucleotides.

Fourth The protein can also be chopped up by at least 5 different enzymes at 6 different sites, and some fragments are biologically active (toxic in tissue culture).

Naturally, the region with the mutation (near the amino terminal end) of the protein has been studied most intensively.

Huntingtin has its fingers in many physiologic pies — the reference is excellent in this area — these include vesicular trafficking, cell division, cilia formation, endocytosis, autophagy, gene transcription. Abnormalities of which one causes the neurologic disease.

The mutant form forms protein aggregates. Like Alzheimer’s disease senile plaque or the Lewy body of Parkinson’s disease, we don’t know if the aggregates are toxic or protective.

Fifth: Despite all its known functions we don’t know if the mutation produces a loss of some vital function of Huntingtin, or a new and toxic function.

Even worse, compared to cancer, Huntington’s chorea is ‘simple’ because we know the cause.

The incredible information economy of frameshifting

Her fox and dog ate our pet rat

H erf oxa ndd oga teo urp etr at

He rfo xan ddo gat eou rpe tra t

The last two lines make no sense at all, but (neglecting the spaces) they have identical letter sequences.

Here are similar sequences of nucleotides making up the genetic code as transcribed into RNA

ATG CAT TAG CCG TAA GCC GTA GGA

TGC ATT AGC CGT AAG CCG TAG GA.

GCA TTA GCC TAA GCC GTA GGA ..

Again, in our genome there are no spaces between the triplets. But all the triplets you see are meaningful in the sense that they each code for one of the twenty amino acids (except for TAA which says stop). ATG codes for methionine (the purists will note that all the T’s should be U). I’m too lazy to look the rest up, but the ribosome doesn’t care, and will happily translate all 3 sequences into the sequential amino acids of a protein.

Both sets of sequences have undergone (reading) frame shifts.

A previous post https://luysii.wordpress.com/2014/10/13/the-bach-fugue-of-the-genome/ marveled about how something too small even to be called a virus coded for a protein whose amino acids were read in two different frames.

Frameshifting is used by viruses to get more mileage out of their genomes. Why? There is only so much DNA you can pack into the protein coat (capsids) of a virus.

[ Proc. Natl. Acad. Sci. vol. 111 pp. 14675 – 14680 ’14 ] Usually DNA density in cell nuclei or bacteria is 5 – 10% of volume. However, in viral capsids it is 55% of volume. The pressure inside the viral capsid can reach ten atmospheres. Ejection is therefore rapid (60,000 basepairs/second).

The AIDS virus (HIV1) relies on frame shifting of its genome to produce viable virus. The genes for two important proteins (gag and pol) have 240 nucleotides (80 amino acids) in common. Frameshifting occurs to allow the 240 nucleotides to be read by the cell’s ribosomes in two different frames (not at once). Granted that there are 61 3 nucleotide combinations to code for only 20 amino acids, so some redundancy is built in, but the 80 amino acids coded by the two frames are usually quite different.

That the gag and pol proteins function at all is miraculous.

The phenomenon is turning out to be more widespread. [ Proc. Natl. Acad. Sci. vol. 111 pp. E4342 – E4349 ’14 ] KSHV (Kaposi’s Sarcoma HerpesVirus) causes (what else?) Kaposi’s sarcoma, a tumor quite rare until people with AIDS started developing it (due to their lousy immune system being unable to contend with the virus). Open reading frame 73 (ORF73) codes for a major latency associated nuclear antigen 1 (LANA1). It has 3 domains a basic amino terminal region, an acidic central repeat region (divisible into CR1, CR2 and CR3) and another basic carboxy terminal region. LANA1 is involved in maintaning KSHV episomes, regulation of viral latency, transcriptional regulation of viral and cellular genes.

LANA1 is made of multiple high and lower molecular weight isoforms — e.g. a LANA ladder band pattern seen in immunoblotting.

This work shows that LANA1 (and also Epstein Barr Nuclear antigen 1` ) undergo highly efficient +1 and -2 programmed frameshifting, to generate previously undescribed alternative reading frame proteins in their repeat regions. Programmed frameshifting to generate multiple proteins from one RNA sequence can increase coding capacity, without increasing the size of the viral capsid.

The presence of similar repeat sequences in human genes (such as huntingtin — the defective gene in Huntington’s chorea) implies that we should look for frame shifting translation in ourselves as well as in viruses. In the case of mutant huntingtin frame shifting in the abnormally expanded CAG tracts rproduces proteins containing polyAlanine or polySerineArginine tracts.

Well G, A , T and C are the 1’s and 0’s of the way genetic information is stored in our genomic computer. It really isn’t surprising that the genome can be read in alternate frames. In the old days, textual information in bytes had parity bits to make sure the 1’s and 0’s were read in the correct frame. There is nothing like that in our genome (except for the 3 stop codons).

What is truly suprising it that reading in alternate frame produces ‘meaningful’ proteins. This gets us into philosophical waters. Clearly

Erf oxa ndd oga teo urp etr at

Rfo xan ddo gat eou rpe tra t

aren’t meaningful to us. Yet gag and pol are quite meaningful (even life and death meaningful) to the AIDS virus. So meaningful in the biologic sense, means able to function in the larger context of the cell. That really is the case for linguistic meaning. You have to know a lot about the world (and speak English) for the word cat to be meaningful to you. So meaning can never be defined by the word itself. Probably the same is true for concepts as well, but I’ll leave that to the philosophers, or any who choose to comment on this.