Tag Archives: 5C

Triplets and TADs

Neurologists have long been interested in triplet diseases — https://en.wikipedia.org/wiki/Trinucleotide_repeat_disorder.  The triplet is made of a string of 3 nucleotides.  Example —  cytosine adenosine guanosine or CAG — which accounts for a lot of them.  We have lots of places in our genome where such repeats normally occur, with the triplets repeated up to 42 times.  However in diseases like Huntington’s chorea the repeats get to be as many as 250 CAGs in a row.  You normally are quite fine as long as you have under 36 of them, and no one has fewer than 6 at this particular location.

Subsequently, expansions of 4, 5, and 6 nucleotide repeats have also been shown to cause disease, bring the total of repeat expansion diseases to over 40.  Why more than half of them should affect the nervous system entirely or for the most part is a mystery.  Needless to say there are plenty of theories.

This leads to three questions (1) there are repeats all over the genome, why do only 40 or so of them expand (2) since we all have repeats in front of the genes where they cause disease why don’t we all have the diseases (3) why do the number of repeats expand with each succeeding generation — the phenomenon is called anticipation.  I saw one such example where a father brought his son to my muscular dystrophy clinic.  The boy had moderately severe myotonic dystrophy.  When I shook the father’s hand, it was clear that he had mild myotonia, which had in no way impaired his life (he was a successful banker).

A recent paper in Cell may help answer the first question and has a hint about the second [ Cell vol. 175 pp. 38 – 40, 224 – 238 ’18 ].  21 of 27 disease associated short tandem repeats (daSTRs) localize to something called a topologically associated domain (TAD) or subdomain (subTAD) boundary. These are defined as contiguous intervals in the genome in which every pair has an elevated interaction frequency compared to loci out side the domain.  TADs and subTADs are measured using chromosome conformation capture assays (acronyms for them include 3C, CCC, 4C, 5C, Hi-C).

Briefly they are performed as follows.  Intact nuclei are isolate from live cel cultures.  These are subjected to paraformaldehye crosslinking to fix segment of genome in close physical proximity. The crosslinked genomic DNA is digested with a restriction endonuclease, and the products expanded by PCR using primers in all possible combinations.  Then having a complete genome sequence in hand, you see what regions of the genome got close enough together to show up in the assay.

This may help explain question one, and the paper gives some speculation about question two — we don’t all have these diseases, because unlike the unfortunates with them, we don’t have problems in our genes for DNA replication, repair and recombination.  There is some evidence for this;  studies in model organisms with these mutations do have short tandem repeat instability.

Unfortunately the paper doesn’t discuss anticipation, because no clinicians appear to be among the authors, even though they’re from Penn which 50+ years ago was very strong in clinical neurology.

None of this work discusses the fascinating questions of how the expanded repeats cause disease, or why so many of them affect the nervous system.

The Kavanaugh Ford confrontation will be to this decade what the Patty Hearst kidnapping was to a previous one  — https://en.wikipedia.org/wiki/Patty_Hearst.  Since I suffered 4 episodes of physical (not sexual) abuse as a kid, and dealt with this extensively as a neurologist, I’m trying to decide whether to write about it.  Emotions are high and there are a lot of nuts out there on the net. There is even a reasonable possibility that both Ford and Kavanaugh are right and not lying.

Activating a proto-oncogene without mutating it

Many proto-oncogenes have to be mutated to cause cancer. Not so the TAL1, LMO2 genes. They drive blood formation, and are aberrantly activated (e.g. more proteins made from them is expressed) in T cell Acute Lymphoblastic Leukemia (TALL). [ Science vol. 351 pp. 1298- 1299, 1454 – 1458 ’16 ] activated them experimentally using the CRISPR technique, and therein hangs a tale.

Addendum 11 April — LMO2 is well known to gene therapists as early work (2002) using retroviruses inserted randomly in the genome to cure SCID (Severe Combined Immunodeficiency) resulted in TALL in 4kids.  The problem was that the vector integrated in multiple sites all over the genome and one such random site  turned on expression of LMO2.

I’ve written a series of six posts trying to imagine the incredible mass of DNA in a 10 micron nucleus on a human scale — we take it for granted, but it’s far from obvious how this is accomplished — here’s the link to the first — https://luysii.wordpress.com/2010/03/22/the-cell-nucleus-and-its-dna-on-a-human-scale-i/. — just follow the links to the rest.

[ Cell vol. 153 pp. 1187 – 1189, 1281 – 1295 ’13 ] Hi-C and 5C (Carbon Copy Chromosome Conformation Capture) allow determination of chromatin organization and long range chromatin interactions in an unbiased genome wide manner at the megaBase scale. Topologically associated domains (TADs) are the way the genome in the nucleus is organized into megabase to submegaBase sized interacting domains. TADs are conserved between species and are invariant across cell types. [ Call vol. 156 p. 19 ’14 ] They average 700 – 800 kiloBases and are said to contain 5 – 10 protein coding genes and a few hundred enhancers. The expression of genes within a TAD is ‘somewhat correlated’. Some TADs have active genes, while others have repressed genes. Genomic interactions are strong within a domain, but are sharply depleted on crossing the boundary between two TADs.

Well TADs have to be separated from each other. The current thinking is that the boundaries are formed by sites in the DNA which bind the CTCF protein, and possibly cohesin proteins as well. CTCF is a large protein (although maddeningly I can’t seem to find out how many amino acids it has) with a molecular mass of 80 kiloDaltons. It’s DNA binding is quite specific as it contains 11 zinc fingers (each of which can specifically bind a 3 nucleotide stretch of DNA). In addition to binding to DNA it can bind to itself, forming a perfect way to form loops of DNA.

All the Science paper did was to delete a few CTCF binding sites using the CRISPR technique around the two oncogenes and bang — expression increased. Why?  Because the insulation between the TAD containing the genes and adjacent TADs was broken, allowing control of the genes by enhancers in the new and larger TAD that had been previously sequestered in an adjacent TAD.  The deletions were thousands of basepairs away from the coding sequence of the genes themselves.  All very nice, but it’s fairly artificial.

However the paper notes that across a large pan-cancer cohort, there was a 2 fold enrichment for boundary CTCF site mutations.