Tag Archives: TADs

The RNA world strikes again

Life is said to have originated in the RNA world.  We all know about the big 3 important RNAs for the cell, mRNA, ribosomal RNA and transfer RNA.  But just like the water, sewer, power and subway systems under Manhattan, there is another world down there in the cell which is just beginning to come into focus

I’ve written several posts about the RNA world in our cells (links at the end), but the latest is really staggering, in that RNA is helping to organize the how our DNA lies in the nucleus.

As usual the discoveries depended new technologies — RD-SPRITE in this case (you don’t want to know what the acronym stands for (by the bye have you noticed how many more acronyms are appearing in papers you read?).  It is extremely complex, but the technique is said to be able to simultaneously map thousands of  RNA and DNA molecules at high resolution relative to all other RNA and DNA molecules.  Details in Cell vol. 184 pp. 5775 – 5790 ’21 .

The count of long nonCoding (for protein that is) RNAs is now in the tens of thousands [ Science vol. 373 pp. 623 – 624 ’21 ]. They have all sorts of functions, but the present work shows that 93% of them stay close to the gene that transcribes them in the nucleus.  Here they bind other proteins in precise territories in the nucleus (because the gene for lncRNAs are found in territories as precise  in the nucleus).   This establishes functional compartments in the nucleus to regulate gene expression.

Interestingly long nonCoding RNAs are transcribed at very low levels, which led people to dismiss them as chaff.  By binding proteins this explains how so few molecules can do so much.

That’s pretty abstract.  Consider Xist, a large nonCoding RNA which inactivates one of the X chromosomes in females.  Just two xists are able to seed a multiprotein cloud around the Xist locus on the X.

Later to be described is Jpx which is crucial in establishing TADs (topologically associated domains)

Here are some older posts on the RNA world

Forgotten but not gone

Forgotten but not gone — take II

Forgotten but not gone — take III

Triplets and TADs

Neurologists have long been interested in triplet diseases — https://en.wikipedia.org/wiki/Trinucleotide_repeat_disorder.  The triplet is made of a string of 3 nucleotides.  Example —  cytosine adenosine guanosine or CAG — which accounts for a lot of them.  We have lots of places in our genome where such repeats normally occur, with the triplets repeated up to 42 times.  However in diseases like Huntington’s chorea the repeats get to be as many as 250 CAGs in a row.  You normally are quite fine as long as you have under 36 of them, and no one has fewer than 6 at this particular location.

Subsequently, expansions of 4, 5, and 6 nucleotide repeats have also been shown to cause disease, bring the total of repeat expansion diseases to over 40.  Why more than half of them should affect the nervous system entirely or for the most part is a mystery.  Needless to say there are plenty of theories.

This leads to three questions (1) there are repeats all over the genome, why do only 40 or so of them expand (2) since we all have repeats in front of the genes where they cause disease why don’t we all have the diseases (3) why do the number of repeats expand with each succeeding generation — the phenomenon is called anticipation.  I saw one such example where a father brought his son to my muscular dystrophy clinic.  The boy had moderately severe myotonic dystrophy.  When I shook the father’s hand, it was clear that he had mild myotonia, which had in no way impaired his life (he was a successful banker).

A recent paper in Cell may help answer the first question and has a hint about the second [ Cell vol. 175 pp. 38 – 40, 224 – 238 ’18 ].  21 of 27 disease associated short tandem repeats (daSTRs) localize to something called a topologically associated domain (TAD) or subdomain (subTAD) boundary. These are defined as contiguous intervals in the genome in which every pair has an elevated interaction frequency compared to loci out side the domain.  TADs and subTADs are measured using chromosome conformation capture assays (acronyms for them include 3C, CCC, 4C, 5C, Hi-C).

Briefly they are performed as follows.  Intact nuclei are isolate from live cel cultures.  These are subjected to paraformaldehye crosslinking to fix segment of genome in close physical proximity. The crosslinked genomic DNA is digested with a restriction endonuclease, and the products expanded by PCR using primers in all possible combinations.  Then having a complete genome sequence in hand, you see what regions of the genome got close enough together to show up in the assay.

This may help explain question one, and the paper gives some speculation about question two — we don’t all have these diseases, because unlike the unfortunates with them, we don’t have problems in our genes for DNA replication, repair and recombination.  There is some evidence for this;  studies in model organisms with these mutations do have short tandem repeat instability.

Unfortunately the paper doesn’t discuss anticipation, because no clinicians appear to be among the authors, even though they’re from Penn which 50+ years ago was very strong in clinical neurology.

None of this work discusses the fascinating questions of how the expanded repeats cause disease, or why so many of them affect the nervous system.

The Kavanaugh Ford confrontation will be to this decade what the Patty Hearst kidnapping was to a previous one  — https://en.wikipedia.org/wiki/Patty_Hearst.  Since I suffered 4 episodes of physical (not sexual) abuse as a kid, and dealt with this extensively as a neurologist, I’m trying to decide whether to write about it.  Emotions are high and there are a lot of nuts out there on the net. There is even a reasonable possibility that both Ford and Kavanaugh are right and not lying.

Activating a proto-oncogene without mutating it

Many proto-oncogenes have to be mutated to cause cancer. Not so the TAL1, LMO2 genes. They drive blood formation, and are aberrantly activated (e.g. more proteins made from them is expressed) in T cell Acute Lymphoblastic Leukemia (TALL). [ Science vol. 351 pp. 1298- 1299, 1454 – 1458 ’16 ] activated them experimentally using the CRISPR technique, and therein hangs a tale.

Addendum 11 April — LMO2 is well known to gene therapists as early work (2002) using retroviruses inserted randomly in the genome to cure SCID (Severe Combined Immunodeficiency) resulted in TALL in 4kids.  The problem was that the vector integrated in multiple sites all over the genome and one such random site  turned on expression of LMO2.

I’ve written a series of six posts trying to imagine the incredible mass of DNA in a 10 micron nucleus on a human scale — we take it for granted, but it’s far from obvious how this is accomplished — here’s the link to the first — https://luysii.wordpress.com/2010/03/22/the-cell-nucleus-and-its-dna-on-a-human-scale-i/. — just follow the links to the rest.

[ Cell vol. 153 pp. 1187 – 1189, 1281 – 1295 ’13 ] Hi-C and 5C (Carbon Copy Chromosome Conformation Capture) allow determination of chromatin organization and long range chromatin interactions in an unbiased genome wide manner at the megaBase scale. Topologically associated domains (TADs) are the way the genome in the nucleus is organized into megabase to submegaBase sized interacting domains. TADs are conserved between species and are invariant across cell types. [ Call vol. 156 p. 19 ’14 ] They average 700 – 800 kiloBases and are said to contain 5 – 10 protein coding genes and a few hundred enhancers. The expression of genes within a TAD is ‘somewhat correlated’. Some TADs have active genes, while others have repressed genes. Genomic interactions are strong within a domain, but are sharply depleted on crossing the boundary between two TADs.

Well TADs have to be separated from each other. The current thinking is that the boundaries are formed by sites in the DNA which bind the CTCF protein, and possibly cohesin proteins as well. CTCF is a large protein (although maddeningly I can’t seem to find out how many amino acids it has) with a molecular mass of 80 kiloDaltons. It’s DNA binding is quite specific as it contains 11 zinc fingers (each of which can specifically bind a 3 nucleotide stretch of DNA). In addition to binding to DNA it can bind to itself, forming a perfect way to form loops of DNA.

All the Science paper did was to delete a few CTCF binding sites using the CRISPR technique around the two oncogenes and bang — expression increased. Why?  Because the insulation between the TAD containing the genes and adjacent TADs was broken, allowing control of the genes by enhancers in the new and larger TAD that had been previously sequestered in an adjacent TAD.  The deletions were thousands of basepairs away from the coding sequence of the genes themselves.  All very nice, but it’s fairly artificial.

However the paper notes that across a large pan-cancer cohort, there was a 2 fold enrichment for boundary CTCF site mutations.