Category Archives: Molecular Biology

Has the great white whale of oncology finally been harpooned?

The ras oncogene is the great white whale of oncology. Mutations in 20 – 40% of cancer turn its activity on so that nothing can turn it off, resulting in cellular proliferation. People have been trying to turn mutated ras off for years with no success.

A current paper [ Cell vol. 165 pp. 643 – 655 ’16 ] describes a new and different way to attack it. Once  ras is turned on (either naturally or by mutation) many other proteins must bind to it, to produce their effects — they are called RAS effectors, among which are the uneuphoniously named RAF, RalGDS and PI3K. They bind to activated ras by the cleverly named Ras Binding Domain (RBD) which has 78 amino acids.

The paper describes rigosertib, a not that complicated molecule to the chemist, which inhibits the binding (by resembling the site on ras that the RBD binds to). It is a styryl benzyl sulfone and you can see the structure here — https://en.wikipedia.org/wiki/Rigosertib.

What’s good about it? Well it is in phase III trials for a fairly uncommon form of cancer (myelodysplastic syndrome). That means it isn’t horribly toxic or it wouldn’t have made it out of phase I.

Given the mechanism described, it is possible that Rigosertib will be useful in 20 – 40% of all cancer. Can you say blockbuster drug?

Do you have a speculative bent? Buy the company testing the drug and owning the patent — Oncova Therapeutics. It’s quite cheap — trading at $.40 (yes 40 cents !). It once traded as high as $30.00 — symbol ONTX. I don’t own any (yet), but for the price of a movie with a beer and some wings afterwards you could be the proud owner of 100 shares. If Rigosertib works, the stock will certainly increase more than a hundredfold.

Enough kidding around. This is serious business. In what follows you will find some hardcore molecular biology and cellular physiology showing just what we’re up against. Some of the following is quite old, and probably out of date (like yours truly), but it does give you the broad outlines of what is involved.

The pathway from Ras to the nucleus

The components of the pathway had been found in isolation (primarily because mutations in them were associated with malignancy). Ras was discovered as an oncogene in various sarcoma viruses. Mutations in ras found in tumors left it in a ‘turned on’ state, but just how ras (and everything else) fit into the chain of binding of a growth factor (such as platelet derived growth factor, epidermal growth factor, insulin, etc. etc.) to its receptor on the cell surface to alterations in gene expression wasn’t clear. It is certain to become more complicated, because anything as important as cellular proliferation is very likely to have a wide variety of control mechanisms superimposed on it. Although all sorts of protein kinases are involved in the pathway it is important to remember that ras is NOT a protein kinase.

l. The first step is binding of a growth factor to its receptor on the cell surface. The receptor is usually a tyrosine kinase. Binding of the factor to the receptor causes ‘activation’ of the receptor. Activation usually means increasing the enzymatic activity of the receptor in the tyrosine kinase reaction (most growth factor receptors are tyrosine kinases). The increase in activity is usually brought about by dimerization of the receptor (so it phosphorylates itself on tyrosine).

2. Most activated growth factor receptors phosphorylate themselves (as well as other proteins) on tyrosine. A variety of other proteins have domains known as SH2 (for src homology 2) which bind to phosphorylated tyrosine.

3. A protein called grb2 binds via its SH2 domain to a phosphorylated tyrosine on the receptor. Grb2 binds to the polyproline domain of another protein called sos1 via its SH3 domain. At this point, the unintiated must find the proceedings pretty hokey, but the pathway is so general (and fundamental) that proteins from yeast may be substituted into the human pathway and still have it work.

4. At last we get to ras. This protein is ‘active’ when it binds GTP, and inactive when it binds GDP. Ras is a GTPase (it can hydrolyze GTP to GDP). Most mutations which make ras an oncogene decrease the GTPase activity of RAS leaving it in a permanently ‘turned on’ state. It is important for the neurologist to know that the defective gene in type I neurofibromatosis activates the GTPase activity of ras, turning ras off. Deficiencies (in ras inactivation) lead to a variety of unusual tumors familiar to neurologists.

Once RAS has hydrolyzed GTP to GDP, the GDP remains bound to RAS inactivating it. This is the function of sos1. It catalyzes the exchange of GDP for GTP on ras, thus activating ras.

5. What does activated ras do? It activates Raf-1 silly. Raf-1 is another oncogene. How does activated ras activate Raf-1 ?  Ras appears to activate raf by causing raf to bind to the cell membrane (this doesn’t happen in vitro as there is no membrane). Once ras has done its job of localizing raf to the plasma membrane, it is no longer required. How membrane localization activates raf is less than crystal clear. [ Proc. Natl. Acad. Sci. vol. 93 pp. 6924 – 6928 ’96 ] There is increasing evidence that Ras may mediate its actions by stimulating multiple downstream targets of which Raf-1 is only one.

6. Raf-1 is a protein kinase. Protein kinases work by adding phosphate groups to serine, threonine or tyrosine. In general protein kinases fall into two classes those phosphorylating on serine or threonine and those phosphorylating on tyrosine. Biochemistry has a well documented series of examples of enzymes being activated (or inhibited) by phosphorylation. The best worked out is the pathway from the binding of epinephrine to its cell surface receptor to glycogen breakdown. There is a whole sequence of one enzyme phosphorylating another which then phosphorylates a third. Something similar goes on between Raf-1 and a collection of protein kinases called MAPKs (mitogen activated protein kinases). These were discovered as kinases activated when mitogens bound to their extracellular receptors.There may be a kinase lurking about which activates Raf (it isn’t Ras which has no kinase activity). Removal of phosphate from Raf (by phosphatases) inactivates it.

7. Raf-1 activates members of the MAPK family by phosphorylating them. There may be several kinases in a row phosphorylating each other. [ Science vol. 262 pp. 1065 – 1067 ’93 ] There are at least three kinase reactions at present at this point. It isn’t known if some can be sidestepped. Raf-1 activates mitogen activated protein kinase kinase (MAPK-K) by phosphorylation (it is called MEK in the ras pathway). MAPK-K activates mitogen activation protein kinase (MAPK) by phosphorylation. Thus Raf-1 is actually mitogen activated protein kinase kinase kinase (sort of like the character in Catch-22 named Junior Junior Junior). (1/06 — I think that Raf-1 is now called BRAF)

8. The final step in the pathway is activation of transcription factors (which turn genes off or on) by MAP kinases by (what else) phosphorylation. Thus the pathway from cell surface is complete.

Is that mutation significant?

Face it, our genomes are a real mess. A study of just the parts of the genome coding for amino acids (2% at most) in about 2,500 people found an average of 205 variants which change the amino acid coded for IN EACH PERSON. Each person also had an average of 3 termination codons in the 15,000+ protein coding sequences they studied. So they are wandering around with 3 abnormally short proteins. You can read more about it in this old post –https://luysii.wordpress.com/2012/07/31/how-badly-are-thy-genomes-oh-humanity/

Here’s the problem — these people were healthy. Obviously, not a problem for them, but a big problem for physicians attempting to do genetic counseling. For how it affected epilepsy counseling see — https://luysii.wordpress.com/2011/07/17/weve-found-the-mutation-causing-your-disease-not-so-fast-says-this-paper/.

This brings us to Lynch syndrome (aka Hereditary NonPolyposis Colorectal Cancer — HNPCC). It is a familial cancer syndrome, and we now know what the problem is — mutations in any of four genes involved in a type of DNA mutation repair (there are many). The genes are called MSH2, MSH6, MLH1 and PMS2 (acronyms all whose names you don’t need to know) and the type of repair is called MisMatch Repair (MMR).

This isn’t academic at all. Suppose your aunt comes down with colon cancer and you get tested for mutations in one of the four, and a mutation is found. You’re fine now. The question before the house is — should you have your colon out? Colonoscopy won’t help because this kind of colon cancer doesn’t arise from polyps (which is what colonoscopy is looking for).

The problem is that the 4 genes are ‘peppered’ with missense variants (change the amino acid coded for). They are called VUS (Variants of Unknown Significance). The following paper [ Proc. Natl. Acad. Sci. vol. 113 pp. 3918 – 3820, 4128 – 4133 ’16 ] used a clever way to test a VUS for significance. This would have been impossible 5 years ago. What they did was use CRISPR to introduce the variant into the appropriate protein in mouse Embryonic Stem cells. Then they tested the manipulated stem cells for defects in MisMatch Repair. They tested 59 (yes fifty-nine) such VUSs and found that about 1/3 (19) produced MMR defects.

Fascinating time to be alive and reading about all this stuff.

Activating a proto-oncogene without mutating it

Many proto-oncogenes have to be mutated to cause cancer. Not so the TAL1, LMO2 genes. They drive blood formation, and are aberrantly activated (e.g. more proteins made from them is expressed) in T cell Acute Lymphoblastic Leukemia (TALL). [ Science vol. 351 pp. 1298- 1299, 1454 – 1458 ’16 ] activated them experimentally using the CRISPR technique, and therein hangs a tale.

Addendum 11 April — LMO2 is well known to gene therapists as early work (2002) using retroviruses inserted randomly in the genome to cure SCID (Severe Combined Immunodeficiency) resulted in TALL in 4kids.  The problem was that the vector integrated in multiple sites all over the genome and one such random site  turned on expression of LMO2.

I’ve written a series of six posts trying to imagine the incredible mass of DNA in a 10 micron nucleus on a human scale — we take it for granted, but it’s far from obvious how this is accomplished — here’s the link to the first — https://luysii.wordpress.com/2010/03/22/the-cell-nucleus-and-its-dna-on-a-human-scale-i/. — just follow the links to the rest.

[ Cell vol. 153 pp. 1187 – 1189, 1281 – 1295 ’13 ] Hi-C and 5C (Carbon Copy Chromosome Conformation Capture) allow determination of chromatin organization and long range chromatin interactions in an unbiased genome wide manner at the megaBase scale. Topologically associated domains (TADs) are the way the genome in the nucleus is organized into megabase to submegaBase sized interacting domains. TADs are conserved between species and are invariant across cell types. [ Call vol. 156 p. 19 ’14 ] They average 700 – 800 kiloBases and are said to contain 5 – 10 protein coding genes and a few hundred enhancers. The expression of genes within a TAD is ‘somewhat correlated’. Some TADs have active genes, while others have repressed genes. Genomic interactions are strong within a domain, but are sharply depleted on crossing the boundary between two TADs.

Well TADs have to be separated from each other. The current thinking is that the boundaries are formed by sites in the DNA which bind the CTCF protein, and possibly cohesin proteins as well. CTCF is a large protein (although maddeningly I can’t seem to find out how many amino acids it has) with a molecular mass of 80 kiloDaltons. It’s DNA binding is quite specific as it contains 11 zinc fingers (each of which can specifically bind a 3 nucleotide stretch of DNA). In addition to binding to DNA it can bind to itself, forming a perfect way to form loops of DNA.

All the Science paper did was to delete a few CTCF binding sites using the CRISPR technique around the two oncogenes and bang — expression increased. Why?  Because the insulation between the TAD containing the genes and adjacent TADs was broken, allowing control of the genes by enhancers in the new and larger TAD that had been previously sequestered in an adjacent TAD.  The deletions were thousands of basepairs away from the coding sequence of the genes themselves.  All very nice, but it’s fairly artificial.

However the paper notes that across a large pan-cancer cohort, there was a 2 fold enrichment for boundary CTCF site mutations.

That’s not a bug — that’s a feature

Back in the early days of computers you could own (aka personal computers) it wasn’t point and click, but hunt and peck, where commands in the early operating systems (DOS, etc.) had to be typed onto the command line using a keyboard. The interfaces were far from intuitive, to say the least, and the unexpected was always expected. When things went south software designers quickly learned to say “That’s not a bug, thats a feature ! ”

Essentially the same thing has happened to the latest and greatest tool in genetic engineering, the CRISPR system. It’s fascinating that it has been hiding in plain sight for FOUR decades. In med school in the mid60s the basic book about hereditary and DNA was “Sexuality and the Genetics of Bacteria” (1961) by Francois Jacob. No one had any idea that DNA would be sequenced. Viruses were studied (called bacteriophages back then).

No one had any idea that bacteria could defend themselves against viruses, but defend they do by their CRISPR system. It’s only been known for a decade, earlier papers on the subject by 3 different authors Mojica, Gilles Vergnaud, Alexander Bolotin were rejected before eventual publication.

Briefly, when a bacterium is infected by a virus, it makes a copy of fragments of its DNA, and pastes it into its genome. On subsequent invasions, it uses the DNA copy to make RNA, which along with a complex enzyme binds to the genome of the new organism, and destroys it.

It turns out that a PAM (Protospacer Adjacent Motif) is crucial for the whole system to work. The bacterial DNA doesn’t have such a sequence of DNA, and searches for it in the invader. The PAM isn’t large (just 3 nucleotides in a row) and the system looks for it in invading viral DNA double helices.

But where does it look? On the side of the double helix with the least information — the minor groove

Look at the following http://pharmafactz.com/wp/wp-content/uploads/2014/11/watson-crick-base-pairing.jpg

It shows classic Watson Crick base pairing — the major groove is a lot bigger taking up 210 degrees (hardly a groove) with more chemical information) than the minor groove. So binding to the major groove is likely to be far more accurate (as well as easier because it’s a larger space)

So why does E. Coli do this? Because different viruses contain different PAM sequences. [ Nature vol. 530 pp. 499 – 503 ’16 ] This is the crystal structure of the E. Coli Cascade complex (the business end of CRISPR) bound to a foreign double stranded DNA target. The 5′ ATG PAM is recognized in duplex form, from the minor groove side, by 3 structural features in the Cse1 subunit of cascade. The promiscuity inherent to minor groove DNA recognition explains how a single Cascade complex can respond to several distinct PAM sequences — this is a feature not a bug.

When knowledge isn’t power

Here is a genetic disease, where we’ve known exactly what’s wrong with the causative gene for 23 years, over 10,000 papers have been written (a Google search comes up with about 418,000 results (0.45 seconds), but we don’t know how the mutation causes the problems it does or have a clue how to treat the disease. So much for finding the cause of a genetic disease leading to therapy. Imagine how much harder cancer is.

I speak of Huntington’s chorea, and the causative gene huntingtin. It’s a terrible neurologic disease characterized by progressive movement disorders, dementia and incapacitation over a decade or two. Woodie Guthrie had it; fortunately Arlo escaped. Like many people with the disorder Woodie was quite fertile, having 8 children.

It being a neurologic disorder, I’ve read a lot about it, and my jottings about my readings over the past few decades have consumed 83,635 characters (aren’t computers wonderful)? I’ve had a fair amount of experience with it, as an Indian agent in Montana had it, and produced many progeny with his women, leading to a good deal of devastation in one tribe.

Neuron vol. 89 pp. 910 – 926 ’16 is an excellent recent review (but not one for the fainthearted). Several mysteries are immediately apparent.

First huntingtin is expressed in nearly every neuron, but only a few die. It is expressed outside the brain in lung ovary and testes, but they work just fine.

Second Huntingtin interacts with over 350 different proteins. Figuring which are the important ones has provided steady employment.

Third it exists in many forms, so many that there aren’t enough scientists living to test them all. This is because huntingtin is subject to a variety of chemical modifications (phosphorylation, ubiquitination, acetylation, palmitoylation, sumoylation) at FORTY-EIGHT different sites (listed in the article). So this gives 2^48 possible modified forms of the protein (either modification being present or absent). 2^48 = 281,474,976,710,656 if you’re interested.

In addition to the modifications, the protein is huge — some 3,144 amino acids occurring in 67 exons forming two mRNAs of 10,366 and 13.711 nucleotides.

Fourth The protein can also be chopped up by at least 5 different enzymes at 6 different sites, and some fragments are biologically active (toxic in tissue culture).

Naturally, the region with the mutation (near the amino terminal end) of the protein has been studied most intensively.

Huntingtin has its fingers in many physiologic pies — the reference is excellent in this area — these include vesicular trafficking, cell division, cilia formation, endocytosis, autophagy, gene transcription. Abnormalities of which one causes the neurologic disease.

The mutant form forms protein aggregates. Like Alzheimer’s disease senile plaque or the Lewy body of Parkinson’s disease, we don’t know if the aggregates are toxic or protective.

Fifth: Despite all its known functions we don’t know if the mutation produces a loss of some vital function of Huntingtin, or a new and toxic function.

Even worse, compared to cancer, Huntington’s chorea is ‘simple’ because we know the cause.

The chemical ingenuity of the cell

If you know a bit of molecular biology, you know that messenger RNA (mRNA) has a tail of consecutive adenines added at its 5′ end (sorry ! ! !  3′ end — oh well). If you don’t know that much all the background you need can be found in https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/ — just follow the links.

The adenines are not coded in the genome. Why? I’ve always thought of it as something preventing the mRNA from being broken down before the ribosome translates it into protein. Gradually the adenines are nibbled off by cytoplasmic nucleases. The literature seems to agree — from my notes on various sources

Most mRNAs in mammalian cells are quite stable and have a half life measured in hours, but others turn over within 10 to 30 minutes. The 5′ cap structure in mRNA prevents attack by 5′ exonucleases and the polyadenine (polyA) tail prohibits the action of 3′ exonucleases. The absence of a polyA tail is associated with rapid degradation of mRNA. Histone mRNAs lack a polyA tail but have near their 3′ terminus a sequence which can form a stem loop structure this appears to confer resistance to exonucleolytic attack.

polyA — the polyAdenine tail found on most mRNAs must be removed before mRNA degradation can occur. Anything longer than 10 adenines in a row seems to protect mRNA. The polyA tail is homogenous in length in most species ( 70 – 90 in yeast, 220 – 250 nucleotides in mammalian cells). PolyA shortening can be separated into two phases, the first being the shortening of the tail down to 12 – 25 residues, and the second terminal deadenylation being the removal of some or all of them.

Molecular Biology of the Cell 4th Edition p. 449 — Once a critical threshold of tail shortening has been reached (about 30 As) the 5′ cap is removed (decapping) and the RNA is rapidly degraded. The proteins that carry out tail shortening compete directly with the machinery that catalyzes translation; therefore any factors increasing translation initiation efficiency increase mRNA stability. Many RNAs carry in the 3′ UTR sequences binding sites for specific proteins that increase or decrease the rate of polyA shortening.

But why polyAdenine? Why not polyCytosine or PolyGuanine or polyUridine? Here’s were the chemical ingenuity comes in. Of the 64 possible codons for amino acids only 3 tell the ribosome to stop. These are called various — termination codons, stop codons,and (idiotically) nonsense codons — they aren’t nonsense at all, and are  functionally vital for the following reason. Stop codons cause the ribosome to separate into two parts releasing the mRNA and the protein. Suppose a given mRNA doesn’t have a stop codon? Then the ribosome and the mRNA remain stuck together, and future protein synthesis by that particular ribosome becomes impossible. Not good.

This is probably why the codons for stop are so similar UAA, UAG and UGA — mutating a G to an A gives another one, and mutating either A in UAA to a G gives another stop codon. So the coding chosen for stop codons is somewhat resistant to mutation, because mRNAs with stop codons are disastrous for reasons shown above.

Well, randomness happens and suppose that the termination codon has been mutated to another amino acid. These are called nonStop RNAs which code for nonStop proteins. So the poor ribosome then translates the mRNA right to its 3′ end. Well what does AAA translate into — lysine. Lysine is quite basic and quickly becomes protonated on its epsilon lysine (even within the confines of the ribosome). The exit tunnel for the ribosome is strongly negatively charged, and so coulomb interaction grinds things to a halt. What other basic amino acids are there? There’s arginine, and perhaps histidine, but no codons for them is CCC or GGG or UUU.

Then the Ribosomal Quality Control system (RQC) then springs into action. I didn’t realize this until reading the following paper this year. Did you? Amazing cleverness on the part of the cell.

[ Nature vol. 531 pp. 191 – 195 ’16 ] Translation of an mRNA lacking a stop codon (nonStop mRNA) in eukaryotes results in a polyLysine protein (AAA codes for lysine). The positively charged lysine cause stalling in the negatively charged ribosomal exit tunnel. The Ribosomal Quality Control complex (RQC complex) recognizes nonStop proteins and mediates their ubiquitination and proteasomal degradation.

The eukaryotic RQC comprises Listerin (Ltn1) an E3 ubiquitin ligase, Rqc1, Rqc2 and the AAA+ protein CDC48. On dissociation of the stalled ribosome, Rqc binds to the peptidyl tRNA of the 60S sunit and recruits Ltn1 which curves around the 60S ribosome, positioning its ligase domain near the nascent chain exit. R2c2 is a nucleotide binding protein that recruits tRNA^Ala and tRNA^Thr to the 60S peptidyl tRNA complex. This results in the addition of a Carboxy terminal Ala/Thr sequence (a CAT tail) to the stalled nascent chain.

Mutation of Listerin causes neurodegeneration in mice.

Threading the ribosomal needle

What do you do when you to try to thread a needle? You straighten out the thread. This is exactly what a newly discovered RNA modification (1 methyl adenosine) is doing. If you look at the of adenine pairing with thymine in the following link, the hydrogen sitting between the adenine and thymine is replaced with a much bulkier methyl group in 1 methyl adenosine. Watson-Crick base pairing is impossible.

http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/B/BasePairing.html

Not much 1 methyl adenosine is found in a given mRNA (usually one or less). The authors note that it is usually found near a transcription start site (and in a highly structured region — based on the PARS score — whatever that is). In particular it is found at alternative initiation sites in the second or third exon of a gene. Unsurprisingly, when it is present more protein is expressed from the mRNA.

The work is described in Nature vol. 530 pp. 422 – 423, 441 – 446 ’16. The authors wonder how many mRNA modifications are out there waiting to be discovered. Let’s hope they look. Other mRNA modifications are known (pseudouridine, 6 methyl adenine and 5 methyl cytosine). The modification is dynamic, the amount changing with cellular conditions. This isn’t a flash in the pan as 1/3 of the same sites are methylated in mouse mRNA.

Nicastrin the gatekeeper of gamma secretase

Once a year some hapless trucker from out of town gets stuck trying to drive under a nearby railroad bridge with a low clearance. This is exactly the function of nicastrin in the gamma secretase complex which produces the main component of the senile plaque, the aBeta peptide.

Gamma secretase is a 4 protein complex which functions as an enzyme which can cut the transmembrane segment of proteins embedded in the cell membrane. This was not understood for years, as cutting a protein here means hydrolyzing the amide bond of the protein, (e.g. adding water) and there is precious little water in the cell membrane which is nearly all lipid.

Big pharma has been attacking gamma secretase for years, as inhibiting it should stop production of the Abeta peptide and (hopefully) help Alzheimer’s disease. However the paper to be discussed [ Proc. Natl. Acad. Sci. vol. 113 p.n E509 – E518 ’16 ] notes that gamma secretase processes ‘scores’ of cell membrane proteins, so blanket inhibition might be dangerous.

The idea that Nicastrin is the gatekeeper for gamma secretase is at least a decade old [ Cell vol. 122 pp. 318 – 320 ’05 ], but back then people were looking for specific binding of nicastrin to gamma secretase targets.

The new paper provides a much simpler explanation. It won’t let any transmembrane segment of a protein near the active site of gamma secretase unless the extracellular part is lopped off. The answer is simple mechanics. Nicastrin is large (709 amino acids) but with just one transmembrane domain. Most of it is extracellular forming a blob extending out 25 Angstroms from the membrane, directly over the substrate binding pocket of gamma secretase. Only substrates with small portions outside the membrane (ectodomains) can pass through it. It’s the railroad bridge mentioned above. Take a look at the picture — https://en.wikipedia.org/wiki/Nicastrin

This is why a preliminary cleavage of the Amyloid Precursor Peptide (APP) is required for gamma secretase to work.

So all you had to do was write down the wavefunction for Nicastrin (all 709 amino acids) and solve it (assuming you even write it down) and you’d have the same answer — NOT. Only the totally macroscopic world explanation (railroad bridge) is of any use. What keeps proteins from moving through each other? Van der Waals forces. What help explain them. The Pauli exclusion principle, as pure quantum mechanics as it gets.

Bad news on the AIDs front

Bad news for those hoping for an AIDs cure. As you know, the active virus (HIV1) has a genome made of RNA. However, thanks to an enzyme it possesses called reverse transcriptase (which has led to Nobel prizes), it copies itself into DNA and integrates into the genome of lymphocytes. There it sits presumably doing nothing, but it’s always capable of activating and producing more infectious virus.

We seem to have fought the virus to a draw, using a cocktail of drugs which attack different aspects — HAART (Highly Active Antiretroviral Therapy). Success is usually considered being unable to detect viral RNA in the blood (see later). However blood cells are short-lived. What about the longer living lymphocytes found in the lymph nodes and spleen.

That’s what was studied in a current paper [ Nature vol. 530 pp. 5` – 45 ’16 ] but in only 3 people. All had no detectable virus in the blood (under 48 copies/milliLiter — an incredibly tiny amount — see later). What they did was to biopsy lymph nodes in the groin on study entry and at 3 and 6 months.

Then they sequenced the genomes of the lymphocytes from the nodes, to study the HIV1 DNA integrated into the genome. They found that the genome changed with time. This is very bad. Why?

Because it implies that, even though you the virus in the blood, the virus was not staying latent in the lymph nodes, but coming out of the lymphocytes and forming infectious virus which then mutated. Subsequently the mutated virus integrated into the genome of another lymphocyte. So even with what we consider excellent control, the virus is not purely latent. Drug resistance could arise from mutations (although they didn’t see it in this study).

Clearly, more people need to be studied this way (but serial biopsies? It will probably be done in prisoners, if such things are still done).

It’s worthwhile thinking about how incredibly selective and accurate our methods of analysis are. 48 copies of the viral RNA per milliLiter of blood is the lower limit of detection. Remember that water has a molecular weight of 18, so a liter of distilled water is 1000 grams / 18 grams = 55.5 Molar. A mole has 6 x 10^23 molecules. A milliLiter is 10^-3 liters. So 1 milliLiter of distilled water has 55 * 6 * 10^23 * 10^-3 == 3 * 10^22 molecules of water in it so the assay is finding 48 or more molecules of HIV1 RNA in the water haystack. Even figuring that the concentration of water in blood is 1/10 that of distilled water, this is still impressive.

smORFs, dwORFs and now uORFs

A recent post described small Open Reading Frames (smORFs) and DWarf Open Reading Frames (DWORFS) — see the link at the bottom. Now it’s time for uORFs (upstream Open Reading Frames). Upstream of what you might ask? Well messenger RNA is grabbed by the ribosome at one end (called the 5′ end). The current thinking was that the ribosome marched along the mRNA from the 5′ to the 3′ direction looking for the sequence Adenine Uridine Guanine (AUG) which codes for methionine. It then begins reading the mRNA 3 nucleotides at a time and tacking amino acids onto the methionine. This is called translating mRNA into protein. What about the 5′ end of the mRNA before the AUG is reached (perhaps hundreds of nucleotides later) — it isn’t translated which is why its called the 5′ UTR (5′ UnTranslated Region). In bacteria its only a few nucleotides, but our 5′ UTRs can have thousands — https://en.wikipedia.org/wiki/Five_prime_untranslated_region.

Two other terms of art are upstream and downstream. Since the ribosome flows from 5′ to 3′ on mRNA, any nucleotide 5′ to a given point is called upstream, and anything 3′ is called downstream. Logical terminology — what a pleasure.

So a uORF is an upstream Open Reading Frame. Upstream to what? Why to the AUG (the initiator codon). The assumption had always been that since there was no initiator AUG codon on this region — that proteins couldn’t be made from the uORF. Wrong.

This is where [ Science vol. 351 p. 465 aad2867 – 1 –> 9 ’16 ] comes in. It turns out that the ribosome can translate some of these uORFs in protein, and the paper describes a clever technique (called 3T) they developed to find them. One of the problems in finding uORF proteins is that some are quite small, and are missed in the usual protein assays. One uORF from ATF4 contains only3 amino acids which is so small that mass spectrometry can’t see it.

The paper makes the amazing statement that — Nearly half of all mammalian mRNAs harbor uORFs in the 5′ UTRs, and many are initiated with nonAUG start codons. They may be a general mechanism to regulate downstream coding sequence expression and gives two citations that I must have missed in my reading .

For instance Binding immunoglobulin Protein (BiP aka Heat Shock Protein family A member 5 – HSPA5 ) contains uORFs exclusively initiated by UUG and CUG start codons (not AUG).

What might the functions of uORF actually be? The obvious one is that the proteins made from them might actually be doing something. What could a 3 amino acid protein possibly do? Lots. Consider thyrotropin releasing hormone which helps control your thyroid — it is pyroglutamic acid histidine proline. Then there is met-encephalin which has 5 amino acids and is one of the endogenous opiate peptides your brain uses.

Another possibility is that just translating the uORF into protein controls the translation of the protein starting with the AUG codon. This isn’t so far fetched. A recent paper [ Nature vol. 529 pp. 551 – 554 ’16 ] gave a 3 dimensional structure for RNA polymerase II transcribing a DNA template into mRNA. The authoress (Carrie Bernecky) was kind enough to supply the dimensions of the complex when I wrote her. Remember you can consider the DNA double helix as a cylinder 20 Angstroms in diameter. It is roughly 150 x 150 x 160 Angstroms. Figuring 3 stacked nucleotides/10 Angstroms, this is enough to obstruct 45 nucleotides of DNA upstream of the actual start site.

This is just another example of room at the bottom, where all sorts of small molecule metabolites, small RNAs, small DNAs are just being unearthed and their structure determined. For more on this please see the following link

https://luysii.wordpress.com/2016/01/25/smorfs-and-dworfs-has-molecular-biology-lost-its-mind/

Follow

Get every new post delivered to your Inbox.

Join 92 other followers