Category Archives: Molecular Biology

Little Bo Peep meets cellular biology and biochemistry.

Flippase. Eat me signals. Dragging their tails behind them. Have cellular biologists and structural biochemists gone over to the dark side? It’s all quite innocuous as the old nursery rhyme will show

Little Bo Peep has lost her sheep
and doesn’t know where to find them
Leave them alone, and they’ll come home
wagging their tails behind them.

First, some cellular biochemistry. The lipid bilayer encasing all our cells is made of two leaflets, inner and outer. The composition of the two is different (unlike the soap bubble). On the inside we find phosphatidylethanolamine (PE), phosphatidylserine (PS). The outer leaflet contains phosphatidylcholine (PC) and sphingomyelin (SM) and almost no PE or PS. This is clearly a low entropy situation compared to having all 4 randomly dispersed between the 2 leaflets.

What is the possible use of this (notice how teleology invariably creeps into cellular biology)? Chemistry is powerless to explain such things. Much as I love chemistry, such truths must be faced.

It takes energy to maintain this peculiar distribution. The enzyme moving PE and PS back inside the cell is the flippase. It requires energy in the form of ATP to operate. When a cell is dying ATP drops, and entropy takes its course moving PE and PS to the cell surface. Specialized cells (macrophages) exist to scoop up the dying or dead cells, without causing inflammation. They recognize PE and PS by a variety of receptors and munch up cells exposing them on the surface. So PE and PS are eat me signals which appear when there isn’t enough ATP around for flippase to use to haul PE and PS back inside. Clever no?

No for some juicy chemistry (assuming that you consider transport of a molecule across a lipid bilayer actual chemistry — no covalent bonds to the transferred molecule are formed or removed, although they are to the transporter). Well it certainly is physical chemistry isn’t it?

Here are the structures of PE, PS, PC, SM http://www.google.com/search?q=phosphatidylserine&client=safari&rls=en&tbm=isch&tbo=u&source=univ&sa=X&ei=bDRLU5yfHOPLsQSOnoG4BA&ved=0CPABEIke&biw=1540&bih=887#facrc=_&imgdii=_&imgrc=qrLByG2vmhWdwM%253A%3BwAtgsTPwCxeZXM%3Bhttp%253A%252F%252Fscience.csumb.edu%252F~hkibak%252F241_web%252Fimg%252Fpng%252FCommon_Phospholipids.png%3Bhttp%253A%252F%252Fscience.csumb.edu%252F~hkibak%252F241_web%252Fcoursework_pages%252F2012_02_2.html%3B1297%3B934.

There are a few things to notice. Like just about every lipid found in our membranes, they are amphipathic — they have a very lipid soluble part (look at the long hydrocarbon changes hanging below them) and a very water soluble part — the head groups containing the phosphate.

This brings us to [ Proc. Natl. Acad. Sci. vol. 111 pp. E1334 - E1343 '14 ] Which describes ATP8A2 (aka the flippase). Interestingly, the protein, with at least 10 alpha helices spanning the membrane, and 3 cytoplasmic domains closely resembles the classic sodium pump beloved of neurophysioloogists everywhere, which pumps sodium ions out of neurons and pumps potassium ions inside, producing the equally beloved membrane potential of neurons.

Look at those structures again. While there are charges on PE, PS (on the phosphate group), these molecules are far larger than the sodium or the potassium ion (easily by a factor of 10). This has long been recognized and is called the ‘giant substrate problem’.

The paper solved the structure of ATP8A2 and used molecular dynamics stimulations to try to understand how it works. What they found is that transmembrane alpha helices 1, 2, 4 and 6 (out of 10) form a water filled cavity, which dissolves the negatively charged phosphate of the head group. What happens to those long hydrocarbon tails? The are left outside the helices in the lipid core of the membrane. It is the charged head groups that are dragged through by the flippase, with the tails wagging along behind them, just like little Bo Peep.

There’s a lot more great chemistry in the paper, particularly how Isoleucine #364 directs the sequential formation and annihilation of the water filled cavities between alpha helices 1, 2, 4 and 6, and how a particular aspartic acid is phosphorylated (by ATP, explaining why the enzyme no longer works in energetically dying cells) changing conformation of all 10 transmembrane helices, so that only one half of the channel is open at a time (either to the inside or the outside).

Go read and enjoy. It’s sad that people who don’t know organic chemistry are cut off from appreciating such elegance. There is more to esthetics than esthetics.

The death of the synonymous codon – IV

The coding capacity of our genome continues to amaze. The redundancy of the genetic code has been put to yet another use. Depending on how much you know, skip the following three links and read on. Otherwise all the background to understand the following is in them.

http://luysii.wordpress.com/2011/05/03/the-death-of-the-synonymous-codon/

http://luysii.wordpress.com/2011/05/09/the-death-of-the-synonymous-codon-ii/

http://luysii.wordpress.com/2014/01/05/the-death-of-the-synonymous-codon-iii/

There really was no way around it. If you want to code for 20 different amino acids with only four choices at each position, two positions (4^2) won’t do. You need three positions, which gives you 64 possibilities (61 after the three stop codons are taken into account) and the redundancy that comes with it. The previous links show how the redundant codons for some amino acids aren’t redundant at all but used to code for the speed of translation, or for exonic splicing enhancers and inhibitors. Different codons for the same amino acid can produce wildly different effects leaving the amino acid sequence of a given protein alone.

If anything will figure out a way to use synonymous codons for its own ends, it’s cancer. [ Cell vol. 156 pp. 1129 - 1131, 1324 - 1335 '14 ] analyzed protein coding genes in cancer. Not just a few cases, but the parts of the genome coding for the exons of a mere 3,851 cases of cancer. In addition they did whole genome sequencing in 400 cases of 19 different tumor types.

There are genes which suppress cancer (which cancer often knocks out — such as the retinoblastoma or the ubiquitous p53), and genes which when mutated promote it (oncogenes like ras). They found a 1.3 fold enrichment of synonymous mutations in oncogenes (which would tend to activate them) than in the tumor suppressors. The synonymous mutations accounted for 20 – 40 % of somatic mutations found in cancer exomes.

Unfortunately, synonymous mutations have been used to estimate the background mutation frequency for evolutionary analysis, on the theory that they are neutral (e.g. because they don’t change protein structure, they are assumed not to change how the gene for the protein functions). Wrong. Wrong. They can change how much, or where, or what exons of a protein are included in the final product.

Why drug discovery is so hard: Reason #25 — What if your drug target is really a pointer to the real target?

Any drug safely producing weight loss would be a big (or small) pharma blockbuster. Those finding it should get on the boat to Sweden. Finding a target to attack is the problem. Here’s one way to look. Take lots of fat people, lots of thin people and see what in their genomes differentiates them (assuming anything does). Actually what was done was to look at type II diabetics (non-insulin dependent) the vast majority overweight and controls. The first study involved the genomes of nearly 5,000 diabetics and controls. How did they interrogate the genomes? At the time of the work it was impossible to completely sequence this many genomes.

It’s time to speak of SNPs (single nucleotide polymorphisms). Our genome has 3.2 gigaBases of DNA. With sequencing being what it is, each position has a standard nucleotide at each position (one of A, T, G, or C). If 5% of the population have one of the other 3 at this position you have a SNP. Already 10 years ago, some 7 MILLION SNPs had been found and mapped to the human genome.

The first study found some SNPs associated with obesity in the diabetics. This tells where to look for the gene. A second study with nearly 9,000 diabetics and controls, replicated the first.

Then the monster study, with 39,000 people [ Science vol. 316 pp. 889 - 894 '07 ] found FTO (FaT mass and Obesity associated gene) on chromosome #16. The 16% of Caucasian adults with two copies of the variant SNP in FTO were 1.67 times more likely to be obese. An intense flurry of work showed that the gene coded for an oxidase, using iron and 2 oxo-glutaric acid (alphaKG for you old timers). The enzyme removes methyl groups from the amino group at position #6 of adenine and the 3 position of thymine. Before this time, no one really paid much attention to them. Subsequently we’ve found 6 methyl adenine in a mere 7,676 mRNAs. Just what it does when it’s there, and why the cell wants to remove it is currently being worked out.

Clearly FTO is a great target for an obesity drug. Of course they knocked the gene out in the mouse. The animals were normal at birth, but at 6 weeks weighed 30 – 40% less than normal mice. FTO as a drug target looked even better after this.

It was somewhat surprising that the SNP was in an intron in the gene. This meant that even in the obese the protein product of the FTO gene was the same as in the skinny. Presumably this could mean more FTO, less FTO or a different splice variant. If some of this molecular biology is above your pay grade, the background you need is in 5 posts starting with https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/.

It was somewhat surprising that FTO levels were the same in people with and without the fat SNP. That left splice variants as a possibility.

The denouement came this week [ Nature vol. 507 pp. 309 - 310, 371 - 375 '14 ]. The intron containing the SNP in FTO produces obesity by controlling another gene called IRX3 which is a mere 500,000 nucleotides away. The intron of FTO binds to the promoter of IRX3 turning the gene on resulting in more IRX3. Mice lacking a functional copy of IRX3 have a 25 – 30% lower body mass. As any C programmer would say, FTO is the pointer not the data.

I don’t know if big or small pharma was at work finding inhibitors or enhancers of FTO function, but this paper should have brought them to a screeching halt. The FTO/IRX3 story just shows how many pitfalls there are to finding new drugs, and why the search has shown relatively little success recently. We are trying to alter the function of an incredibly complex system, whose workings we only dimly understand.

Was I the last to find out?

Quick ! Can you form a hydrogen bond from a carbon hybridized sp3 to an oxygen atom?

I didn’t think so, but you can. This, in spite of reading about proteins for over half a century. [ Proc. Natl. Acad. Sci. vol. 111 pp. E888 - E895 '14 ] describes this (along with lots of references backing up the statements which follow) to such bonds forming between the transmembrane segments of membrane proteins (estimated to be 30% of all our proteins).

Whether or not they contribute to membrane stability isn’t known. Consider the alpha carbon of an amino acid. It is adjacent to a carbonyl group of an amide (electron hungry, but less so than a pure carbonyl because of resonance) and the nitrogen atom of an amide (slightly more electronegative than carbon, and probably more electron hungry because it loses part of its lone pair to resonance).

They are usually found from the alpha carbon of glycine on one helix to the carbonyl of an adjacent transmembrane helix. Glycine zippers (e.g. the G X X X G motif) have long been known in transmembrane helices. Since glycine is the smallest amino acid, having them on the same side of the helix was thought to be a way to pack adjacent helices together.

What would you consider good evidence for such a bond? Spectroscopy of model compounds with deuterium for the alpha hydrogen would be one way (it’s been done). The best evidence would be a shortened distance between the hydrogen and the carbonyl and this has been found as well.

Humbling ! !

What junk DNA is doing

I’ve never bought the idea that the 98% of our 3.2 gigaBase genome not coding for protein is junk. Consider the humble leprosy organism.It’s a mycobacterium (like the organism causing TB), but because it essentially is confined to man, and lives inside humans for most of its existence, it has jettisoned large parts of its genome, first by throwing about 1/3 of it out (the genome is 1/3 smaller than TB from which it is thought to have diverged 66 million years ago), and second by mutation of many of its genes so protein can no longer be made from them. Why throw out all that DNA? The short answer is that it is metabolically expensive to produce and maintain DNA that you’re not using.

Which brings us to Cell vol. 156 pp. 907 – 919 ’14. At least half of our genome is made of repetitive elements. We have some 520,000 (imperfect) copies of LINE1 elements — each up to 6,000 nucleotides long. There are 1,400,000 (imperfect) copies of Alu each around 300 nucleotides long. This stuff has been called junk for decades. However it has become apparent that over 50% of our entire genome is transcribed into RNA. This is also expensive metabolically.

Addendum 17 Mar: Just the cost of making a single nucleotide from scratch to hook into mRNA is 50 ATP molecules (according to an estimate I read). It also takes energy for the polymerase to hook two nucleotides together — but I can’t find out what it is (anyone know?). It’s hard to avoid teleology when thinking about biology — but why should a cell expend all this metabolic energy to copy half or more of its genome into RNA, if it weren’t getting something useful back?

Why hasn’t evolution got rid of this stuff, like the leprosy organism? Probably because it’s doing several important things we don’t understand. Here’s one of them. The cell paper did something clever and obvious (now that someone else though of it). C0T-1 DNA is placental DNA predominantly 50 – 300 nucleotides in size, very enriched in repetitive DNA sequences. It is used to block nonspecific hybridization in microarray screening for mRNA coding for protein. The authors used C0T-1 DNA to look at whole cells to find RNA transcribed from these repetitive elements, and more importantly, to find where in the cell it was located.

Guess what they found? Repetitive DNA is associated big time with interphase (e.g. not undergoing mitosis) active chromatin (aka euchromatin). So RNA transcribed from Alu and LINE1 is a structural component of our chromosomes. Since the length of the 3.2 gigaBases of our genome, if stretched out, is 1 METER, a lot of our DNA occurs in very compact structures (heterochromatin) which is thought to be transcriptionally inactive. What happens when you use RNAase (an enzyme breaking down RNA) to remove it? The chromosomes condense to heterochromatin. So the junk may be keeping our chromosomes in an ‘open’ state, a fairly significant function.

This is the exact opposite of XIST, a 17,000 nucleotide RNA transcribed from the X chromosome, which keeps one of the two X’s each female possesses inactive by coating it like the ecRNAs

The authors conclude with “we are far from understanding genome expression and regulation.” Amen.

If some of this is a bit above your molecular biological pay grade — please see a series of articles “Molecular Biology Survival Guide for Chemists” — here’s a link to the first one — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/. There are 4 more.

Short and Sweet

Yamanaka strikes again. Citrulline is deiminated arginine, replacing a C=N-H (the imine) by a carbonyl C=O. An enzyme called PAD4 does the job. Why is it important? Because one of its targets is the H1 histone which links nucleosomes together. Recall that the total length of DNA in each and every one of our cells is 3 METERS. By wrapping the double helix around nucleosomes, the DNA is shortened by one order of magnitude.

So what? Well, at physiologic pH the imine probably binds another proton making it positively charged, making it bind to the negatively charged DNA phosphate backbone. Removing the imine makes this less likely to happen, so the linker doesn’t bind the double helix as tightly.

Duck soup for the chemist, but apparently no one had thought to look at this before.

This opens up the DNA (aka chromatin decondensation) for protein transcription. Why is Yamanaka involved? Because PAD4 is induced during cellular reprogramming to induced pluripotent stem cells (iPSCs), activating the expression of key stem cell genes. Inhibition of PAD4 lowers the percentage of pluripotent stem cells, reducing reprogramming efficiency. The paper is Nature vol. 507 pp. 104 – 108 ’14.

Will this may be nice for forming iPSCs, it should be noted that PAD4 is unregulated in a variety of tumors.

Curioser and curioser

Curious Wavefunction alluded to the first example of a protein which stands everything we thought we knew about them on its head. At the end of this post you’ll find another equally counterintuitive example.

We all know that proteins fold into a relatively dry core where hydrocarbon side chains and other hydrophobic elements hide out. This was one of Walter Kauzmann’s many contributions to chemistry and biology. He also wrote one of the first books on quantum chemistry, as did his PhD advisor Henry Eyring at Princeton (I was lucky enough to take PChem from him). The driving force for the formation of globular proteins according to him, was pretty much entropic, with hydrocarbon side chains solvating each other so water wouldn’t have to form an elaborate (hence structured) cage to do so.

Which brings us to the wonderfully named fish Pseudopleuronectes Americanus which lives in frigid polar waters. To keep ice crystals from forming in their cells, arctic fish have evolved proteins to prevent it. It is a fascinating example of evolution solving a problem different ways, because by 1996 at least 4 different types of antifreeze proteins were known [ PNAS vol. 93 pp. 6835 - 6840 '96 ].

The new protein is a 3 kiloDalton alanine rich helix bundle 145 Angstroms long.
Amazingly the helices surround a core of 400 water molecules (surround as in the water is on the inside of the protein, not the outside). The water molecules inside the protein are arranged as pentagons (not hexagons as they would be in ice) — so they form a clathrate. The pentagonal arrangement of water was predicted on theoretical grounds 50 years ago by Scheraga ( J. Biol. Chem. vol. ?? pp. 2506 – 2508 1962 ).

The protein has an amino acid periodicity of 11 amino acids, which nicely comes out to 3 turns of the alpha helix. There is a threonine at position i, alanine at position i + 4 and alanine a position i + 8. All of these bind water — not surprising for threonine, but alanine is a hydrocarbon. The evolving fish clearly didn’t listen to protein chemists. However, most of carbonyl groups of the protein backbone are involved in hydrogen bonding to water.

Not to be outdone, a freeze tolerant beetle (Upis cermaboides — don’t you love these names) has an antifreeze molecule made mostly of sugar and lipid.

Well even if we don’t know what we thought we knew about proteins, at least we understand biologic membranes and the proteins that go through them. Don’t we?

Apparently not. [ Proc. Natl. Acad. Sci. vol. 111 pp. 2425 - 2430 '14 ] studied the alpha-hemolysin of staphylococci. We know that the membrane of our cells is made of a double layer of molecules which a charged head which binds water and a long (16 + carbons) hydrocarbon tail. So the hydrocarbon core is 30 Angstroms across, and the lipid head groups are about 40 Angstroms away from each other on either side of the membrane.

We also know how proteins fit into the membrane — one model is the G Protein Coupled Receptor (GPCR) for which we have at least 800 human genes, and which is the target for 30% of all drugs approved by the FDA [ Science vol. 335 pp. 1106 - 1110 '12 ]. These all have 7 alpha helices arranged like a stack of logs extending across the membrane. The amino acids here are usually hydrophobic. Another model is the beta barrel — used mostly by bacteria — these have beta strands arranged across the membrane (like the staves of a barrel — get it). I’m not sure what the record is for the number of strands, but one from the gonococcus has 16 of them. They surround a large pore.

Back to the alpha hemolysin of staphylococci It’s designed to kill its target by forming a hole in the membrane. And so 7 of them get together to do so. However, instead of the running back and forth across the 30 Angstroms of the anhydrous part of the membrane, the heptamers put their heads together forming the hole (like skydivers holding hands), with their hydrocarbon like parts sticking out into the membrane and the water filled hole in the center. How do they know? They studied truncated mutants of the hemolysin, which weren’t long enough to span the 30 Angstroms across the membrane, and they still formed holes. An entirely new (to me) protein arrangement.

Two very scary papers about cancer

What if, even after you’ve killed every cancer cell in the body, there are still non-malignant cells left that are halfway there. That’s the conclusion of two very scary papers published in the past week[ Nature vol. 506 pp, 300 - 301, 328 - 33 '14, Proc. Natl. Acad. Sci. vol. 111 pp. 2548 - 2553 '14 ]. Both involve acute myelogenous leukemia (AML).

Blood cancers are easy to study even without getting samples of the marrow, which is (relatively) easy to come by. The marrow contains stem cells which can form all the cellular elements of blood (red cells, all types of white cells and platelets). They are called hematopoietic stem cells (HSCs), and just one of them is enough to completely repopulate the radiation destroyed marrow of an experimental animal.

Even a person suffering with AML contains functionally normal HSCs in their marrow (otherwise they’d be dead). What these papers show, is that these cells contain some, but not all of the mutations found in the leukemic cells (their names are DNMT3A, IDH1, IDH2, ASXL1, IKZK1 — they don’t roll trippingly off the tongue do they?). They are called preleukemic cells, and the papers show that conventional therapy for AML does NOT kill them. Essentially these cells are accidents waiting to happen.

The PNAS paper calls these genes ‘landscaping genes’, a term which may be original. I love the term, it’s extremely descriptive and short. These are genes involved in global chromatin changes — we’re talking epigenetics here — proteins causing changes in DNA and the proteins that bind to it, which don’t actually change the order of bases in the DNA.

Hopefully this doesn’t apply to other forms of cancer, but I have a sinking feeling that it does. So getting rid of every cancer cell in the body, may not be enough. Frightening.

Everything not expressly forbidden biochemically is happening somewhere

A fairly oblique introduction (from an earlier post)

Sherlock Holmes and the Green Fluorescent Protein

Gregory (Scotland Yard): “Is there any other point to which you would wish to draw my attention?”
Holmes: “To the curious incident of the dog in the night-time.”
Gregory: “The dog did nothing in the night-time.”
Holmes: “That was the curious incident.”

The chromophore of green fluorescent protein (GFP) is para-hydroxybenzylidene imidazolinone. It is formed by cyclization of a serine (#65) tyrosine (#66) glycine (#67) sequential tripeptide. It is found in the center of a beta barrel formed by the 238 amino acids of GFP.

What is so curious about this?

Simply put, why don’t things like this happen all the time? Perhaps nothing quite this fancy, but on a more plebeian level consider this: of the twenty amino acids, 2 are carboxylic acids, 2 are amides, 1 is an amine, 3 are alcohols and one is a thiol. One might expect esters, amides, thioesters and sulfides to be formed deep inside proteins. Why deep inside? On the surface of the protein, there is water at 55 molar around to hydrolyze them purely by the law of mass action (releasing about 10 kJ/Avogadro’s number per bond in the process). Some water is present in the X-ray crystallographic structure of proteins, but nothing this concentrated.

The presence of 55 M water bathing the protein surface leads to an even more curious incident, namely why proteins exist at all given that amide hydrolysis is exothermic (as well as entropically favorable). Perhaps this is why proteins contain so many alpha helices and beta sheets — as well as functioning as structural elements they may also serve to hide the amides from water by hydrogen bonding them to each other. Along this line, could this be why the hydrophilic side chains of proteins (arginine, lysine, the acids and the amides) are rather bulky? Perhaps they also function to sterically shield the adjacent amides. After all, why should lysine have 4 CH2 groups to separate the primary amino from the alpha carbon? Ditto for the 3 CH2 groups separating the guanidine group, and the 2 CH2 for glutamic acid.

We now have an example before us of an ester between threonine and glutamic acid within the same protein. For details see Proc. Natl. Acad. Sci. vol. 111 pp. 1229 – 1230, 1367 – 1372 ’14. It is put to use to stabilize long thin proteins subject to mechanical stress. All sorts have bacteria have little hairs (pili) allowing them to attach to our cells. The first example were found in some nasty characters (Streptococcus progenies, Clostridium perfringens), possibly because they’re under intense study because the infections they cause are even nastier. Interestingly, the ester is buried deep in the protein where water can’t get at it so easily. This type of link on external proteins turns out to be fairly common in Gram positive organisms.

So everything not biochemically forbidden is probably happening somewhere.

Why drug discovery is so hard: Reason #24 — Is the 3′ untranslated region of every mRNA a ceRNA?

We all know what proteins do. They act as enzymes, structural elements of cells, membrane proteins where drugs bind etc. etc. The background the pure chemist needs for what follows can all be found in the category “Molecular Biology Survival Guide.

We also know that that the messenger RNA for any given protein contains a lot more information than that needed to code for the amino acids making up the protein. Forget the introns that are spliced out from the initial transcript. When the mature messenger RNA for a given protein leaves the nucleus for the cytoplasm where the ribosome translates it into protein at either end it contains nucleotides which the ribosome effectively ignores. These are called the untranslated regions (UTRs). The UTRs at the 3′ end of human mRNAs range in length between 60 and 4,000 nucleotides (average 800). It costs energy to store the information for the UTR in DNA, more energy to synthesize the nucleotides which make it up, even more to patch them together to form the UTR, more to package it and move it out of the nucleus etc. etc.

Why bother? Because the 3′ UTR of the mRNA contains a lot of information which tells the cell how much protein to make, how long the mRNA should hang around in the cell (among many other things). A Greek philosopher got here first — “Nature does nothing uselessly” – Aristotle

Those familiar with competitive endogenous RNA (ceRNA) can skip what follows up to the ****

Recall that microRNAs are short (20 something) polynucleotides which bind to the 3′ untranslated region (3′ UTR) of mRNA, and either (1) inhibit its translation into protein (2) cause its degradation. In each case, less of the corresponding protein is made. The microRNA and the appropriate sequence in the 3′ UTR of the mRNA form an RNA-RNA double helix (G on one strand binding to C on the other, etc.). Visualizing such helices is duck soup for a chemist.

Molecular biology is full of such semantic cherry bombs as nonCoding DNA (which meant DNA which didn’t cord for protein), a subset of Junk DNA. Another is the pseudogene — these are genes that look like they should code for protein, except that they don’t because of lack of an initiation codon or a premature termination codon. Except for these differences, they have the nucleotide sequence to code for a known protein. It is estimated that the human genome contains as many pseudogenes (20,000) as it contains true protein coding genes [ Genome Res. vol. 12 pp. 272 - 280 '02 ]. We now know that well over half the genome is transcribed into mRNA, including the pseudogenes.

PTEN (you don’t want to know what it stands for) is a 403 amino acid protein which is one of the most commonly mutated proteins in human cancer. Our genome also contains a pseudogene for it (called PTENP). Interestingly deletion of PTENP (not PTEN) is found in some cancers. However PTENP deletion is associated with decreased amounts of the PTEN protein itself, something you don’t want as PTEN is a tumor suppressor. How PTEN accomplishes this appears to be fairly well known, but is irrelevant here.

Why should loss of PTENP decrease PTEN itself? The reason is because the mRNA made from PTENP, even though it has a premature termination codon, and can’t be made into protein, is just as long, so it also contains the 3′UTR of PTEN. This means PTENP is sopping up microRNAs which would otherwise decrease the level of PTEN. Think of PTENP mRNA as a sponge.

Subtle isn’t it? But there’s far more. At least PTENP mRNA closely resembles the PTEN mRNA. However other mRNAs coding for completely different proteins, also have binding sites in their 3′UTR for the microRNA which binds to the 3UTR of PTEN, resulting in its destruction. So transcription of a completely different gene (the example of ZEB2 is given) can control the abundance of another protein. Essentially its mRNA is acting as a sponge, sopping up the killer microRNA.

It gets worse. Most microRNAs have binding sites on the mRNAs of many different proteins, and PTEN itself has a 3′UTR which binds to 10 different microRNAs.

So here is a completely unexpected mechanism of control of protein levels in the cell. The general term for this is competitive endogenous RNA (ceRNA). Two years ago the number of human microRNAs was thought to be around 1,000 (release 2.0 of miRBase in June ’13 gives the number at 2,555 — this is unlikely to be complete). Unlike protein coding genes, it’s far from obvious how to find them by looking at the sequence of our genome, so there may be quite a few more.

So most microRNAs bind the 3′UTR of more than one protein (the average number is unclear at this point), and most proteins have binding sites for microRNAs in their 3′UTR (again the average number is unclear). What a mess. What subtlety. What an opportunity for the regulation of cellular function. Who is going to be smart enough to figure out a drug which will change this in a way that we want. Absence of evidence of a regulatory mechanism is not evidence of its absence. A little humility is in order.

*****

If this wasn’t a scary enough, consider the following cautionary tale — Nature vol. 505 pp. 212 – 217 ’14. HMGA2 is a protein we thought we understood for the most part. It is found in the nucleus, where it binds to DNA. While it doesn’t transcribe DNA into RNA, it does bind to DNA helping to form a protein complex which binds to DNA which effectively helps promote transcription of certain genes.

Well that’s what the protein does. However the mRNA for the protein uses its 3′ untranslated region (3′UTR) to sop up microRNAs of the let-7 family. The mRNA for HMGA2 is highly overexpressed in human cancer (notably the very common adenocarcinoma of the lung). You can mutate the mRNA for HMGA2 so it doesn’t produce the protein, just by putting a stop codon in it near the 5′ end. Throw the altered mRNA into a tissue culture of an lung adenocarcinoma cell line, and the cell become more proliferative and grows independently of being anchored to the tissue culture plate (e.g. anchorage independence, a biologic marker for cancer).

So what? It means that it is possible that every mRNA for every protein we make is acting as a ceRN A. The authors conclude the paper with ” Such dual-function ceRNA and protein activities necessitate a deeper exploration of the coding genome in biological systems.”

I’ll say. We’re just beginning to scratch the surface. The control mechanisms within the cell continue to amaze (me) by their elegance and subtlety. I doubt highly that we know them all. Yet more reasons that drug discovery is hard — we are mucking about with a system whose workings we only dimly understand.

Follow

Get every new post delivered to your Inbox.

Join 57 other followers