Tag Archives: SmORF

smORFs, dwORFs and now uORFs

A recent post described small Open Reading Frames (smORFs) and DWarf Open Reading Frames (DWORFS) — see the link at the bottom. Now it’s time for uORFs (upstream Open Reading Frames). Upstream of what you might ask? Well messenger RNA is grabbed by the ribosome at one end (called the 5′ end). The current thinking was that the ribosome marched along the mRNA from the 5′ to the 3′ direction looking for the sequence Adenine Uridine Guanine (AUG) which codes for methionine. It then begins reading the mRNA 3 nucleotides at a time and tacking amino acids onto the methionine. This is called translating mRNA into protein. What about the 5′ end of the mRNA before the AUG is reached (perhaps hundreds of nucleotides later) — it isn’t translated which is why its called the 5′ UTR (5′ UnTranslated Region). In bacteria its only a few nucleotides, but our 5′ UTRs can have thousands — https://en.wikipedia.org/wiki/Five_prime_untranslated_region.

Two other terms of art are upstream and downstream. Since the ribosome flows from 5′ to 3′ on mRNA, any nucleotide 5′ to a given point is called upstream, and anything 3′ is called downstream. Logical terminology — what a pleasure.

So a uORF is an upstream Open Reading Frame. Upstream to what? Why to the AUG (the initiator codon). The assumption had always been that since there was no initiator AUG codon on this region — that proteins couldn’t be made from the uORF. Wrong.

This is where [ Science vol. 351 p. 465 aad2867 – 1 –> 9 ’16 ] comes in. It turns out that the ribosome can translate some of these uORFs in protein, and the paper describes a clever technique (called 3T) they developed to find them. One of the problems in finding uORF proteins is that some are quite small, and are missed in the usual protein assays. One uORF from ATF4 contains only3 amino acids which is so small that mass spectrometry can’t see it.

The paper makes the amazing statement that — Nearly half of all mammalian mRNAs harbor uORFs in the 5′ UTRs, and many are initiated with nonAUG start codons. They may be a general mechanism to regulate downstream coding sequence expression and gives two citations that I must have missed in my reading .

For instance Binding immunoglobulin Protein (BiP aka Heat Shock Protein family A member 5 – HSPA5 ) contains uORFs exclusively initiated by UUG and CUG start codons (not AUG).

What might the functions of uORF actually be? The obvious one is that the proteins made from them might actually be doing something. What could a 3 amino acid protein possibly do? Lots. Consider thyrotropin releasing hormone which helps control your thyroid — it is pyroglutamic acid histidine proline. Then there is met-encephalin which has 5 amino acids and is one of the endogenous opiate peptides your brain uses.

Another possibility is that just translating the uORF into protein controls the translation of the protein starting with the AUG codon. This isn’t so far fetched. A recent paper [ Nature vol. 529 pp. 551 – 554 ’16 ] gave a 3 dimensional structure for RNA polymerase II transcribing a DNA template into mRNA. The authoress (Carrie Bernecky) was kind enough to supply the dimensions of the complex when I wrote her. Remember you can consider the DNA double helix as a cylinder 20 Angstroms in diameter. It is roughly 150 x 150 x 160 Angstroms. Figuring 3 stacked nucleotides/10 Angstroms, this is enough to obstruct 45 nucleotides of DNA upstream of the actual start site.

This is just another example of room at the bottom, where all sorts of small molecule metabolites, small RNAs, small DNAs are just being unearthed and their structure determined. For more on this please see the following link

https://luysii.wordpress.com/2016/01/25/smorfs-and-dworfs-has-molecular-biology-lost-its-mind/

SmORFs and DWORFs — has molecular biology lost its mind?

There’s Plenty of Room at The Bottom is a famous talk given by Richard Feynman 56 years ago. He was talking about something not invented until decades later — nanotechnology. He didn’t know that the same advice now applies to molecular biology. The talk itself is well worth reading — here’s the link http://www.zyvex.com/nanotech/feynman.html.

Those not up to speed on molecular biology can find what they need at — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/. Just follow the links (there are only 5) in the series.

lncRNA stands for long nonCoding RNA — nonCoding for protein that is. Long is taken to mean over 200 nucleotides. There is considerable debate concerning how many there are — but “most estimates place the number in the tens of thousands” [ Cell vol. 164 p. 69 ’16 ]. Whether they have any cellular function is also under debate. Could they be like the turnings from a lathe, produced by the various RNA polymerases we have (3 actually) simply transcribing the genome compulsively? I doubt this, because transcription takes energy and cells are a lot of things but wasteful isn’t one of them.

Where does Feynmann come in? Because at least one lncRNA codes for a very small protein using a Small Open Reading Frame (SMORF) to do so. The protein in question is called DWORF (for DWorf Open Reading Frame). It contains only 34 amino acids. Its function is definitely not trivial. It binds to something called SERCA, which is a large enzyme in the sarcoplasmic reticulum of muscle which allows muscle to relax after contracting. Muscle contraction occurs when calcium is released from the endoplasmic reticulum of muscle.  SERCA takes the released calcium back into the endoplasmic reticulum allowing muscle to contract. So repetitive muscle contraction depends on the flow and ebb of calcium tides in the cell. Amazingly there are 3 other small proteins which also bind to SERCA modifying its function. Their names are phospholamban (no kidding) sarcolipin and myoregulin — also small proteins of 52, 31 and 46 amino acids.

So here is a lncRNA making an oxymoron of its name by actually coding for a protein. So DWORF is small, but so are its 3 exons, one of which is only 4 amino acids long. Imagine the gigantic spliceosome which has a mass over 1,300,000 Daltons, 10,574 amino acids making up 37 proteins, along with several catalytic RNAs, being that precise and operating on something that small.

So there’s a whole other world down there which we’ve just begun to investigate. It’s probably a vestige of the RNA world from which life is thought to have sprung.

Then there are the small molecules of intermediary metabolism. Undoubtedly some of them are used for control as well as metabolism. I’ll discuss this later, but the Human Metabolome DataBase (HMDB) has 42,000 entries and METLIN, a metabolic database has 240,000 entries.

Then there is competitive endogenous RNA –https://luysii.wordpress.com/2012/01/29/why-drug-discovery-is-so-hard-reason-20-competitive-endogenous-rnas/

Do you need chemistry to understand this? Yes and no. How the molecules do what they do is the province of chemistry. The description of their function doesn’t require chemistry at all. As David Hilbert said about axiomatizing geometry, you don’t need points, straight lines and planes You could use tables, chairs and beer mugs. What is important are the relations between them. Ditto for the chemical entities making us up.

I wouldn’t like that.  It’s neat to picture in my mind our various molecular machines, nuts and bolts doing what they do.  It’s a much richer experience.  Not having the background is being chemical blind..  Not a good thing, but better than nothing.