Tag Archives: Metabolome

SmORFs and DWORFs — has molecular biology lost its mind?

There’s Plenty of Room at The Bottom is a famous talk given by Richard Feynman 56 years ago. He was talking about something not invented until decades later — nanotechnology. He didn’t know that the same advice now applies to molecular biology. The talk itself is well worth reading — here’s the link http://www.zyvex.com/nanotech/feynman.html.

Those not up to speed on molecular biology can find what they need at — https://luysii.wordpress.com/2010/07/07/molecular-biology-survival-guide-for-chemists-i-dna-and-protein-coding-gene-structure/. Just follow the links (there are only 5) in the series.

lncRNA stands for long nonCoding RNA — nonCoding for protein that is. Long is taken to mean over 200 nucleotides. There is considerable debate concerning how many there are — but “most estimates place the number in the tens of thousands” [ Cell vol. 164 p. 69 ’16 ]. Whether they have any cellular function is also under debate. Could they be like the turnings from a lathe, produced by the various RNA polymerases we have (3 actually) simply transcribing the genome compulsively? I doubt this, because transcription takes energy and cells are a lot of things but wasteful isn’t one of them.

Where does Feynmann come in? Because at least one lncRNA codes for a very small protein using a Small Open Reading Frame (SMORF) to do so. The protein in question is called DWORF (for DWorf Open Reading Frame). It contains only 34 amino acids. Its function is definitely not trivial. It binds to something called SERCA, which is a large enzyme in the sarcoplasmic reticulum of muscle which allows muscle to relax after contracting. Muscle contraction occurs when calcium is released from the endoplasmic reticulum of muscle.  SERCA takes the released calcium back into the endoplasmic reticulum allowing muscle to contract. So repetitive muscle contraction depends on the flow and ebb of calcium tides in the cell. Amazingly there are 3 other small proteins which also bind to SERCA modifying its function. Their names are phospholamban (no kidding) sarcolipin and myoregulin — also small proteins of 52, 31 and 46 amino acids.

So here is a lncRNA making an oxymoron of its name by actually coding for a protein. So DWORF is small, but so are its 3 exons, one of which is only 4 amino acids long. Imagine the gigantic spliceosome which has a mass over 1,300,000 Daltons, 10,574 amino acids making up 37 proteins, along with several catalytic RNAs, being that precise and operating on something that small.

So there’s a whole other world down there which we’ve just begun to investigate. It’s probably a vestige of the RNA world from which life is thought to have sprung.

Then there are the small molecules of intermediary metabolism. Undoubtedly some of them are used for control as well as metabolism. I’ll discuss this later, but the Human Metabolome DataBase (HMDB) has 42,000 entries and METLIN, a metabolic database has 240,000 entries.

Then there is competitive endogenous RNA –https://luysii.wordpress.com/2012/01/29/why-drug-discovery-is-so-hard-reason-20-competitive-endogenous-rnas/

Do you need chemistry to understand this? Yes and no. How the molecules do what they do is the province of chemistry. The description of their function doesn’t require chemistry at all. As David Hilbert said about axiomatizing geometry, you don’t need points, straight lines and planes You could use tables, chairs and beer mugs. What is important are the relations between them. Ditto for the chemical entities making us up.

I wouldn’t like that.  It’s neat to picture in my mind our various molecular machines, nuts and bolts doing what they do.  It’s a much richer experience.  Not having the background is being chemical blind..  Not a good thing, but better than nothing.

How little we know

Well it’s basic biochem 101, but enzymes only allow equilibrium to be reached faster (by lowering activation energy), they never change it. This came as a shock to the authors of [ Proc. Natl. Acad. Sci. vol. 112 pp. 6601 – 6606 ’15 ] when Cytosolic Nonspecific DiPeptidase 2 (CNDP2), a proteolytic enzyme, was found to tack the carboxyl group of lactic acid onto the amino group of a variety of amino acids, essentially running the proteolytic reaction in reverse. Why? Because intracellular levels of lactic acid and amino acids are in the high microMolar to milliMolar range. It’s Le Chatelier’s principle in action.

The compounds are called N-Lactoyl amino acids. No one had ever seen them before. They are part of the ‘metabolome’ — small molecules found in our bodies. God knows what they do. The paper was really shocking to me for another reason, because I had no idea how many members the metabolome has.

How large is the metabolome? Make a guess.

Well METLIN (https://metlin.scripps.edu/index.php has 240,000, and Human Metabolome DataBase http://www.hmdb.ca/metabolites?c=hmdb_id&d=up&page=1676 has 42,000. I doubt that we know what they are all doing. Undoubtedly some of them are binding to proteins producing physiologic effects. Drug chemists may be mimicking some of them unknowingly, producing untoward and unexpected side effects.

What’s even more shocking to me is the following statement from the paper. State of the art untargeted metabolomics studies still report ‘up to’ 40% unidentified, but potentially important metabolitcs which can be detected reproducibly. The unknown metabolites are only rarely characterized because of the extensive work required for de novo structure determination..

So we really don’t know everything that’s out there in our bodies, and even if we did, we don’t know what they are doing. Drug discovery is hard because we only dimly understand the system we are trying to manipulate. Until I read this paper, I had no idea just how dim this is.