Nothing could be simpler than the distinction between the initial product of genes that code for proteins (mRNA) and genes that don’t (long non-Coding RNAs — aka lncRNA, lincRNA). Not anymore according to an exceedingly clever and well thought out piece of work.
[ Cell vol. 168 pp. 753 – 755, 843 – 855 ’17 ] We know that ultraviolet light damages DNA primarily by forming pyrimidine dimers. Naturally transcription of DNA won’t be as accurate, so the cell has ways to shut it down. Ultraviolet exposure results in an unusual type of restriction of transcription along with slower elongation, with the result that only the promoter proximal 20 – 25 kiloBases of a protein coding gene are efficiently transcribed into mRNA.
In addition, after ultraviolet damage there is a global switch in pre-mRNA processing resulting in a preference for the production of transcripts containing alternative last exons not normally included in the dominant mRNA isoform. Some 84 genes are processed this way.
ASCC3 is the strongest regulator of transcription following UV damage, acting to repress it after UV damage. It is a DEAD/DEAH box DNA helicase component. The ASCC3 protein interacts with RNA polymerase II (Pol II) and becomes highly ubiquitinated and phosphorylated on UV irradiation. It isn’t required to establish transcriptional repression, just maintainance. Disruption of the UV specific form — e.g. the short isoform containing the alternative last exon has the opposite effect, allowing transcriptional recovery after UV damage.
This explains why the human genes remaining expressed (or actually induced) after UV irradiation are invariably ‘very short’ (whatever that means).
The short and long isoforms constitute an autonomous regulatory module, and are related functionally, so the effect of deleting one can at least be partially compensated for by deleting the other.
The 3,100 nucleotide long ‘short’ isoform, codes for a protein, but the protein itself didn’t have the effect of the short form mRNA (see if you can figure out, without reading further how the authors proved this). The mRNA produced from the short isoform is found almost exclusively in the nucleus. The authors put in a stop codon immediately downstream of the start codon which ablated protein production but not transcription into the appropriate mRNA, but there was still rescue of the transcriptional recovery phenotype. So the functional form of the short RNA isoform is mediated by a nonCoding RNA encoded in the ASCC3 protein coding gene. The short ASCC3 isoform has an open reading frame of 333 nucleotides, but functionally it is a lncRNA (of 3.5 kiloBases).
So protein genes can produce functional lncRNAs. How many of them actually do this is unknown. When you knockdown a gene, how much of the effect is due to less protein and how much due to the (putative) lncRNA which also might be produced by the gene. That’s why it’s back to the drawing board for knockout mice (or even mRNA knockdown using shRNA etc. etc.)
The current definition of lncRNA is absence of protein coding potential in a gene.
Why have the same gene code for two different things — there may be a regulatory advantage — controlling the function of the protein. lncRNAs have the unique ability to act in close spatial proximity to their transcription loci.
Stay tuned. It’s just fascinating what we still don’t know.