Towards a deeper annotation of human lncRNAs

Abstract

A substantial fraction of the human transcriptome is composed of the so-called long noncoding RNAs (lncRNAs), yet the available catalogs of known lncRNAs are far from complete. Moreover, functional studies of these RNAs are challenged by several factors, such as their tissue-specific expression and functional heterogeneity, resulting in only ca. 1% of them being well characterized. Here, we describe a set of 41,400 novel lncRNAs discovered with RNA-Seq data from 1463 samples encompassing diverse tissues and cell lines. We utilized publicly available transcriptomic and genomic data to provide their characteristics, such as tissue specificity, cellular abundance, polyA status, cellular localization, evolutionary conservation and transcript stability, which allowed us to speculate on their possible biological roles. We also pinpointed 24 novel lncRNAs as candidates for breast cancer biomarkers. The results bring us closer to a comprehensive annotation of human lncRNAs, though vast amounts of further work are needed to validate the predictions and fully decipher their biology. This article is part of a Special Issue entitled: ncRNA in control of gene expression edited by Kotb Abdelmohsen.

Publication
Biochim Biophys Acta Gene Regul Mech