29 April 2016
Elena Rivas
Department of Molecular and Cellular Biology
Harvard University
Pairwise covariations observed in RNA alignments provide a powerful means of deducing evolutionarily conserved RNA secondary structures. In turn, a conserved secondary structure provides positive evidence for RNA function. Long noncoding RNAs (lncRNAs) are controversial, and many may be transcriptional noise or unrecognized short protein-coding genes. Even for well-studied functional lncRNAs like Xist, how they function as an RNA remains largely unknown. Secondary structures have been proposed for three long noncoding RNAs (lncRNAs) – HOTAIR, steroid receptor RNA activator (SRA) noncoding RNA (ncSRA), and Xist-RepA – and these structures are said to be evolutionarily conserved, based on covariation analysis.
Asking whether a lncRNA has an evolutionary conserved structure or not is different from consensus structure prediction. In structure prediction, one tries to find the structure most compatible with the data, assuming that a structure is present. To test whether the existence of a conserved secondary structure is supported by the data, it is necessary to test the statistical significance of observed pairwise covariations against a null hypothesis of no conserved structure. There is a history of mathematical work on statistical significance of RNA pairwise covariation data, but those methods are not readily available in software tools, and they were apparently not used for the published HOTAIR, SRA, or Xist analyses.
We present a method for calculating the significance of base pair covariation in an RNA alignment under a null hypothesis that considers confounding covariations that arise by phylogenetic correlation instead of structural constraint. Our statistical method finds significantly covarying base pairs in several RNAs identified in recent screens for structural RNAs, including autoregulatory ribosomal protein mRNA leaders in γ-proteobacteria, noncoding RNAs in α-proteobacteria, a noncoding RNA in the ciliate Oxytricha, and a proposed control region for alternative splicing of Drosophila Dscam1. No significantly covarying base pairs are found in the proposed secondary structures for HOTAIR, SRA, or Xist, nor for any alternative structure of these RNAs.
R-scape (RNA Structural Covariation Above Phylogenetic Expectation), available both as a web server and as source code, is a quantitative tool for evaluating statistical support for evolutionarily conserved RNA structures, and thus for helping to identify functional RNAs.