The ABC of RNA

By Staff Writers
Thursday, 13 March, 2008

A team of bioinformaticians at the University of Montreal have uncovered a structural alphabet that can be used to infer the 3D structure of RNA from sequence data.

They reported their findings in the March 6 edition of Nature .

The folding of a single-stranded RNA molecule is determined by the interactions between its constituent nucleotides.

However, the classical approach to RNA modelling suffers from an important limitation: it only takes into account the canonical Watson-Crick interactions A:U and G:C - those where the nucleotides are facing each other.

The non-canonical Hoogsteen and sugar interactions, those where the nucleotides are side by side or on top of each other, are not taken into account by conventional modelling algorithms. The result can be incomplete or erroneous models which can mislead researchers.

The attempt to remedy this problem led Francois Major, principal investigator at the Institute for Research in Immunology and Cancer and professor in the Department of Computer Science and Operations Research, and Marc Parisien, a graduate student in his laboratory, to propose a different approach to model RNA structure.

Their idea: assemble the structure in silico starting from motifs that combine all the possible interactions between a nucleotide and its neighbours.

The researchers implemented a first algorithm, MC-Fold, that systematically assigns the different motifs to each segment of the sequence and selects the most probable pair based on its frequency in known structures.

A second algorithm, MC-Sym, then assembles the set of selected motifs, taking into account the constraints that are found in known structures.

"We introduced a new first-order object to represent nucleotide relationships, the nucleotide cyclic motif (NCM)," Major said.

"We reasoned that using NCMs could allow us to arrive at better models of the 3D structure of RNA molecules.

"Compared to the thermodynamic approach, our algorithms make less false positives and negatives and predict structures that are closer to the empirical data in the case of sequences for which it is available.

"The improvement is due to the fact that NCMs incorporate more base-pairing context-dependent information."

Major and Parisien have shown that these tools can be used to study the biology of RNA viruses such as HIV. They have also used the MC-Fold:MC-Sym pipeline to identify microRNAs, which are notoriously difficult to identify based on sequence alone.

The MC-Fold and MC-Sym RNA modelling tools are available from Major's website.