Junk no more - RNAs get networking
Friday, 26 June, 2009
This feature appeared in the May/June 2009 issue of Australian Life Scientist. To subscribe to the magazine, go here.
To a non-biologist, the world of RNAs is a desperately complex and intricate world, full of strange abilities, new discoveries and horrifyingly complex acronyms. Even to a molecular biologist this world is a maze: we all learned early enough the central dogma of biology – DNA codes for messenger RNA which codes for proteins – but that leaves us with the 98 per cent of the genome that has been deemed to be junk.
Then microRNAs came along, single-stranded RNAs of 21-23 nucleotides in length that were shown to regulate gene expression, and now a vast quantity of non-coding RNAs is being discovered that has shaken the central dogma to its core.
The University of Queensland’s Professor John Mattick has long been arguing that molecular biologists have got it wrong for decades, and that the vast majority of genetic information is actually embedded in non-coding RNAs (ncRNAs).
While there are clearly master regulators of the genome, beyond this is a whole layer of regulatory RNAs, Mattick says. “These regulatory RNAs are subtly setting the scene for the trajectories of differentiation and development,” he says.
“Effectively, much if not most of the genome is encoding this very sophisticated RNA regulatory network in the background. Rather than being oases of protein-coding sequences in a desert of junk, the genome is really islands of protein-coding sequences in a sea of regulation that is mainly transacted by RNAs.”
And there is more and more evidence to support that view. In a recent series of papers published in Nature Genetics, an international consortium of researchers using a systems biology approach has uncovered new information on the regulatory networks underlying gene expression.
What they have found is that by deep sequencing one cell line and measuring transcription start site (TSS) usage over the cell’s growth arrest and differentiation, many promoter sequences have been observed that were previously unidentified.
The results suggest that multiple and overlapping transcription factors are necessary for gene expression, and that “rather than a fixed hierarchy with one or a very few master regulators at the top, the picture that emerges is that of a recurrent network in which multiple transcription factors mutually coordinate their activity to implement the differentiation”.
That was the conclusion of one of the papers, written by the FANTOM Consortium, a long-running project coordinated by the RIKEN Omics Science Centre in Yokohama and involving many researchers throughout the world. (For more on the project, see page 20.)
Working on the project were Mattick and several of his colleagues from the University of Queensland”s Institute for Molecular Bioscience, including Associate Professor Sean Grimmond and Dr Geoff Faulkner from the expression genomics laboratory, PhD student Ryan Taft and a host of other Australian and international researchers.
In another paper in the series, led by Faulkner, Grimmond and Professor Piero Carninci from RIKEN, the repetitive elements known as retrotransposons that comprise 30 to 50 per cent of the genome and were long thought mainly to be genetic leftovers, were found to have some functional role.
The team found over 200,000 transcription start sites within repeat elements over the mouse and human genomes, and observed that they are generally tissue specific. They also found that these repetitive elements frequently function as alternative promoters or express ncRNAs, and that retrotransposon transcription has a key influence upon the transcriptional output of the mammalian genome.
---PB--- And in the third Nature Genetics paper, led by Taft and Mattick, the team have identified a new kind of RNA, tiny sequences of 18 nucleotides in length that map to within –60 to +120 nt of transcription start sites in humans, chickens and Drosophila.
These transcription initiation RNAs (tiRNAs), as the researchers have named them, show specific size and sequence characteristics, are mainly found near highly expressed transcripts and sites of RNA polymerase II binding, and are very likely to be biologically meaningful.
“Basically, I think we have misunderstood the structure of genetic and genomic programming for the past 50 years because of the assumption that most genetic information is translated by protein,” Mattick says.
“These and previous papers of ours, and others coming out soon on non-coding RNAs in neural cells and T cells are providing more evidence to support this alternative view.
“The evidence is now coming thick and fast. I do think that the dam is about to burst and that people will have to step back and reassess their understanding of the evolution and genetic programming of complex organisms.”
Tiny RNAs
The discovery of tiRNAs was not unexpected, as several papers over the last few years have reported an association between small RNAs and transcription starter sites. While they’ve never shown that they are actually transcription starter sites, the RNAs were obviously derived from those sequences, Mattick says.
“What we’ve done is to identify and characterise them,” he says. “We’ve shown that they have a very specific size – 18 nt – and they have a very specific position which averages about 20 nt downstream of the starter site.
“They are conserved in both size and position relative to transcription starter sites in chicken, human and flies, which indicates they are in all animals.”
In the paper, the researchers write that previous deep-sequencing studies have tended to disregard low abundance, non-annotated small RNAs like these as spurious, or degradation products.
A number of observations suggest, however, that they are biologically meaningful. The majority – 74 per cent – of human tiRNAs map to canonical RefGene promoters, they possess a terminal 5’ phosphate, thus selecting against degradation products, and their 5’ ends show peak density close to transcription starter sites, indicating that they are processed.
While the actual function of the tiRNAs is not certain, Taft and Mattick believe they are associated with chromatin modification. tiRNAs are G+C rich, unlike miRNAs, and the vast majority overlap an annotated CpG island.
“We have speculated whether they are either signatures of transcription or they may be signatures of poised transcription, where something has started and then stopped,” Mattick says.
“It’s well known that some classes of promoters, or many of them, have poised transcripts – there are two initiation sites. So it could be a signal of that, but our favoured idea – which we have new data for – is that they are associated with particular forms of chromatin modification.
“There is an interplay between these small RNAs, the promoters including CpG islands, and chromatin modification. They exist, they are evolutionarily conserved in size and position and they clearly have some important functional role in either transcription itself or in the epigenetic phenomena that surround active transcription start sites. It may take some time to work it out.”
---PB--- Massive RNAs
In the meantime, work is continuing on RNAs at the other end of the spectrum, the long non-coding RNAs that are thousands of bases long. Probably the best known is XIST, which regulates X-inactivation in females.
The existence of XIST has been known for many years, but it has since been claimed as a large intervening (or intergenic) non-coding RNA (lincRNA). This subset of ncRNAs was named by Dr John Rinn, a researcher at Harvard University who is also a member of the Broad Institute.
In 2007, Rinn discovered HOTAIR, a lincRNA that is situated near the HOXC cluster but works to regulate the HOXD cluster, found on a different chromosome. Since then, less than a dozen lincRNAs have been identified, but in a paper in Nature in February, Rinn and his colleagues, including the Broad’s Eric Lander, reported the discovery of over a thousand large, multi-exonic RNAs across four mouse cell types.
Remarkably, these lincRNAs seem to have a role in a wide variety of biological processes, from embryonic stem cell pluripotency to cell proliferation. And interestingly, most of these lincRNAs are found very near genes encoding transcription factors and other protein factors related to transcription.
“We had noticed it by eye, but we did a genome-wide analysis and found that lincRNAs do end up sitting next to transcription factors way more often than not,” Rinn says.
The researchers believe that many lincRNAs are involved in transcriptional control, some by guiding chromatin remodelling proteins to target loci, and some by working as a kind of buddy to transcription factors, helping them turn off some genes when others are turned on.
Rinn believes that one way these lincRNAs are working is by binding to polycomb proteins, which remodel chromatin so that transcription factors cannot bind to promoter sequences. This seems to be what HOTAIR is doing – it binds to the polycomb repressive complex 2 (PRC2) and is required for trimethylation of the HOXD locus.
“The big question is how does the genome that is in every cell in the body, use the exact same cellular machinery to produce completely different outputs?” Rinn says.
“There has to be something that is maternally inherited in cell division that tells a new, naked genome so to speak how to re-put on its clothing. We know there is a histone code that tells you what the cell is, but we need to figure out how those marks get there in the first place.
“One of my suspicions is that RNA is binding the polycomb and creating a myriad of different flavours of polycomb that know where to go based on this guidance. We now have a good indication that there will be a partial mechanism using lincRNAs but I doubt that it is the only mechanism.
“John Mattick has shown that they are doing lots of different things – he had a recent paper showing they were involved in nuclear speckle formation and are likely doing something in the brain, and others have shown that they are important for transport. It is definitely clear that they are doing other things, but one principal seems to be this guidance,” which John Mattick predicted way back in 2001.
Guilt by association
Rinn and his colleagues have achieved two feats in their paper: identifying new lincRNAs by developing chromatin-state maps to discover transcriptional units in between known protein-coding loci, and developing a new way of assigning putative function for these lincRNAs based on the protein-coding genes they are hanging out with.
Using chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq), the researchers mapped genes which showed a distinctive marker along the length of the transcribed region, which they have called the K4-K36 domain.
Surveying four different mouse cell types, they found over 1600 of these markers and showed that the majority had clear evidence of RNA transcription. Most resemble known lincRNAs and do not encode protein-coding genes. Most also showed a significant cluster of CAGE tags.
Then they began to assign putative function to the RNAs based on a method they call ‘guilt by association’. “It’s CSI lincRNAs,” Rinn says. “We hunt down the possible function of lincRNAs depending on what protein-coding gene they are running around with.”
For example, they focused on three large clusters of lincRNAs associated with p53-mediated DNA damage repair. “These lincRNAs only turn on when p53 is turned on,” he says. “We think they may be acting in that pathway to help p53 do its job.
“This is called the TV and radio conundrum: if you are sitting there listening to the radio and your favourite television program comes on, you not only have to use the remote control like a transcription factor to turn on the TV, but you also need to turn off the radio so you don’t have interference or noise.
---PB--- “We think lincRNAs may be these long sought after repression mechanisms in the transcription pathway where it turns on the genes it wants to turn on, like the television, but it also turns on a bunch of RNAs that go turn off the radio. We have a lot of evidence for that. Are they associating with polycombs to shut off specific interfering noise programs?”
The next step is to use RNAi for proof of function studies. The Broad Institute is involved in The RNAi Consortium (TRC), which is making libraries of short hairpin RNAs (shRNAs) for stable knockdown, and the team is planning to use stem cells as a model system due to their rich phenotypic output.
“The RNAi studies will probably be published in the next year and there will be one soon on polycomb-guided repression by lincRNAs,” Rinn says.
“Then we also find that they bind to other chromatin modelling complexes other than polycombs, things that sort of partition the genome. And shortly after that we will have a couple on what we call anti-factors – that non-coding RNAs can serve as repressors in transcriptional pathways.”
Evolutionary conservation
One interesting aspect of Rinn’s team’s recent paper was the emphasis it put on the high level of evolutionary conservation of the lincRNAs identified.
It is still the prevailing orthodoxy that only with signs of clear evolutionary conservation can long ncRNAs be considered biologically functional.
John Rinn certainly doesn’t agree with this, and nor does John Mattick. Rinn says it is wrong to believe that because lincRNAs are conserved they are functional, and therefore all of the other stuff is not.
“That is not true,” he says. “Just because they are conserved, that doesn’t mean everything else is junk. We clearly know that non-conserved RNAs are functional and we know conserved ones are functional. Evolution is a dynamic process: the non-conserved RNAs are in rapidly evolving regions of the genome and wouldn’t have signatures of conservation, which is what John Mattick and others have demonstrated.”
Mattick agrees that just because some of the 30,000 or so previously discovered long ncRNAs are not highly conserved, that doesn’t mean they can be put back in the junk heap. “We ourselves looked at over 1300 in the brain and found that most were expressed in very precise patterns, suggesting that there are tens of thousands that have specific expression patterns and have specific functions. John Rinn’s work is great and is providing even more evidence for the ubiquity and function of long non-coding RNAs.”
Last year, Mattick wrote an article for a German magazine that very clearly spells out his view on the wrong-headedness of the junk DNA hypothesis. It is this junk, he argues, that may very well hold the key to human complexity, variation and cognition.
Mattick says that the protein-centric view of genetic programming appears increasingly primitive, and that much, if not most of the human genome is functional and largely devoted to a sophisticated RNA-based regulatory system that controls differentiation and development.
“Given that humans share the same proteome, and that this is held largely in common with other mammals and other vertebrates, the clear conclusion is that the majority of the differences between individuals and between species is likely to be embedded in this RNA-based control architecture.”
This feature appeared in the May/June 2009 issue of Australian Life Scientist. To subscribe to the magazine, go here.
Quitting smoking increases life expectancy even for seniors
Although the benefits of quitting smoking diminish with age, there are still substantial gains...
Stem cell transplants treat blindness in mini pigs
Scientists have successfully transplanted retinas made from stem cells into blind mini pigs,...
Sugary drinks raise cardiovascular disease risk, but occasional sweets don't
Although higher sugar intake raises your risk of certain cardiovascular diseases, consuming sweet...