Biology's central dogma goes out with the junk

By Graeme O'Neill
Tuesday, 14 September, 2004

John Mattick says it's now clear that biology's central dogma is wrong -- at least as it applies to higher organisms.

Prof Mattick, director of the Institute for Molecular Biosciences at the University of Queensland, has told an international bioscience conference at Surfers Paradise that the dogma that genes code for proteins only holds true for bacteria.

A large number of genes in the mammalian genome code for a multitude of small RNA molecules that function as a 'hidden layer' of instructions for building complex cells and organisms from a shared constructor set of proteins.

In a tour-de-force presentation to CSIRO's Horizons in Livestock Science conference on RNA-induced gene silencing, Mattick laid out the latest evidence for his once controversial proposition that the non-coding or 'junk' DNA that dominates the genomes of eukaryotes contains elaborate but precise instructions for assembling and operating complex life forms.

Mattick said the emerging understanding of the RNA-nome, or 'R-nome', is transforming genomics. It was now clear that non-coding RNA -- not the protein-coding exons of genes -- is the main source of variation between species, and between individuals.

"Most genetic variation in mammals occurs in non-coding regions of the genome," he said. "Variation was thought to be a product of random mutation in exons, but only 0.3 per cent of the 3 million polymorphisms between individual humans occur in protein-coding sequences.

"And only 1 per cent of the protein-coding sequences in genes differ between humans and mice."

Puppet-master RNA

The discovery in 2001 that the human genome contains only 30,000-odd genes, not several hundred thousand as originally believed, had come as an enormous surprise to most geneticists, just as the discovery of introns in eukaryotic genes had come as a surprise in 1977.

But those 30,000 genes specify at least a quarter of a million different proteins, because the introns permit multiple alternative splicings for messenger RNAs, so each gene potentially encodes multiple proteins.

This suggested that animals share a relatively stable core proteome, which is multi-tasking. The alternative spliced forms of proteins are employed in different contexts, and at different times of development -- under the supervision of the R-nome.

Mattick told the conference it was highly significant the numbers of protein-coding genes in eukaryotes does not scale up with an organism's complexity -- but the R-nome does.

Vertebrate genomes, which range from around 25,000 genes (puffer fish) to just over 30,000 genes (mammals) are not substantially larger than that of the fruit fly Drosophila, with 19,000 genes, and are actually smaller than the genomes of some plants, like wheat and maize.

The number of non-protein coding genes does scale with complexity, and reaches a maximum in vertebrates. In humans, non-protein coding sequences comprise up to 98.8 per cent of the total genome.

Wrong assumptions

The assumption that interactions between regulatory factors and environmental cues provide sufficient information to control the cellular trajectories of differentiation and development is clearly wrong, Mattick said.

"The problem is not to generate complexity, but to control the complex trajectories of development and differentiation reproducibly, and that requires enormous amounts of information," he said.

A newborn human consists of 1014 positionally distinct cells, all with precise architectures and differentiated functions. "The question is how you go from a single fertilised cell to 1014 cells in a baby," Mattick said.

Most of the information required to achieve this remarkable feat lies outside the protein-coding components -- in the R-nome. The RNA instructions function as a precise, digital control system for proteins, which are 'analogue' in nature.

Mattick said recent analyses of the concentration of RNA molecules in cells indicated that as much as 98 per cent of the transcriptional output of the human genome was from non-coding regions -- not from protein-coding genes.

"There are enormous numbers of non-coding RNA genes in the mammalian genome that are only now beginning to be recognised," he said. "They appear to account for between 50 and 75 per cent of all transcripts. Either the human genome is replete with useless transcription, or these non-protein-coding RNAs are fulfilling some unexpected function."

The limited number of interactions between regulatory proteins and environmental signals provided insufficient state information for the programming of differentiation and development.

Protein signalling merely provided contextual clues to guide and to tune the RNA-directed, endogenously programmed pathways, by providing positional information and correcting random errors.

Hidden layer

Mattick likened this 'hidden layer' of genetic activity to the vast amount of hidden activity in the brain's neural networks required to coordinate a simple physical act, like clicking the fingers.

The complexity of this hidden control layer scaled in a non-linear manner -- every new gene added to a network required a new regulator, plus a higher-order regulator to coordinate its activity with that of existing genes, or gene networks.

The regulatory 'overhead' needed to increase exponentially if the system was not to become disconnected. Mattick said the current estimate was that the number of control elements increased by a power of 1.98 -- in effect, the square -- of the number of protein-coding elements.

After some 3 billion years of prokaryote evolution, the first complex eukaryotic cells had appeared around 1.5 billion years ago. Another billion years had passed before eukaryote genome reached a threshold of complexity sufficient to spawn the explosive diversification of archetypal, multicellular organisms that occurred in the so-called 'Cambrian explosion', 520 million years ago.

Mattick said the mammalian R-nome probably represented the limit of control-system complexity. Up to 98 per cent of the human genome was devoted to non-coding RNA sequences to organise and regulate the activity of only 30,000 genes.

Given that the genome and R-nome resided in both strands of the double helix, Mattick said it was astonishing that the entire design and operating instructions for a life form as complex as a human were encoded in only 6 billion nucleotides -- a masterpiece of compact 'design'.

Rethinking the code

It was likely that many genes had evolved only to express RNA signals, not proteins, and that the majority of mammalian genome was devoted to the control of developmental programming, according to Mattick.

"Biology's central dogma says that DNA codes for messenger RNA which codes for proteins," said Mattick. "Information flows from gene to RNA to protein. Genes are generally regarded as synonymous with proteins."

The complexity of prokaryotes -- bacteria -- was limited not by evolution, but by the simplicity of their genetic operating systems. But for eukaryotes, most transactions occurred internally. Mattick argues that a hidden layer of RNA-based genetic programming regulates the expression of genes.

"The central dogma is true for prokaryotes, whose genomes consist of 96 per cent of wall-to-wall protein-coding sequences, flanked by regulatory elements. But it is not necessarily correct for eukaryotes," said Mattick.

In eukaryotic genomes, genetic information is expressed both as proteins and RNAs - a host of micro-RNAs, cleaved from larger RNA transcripts from introns, transmit information in parallel with the protein coding sequences, and are almost certainly essential for the intricate networking of gene activity.

"The majority of the regulatory transactions during development in higher organisms is conveyed by RNAs, not proteins, although the two classes regulatory controls work in concert," Mattick said.

Look out for a major feature on RNAi in the October-November issue of Australian Life Scientist

Biology's central dogma goes out with the junk

Babies of stressed mothers likely to get their teeth earlier

Customised immune cells used to fight brain cancer

Elevated blood protein levels predict mortality

Content from other channels on our network