What happened to the ‘junk’ in my DNA?

Friday, 07 September, 2012


All my life I was assured that most of my DNA is ‘junk’, serves no function and more than 99% of it is common with chimpanzees and bonobos. It has long been known that the gene regions coding for proteins make up only 1% of the human genome and many have wondered what the other 99% did, if anything.

Scientists have now begun to discover the answer: about 80% of the genome is biochemically active, and likely involved in regulating the expression of nearby genes.

The ENCODE consortium (Encyclopedia of DNA Elements), which includes hundreds of scientists from several dozen labs around the world, has used genetic sequencing data from 140 types of cells to identify thousands of DNA regions that help finetune genes’ activity and influence which genes are expressed in different kinds of cells.

Just as the sequencing of the human genome helped scientists learn how mutations in protein-coding genes can lead to disease, the new map of noncoding regions should provide some answers on how mutations in the regulatory elements lead to diseases such as lupus and diabetes, says Manolis Kellis, an associate professor of computer science at MIT, an associate member of the Broad Institute and an author of a paper describing the findings in the 5 September online edition of Nature.

“Humans are 99.9% identical to each other, and you only have one difference in every 300 to 1000 nucleotides,” Kellis says. “What ENCODE allows you to do is provide an annotation of what each nucleotide of the genome does, so that when it’s mutated, we can make some predictions about the consequences of the mutation.”

Mapping noncoding DNA

ENCODE was established in 2003 to extend our understanding of the human genome beyond protein-coding genes. One way to do that is by studying the chemical modifications of individual stretches of DNA, which control when genetic regions will be active. These modifications vary by cell type and can modify either DNA directly or the histone proteins that DNA wraps around.

To map these modifications, known collectively as the epigenome, the research groups had to collect many different kinds of data from different cell types. Some labs measured DNA or histone modifications, while others gauged the accessibility of different stretches of DNA by cutting it into fragments with enzymes.

Kellis and his group were among the computational scientists leading the effort to analyse and integrate the huge amount of data generated by different labs. “Given that we were getting more than 1000 data sets, we had to figure out ways to automatically calibrate experiments,” says Anshul Kundaje, a research scientist in MIT’s Computational Biology Group. “We developed an almost purely automated system that did all of this.”

The ENCODE researchers found that 80% of the genome experiences some kind of biochemical event, such as binding to proteins that regulate how often a neighbouring gene is utilised. They also discovered that the same regulatory region can play different roles, depending on what type of cell it’s acting in.

The findings should have a major impact on scientists’ understanding of human biology and how genomic variations can cause disease, says Ben Raphael, an associate professor of computer science at Brown University.

“The most exciting part is now we’re getting a whole genome annotation of functional elements,” says Raphael, who was not part of the research team. “Every time you want to understand what a particular piece of the genome is doing, you can use the data from this project.”

Human variation

The researchers also studied the conservation of nucleotides - the A, T, C and G ‘letters’ of DNA - in the newly identified regulatory regions. Nucleotides are conserved if they remain the same over long evolutionary periods, which can be measured by analysing the variability between species, or among individuals within a species.

A recent paper by Kellis and colleagues showed that 5% of noncoding DNA is conserved across mammals. In one of the ENCODE companion papers appearing online 5 September in Science, Kellis and MIT postdoc Lucas Ward show that an additional 4% is conserved within the human lineage, suggesting that those elements control recently evolved traits, some of which are unique to humans.

When the researchers looked at the functions of genes near newly evolved regulatory regions, they found many genes that encode regulators that activate other genes. “Genes involved in the nerve growth pathway and colour vision, both of which have been hypothesised to be recent innovations in the primate lineage, are enriched in human-constrained elements in non-conserved regions,” Ward says.

The researchers found that the most highly conserved nucleotides were also the ones most likely to be associated with disease when mutated. They also showed that variants associated with autoimmune diseases such as lupus and rheumatoid arthritis are located in regions active only in immune cells, while variants linked to metabolic diseases are in regions active only in liver cells.

In their next phase, the ENCODE researchers hope to determine just how those variations lead to human disease.

Related Articles

Mast cell test simplifies the diagnosis of food allergies

In the Hoxb8 mast cell activation test (Hoxb8 MAT), mast cells grown in the laboratory are...

A science-based solution for tackling lake health

Experts urge moving away from short-term 'fixes' that only worsen a lake's condition...

Climate report warns of perilous times ahead

Scientists have warned that the Earth is stepping into a critical and unpredictable new phase of...


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd