Feature: Hidden in the genome

By Fiona Wylie
Wednesday, 22 December, 2010


This feature appeared in the November/December 2010 issue of Australian Life Scientist. To subscribe to the magazine, go here.

What makes you you, and not someone else, are the millions of single-nucleotide polymorphisms (SNPs) residing in your genome. Together, these account for the majority of the genetic differences between any two people. And these variations affect not only an individual’s physical traits, but also their susceptibility to disease.

A genome wide association study (GWAS) seeks to characterise this variation across the genomes of a huge number of subjects and controls to identify associations of specific SNPs with a given disease. For geneticists, SNPs act as markers to locate genes in DNA sequences, thus implicating the ‘tagged’ genes in disease.

Since the first GWAS results were published in 2005, many risk and protective factors have been identified for dozens of common human conditions, including asthma, cancer, diabetes, heart disease and mental illness.

As new genetic information drives ever-more sophisticated products on the market to do GWAS and the trickle of data becomes a tsunami, some scientists and funding agencies are now starting to ask if pouring more mega dollars into these approaches will really deliver the extra ‘bang for their buck’.

Many of the GWAS findings are novel, with the ‘associated’ SNPs not previously implicated in disease. These results are especially exciting in light of the past difficulties replicating genetic findings for many diseases using linkage and candidate gene studies.

Such discoveries are informing researchers about basic aspects of the disease pathogenesis as well as revealing new avenues to detect, treat and prevent the disease. Such variations can also affect an individual’s response to therapy. Thus, GWASes have been touted as one of the first steps on the golden path to personalised medicine.

In practical terms, a GWAS essentially compares all or most of the DNA from individuals across two study groups: people with and without a disease or trait. The extracted DNA is analysed on gene arrays or ‘chips’ that can read millions of sequences simultaneously.

This technology is advancing almost daily, with one of the leading genetics technologies companies, Illumina, set to release its Rolls Royce bead-chip array, the Omni5. This will analyse around five million markers per sample. The chip results are analysed using bioinformatic tools to identify SNPs in the sample that mark blocks of DNA variations, or haplotypes. GWASes are generally non-hypothesis-driven, which means that they are like a very high-tech genomic fishing expedition, with SNPs as the bait and disease-associated variations the catch.

---PB---

Complex problems, complex solutions

GWASes only became feasible in the 1980s, with the explosion of molecular biology tools and techniques, especially in the realms of DNA sequencing. The literature began featuring comparisons of whole genomes, with candidate-gene and linkage studies starting to implicate disease-associated genes with large effects.

However, many findings were difficult to replicate and the statistical power of the associations identified was generally limited, and nowhere near the stage of handling a whole genome.

In 1996, after much similar discussion in the field, Eric Lander at MIT proposed the common disease-common variant hypothesis. It argues that most of the sequence differences or genetic variation between any two people are due to common SNPs (varying on at least five per cent of chromosomes in a population), and that these common variants must confer at least some of the genetic risk for common diseases.

He further proposed characterising and cataloguing these variations to study their association with disease in large samples. SNPs became the variation of choice in this common disease-common variant GWAS strategy, which essentially assumed that many different common SNPs have small effects on each disease, and that some could be found if enough people were tested for enough SNPs.

The three major advances in human genetics that really enabled these major studies to become a practical reality about five years ago were: the publication of the human genome in 2003; the International HapMap Project to identify and catalogue genetic similarities and differences among humans, launched in 2002; and the ongoing improvement of arrays to reliably, quickly and affordably screen huge numbers of SNPs simultaneously. And thus a new and ever-expanding phase of genetics research was born.

The first GWAS publication in 2005 identified an association between age-related macular degeneration (ARMD) and a variation in the gene for complement factor H (doi:10.1126/science.1109557). This association was unexpected from previous research in ARMD, and suggested ARMD was an inflammatory process.

In the five-years since this landmark paper, GWASes have rapidly grown in scale and complexity, with some studies now looking at over a million genetic markers in cohorts approaching hundreds of thousands of individuals.

According to Jennifer Stone, Product Manager of Genotyping Applications at Illumina in San Diego, the field has boomed since those first few studies. “In 2009 alone, more than 200 GWAS papers were published and more than 3500 associations have been reported so far.

---PB---

This has given us a large number of significant findings with respect to many different conditions and phenotypes – and most associations identified had virtually nothing known previously about their genetic architecture.” She adds that new products fuelled by release of the 1000 Genomes Project data next year will send this number even higher.

Many international GWAS consortia and partnerships, including private-public couplings, have been formed over the last five years to tackle human disease on several fronts. These combined efforts bring together heavyweight in the genetics field often from many countries and, importantly, some serious money to access large sample populations and the technology needed.

Such collaborations often involve well-respected organisations such as the NIH and large pharmaceutical companies like Pfizer. In one example of such major GWAS efforts, the Wellcome Trust Case Control Consortium formed in 2007 has uncovered many new disease genes for coronary heart disease, diabetes, rheumatoid arthritis, Crohn’s disease, bipolar disorder and hypertension.

Examples of the success of current GWAS approaches are not hard to find, even if one scans only the last few months of literature. In August, 59 previously unknown genes were linked to cholesterol levels in the blood out of 95 variants identified in a screen of over 100,000 European individuals. This was a high-power and high-sensitivity study that clearly showed meaningful and novel genetic variation with respect to disease.

The October issue of Nature Genetics published five separate GWAS papers describing genetic variants associated with increased risk of the skin disease, psoriasis. One of the papers, published by the Wellcome Trust Case Control Consortium 2 (WTCCC2), reports for the first time the interactions of two specific genetic regions in humans, previously not known to interact using previous methods.

They also found eight regions of the human genome that, until now, had not been linked with psoriasis, and seven of these regions harbour genes with recognised immune functions.

In another example, the international consortium including researchers from Icelandic company, deCODE genetics, reported a rare risk variant linked to glaucoma (Nature Genetics, September 2010, doi:10.1038/ng.661) that carries a higher risk in Chinese populations compared to those of European ancestry.

This finding could impact directly on screening tests for this common eye disease, particularly in China, “where this latest SNP alone can define a small fraction of the population that should be very carefully screened.”

---PB---

Hidden in the genome

A crucial issue with GWASes is whether they are statistically powerful enough to reveal the multitude of minute changes in the genome that can influence or even cause human disease.

This is asked partly because most of the SNP variations found by early GWAS are associated with only a minute increased risk of disease, and have a relatively tiny predictive value. It’s not uncommon to see a few variants that have a large effect, while a majority have very small effects.

However, it may be that many small changes, even in some genes that have a very small effect, could account for the variability in disease susceptibilities, especially those for complex diseases.

It’s corralling these tiny effects into something significant – something not lost in the noise of the rest of the genome – that is proving a gargantuan statistical challenge. It’s like listening for a few conspiratorial whispers in the crowd at a rock concert.

The latest thinking is that larger sample sizes than first estimated are needed for GWASes to reach the power they promise. Instead of the few thousand that were first sampled for GWASes, current teams are now aiming to sample tens to hundreds of thousands of affected individuals and controls, depending on genotypic relative risk and allele frequency.

Some, such as Duke University’s Professor David Goldstein, are vocally sceptical of this approach. Goldstein, writing in The New England Journal of Medicine last year, argues there’s a limit to how much we can learn about common risk variants when it comes to some common diseases or traits.

“If common variants are responsible for most genetic components of Type 2 diabetes, height, and similar traits, then genetics will provide relatively little guidance about the biology of these conditions, because most genes are ‘height genes’ or ‘Type 2 diabetes genes’,” he wrote.

Others are more optimistic about the prospects of GWASes in revealing risk variants, especially given the new generation of tools just around the corner. The ‘bigger is better’ line was echoed recently at an Illumina GWAS workshop in Brisbane by Professor Matt Brown, from the University of Queensland, and Dr Peter Visscher, from the Queensland Institute of Medical Research, based on their own respective research data, with Brown declaring that “sample size is king, even for common variants.”

---PB---

Another challenge for GWASes is solving the ‘case of the missing heritability’, which describes the proportion of inter-individual differences in a trait that is the result of genetic factors.

Despite some GWAS results explaining a substantial fraction of the genetic risk for a given disease (such as in the paragon case of age-related macular degeneration), for other complex diseases, such as schizophrenia, the sum of individual effects discovered so far is much less than the total estimated heritability. The challenge for the field is therefore to identify the variants that confer that outstanding risk, i.e. balancing the books.

It is now widely accepted that common SNPs are unlikely to explain all of the genetic risk for disease, even for common disorders, and that multiple rare SNPs (frequency < 1%) along with hundreds or thousands of common small-effect variants play a bigger role.

Most rare SNP associations will be missed by current GWAS methods, but it is expected that the 1000 Genomes Project will discover most SNPs with 1–5% frequencies, and thus form the basis for a further round of ever-more informative and penetrating GWASes to identify these less common SNPs and perhaps common SNPs with small effect sizes.

Another major challenge of GWASes is meaningful selection of the study and comparison populations such that systematic biases are not introduced. Depending on the phenotype, it might be important to match the control group for variables such as age, environmental influences and ancestry, and epidemiological data are often critical for defining appropriate comparison groups.

GWAS researchers must also consider the known and possible differences among populations with diverse geographic ancestries. Recent large-scale studies comprising different populations groups from around the world are bearing this out, such as finding differences in various allelic and SNP patterns between individuals of European and Asian descent.

Case-control differences in ancestry could therefore also confound GWAS results, although this can often be corrected statistically, and sophisticated methods have been devised specifically for the statistical analysis and imputation of GWAS data.

Samples are now often limited to a single ancestry such as European or Asian, because of these possible differences in SNPs frequencies across populations and some associations are most meaningfully analysed in separate, homogeneous samples.

---PB---

The next great leap forward

Despite the limitations yet to be overcome, it is clear to all that the GWAS era is changing the face of human genetics. According to recent NIH data, more than 400 regions of the genome have now been reproducibly associated with around 70 common diseases or complex traits.

The future challenge now is to discern the meaning of all this data in terms of gene function and disease biology, and to make the most of new technologies for the benefit of human health.

For instance, Illumina’s Jennifer Stone sees next-generation sequencing (and whatever comes after that) becoming a complementary rather than alternative approach to whole-genome analysis, particularly as it all continues to become cheaper and more efficient.

She predicts that technology for both approaches under development “will be extremely powerful in providing a comprehensive coverage of genetic variation and its associations with disease, particularly once the complete 1000 Genome picture appears.”

It is now obvious that GWASes will require as many as 20,000 to 30,000 case subjects, and a similar number of comparison subjects, to obtain highly robust and meaningful findings in many cases, particularly in studying complex diseases such as psychiatric and cardiac disease, where hundreds of common and/or rare associated variations might have small effects.

This raises important questions regarding resource allocation and national funding bodies such as the NHMRC in Australia and the NIH in the USA will face tough decisions in the near future, especially given the amount of money already invested and the range of research interests to balance.

To this end, all ears in the genetics world will be tuned into the upcoming American Society of Human Genetics meeting in early November, where data will be reported from testing of Illumina’s new Omni5 array. This is particularly so in Australia, as grant writing season approaches and arguments need to be formulated to justify further studies. So, for now, watch this space.

This feature appeared in the November/December 2010 issue of Australian Life Scientist. To subscribe to the magazine, go here.

Related Articles

AI-designed DNA switches flip genes on and off

The work creates the opportunity to turn the expression of a gene up or down in just one tissue...

Drug delays tumour growth in models of children's liver cancer

A new drug has been shown to delay the growth of tumours and improve survival in hepatoblastoma,...

Ancient DNA rewrites the stories of those preserved at Pompeii

Researchers have used ancient DNA to challenge long-held assumptions about the inhabitants of...


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd