Feature: Proteomics’ open book

By Graeme O'Neill
Monday, 10 September, 2012

In the beginning, there was the Human Genome Project. For all its epochal significance, it was little more than that: a beginning. It delivered the first 3.5 billion base-pair DNA sequence of a human being, along with a near-complete catalogue of 20,000-odd genes, and a map showing where each gene located on our 22 pairs of autosomes and pigeon-paired sex chromosomes.

The modest total of genes deflated any immodest notion that natural selection might have endowed a species with supersized brain with supersized genome to match. Instead Homo sapiens turned out to be in the same genomic little league as nematodes and fruit flies.

But the disconnect between the final tally of 20,000-odd protein-coding genes, and even the most conservative estimate of the size of the human proteome, hinted at the much greater task ahead: cataloguing and determining the functions of at least 20,000 proteins, and at least 200,000 variants created by RNA-directed splicing of gene introns in different combinations. And even that would be just the first step beyond the beginning.

Ian Smith.jpg - Professor Ian Smith, Pro Vice-Chancellor (Research and Research Infrastructure) and Professor of Proteomics at Monash University, is Director of the Monash Biomedical Proteomics Facility and head of the Peptide Biology Laboratory. His research focuses on proteases involved in the generation and metabolism of peptide regulators of cardiovascular function, and the development of protease inhibitors as research tools and potential therapeutics.Ian Smith, Pro Vice-Chancellor (Research and Research Infrastructure) and Professor of Proteomics at Monash University, says the human proteome also extends to an unknown number of permutations on the unadorned peptide sequences of each primary protein and its splice variants.

According to Smith, it has been difficult to get a handle on the number of human proteins and variants, with estimates of the number of post-translationally modified proteins, their splice variants and peptide progeny ranging from 1.5 to 2.5 million.

Post-translational extras include glycosylation, phosphorylation, acetylation and lipidation, and there are further permutations on each of these variants. For example, there is N-type and O-type acetylation, and proteins may be phosphorylated at serine, threonine or tyrosine residues. Proteins may also be structurally altered by disulfide bonding, or undergo enzymatic cleavage into peptide fragments.

The challenge is to bring order to the multitude of molecules: to aggregate, catalogue and organise information held in dozens of databases around the world, many of which have evolved with idiosyncratic structures and inconsistent data formats.

The Human Proteome Organisation (HuPO) made a start two years ago by launching the Human Proteomic Project (HPP). HuPO decided that, to provide complete coverage of the proteome, one arm of the project would employ a gene-based, chromosomal-centric strategy.

---PB---

Trans-Tasman consortium

As with the Human Genome Project, the 24 human chromosomes have been divided up and parcelled out to participating nations. Or, in Australia’s case, to an Australian-New Zealand consortium. The trans-Tasman consortium put its hand up for chromosome 7, and Smith and his colleagues undertook to design a data-integration and software-analysis system for the whole project, with chromosome 7 as a model.

The Monash team received a $200,000 one-year grant from the Australian National Data Service (ANDS) to design, develop, and deploy an open-access Web interface – i.e. a proteome browser – to make the huge volume of information easily comprehensible, searchable and reusable to researchers worldwide, and to enable the map of the human proteome to be completed.

The ANDS is a collaboration between Monash University, the Australian National University and the CSIRO; and is funded by the Australian Department of Industry, Innovation, Science, Research and Tertiary Education (DIISRTE).

Smith says securing funding for the browser project was always going to be a challenge, because biomedical researchers prefer disease- and hypothesis-driven research, and were unlikely to be enthusiastic about a proposal for a chromosome-centric data integration and analysis tool.

Work on the proteome browser began in February, and the Monash team ran a workshop in Sydney in March, which identified the Proetomics community’s vision for the browser.

“I like to emphasise that while it’s led by Monash, we’ve had significant input from Macquarie University and the rest of the Australian proteomics community,” Smith says.

“We’ve had great support from groups in New Zealand, the US and South Korea, and the fact that it’s supported by ANDS is terrific. There’s a tremendous amount of goodwill in the research community, and we’re confident we can make it work.”

His Monash University colleague, Robert Goode, has been the main driver for the project, which is receiving strong support from Professor Bill Hancock, of Northeastern University in Boston – Hancock, a former Adelaide University PhD, is co-chair of the trans-National Institutes of Health Alliance of Glycobiologists for Detection of Cancer and Cancer Risk, and editor-in-chief of the Journal of Proteome Research.

The Chromosome 7 project is largely being driven by Professor Mark Baker, Chair of Proteomics at Macquarie University, whose team has been working to develop a comprehensive catalogue of the membrane proteome, to underpin research on the mechanisms involved in the development of ovarian, colon, breast and prostate cancers.

Smith’s team will present a working version of the browser to a meeting of the International HuPO in Boston in September, that will integrate a number of proteomics data sources, and Smith expects the browser to go live in November or December.

Phase 2 of the project, scheduled for completion in February next year, will integrate more data sources, provide advanced filter capabilities and will offer reporting and data-export functions.

---PB---

Other species

While its primary role is to underpin the Human Proteome Project, the software is being designed to be species- and chromosome-independent, to allow comparisons of human and animal data. “And there’s no reason why it can’t be easily adapted to other species, including lower vertebrates invertebrates, plants and single-celled organisms,” he says.

Smith describes the current design as “very experimental”, but says it will provide maximum flexibility in format and scope. “Everyone’s favourite database could potentially be included – nobody will be left out.

“It will be freely available, owned by the international proteomics community, with no constraints on access. It’s important to get the information out there to the biological and biomedical research communities.”

The browser will present every protein in a traffic-light format. A green, amber or red button indicates the quality of the information is available on the protein, green indicating good quality, amber indicating reasonable quality, and red indicating poor quality data. The absence of any colour (black) indicates that there is no data available.

“The user will be able to click on the name of the protein – say, Protein Z on Chromosome 7 – and step down through the database to get information on, for example, the tissues in which it has been detected and its function in those tissues, whether it is up-regulated or down-regulated in cancer, and in the future possibly even details of its epigenetic status in various tissue types, both healthy and cancerous.

“The browser’s particular value is that it will allow researchers to drill down through tissues, see which protein is being expressed at a particular point in the cell cycle, determine which molecular form or splice variants are present, and determine their phosphorylation status.”

The interface presents a series of clickable filters, allowing an investigator to select a chromosome, determine whether the protein for a particular gene has been discovered and characterised, check for post-translational modifications, note any associations with genetic or metabolic disease, compare sequence variations and find information on alternative splicings or peptides produced by protein cleavage.

According to Smith, organising information on alternative splicings has been one of the major challenges in designing the browser. “The beauty of the browser approach is that we can easily add extra layers of information as more and more data, or different types of data, become available.

“Devising a common format for data entry has been challenging, but the advantage of the browser is that, to my knowledge, there’s nothing else around that rivals it for sophistication and comprehensiveness.

“The advantage of being first with a project like this is that, if the Americans want to do it one way, and the Europeans another, Australia is seen as an honest broker, putting us in a good position to resolve any conflicts.”

Novel antibiotic activates 'suicide' mechanism in superbug

Researchers have discovered a new class of antibiotic that selectively targets Neisseria...

Modifications in the placenta linked to psychiatric disorders

Schizophrenia, bipolar disorder and major depression disorder are the neuropsychiatric disorders...

ADHD may be linked with an increased risk of dementia

An adult brain affected by attention deficit hyperactivity disorder (ADHD) presents modifications...

Feature: Proteomics’ open book

Trans-Tasman consortium

Other species

Novel antibiotic activates 'suicide' mechanism in superbug

Modifications in the placenta linked to psychiatric disorders

ADHD may be linked with an increased risk of dementia

Content from other channels on our network