The gene or the protein?
Friday, 06 February, 2009
If Dr John Bergeron hadn’t become a cell biologist he would surely have made his name as an advocate. One of the founders of the Human Proteome Organisation (HUPO), a recent past president and chair of its series of initiatives, Bergeron has long been a champion of the power of proteomics, using its techniques to draw out the inner workings of the cell’s organelles and to uncover the secrets of the secretory pathway.
A professor of cell biology at McGill University in Montreal, headquarters of HUPO, Bergeron is looking forward to coming out to Australia, which he calls the original home of proteomics and where the term was first defined.
And when he gets here, he knows full well that he will have a serious debate on his hands.
The idea of a human proteome project has been tossed around for several years, and while the direction it will take has not yet been decided, it looks increasingly likely that it will happen.
The very complexity of the human proteome is considered the project’s largest stumbling block, but Bergeron and his colleagues believe they have hit on the most logical way forward.
Informed by their work with CellMap, which aims to identify the location of a cell’s entire set of proteins (see page 5), they believe that a gene-centric approach – in which the mapping of the human proteome takes as its starting point the human genome – is the way to go.
Many would disagree, and some do so strongly. In an editorial in a recent issue of the Journal of Proteome Research, proteomics pioneer Denis Hochstrasser of the University of Geneva argued that a protein-centric approach was the only way in which to capture the enormous complexity of the human proteome.
“Although Swiss-Prot/UniProtKB [the central repository of protein sequence and function information] now lists most, if not all, of the proteins predicted from the human genome, the protein products of the identified gene sequences need to be expressed or synthesised to unravel the products’ structure, associations, and functions,” Hochstrasser wrote.
“Protein half-lives, concentration levels, and modifications are intimately linked to cell metabolism and are probably less directly linked to gene expression level.”
Australian proteomics researcher Professor Nicki Packer argues that looking only at the gene level will not tell you “what is actually being expressed at any one time in any one cell in any one disease”.
Post-transcriptional modifications are of particular importance, she says. “I’m pretty much firmly in the court that you have to look at the product of the expressed gene.”
Bergeron doesn’t disagree; he just wants to do it in a different way. “We don’t disagree at all but we want to put it on a robust foundation,” he says. “It’s not a difference of opinion – we agree and we’re coming up with a strategy to make sure that this gets realised.”
---PB--- Gene-centric approach
The father of the gene-centric approach, Bergeron says, is his colleague Dr Tommy Nilsson, who has just joined McGill from the University of Gothenburg in Sweden.
Nilsson says a gene-centric approach is the logical step from the completion of the human genome.
“From that we know exactly what proteins are going to be expressed and therefore it is basically a linking to the genome,” Nilsson says. “One point that is very important is that rather than having an unknown entity we have a known possible entity.
“The gene-centric way of looking at things, and also from an engineering perspective, is how do you deal with the almost infinite number of permutations there can be in the human body? Proteins being expressed at different levels, different isoforms, splice forms, post-translational modifications – it is just humungous. And if you go and look at a particular disease tissue you are going to be lost in details.
“What is better is to collapse all of that complexity down to 20,300 … that gives you a handle on the complexity. That doesn’t mean that you are ignoring the complexity, but you build up a database which then becomes powerful.
“You have 20,300 different proteins, each protein has one name or identifier and underneath that particular protein you can have the isoforms or splice forms annotated as much as you want. The gene-centric approach is just a way to deal with the complexities.”
Bergeron himself likes to use the analogy of a house built on stilts. The human genome is about 20,300 protein-coding genes, meaning 20 stilts each representing 1000 genes. At the moment, for 5000 of those genes, there is no evidence for a protein, and for another 8000 genes there is only weak evidence.
“So now you have a house,” he says, “which is our foundation of knowledge of the human genome, where instead of 20 stilts the house is resting on only seven. Seven out of 20 – that house is going to collapse. That is the knowledge foundation that we have right now.
“The gene-centric human proteome fills in those 20 stilts and makes them the solid foundation upon which the house of knowledge is built. And we have a failsafe way of filling in that knowledge, such that you can reliably assign the data that you are accruing from mass spectrometry to a single representative protein from each protein-coding gene.
“This gets rid of the lack of reproducibility and the false sense of reliability that you think you have when you are comparing disease with normal.”
---PB--- Test samples
Bergeron says serious discussion about a human proteome project began following a Nature editorial in 2005, which said that one of the responsibilities of HUPO was to ensure that standards were in place to deal with the major problem of current proteomics research: the lack of reproducibility, reliability and robustness in experiments.
Soon after that editorial, a HUPO meeting agreed to devise a test samples project to bring order and standardisation to the reporting of proteomics experiments, an effort that is expected to be published very shortly.
While details are scant as the paper is under embargo, Bergeron’s colleague Dr Alex Bell was closely involved in the study, which involved 27 labs from around the world. They were asked to complete two reasonably difficult tasks and the results assessed.
Essentially, the vast majority of the labs failed to get the task done. This is not as big a disaster as it sounds, as the researchers found the raw data was good but the methodology was rather suspect.
“We asked them to look back at some of their data so they could find the correct answers,” Bell says. “They weren’t reporting efficiently or with descriptive names, and for those people who had serious problems we did send them another sample and then it was not a problem.
“In effect, when we had quality control, everyone managed to get to the same endpoint and collect sufficient data.”
The team believes that part of the problem is in the disparate kinds of proteomics databases used throughout the world. One aspect of the test samples project is to come up with a quality control mechanism that might be used as a basis for the project.
So if we face major problems with reproducibility and robustness in experiments, are we actually ready, technologically speaking, for a human proteome project? Yes and no, says Nilsson.
The mass spectrometry technology being used now is far in advance of what is required for the project in terms of accuracy and capacity, he says.
“It is not as if we are in 1985 when we thought about sequencing the human genome, when we could barely crawl when that project was first discussed,” he says.
“We are in a much better position than that in terms of the technology. In one run you can get about 1000 different proteins. These platforms are very standardised and it’s not like there is only one lab in the world that has it – most labs have a mass spectrometer.
“Technology wise, we are in a much better position than people were ever in with the human genome project. I say that without any hesitation.
“But the no comes particularly from our inability to match the high quality data that comes out of the mass spectrometer with the current databases. That really is where everything falls apart.
“And that’s the same with transcriptomics – the databases today are absolutely awful, they are totally incompatible with proteomics. You have multiple entries for the same protein, multiple names for the same protein, multiple errors and it just goes on.
“The hope and the wish and the intention in this project is also that the databases reduce the complexity as far as possible.
“For example, creating a gene-centric database, that’s the wish. It’s a big no for today but this is an easy fix. In a year or so, it will be fixed.”
---PB--- Towards consensus?
Every January, hordes of suffering scientists decamp with their chilblains from Canada’s winter freeze to attend a conference in Barbardos, where McGill runs the Bellairs Research Institute.
It was here in 2007 that the major discussions on how to go about putting a human proteome project together began in earnest. Discussions continued throughout the year, and at the 2008 Barbados conference a white paper was commissioned, a key feature of which was the consideration of a gene-centric proteome.
Bergeron will report on the outcomes of the 2009 conference when he attends the Lorne Proteomics Symposium this weekend. A major theme of the Lorne symposium is the human proteome project, and in particular how the Australian and New Zealand proteomics world can participate.
Gaining consensus on the approach is going to be difficult, to say the least. Bergeron personally hopes that the scientific foundation will announced at the HUPO World Congress in Toronto in September, and that it be underway by the time of the 2010 congress in Sydney.
“It would be very satisfying to have in Australia, where the whole field began, the actual launching of the human proteome project,” he says. “And it will be a when, not an if.”
---PB--- Profile
John Bergeron and his team are perhaps best known for the CellMap strategy, using proteomics to characterise all of the proteins in the mammalian cell.
In December 2006, Bergeron and his team reported a quantitative analysis of the secretory pathway – the steps the cell uses to move proteins out of the cell – and found 1400 proteins involved in this pathway alone, 345 of them previously uncharacterised.
Along with the sampling issue, one major problem facing proteomics worldwide is dynamic range – the ration between the smallest and largest possible values in a quantity.
The volume of proteins in cells can span several orders of magnitude, and some are beyond the dynamic range of even the best instruments available today, he says.
“What we do is go into the cell and pick out each and every part which would otherwise be refractory to proteomics, because they would be buried in the noise of low abundance as a consequence of this enormous difference in range of concentration.
“This is another strategy that we use in order to map the proteome in that we go after each and every compartment of the cell in all of the different cell types of the body and use that data to map the human proteome.
“In the actual sample that is being looked at in the mass spectrometer, these proteins are present in high concentrations because we’ve plucked them out of the cell and gathered them together in a homogenous way, but if you were to look at them in the original soup of the cell, they would be at such a low concentration you wouldn’t be able to pick them up.
“That’s our little specialty – we focus on overcoming the dynamic range problem in proteomics by concentrating that part of the cell and using that as a little universe to characterise. And then, after we pluck out all of the different parts of the cell, put it together and map it as a quantitative whole.
“We’ve learned in the particular proteomics methodology that we use is that we use the mass spectrometer like a microscope – we call it the protein microscope – and by being able to use clustering tools to cluster the distribution of proteins in various parts of the cell, we can see proteins that co-cluster with each other.
“There are unknown proteins whose function is always found to be that which is dictated by other proteins they co-cluster with, and all of the proteins that come together in a molecular machine can be quantified this way. We can get stoichiometric abundance of all of the proteins automatically by using this strategy.”
Quitting smoking increases life expectancy even for seniors
Although the benefits of quitting smoking diminish with age, there are still substantial gains...
Stem cell transplants treat blindness in mini pigs
Scientists have successfully transplanted retinas made from stem cells into blind mini pigs,...
Sugary drinks raise cardiovascular disease risk, but occasional sweets don't
Although higher sugar intake raises your risk of certain cardiovascular diseases, consuming sweet...