Combining clinical, laboratory and metabolic records with genomic data

Wednesday, 16 July, 2008

Cardiovascular disease (CVD) is the biggest killer among chronic diseases, claiming 17 million lives globally every year. Yet doctors can only attribute 50% of cardiovascular diseases to known risk factors.

Many of the risk factors for CVDs, like smoking, gender and low-density lipoprotein (LDL) cholesterol, are well known. But CVD is a complex condition, with very wide variations among the population. By gaining an increased understanding of the risk parameters, doctors would be in a better position to identify those more at risk of CVD.

The EU-funded project Multi-Knowledge has developed an IT platform that can combine clinical, laboratory and metabolic information with high throughput genomic data about the same individual.

Other studies have combined genomic data with clinical data, but the Multi-Knowledge team did that exercise on a much larger scale, says Dr Zohar Yakhini, the project’s technical coordinator.

“People normally collect clinical data that they are interested in, while we have just collected data from a cohort that consists of mostly nominally healthy people,” Yakhini says. “The clinical data that we are working with is much richer than we have seen in the past, that is a novelty of the project.”

By combining information in this way, doctors and researchers will be able to analyse the correlation between the gene expression profile in the blood of an individual and their risk of developing CVD.

“We find pre-clinical conditions to be associated to certain genomic signatures in this population,” he says. “This means that disease processes that start early might already manifest themselves in the genomic data.”

For example, doctors could use the information to identify blood gene expression signatures that link smoking and CVD in some patients, and cholesterol and CVD in others.

If achieved, this level of understanding would provide a powerful tool in preventative medical care. Simple lifestyle changes could reduce an individual patient’s risk of developing a cardiovascular disease to almost zero.

How it works

The Multi-Knowledge system works by combining heterogeneous data into a single computing platform. It sounds simple, but the task was complex.

Gene and protein expression analysis is a very data-intensive application. Gene expression occurs when information encoded in DNA gets converted into structures present and operating in a cell. These are the mRNA molecules.

Expression profiling of those molecules, using microarrays, produces very large quantities of information that can be difficult for non-specialists to analyse.

The Multi-Knowledge team developed a data collection system and the protocols to combine the data with additional information collected in a distributed set-up. In this respect the project will have an important influence on emerging standards for heterogeneous medical data collection.

The data, which can come from many sources and take many forms, is stored in a data repository that houses both raw information as well as analysis results. All participants in the study can access data and use the analysis tools that are part of the system. The data analysis tools allow users to connect the repository with external databases and annotations.

After all this work in the background, the information is presented to users via a portal. The platform is an elegant approach to combining various forms of medical data, but its relevance to the clinic depends on its usefulness.

Testing times

So the Multi-Knowledge team tested the new platform by using it to study CVD risk factors in a nominally healthy sample. One such well-known risk is smoking. The Multi-Knowledge team therefore tested this well-known risk factor in 50 apparently healthy young adults. They were looking for an over-expression of genes that provoked an inflammatory response.

The system passed with flying colours. There was a clear over-expression of genes that provoke an inflammatory response in those patients who smoked. The team also identified distinct inflammatory expression genetic signatures for smoking and for other risk factors, such as high LDL cholesterol.

The project, which received funding from the EU's Sixth Framework Programme for research, finished at the end of June 2008. Team members currently continue to work on various elements of the system.

“Not all elements of the system are integrated together, so we are working on that,” says Yakhini. “Other elements of the system have already been deployed in several sites, like the data analysis tool.”

Ultimately, the developments will influence the various new approaches to medical information taken by the consortium partners.

“From a commercial perspective, we are not necessarily going to develop a full commercial system,” says Yakhini. “It will likely be the case that respective partners will take pieces of this work and incorporate it into other products that they already have. From an industry standpoint this is also a favourable result.

“But I think many of the consortium partners will continue to work together after the lifetime of the project, finding more pieces in the puzzle that will ultimately yield effective diagnoses tools combining clinical and genomic analysis.”

And then the final impact will be lives saved through better prevention of cardiovascular disease.

This article was sourced from ICT Results.

Combining clinical, laboratory and metabolic records with genomic data

Light pollution promotes blue-green algae growth in lakes

Solar-powered reactor uses CO2 to make sustainable fuel

Scientists simulate the effects of an asteroid collision

Content from other channels on our network