Biotech software: from DIY to off the shelf

By Pete Young
Wednesday, 26 June, 2002

The task of creating specialised software tools to decipher the complexities of genes and proteins until recently has been a do-it-yourself project for the bio-research community.

They shouldered the job because bringing traditional software engineers up to speed with a deep understanding of the bioscience environment is a time-consuming and challenging task.

What's more, researchers are by definition feeling their way forward, so providing software engineers with the precise specifications of the tools they are looking for can be difficult.

"Researchers doing experiments in the discovery process try high-powered statistics tools and data visualisation tools in unique ways that work some times and some times don't," says Proteome Systems software development manager Phil Doggett.

"You don't know exactly what you want before you start off. You are experimenting with the software side in the same way you are on the analysis side so it is not possible to bring in a software engineer and describe exactly what you want."

What biotech workers really want are software tools that permit far higher levels of user control and flexibility than those built into the fire-and-forget software packages sold into the business sector.

The most valuable computational platforms are those which allow "things to be put together in as many different ways as [researchers] can think of," says Doggett. "That is probably why Linux and UNIX platforms predominate in biotech rather than other heavily commoditised software environments."

Proteome Systems has developed novel analysis algorithms as well as refined and expanded software originally developed by researchers working at the laboratory benchtop for integration and commercial release.

They place a "context-aware" layer of software on top of such general purpose data storage and management products as IBM's DB2.

"Our software is aware of the proteomics environment so it knows which data can be moved between which experiments and which analyses can be applied against which bits of data," says Doggett.

A software engineer who has worked in the biotech arena for 10 years, Doggett doesn't see "a horde of programmers coming into the bioinformatics field the way they did the internet industry. It really does require specialised biotech knowledge."

Bioinformatics software covers a vast range from visualisation tools to protein characterisation and molecular modelling to mining the genome and proteome databases.

Managing the data flow across the boundaries between those zones with workflow software that provides the electronic equivalent of the paper lab book is an area companies like Proteome Systems are stepping into.

'We seem to be unique in that we are dealing with the data flow across the boundaries between image analysis, protein identification and database pattern searching," says Doggett.

Cost conscious The cost of commercial packages is an issue for many cash-strapped research facilities whose operating budgets include large components of public funding.

In specific areas, such as protein-to-protein interactions, "you might want to look at some of the commercial packages," says Derek van Dyk, a project leader with the Australian Proteome Analysis Facility. (APAF)

But no single commercial package embraces the wide range of the tools that the APAF uses and purchasing them individually would be prohibitively expensive, he says.

The fallback position is to use the menu of researcher-developed toolsets available on the large public-domain gene databases such GenBank and Swissprot.

However their one-size-fits-all design can make them slow and cumbersome tools for many research applications.

"It would be nice to have quicker, more powerful in-house tools than the public domain ones we use," says van Dyk. "A lot of people are writing their own software but I think good bioinformaticians are few and far between."

One bioinformatician who sees a dangerous gulf developing between bio-scientists and IT professionals is strewn with dangers, is Assoc Prof Matt Bellgard, director of the Centre for Bioinformatics and Biological Computing (CBBC).

The CBBC is associated with Murdoch University in Perth and its nine-person staff, which includes both biotech and infotech specialists, is working in areas such as new algorithms for handling gene sequencing research data.

Bellgard says more attention needs to be given to quality control aspects of the software which bio-researchers are relying on.

As an example, he cites the process by which researchers query public databases in search of genomic sequencing information to match against their own results.

Data in the public annotated sequence databases can change over time, due to input errors or new interpretations. The database search tool itself may simultaneously be changing due to version upgrades. Because no tools supply automated audit trails of the process, such changes are invisible to researchers attempting to replicate their results at a later stage. The result is confusion and proliferation of yet more errors, Bellgard argues.

He sees a pressing need for scientists who need to be able to reproduce their results to have access to audit trails of their analysis and search processes down to details about the version number of the software packages they use. End users need control over that part of the analysis process and should not have to rely on others to supply it, he says.

The CBBC is part way toward creating a number of automated tracking packages for gene sequence searches, one of which is MAS (Multi-BLAST Assembly System): "It tracks and stores the process so that in a month's time, when a researcher does the analysis again and sees different matches, he can have more confidence in the quality of the result."

Automating the steps in the query process and building more quality assurance into the search software is the type of assistance which non-IT end users will need more in the future, Bellgard says. He believes it should form part of ongoing efforts to give bio-researchers maximum control over the parameters of the software packages on which they depend.

Biotech software: from DIY to off the shelf

Protein-based therapy helps the body remove harmful cells

Diabetes changes the structure of our hearts, study finds

Beta blockers could halt triple negative breast cancer

Content from other channels on our network