Computing grid spreads number-crunching across four states

By David Braue
Monday, 23 June, 2003

Data-intensive research fields such as physics and life science will soon benefit from grid computing technology, a new method for analysing massive amounts of data demonstrated in Australia for the first time at the recent ICCS 2003 (International Conference on Computational Science) in Melbourne.

Grid technology promises to dramatically change the analysis of data by providing fast connections between powerful but physically dispersed computers. Such links allow scientists to distribute data across more than one computer at a time, leading to faster and more readily actionable results than are possible if all the data must be collected and processed at one location.

Researchers from the Universities of Sydney, Melbourne, Adelaide, and Australian National University in Canberra showed the operational grid – called the Australian Belle Data Grid -- to attendees at the PRAGMA (Pacific Rim Applications and Grid Middleware Assembly) ICCS sub-conference.

Built with the assistance of IBM, the grid consisted of standard Linux-based desktop PCs connected via the Internet. A 10 terabyte data set, sourced from early observations from the ongoing LHC (Large Hadron Collider) project currently underway at CERN in Switzerland, provided source data for the linked computers.

In the demonstration, the grid-linked systems continually co-ordinated their efforts, pushing the data between connected nodes and – to minimise bandwidth consumption and analysis speed -- returning only the results to the user.

The grid was managed using locally-developed GridBus software and the Globus Toolkit (www.globus.org), a grid computing enabler under continuous development by researchers at Argonne National Laboratory, the University of Chicago, the University of Southern California, and Northern Illinois University’s High Performance Computing Laboratory.

Grid computing is a significant improvement over current methods: in the Atlas Collaboration that’s one part of the $US8 billion LHC project, for example, over 400 researchers in 50 countries face the challenge of sharing massive amounts of data.

Since distributing copies of that much data is practically impossible, building a computing grid allows data to be distributed and analysed far more efficiently.

“What we’re after is better management of data within large collaborations of people,” says Lyle Winton, a research fellow within the University of Melbourne School of Physics.

“Normally, if you’re part of a large collaboration and people from different institutions want to use those resources, they need to know where to log onto machines, how to use them, and where the data is stored on each one. [In grid computing] the analysis processing moves to where the data is, rather than moving the data to where the user it. We’re hoping that [data management] will be taken away from physicists so they don’t have to worry about where the data is and where it’s coming from.”

Since grids focus on the data being analysed and not the underlying computer technology, the connected nodes do not have to be running the same operating system and applications.

This characteristic could soon pave the way for massive grids composed of all kinds of different systems, including the powerful supercomputers recently purchased Melbourne University, the Victorian Partnership for Advanced Computing and the South Australian Partnership for Advanced Computing.

Although last week’s demonstration was simply a proof of concept, the grid computing approach is gaining currency within biotechnology circles as researchers face the ever more difficult challenge of managing increasing volumes of data.

Grid computing could substantially reduce the time needed for tasks such as protein modelling, drug discovery, in silico DNA sequencing, and other time-intensive computing.

In the long term, technology’s advocates envision that grid-based computing power will be sold commercially as a form of computing-on-tap – allowing businesses to pay for access to a massive number-crunching capability for rapidly producing periodic reports, scientific analyses and other information that requires far too much compute power for one organisation to justify purchasing for itself.