Science and the supercomputer
Thursday, 31 January, 2008
IBM's Blue Gene/L may have retained its position as the fastest supercomputer on the planet but SGI's new Altix ICE blade system has made a big impact on the Top 500 list, debuting at number three.
While SGI only launched its new Altix ICE system in the middle of last year, it has already caused a stir in the world of supercomputing. The system purchased by the US state of New Mexico for its Computing Applications Centre is destined for collaborative use by government, academia and private industry to further scientific and engineering research.
The ICE machine was listed as the world's third fastest by the Top 500 project - organised by Hans Meuer of Germany's University of Mannheim; Jack Dongarra of the University of Tennessee and Erich Strohmaier and Horst Simon of the Lawrence Berkeley National Laboratory in California - in its most recent update last November.
IBM's Blue Gene/L at the Lawrence Livermore National Laboratory, also in California, retained its dominant position on the list, as it has done for several years. In this part of the world, the Blue Gene/L system bought last year by New Zealand's University of Canterbury and dubbed the Blue Fern is still the highest ranked supercomputer in the ANZ region, sitting pretty at number 176.
Australia's lone entry on the list is the SGI Altix 3700 system installed at the Australian Partnership for Advanced Computing (APAC) facility at the Australian National University in September 2005.
While the APAC partnership - involving the Queensland Cyber Infrastructure Foundation, the ac3 group in NSW, TPAC in Tasmania, iVEC in WA, SAPAC in South Australia, VPAC in Victoria, the CSIRO and ANU - is currently "transitioning to new arrangements" under the National Collaborative Research Infrastructure Strategy (NCRIS), it has been responsible for fostering interstate collaboration not always seen in this country.
Bill Trestrail, SGI's vice president for the Asia Pacific, says the supercomputing community in Australia is starting to see the fruits of a more integrated approach to research crossing state boundaries. He points to the example of the collaboration between the Howard Florey Institute, Flinders University and the University of Queensland, which is analysing MRI data to understand the correlations between brain structures and disease.
Another is the successful joint bid for funding by Queensland University of Technology and Central Queensland University to build a hybrid supercomputing, cluster and storage infrastructure last year. This project will serve a wide cross-section of research areas, including bioengineering, computational modelling, visualisation, chemistry, bioinformatics and engineering.
Most of these systems run programs like BLAST, for comparative genomic analysis; ClustalW, for multiple sequence alignment of DNA or proteins; and HMMER, a software suite for making and using Hidden Markov Models (HMMs), statistical models used for prediction.
One of APAC's key programs has been the construction of a grid-based approach to solving biological problems using separate high performance computing systems. For example, the iVEC and SAPAC facilities are working together on data from two genomes: the rice genome and Stagonospora nodorum, a fungus that causes leaf blotch in wheat.
"Essentially APAC wants to put in a grid infrastructure to manage applications that can run across the different types of computers we have across the country and to know what data is out there, annotate the data and manage the data," Trestrail says.
"What they are trying to put in place are the management systems to support these large datasets. Across the country there is a major push through to coordinate the management of the data assets, where IP allows it. There's a lot of work going on to try to provide an integrated science infrastructure which is not only about devices - such as microarrays and the like - but the data itself, which is the outputs of the algorithms, the inputs of the algorithms and the algorithms themselves."
---PB---
SGI's Altix technology is an example of what is called global shared memory, or GSM, computing. James Lowey, director of high performance biocomputing at the US Translational Genomics Research Institute (TGen) in Arizona, says GSM machines allow organisations using massive datasets to run multiple threads simultaneously.
TGen has just purchased a new Altix 4700 64-bit system, which boasts over half a terabyte of shared memory. It will allow the institute's researchers to search across multiple datasets without having to break up problems into smaller parts, as is often the case for older technology like Beowulf clusters.
"From a computer geek side of things, having the SGI system allows us to greatly increase what is called IPC, or interprocess communications, which is one of the shortfalls of the traditional Beowulf architecture," Lowey says.
"There's a considerable amount of latency in passing information between nodes. So when you break a problem into 64 different chunks it might need chunk one to talk to chunk 64, and the amount of time it takes to do that is prohibitive. With this SGI machine, instead of having to worry about those processes being taken far apart, we have them actually processing simultaneously on the same memory space."
TGen's senior scientific programmer, Dr Waibhav Tembe, says there are two prime examples of how high performance computing is allowing the life sciences to venture into brave new territory.
"Right now there is tremendous interest in studying the genomic variations known as SNPs (single nucleotide polymorphisms) - not independently but as a combination," Tembe says. "How do different SNPs in combination act in disease or non-disease conditions? This requires heavy computation, and there are programs specially written for such combinatorial genomic analysis. The demand for data structures and all of the input/output elements, can't be met with the conventional 32-bit machines.
"This is a classic case where 64-bit machines, such as the one we have from SGI, has helped us immensely in that we can actually analyse all 23 chromosomes simultaneously. The work that would take a couple of months can be finished in just three or four days.
"The other example is on comparing genomic sequences. There are a number of large databases now that store DNA and protein sequences. Many projects at TGen require homology searches, comparing say a few thousand proteins' sequences. Before we had to split the jobs up, but we can now load them up just directly into memory and perform all the computation. It now takes very little time."
While the new machine may have over half a terabyte of RAM, that might yet prove too little, Lowey says. "[Tembe] here has already managed to run it out of memory. You can build it big but it only took him three months to find a problem that was bigger."
Mouth bacteria linked to increased head and neck cancer risk
More than a dozen bacterial species that live in people's mouths have been linked to a...
Life expectancy gains are slowing, study finds
Life expectancy at birth in the world's longest-living populations has increased by an...
Towards safer epilepsy treatment for pregnant women
New research conducted in organoids is expected to provide pregnant women with epilepsy safer...