Comment: Cloud empowers scientists to accelerate innovation
Thursday, 02 February, 2012
Contributed by Shane Owenby – Managing Director, Asia Pacific, Amazon Web Services
It’s the year 2000 and a research group at a biotechnology company is discussing an edgy and ambitious whole genome RNA sequencing project. But a project of this magnitude will produce terabytes of data and need vast computing power to be used in an undefined number of permutations and combinations.
It’s clear the technology firepower needed to get this project off the ground is beyond the group’s budgetary means. Progress stalls. Scientific thresholds remain just that – thresholds.
The scenario in 2011 is very different. IT procurement, costs and lengthy technical reviews once cast long shadows over research groups, but cloud computing has virtually removed resource constraints as a barrier to progress.
Using cloud computing, rather than buying, owning and maintaining their own data centres or servers, organisations purchase infrastructure resources such as computing power and storage services from third-party cloud providers on an as-needed basis.
Database, messaging infrastructure and content distribution services can also live in the cloud. The provider manages and maintains the entire infrastructure in a secure environment and users interact with their resources via the internet. Capacity can grow or shrink instantly and users pay for what they actually use.
With a cloud computing strategy in hand the biotechnology company’s research group, stymied in 2000, likely could fund their whole project out of a discretionary budget. It can access on-demand technology infrastructure with enabling technologies such as scalable storage, elastic compute power and dynamic analysis platforms with no upfront cost or negotiations.
And so they begin by sending samples to a sequencing service provider, who ships the results to a secure cloud environment. The storage needed for the results is available on-demand with a pay-as-you-go pricing model, meaning the researchers pay nothing until the first byte is written or after the final file is removed.
The collaborators get straight to work performing large-scale, distributed computations. Sharing results becomes as easy as sending an email. Their research program, at last, is enabled to run free of cost and infrastructure restraints. The thresholds can now be crossed.
The availability of cloud computing has clearly changed the way many organisations acquire IT, but for scientists the changes run far deeper. Researchers in industry and academia are using computers in continuously increasing quantities for molecular simulation, virtual screening and DNA and protein sequence analysis, generating vast quantities of data.
In the past, organisations had only two options: spend big on expensive, purpose-built cluster resources and data management systems with high associated management costs; or use shared infrastructure, often at supercomputing centres, and be forced to wait for access, often for weeks or even months.
Both scenarios channelled away the precious resources of time and money from the real task at hand: research. Additionally, the progression and scale of scientific research and development and the demands it places on resources are often unpredictable.
In six months, a project’s technology requirements may have changed three to four times (or more). The affordability, scalability and accessibility of cloud computing is invaluable.
---PB---
Collaboration through the cloud
The 1,000 Genomes Project is the largest study of genetic differences between people to date. The project offers a comprehensive resource on human genetic variation and involves participants from Europe, North America, South America and Asia. To further innovation, all participants are able to share data and analysis in real time through the cloud. This means scientists with less advanced computers and infrastructure have the same access to the raw data as those with supercomputer technology.
Another life sciences organisation taking advantage of cloud computing is the European Bioinformatics Institute (EBI). One of the largest projects currently underway at the EBI is the genome browser, Ensembl.
Ensembl is a central tool used in worldwide bioinformatics research. When working as a global team, latency can become an issue. The EBI has reduced the latency of accessing the Ensembl service for its US collaborators, by moving the service to the cloud. This is making the large amounts of information hosted in the genome databases more readily available to researchers around the world.
The cloud provides an affordable model for global collaboration and it is this sharing of information and collaborative working models that enhance the pace of scientific progress. More time is being spent making discoveries and less on accessing information.
Scalable possibilities
The benefits of the on-demand nature and scalability of cloud computing are currently being enjoyed by Cambridge-based Eagle Genomics, a bioinformatics services and software company specialising in genome content management.
Eagle Genomics stores and analyses large quantities of genomic data for its customers. Recent projects have included biomarker discovery, microarray probe mapping and genome assembly from next-generation sequencing data.
At the heart of Eagle’s analysis projects lies an adapted version of the eHive workflow management system. Eagle’s modifications enable eHive to scale automatically by starting up and spinning down resources in response to capacity demands.
This is something that Eagle could only do cost effectively by having its technology infrastructure in the cloud. This avoids the expense of purchasing and maintaining High Performance Computing (HPC) hardware in-house and avoids underutilised resources.
With every passing day the life sciences’ adoption of cloud technology is increasing rapidly. Economics, a desire to foster more collaboration and the need for faster innovation cycles are leading the life science industry to a new world where scientists have instant access to infinitely scalable resources.
In the next few years, third generation sequencing, massive metagenomics sequencing projects and an increased availability of molecular diagnostics are going to produce unprecedented amounts of data at relatively low costs. Cloud computing will play a crucial role in providing the technology infrastructure that will drive the data-driven future of life science.
---PB---
The cloud – A growing part of the future
IT consolidation is on the rise. Driven by a need to optimise expenses and gain efficiencies, the biomedical and pharmaceutical industry is consolidating IT to focus on core expertise and reduce capital expenditure.
This includes IT infrastructure, which most do not see as a competitive advantage. As organisations grow and work is distributed to scientists across the globe, technology infrastructure running in the cloud will improve efficiencies and utilisations in tandem with growth.
Agility is becoming necessary. It can take organisations months to procure, provision and make dedicated hardware resources available to users. That can feel like years in the fast-moving scientific world and seriously hamper innovation.
IT managers and CIOs have discovered that with the cloud’s ability to rapidly provision resources, scientists can do their job with minimal resource contention. Organisations get to say ‘yes’ to more projects.
New methods lead to new collaborations. Scientific projects require collaboration, increasingly so as scientists start investigating biology at a systems level and collaborating with experts in specialised research functions.
This has led to more distributed partnerships, both public-private and collaborations between academic institutions and companies. The availability of shared data spaces with easy access to on-demand computing resources are creating an ecosystem for data sharing and analysis that is encouraging a healthy trend in scientific collaboration.
Scientific practices are evolving. From its early days, cloud computing has enabled new business models. Many start-ups have flourished because access to cloud services empowered them to create innovative solutions that take advantage of massively distributed architectures without having to invest the capital to build resources.
Life sciences are following a similar trend with more start-ups emerging to provide analysis and data support roles. Instrument and service providers are also leveraging the cloud to distribute data and provide on-demand access to computing pipelines.
Computing paradigms have shifted. Large-scale modelling and simulation and data analysis challenge existing infrastructure and workflow methodologies. Data-intensive workloads require parallel frameworks that are ideally built on top of commodity hardware.
Such systems like Hadoop is becoming part of the solution for difficult computing challenges and is now tuned to successfully run in the cloud. The availability of dynamic cluster computing resources in the cloud has multiplied the capabilities researchers can access to solve scientific problems on a vast scale.
Mouth bacteria linked to increased head and neck cancer risk
More than a dozen bacterial species that live in people's mouths have been linked to a...
Life expectancy gains are slowing, study finds
Life expectancy at birth in the world's longest-living populations has increased by an...
Towards safer epilepsy treatment for pregnant women
New research conducted in organoids is expected to provide pregnant women with epilepsy safer...