Resistance to computation is futile

By Andrew Lonie*
Wednesday, 22 May, 2013

Associate Professor Andrew Lonie outlines the importance of bioinformatics to today’s research labs and showcases one researcher who talks about how it has changed the way she works.

Everyone will have heard of the ‘junk DNA is not junk after all’ story published in Nature in September 2012. Known as ENCODE, this worldwide effort to profile all the different elements of the human genome has created even more opportunities for people with computer programming and mathematical skills.

It will also increase the level of frustration felt by people in large research projects when progress depends on opening the high-end computation ‘door’. Demand for computational modelling, image analysis and bioinformatics means those with these skills will be in business for quite a while.

Those of us working at the coalface of bioinformatics and computational imaging often find ourselves needing to manage the expectations of our collaborators. Understanding the data generated by the new technologies requires complex and often experimental analysis techniques combined with computational grunt. This means we need to talk the language of computers to researchers who sometimes only speak biology and convey the realities of analysing massive sets of error-prone data using approaches that are still developing.

Occasionally we’ll find that the data doesn’t exactly match the outcome that a collaborator is after, which is why we like to get in early to help with experimental planning, if we can. One thing that is certain is that bioinformatics and computational biology are part of the research, not a post-experiment add-on, and the best analyses require lots of interaction between us all. And as a researcher, you really need to understand what is being done to your data.

So what is the best way to proceed? At the VLSCI Life Sciences Computation Centre we have found the right way is to collaborate with groups who already understand what the work involves and are prepared to partner with us to achieve the best results. Not only is their data higher quality and they appreciate the nature of our work, but they also understand that subscribing to a pool of experts who will share their knowledge and networks to help build the researchers’ own skill sets produces quality research and clinical outcomes in this relatively new industry.

Those productive researchers are the ones who have taken it upon themselves to learn these new skills. One such is Dr Victoria Perreau, who is a senior researcher in the Centre for Neurosciences Research, at the University of Melbourne. She works on projects related to multiple sclerosis with Professor Trevor Kilpatrick, who is Division Head of MS at the Florey Institute of Neuroscience and Mental Health. She recently reflected on her own experience (see below).

With the ENCODE announcement we can see the future and it is not slowing down. Those who are not already thinking this way have a steep learning curve ahead of them. There is no avoiding getting some computer knowledge in the new age of biology.

*Associate Professor Andrew Lonie is a computer scientist and bioinformatician, and is head of the Victorian Life Sciences Computation Initiative (VLSCI) Life Sciences Computation Centre (LSCC) at the University of Melbourne. His background is in genetics, molecular biology, information systems and computer science and he uses all of these disciplines daily, analysing and visualising very complex datasets generated by high-throughput genomic technologies.

Victoria’s story

I was originally trained as a molecular biologist and have developed an expertise in microarray analysis and pathway analysis over the last eight to 10 years. I somehow managed to avoid having to do any command line analysis by sticking to software packages that allowed me to do some comprehensive analysis whilst staying ignorant of any programming knowledge. I just didn’t have the time to re-train.

However, I realised over the last two to three years that the advent of next-generation sequencing would have many repercussions, including eventually making microarray array technology obsolete. It would also force analysts to use high-performance computing, and thus off-the-shelf software won’t work because the size of the average datasets could no longer be handled by a desktop computer.

Therefore, it became clear that if I wanted to continue a career path in research in front-line technologies applied to neuroscience, particularly expression analysis and genomics, I would need to learn enough to be able to send jobs to high-end computers using the Unix command line and also have an understanding of programming.

I am a firm believer that there is a lot of expression and genomic data out on the web that can be utilised for hypothesis testing and development in an inexpensive and efficient way. I have already identified some RNA-seq data that has been published that I wish to analyse with a novel perspective and identify different splice variants. But first I need to build up the required skill set to do it efficiently and with best practice.

Towards the end of 2011, I made contact with Andrew Lonie at the Victorian Life Sciences Computation Initiative (VLSCI) and attended a one-day RNA-seq workshop at the Life Science Computation Centre (LSCC). This gave me sufficient introduction to give me confidence that I could achieve my goals and have a basic understanding of the processes involved and allowed me start-up access to the VLSCI.

In semester one this year I took an accredited undergraduate subject, ‘Introduction to computing’. In addition to basic principles of managing computational data, I learned how to write programs in the Python programming language. I am now doing some other online courses. I also attended some introductory Unix workshops run by VLSCI and have been helped to troubleshoot some problems in my work by Dr Bernard Pope at VLSCI.

I am also teaching myself how to use the University of California Santa Cruz table browser and the Galaxy project website to get genomic data, and filter and manage this data to use in my analyses. I am already writing my own short Python programs to do analysis on CHPseq data on transcription factor binding sites for a colleague and am producing graphs for him that answer questions of his data he could not otherwise ask.

It is very clear to me now that, as a biologist, I have to be prepared to put in the hard work to simply learn (and practise) the basics to improve my capacity to communicate with the computer scientists who are trying to help me.

Dr Victoria Perreau is a bioinformatician and is Group Leader of the Bioinformatics and Gene Expression Analysis Group, Centre for Neuroscience Research, Department of Anatomy and Neuroscience at the University of Melbourne. She has a PhD and post doc in molecular biology, and she has many years of experience at the wet bench investigating RNA expression in many aspects of health and disease in the CNS.

Image caption: One of the IBM Blue Gene/Q supercomputers installed at the VLSCI facility in Parkville, Melbourne. (Image: VLSCI)

Resistance to computation is futile

Victoria’s story

Mini lung organoids could help test new treatments

Clogged 'drains' in the brain an early sign of Alzheimer’s

World's oldest known RNA extracted from woolly mammoth

Content from other channels on our network