Page URL:

Time to confront genomics data problem, say scientists

13 July 2015
Appeared in BioNews 810

Scientists have warned that the world of genomics is headed for a data bottleneck.

The team of maths and computer specialists discovered that the data created by genomic studies will soon overtake that of social media giants such as YouTube and Twitter. Even the high-tech and processor power-hungry field of astronomy does not currently generate as much data as genomics, they report in PLoS Biology.

YouTube, the current leader in the field of data generation, has 100 petabytes (100 million gigabytes) of video uploaded to its servers every year - over a thousand times what the average home computer could store. By comparison, genomics is currently generating 25 petabytes a year but the rate at which the data is produced is doubling every seven months, mostly due to the refinement and falling costs of sequencing techniques.

'As genome-sequencing technologies improve and costs drop, we are expecting an explosion of genome sequencing that will cause a huge flood of data,' said Professor Gene Robinson, director of the Carl R Woese Institute for Genomic Biology at the University of Illinois.

By 2025 it is estimated that up to two billion people will have had their genomes sequenced, meaning the level of genomic data could hit exabyte levels - or billions of gigabytes. This huge influx of data leads to the problem of, not just how to store it but, how to acquire, distribute and analyse it. And, the researchers say that all four of these challenges must be tackled if we are to solve the 'genomics data problem'.

Professor Robinson said, 'Genomics will soon pose some of the most severe computational challenges that we have ever experienced.

'If genomics is to realise the promise of having a transformative positive impact on medicine, agriculture, energy production and our understanding of life itself, there must be dramatic innovations in computing. Now is the time to start.'

According to an editorial appearing this week in Nature, perhaps one such innovation could be a more collaborative use of cloud storage.

An international group of prominent researchers, headed by Dr Lincoln Stein, put out a call to the community to collectively fund a cloud computing network that would take the strain from private networks of individual institutions. The group argues that the challenge of accessing large datasets is blocking scientists' progress, particularly when it comes to building on or replicating previous work.

They propose that funding bodies should pay for large genomic datasets to be stored and accessed in cloud format, meaning that researchers can save time and money by not having to download or process the data on local computers.

'We have now reached a stage where these data sets are too large to move around - cloud computing offers us the flexibility to hold the data in one virtual location and unleash the world's researchers on it all together,' said co-author Dr Peter Campbell, head of cancer genomics at the Wellcome Trust Sanger Institute.

Big Data: Astronomical or Genomical?
PLOS Biology |  7 July 2015
Big Data researchers call for support for more accessible and more effective storage of data in the cloud
Ontario Institute for Cancer Research |  9 July 2015
Biggest beast in big data forest? One field's astonishing growth is 'genomical!'
Eurekalert (press release) |  7 July 2015
Data analysis: Create a cloud commons
Nature |  9 July 2015
DNA storage concerns for computers
Yahoo News (PA) |  7 July 2015
Genome researchers raise alarm over big data
Nature News |  7 July 2015
Sequencing the genome creates so much data we don’t know what to do with it
Washington Post |  7 July 2015
14 March 2016 - by Isobel Steer 
Genetic-testing company Ambry Genetics has launched a huge database of cancer-patient genetics, freely available to the public...
5 October 2015 - by Dr Rosie Morley 
An international team of scientists from the 1000 Genomes Project Consortium has created the world's largest catalogue of genomic differences among humans...
21 September 2015 - by Dr Nicoletta Charolidi 
The first findings from the UK10K project, the largest population genome sequencing effort to date, have been made available to worldwide researchers...
29 June 2015 - by Paul Waldron 
Genome analysis software developed by the Broad Institute is now available in cloud form to users of Google's online genetic data storage services...
16 March 2015 - by Dr Hannah Somers 
Three British men have been diagnosed with rare diseases after having their complete genomes sequenced as part of the UK-based 100,000 Genomes Project...
16 March 2015 - by Arit Udoh 
US-based genetic testing company, 23andMe, has announced plans to use its customers' data for research and drug development...
18 November 2014 - by Chris Baldacci 
Google has announced that it will offer storage and analysis of genome sequencing data...
12 December 2013 - by Dr Ruth Stirton 
23andMe and UK Biobank are both large genetic databases: big enough to engage in serious population genetic research. But 23andMe has not undergone any ethical approval processes - think what they could do if they sold their database...
to add a Comment.

By posting a comment you agree to abide by the BioNews terms and conditions

Syndicate this story - click here to enquire about using this story.