Reproduction and Fertility is an open-access, peer-reviewed journal
Page URL:

One-fifth of genomics papers flawed due to Microsoft Excel

5 September 2016
Appeared in BioNews 867

Around one-fifth of scientific papers involving genomic data contain errors caused by the default settings in Microsoft Excel, according to a study.

Under its default settings, the spreadsheet software Excel is known to convert shortened gene names into dates and numbers. For example, the gene symbol for Membrane-Associated Ring Finger, MARCH1, is automatically converted to 01-03-2016 by default, and the gene SEPT2 is altered to 'September 2'.

After scanning 3597 scientific papers published between 2005 and 2015 from 18 different genomics journals, researchers writing for the academic institute Baker ID, in Melbourne, Australia identified 704 papers with gene name errors in their supplementary data sheets that were caused by Excel conversions.

Professor Assam El-Osta, lead author of the paper, explained: 'The errors were found specifically on the supplemental data sheets of academic studies'. He added supplemental pages contain 'important supporting data, rich with information', and that resolving these errors could be 'time-consuming'.

Notably, leading journals such as Nucleic Acids Research, Nature Genetics and Genome Biology – the journal in which the study was published – had the highest proportion (more than 20 percent) of errors. In contrast, less than 10 percent of papers from journals including Molecular Biology and Evolution, Bioinformatics, DNA Research, and Genome Biology and Evolution were affected.

The scientific community first mentioned the gene renaming errors a decade ago, but the problem has not been resolved. The authors of the paper reported an increase in errors by an annual rate of 15 percent, which has occurred over the past five years and has brought into question the thoroughness of the peer-review process.

Speaking to BBC News, Dr Ewan Birney, director of the European Bioinformatics Institute, said: 'What frustrates me is researchers are relying on Excel spreadsheets for clinical trials', adding that the Excel gene renaming issue has been known among the scientific community since 2004. He recommended that the program should only be considered for 'lightweight scientific analysis'.

The study reports that gene renaming errors also affected other spreadsheets including LibreOffice Calc and Apache OpenOffice Calc, but not, apparently, Google Sheets. The researchers conclude that for now issues can be avoided if reviewers, editors, and authors remain vigilant.

A spokesperson for Microsoft Excel told BBC News: 'Excel offers a wide range of options, which customers with specific needs can use to change the way their data is represented.'

An alarming number of scientific papers contain Excel errors
The Washington Post |  26 August 2016
Didn't Quite Mean 1-Mar
genomeweb |  25 August 2016
Gene name errors are widespread in the scientific literature
Genome Biology |  23 August 2016
Microsoft Excel blamed for gene study errors
BBC News |  25 August 2016
Years of genomics research is riddled with errors thanks to a bunch of botched Excel spreadsheets
Quartz |  28 August 2016
10 August 2020 - by Daniel Jacobson 
Twenty-seven human genes have been renamed by the HUGO Gene Nomenclature Committee (HGNC) over the past year due to Excel misreading their symbols as dates...
to add a Comment.

By posting a comment you agree to abide by the BioNews terms and conditions

Syndicate this story - click here to enquire about using this story.