The problem of the names of genes being converted unnoticed by Excel in scientific publications is even greater than previously assumed. This is what a team led by Mandhri Abeysooriya from Deakin University in Australia found out. Almost every third scientific publication with an Excel list of genes in the appendix showed such errors, previously it was about 20 percent. After the problem had been known for years, there was no improvement either, warn the researchers. Just a year ago, the Human Genome Organization committee responsible for naming human genes had changed dozens of names to remedy the situation.
Mark Ziemann, who is now involved in the study, had already drawn attention to the problem five years ago. The point is that Microsoft’s spreadsheet, Excel, converts certain alphanumeric names for genes into dates automatically and without notice. After Microsoft did not react and no other solution had emerged, the HUGO Gene Nomenclature Committee (HGNC) officially renamed several dozen genes last year. Since then, the MARCH1 gene is now called MARCHF1 (“Membrane associated ring-CH-type finger 1”), and SEPT1 became SEPTIN1 (“Septin 1”). In an English-language Excel spreadsheet, this became “1-Mar” or “1-Sep”. In the German versions, the behavior can be traced with “MÄRZ1”.
To quantify whether paying attention to the problem could reduce the number of errors, Abeysooriya and her colleagues Ziemann analyzed more than 11,000 scientific publications on genetics topics with Excel attachments. They appeared in specialist magazines between 2016 and 2020, Explain. Almost every third table therefore contained such errors; in 2016 it found an error rate of around 20 percent. The team acknowledges that the name change should have reduced the problem in the meantime. It will not go away as a result, among other things because it was only about genes of humans, of mice and rats. Genes from other animals could still trigger such conversions. In addition, possible problems in Excel tables in other languages were not addressed.
The research team does not take responsibility for the software out of responsibility, but it does not expect a reaction from Microsoft either. Instead, they give the researchers themselves recommendations for possible countermeasures. Excel is not intended for this work anyway; for example, scripted analyzes in Python or R would be useful here. A programming language would have to be learned for this, but it would still pay off in the long term. But if a spreadsheet really needs to be used, then recommend LibreOffice, as the problem does not occur there. And if you really cannot do without Excel, then you have to be particularly careful when including the data.