I love making spreadsheets. I like lining up little columns of numbers and writing formulas to do things to them. It’s halfway between coding and note-taking. I have sheets for accounts (obviously) but also for projects, holidays, and hobbies. There’s one for the contents of my loft. My New Year’s resolutions? They’re in a spreadsheet. Often, when I start thinking about something, I automatically open a sheet and structure my thoughts into rows and columns. If all you have is a spreadsheet, everything looks like a cell (to misquote Abraham Maslow.)

Use Excel for any length of time and you become familiar with its foibles. Type in a phone number and, if you’re unlucky, it’ll turn it into something like 8.E+09. Best case scenario you’ll lose the first 0. Sometimes numbers get turned into dates. Sometimes dates get turned into numbers. I’ve got used to seeing #N/A.

These things are annoying, but you get used to them. However, if you’re a geneticist, problems like these plague your industry. Typing most genes into Excel isn’t a problem. “Myosin regulatory light chain interacting protein” is fine (shortened to MYLIP), but type in “Membrane-associated ring-CH-type fingers”(shortened to MARCH1) and Excel recognizes it as a date and “helpfully” converts it to March 1, 2020.

This tickles me. It’s the sort of weird edge case I find amusing. When the first Excel software engineer wrote the feature to scan text and convert certain values to dates, who would have thought that one day that would mess up scientific research documents? I also have a sense of relief that I’m not the only one who has to battle Excel. But this gene formatting, more than an amusing quirk, is actually a surprisingly big issue. “A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions,” scientists wrote in a study four years ago. Indeed, they have been writing about the issues Excel causes them since 2004. This delightfully quirky oddity has been messing up genomics journals for two decades.

That was until a few weeks ago when the HUGO Gene Nomenclature Committee (HGNC) decided to rename the problematic genes so that they didn’t get converted into dates in Excel. MARCH1 becomes MARCHF1, SEPT1 becomes SEPTIN1, and so on. Put another way: Geneticists got so annoyed Excel messed up their data they changed the official scientific names to make them more Excel-friendly.

#digital-life #science #technology #genetics #excel #data science

Excel Kept Messing Up the Names of Genes, So Scientists Renamed Them
1.10 GEEK