Missing Data

If you open an Excel file and scroll to the bottom (tip: CTRL + south arrow) you’ll find out it ends at 1,048,576. Not a single row more.

This limitation was blamed by Public Health England for the loss of 15,841 positive COVID test results, as reported by The Guardian¹. In turn, following the estimate that each non-complex case has 3 close contacts, it leads to at least 47,000 potentially infectious people not being informed and required to self-isolate, and potentially be traced. This figure can rise beyond 50,000 as it was reported² that a minority of the 15,841 missing positive cases were complex, i.e. from environments such as hospitals, prisons, homeless shelters, which have 7 close contacts on average.

How It Happened

The Public Health England (PHE) organization collects public and private lab results. As reported by The Guardian, one lab had sent its daily test report to PHE as a CSV document. PHE is then consolidating the results in Excel for the lab, alongside the results received from other labs. The organization would then be reporting the official figures and doing the follow-up on tracing and informing individual positive cases.
Yet, it seems the lab at hand was already within the 1 million+ rows within PHE’s datasheet. The addition of yet another daily batch led to some results being truncated, and hence not reported and acted upon.

#coronavirus #data-wrangling #data-management #excel #data-science

Excel’s Limitation Caused Loss of 16,000 Positive COVID Cases
1.05 GEEK