Disclaimer

The content of this article is intended for an academic audience, I make no gaurantee nor do I accept any liability whatsoever. You as the reader are completely at your own risk in regard to exploring or otherwise making use of the material contained herein. It is your responsibility to abide by all applicable privacy laws for your District, State, Province, Country and/or jurisdiction, specifically in regard to any and all material mentioned or referred to in this article. Consider yourself warned.

Introduction

In this article I will discuss a publically-available Electronic Health Record (EHR) dataset. Ever the optimist, my hope is that one day there will be a widespread availability of de-identified EHR. Researchers who are equipped with this information should be in a better position to properly test their healthcare applications. Potential applications include training a non-linear programming model to help predict patient outcomes given certain treatements and procedures. Others involve improvements to interoperability and information exchange both within and among healthcare facilities. A third application is the benchmarking of various backends. Last but not least, the use of Big Data should help inform operational and managerial improvments including better care and service delivery models. The result: more and more people receiving optimum healthcare!

The landscape of publically-available EHRs at the time of this writing is, to put it bluntly, sparse. I mean, really sparse. Many websites appear helpful at first glance, to the point of offering a plethora of reports and various other research tools. Unfortunately most of the underlying data used to generate these products is highly obfuscated. It is comparable to the tips of so many icebergs! Getting to the source data is next to impossible unless you have privileged access as a member of a partnering institution or government agency.

Despite some initial set-backs I did not give up in my journey to find a public dataset of EHRs. My criteria for this gathering exercise consisted of the following:

  • Privacy. The data must be properly de-identified to protect the privacy of the patients and their families. Apart from the obvious choice of doing nothing, simulated data is a decent alternative to help alleviate privacy concerns. See this link for simulated EHRs. Unfortunately, simulated data may not provide enough real-world fidelity, relevance, noisiness or scores of other criteria depending on the researcher’s intended application. A third option includes de-identified patient records whereby a healthcare facility or group has agreed to release records after they have passed the privacy ‘sniff test’. In these instances, it is prudent on the part of the researcher to have such a decision in-hand to accompany their work. This can include an approved record-of-discussion, reference number or other form of auditable evidence as to the data’s de-identification and releasability from a privacy standpoint. See the above disclaimer. You will be wise to tread very cautiously as there can be severe penalties in your country in regard to breaches of privacy of patient health records.

#healthcare #data-science #postgresql #openehr #electronic-health-record

Exploration into Open Electronic Health Records
1.05 GEEK