The Data Scientist’s Guide To Reproducible Research

Reproducible research is a very important element to the scientific method that is often overlooked. However minor reproducibility might seem at a given time, it is actually one of the most important factors that will effect your research within the realms of any applied science. This impact is only illustrated further with data-science, as often generalized data can lead to misleading conclusions and false theories that do more harm than good.

It doesn’t matter how ground-breaking your discovery might be; you could discover that wearing polyester directly correlates with developing terminal illness, or even find the cure for cancer, or

the covid-19 vaccine.

The fact of the matter is that if you design an experiment that is inherently not reproducible, then you are destroying the venerability of your observations obtained from that experiment. Fortunately, with the tremendous technology that scientists (and even more so, Data-Scientists) have at their finger tips, making your research reproducible and known is easier than it ever has been before.

Notebook Etiquette

Jupyter-Notebook is most likely one of the greatest pieces of software attributing to the ease of computing reproducible. For those who are not already using Jupyter-Notebook, the software allows you to access a server which will run a virtual kernel to execute cell-level code from your web-browser. Not only is this incredibly convenient for large pools of scientific peers who might be working on projects interchangeably together, but also makes a valiant stride towards emulating thoughts and ideas in controlled bursts of code for other scientists to review.

Let’s not forget that one of the other incredibly important steps in the scientific process is peer review. Humans by nature are flawed in terms of bias. Where there is data, unfortunately there is also potential to skew said data to perform to potential biases that a scientist may have, regardless of whether the scientist is aware of this happening or not. Let’s be honest, we’ve all performed a test we were super excited about and had to do an about face and…

Accept the null.

I can certainly understand why accepting the null can be discouraging. While it is certainly true that accepting the null means that your scientific idea did not pass your test, accepting the null still pushes science forward because now we know that on at least one occasion it was statistically inaccurate. So while it can be hard not to skew research, regardless of the direction that your research goes it is certainly something to be proud of.

#data-science #jupyter-notebook #science #programming #statistics

Notebook Etiquette

towardsdatascience.com

The Data Scientist’s Guide To Reproducible Research