Data is Always Dirty

Data is Always Dirty

Data is Always Dirty. Its ability to tell a story is only as good as your ability to clean it.

One of the best lessons a statistics 101 class can teach you is that the quality of your story depends on the quality of your data. And the more complex your problem is the more time you will likely spend cleaning your data before you even do anything else interesting with it. Some projects may require 90% of your time dedicated to cleaning data and if you don’t enjoy this or at least accept this fact it can make the job you are performing agonizingly painful. Know thy data — then clean thy data — then and only then analyze thy data!

With the growth of big data the sky is the limit for the potential dirtiness of data and those who embrace this fact are well suited to deal with any type of data that crosses their path. For one public health project that I worked on, one of my responsibilities was to make updates to a program that was run to create quarterly reports about a specific public health metric. The data for the report came from multiple stakeholders. One of the stakeholders in particular was sitting on a data treasure trove but their department was not well suited for handling these types of projects and the data was dirty — really dirty.

Not surprisingly there were regular dilemmas related to compiling reports for this public health metric that were directly related to the data. Individuals in data driven fields such as data science have multi-faceted jobs but an easy interpretation breaks their role down into two primary objectives — the first is to organize and analyze data to tell a coherent story (this is the reporting aspect) and to understand techniques related to handling data (this is the methods and programming aspect). The problem is that these two can be at odds with one another depending on the goal of the project and initiative being worked on.

This project lends itself perfectly to interview questions that start as “name a difficult project you worked on that caused you frustration and describe how you handled each aspect of this project”. For me — my goal coming in was to make adjustments to something that was already written. As such there were people more familiar with the project. To account for this I focused first on technical aspects that were addressable and tried to layer input from each person on the team according to the specific aspects of the project they excelled at or were particularly interested in. The goal being that no single person (myself included) could forget the overall objective as well as their specific tasks because everyone had an important role. This worked well — until it didn’t. Individuals who already have a stake in a project because they have worked on it bring their inherent biases to the table from the start. This revealed itself after the next (inevitable) road block was encountered after some seemingly quick wins that boosted morale.

big-data data-science statistics storytelling research

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Top 20 Latest Research Problems in Big Data and Data Science

Even though Big data is into main stream of operations as of 2020, there are still potential issues or challenges the researchers.

Role of Big Data in Healthcare - DZone Big Data

In this article, see the role of big data in healthcare and look at the new healthcare dynamics. Big Data is creating a revolution in healthcare, providing better outcomes while eliminating fraud and abuse, which contributes to a large percentage of healthcare costs.

Silly mistakes that can cost ‘Big’ in Big Data Analytics

‘Data is the new science. Big Data holds the key answers’ - Pat Gelsinger The biggest advantage that the enhancement of modern technology has brought

Big Data can be The ‘Big’ boon for The Modern Age Businesses

We need no rocket science in understanding that every business, irrespective of their size in the modern-day business world, needs data insights for its expansion. Big data analytics is essential when it comes to understanding the needs and wants of a significant section of the audience.

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.