Your Data Processing is Your Data Science Brand

What could be a personal brand for Data Scientist? It is the project they did? or the personality they shown on social media? Or is it the company they are employed in?. In my opinion, everything I just mentioned could become your brand — depend on how you wrap it up. However, I want to give my opinion on another aspect of the Data Scientist personal brand: Data Processing.

What exactly is Data Processing? It is a stage where we are processing raw data we have collected or acquired and transform it into “cleaner” data that ready to use for any purposes (analysis, visualization, description, etc.). Essentially, Data Processing is the pillar to have any decent result/product out of your dataset.

How is Data Processing could become your personal brand? In this article, I would show you the detail of why Data Processing is your own personal brand as a Data Scientist. Let’s get into it.

Data Processing is all about asking the right question

You already collected or acquired the dataset in some way; what is next then?. How you’d processing the data would eventually depend on what kind of question you want to solve — and it needs you to compose the right question to ask.

It might seem like a simple task. The task is about asking a question, what so hard about it? Nothing could go wrong with it, right?. The same question could have a different meaning depending on the context, and composing a good question would consider the business and technical understanding.

Imagine that your company has a fraud problem and ask your data science team to do something about it by developing a machine learning model to detect any potential fraud. Sounds easy? You only need to pull the data from a database, process the data, develop the model, and BOOM! You have the machine learning model. Then the business user asks, have the model considering the case “XYZ,” or the fraud prediction is the type “ABC”?. Have you already considering those cases?.

This is the problem that often happens; what you develop or process is not suitable for what the problem should solve or miss some important detail. Without the right information, the question you would ask becomes different and leads you to wrong data processing.

What is unique for each person is how they would address the “right question” problem. It is often all about discussing with the business user what they want, but sometimes the user does not even know what they want until they get the product. In this case, you need to exert your creativity to compose the right question based on the information you had and the research you do where the data processing would depend on all of this activity — the creative process is your brand and something you should show.

