If you google “data science use cases”, you will find hundreds of lists of them, each item starting with a buzz word such as fraud detection, recommendation system or other fancier terms. A short paragraph follows, attempting to explain it in 200 words, barely enough to put this buzz word together with other buzz words such as AI, data science, machine learning, deep learning, all spiced up with superlatives. Anyway, data science (or AI, or machine learning, or deep learning) should make things better, or else, what’s the point?Looking at those lists, it could be puzzling for a DS¹ trying to make sense of what needs to be actually done. They are inspirations at the best, but not recipes nor contain any know-how. As (good) DS, we should be able to identify a potential use case, find and master the tools to solve it, and deliver impact together with our business colleagues. Let’s start from the identification, and here is how you can do it well.
There are at least two prerequisites for a data science use case, one is data (unsurprisingly) and the second is a business decision which lead to an action. Data science provides tools to examine the mechanisms underlying the data, so that it can be leveraged to make better business decisions. In a very schematic way, we can design an exercise to find out the best data science use cases in 4 steps:Step 1: understand the context
Click here to see the full deck
Click here to see the full deck
Click here to see the full deck
This process could take a while, since it’s not always easy to know what going on in other parts of the company. However, it is crucial to have a good understanding about the mechanism and rationales that drives a BDA, otherwise we risk passing over the real problem to solve and end up with a perfect but useless model. So, search for documentations and presentations, and most importantly, talk to people.Step 2: narrow down the focusIt’s time to evaluate and select the best BDAs for your data science project. Try to answer the following questions for each of them:
For the ideal BDAs, you should be able to answer “yes” to all these questions. They are the ones which can benefit the most from data science, and that’s why we should focus on them.Step 3: high-level data evaluationData quality is crucial to the feasibility of a data science use case. An early evaluation can helps us prioritise, and choose the right method and technology. To get a high-level understanding of the data quality, we can consider the following aspects:
Click here to see the full deck
In the best scenario, the data source should be reliable, and used as a main input of the decision making. The data size could indicate to us right tools (e.g. distributed system should be considered when the data size is large) and often times the methods (e.g. for small data set, simple algorithms such as linear regression or traditional statistical methods are preferred to avoid overfitting).Step 4: wrap up and concludeBy comparing the results from step 2 and 3, we should be able to select or even rank the BDAs according to these criteria based on two dimensions — impact and feasibility:
Click here to see the full deck
Use cases can then be defined by grouping together closely linked BDAs with overlapping data requirements and key stakeholders. Finally, to find out where to start and where to finish, we will again look at the impact and feasibility of each use case, but now with an additional dimension — the dependency. Needless to say, the ones on which other cases are depend should started early. And now, you have a solid roadmap for your data science use case(s) !
#data-science #strategy #use-cases #towards-data-science #business