My Framework For Helping Startups Build And Deploy Data Science

I help startups go from “product” to “product+machine learning”.

This is my framework for achieving that, including advice, caveats and examples at each stage.

While every company, problem and data is different, there’s always a lot in common.

This framework revolves around building a proof-of-concept ASAP, then incrementally improving it. This follows my experience in ML that you don’t know if something will work until you try it.

I mostly work on natural language processing, but this framework applies equally to images and numeric data.

Start with a problem or data

Companies with ML-potential fall into two buckets:

Start with a problem (to solve with data)
Start with data (to extract value from)

Anecdotally, tech companies fall into #1 and non-tech companies fall into number #2.

Starting with a problem

You have a problem that ML may be able to solve.

Example: A startup wants to recommend which vegetables can be grown, given geography and environmental conditions. They don’t have data related to this space.

The first step is brainstorming what data is required.

Data can then be acquired via strategic partnerships, web scraping or open data sets.

Starting with data

You own data (and likely a functioning business) and want to derive additional value from that data.

Example: A uniform manufacturer owns granular movement data about each of it’s sales reps.

The first step is brainstorming potential use-cases for the data.

In this example, it could be detecting which salespeople are the least efficient in navigating their territory. So they can be proactively encouraged to improve.

Starting with data (and an existing business) is great because marginal improvements are valuable. Starting with a problem only makes sense if solving it is aspirational and game-changing.

Data exploration

Understanding your data is important, but often gets too much attention in data science projects.

Data exploration helps find gaps between what you think data looks like, and what it actually looks like. This is not the place to show off your data visualization skills.

Investigate data-types, distributions of values, and if anything is missing or dirty.

List out popular solutions

Whether you plan on coding a solution from scratch, or using an API on AWS, make a list of potential algorithms, libraries and APIs.

Example: Your product detects whether an image contains a poisonous mushroom.

List out how this might be solved. Options including Fastai, Keras, Sklearn, Amazon Rekognition and some other niche classification providers.

This is high level and not intended to be exhaustive. Simply find the popular options you can run with.

#product-management #tech #machine-learning #data-science #education