Or how I learned to stop worrying and love the journey.

Image for post

Once there were two sons of two entrepreneurs. They both decided to follow their fathers’ examples and build businesses out of ideas and relationships. One became a salesman by selling products; the other became a salesman by learning data science. I don’t know which was wiser, but I know which was surprised when he realized he was a salesman. The harder he looked for meaning in data, the more important it became to speak clearly and simply.


I was asked a question on Quora:

When did you have to analyse data and give recommendations based on your result?

Someone wants to know about the workflow when doing data work. No? So I tell her my version. I don’t know if my version is a better version than all the others, but I know it is a good version. It works for me. I learn from data. That’s the point, isn’t it? I sometimes convince people to move forward. That’s my job, to move people forward for good reason. Making a recommendation is easier than making a difference with a recommendation.

How do I do it? I provide at least one data answer every day, though that pace slows down when I’m doing deeper work. Since I’m currently working at a SaaS startup, my answers lead us to better relationships with customers, better use of our time and resources. A lot of the work is exploratory analysis, figuring out a generally accurate but imprecise answer.

Proportions

It’s important to know that a lot of my day leads up to the big moments. Most of it is setting up the problem and sometimes I learn enough to leave a problem alone before I did anything fancy.

About 60–70% of my time is spent on data engineering: finding, cataloging, securing, indexing, or storing data. I work with streams, spreadsheets, and databases.

Around 20–30% of my time is spent on data analysis: exploring, visualizing, and summarizing data. Often I create baseline models to get the general shape of the data, the relationships between factors, and an idea of problems with incomplete data, outliers, or other problems.

The rest, between 0–20% of my time, is spent with machine learning. Often baseline models are good enough to make a decision. If I do some machine learning, a lot of it is supervised learning, especially working with categorical data. I rarely process natural language or images. I even more rarely build reinforcement learning models. That’s this project. Other projects use other types of work.

The important thing is the proportion of the work. Set the expectation that you’re working with data, and you won’t be disappointed. Set the purpose to learn from data, and you’ll often be satisfied with accurate but imprecise answers. You have to learn to be satisfied with imprecise answers, because that’s the pace of life. People are waiting to use what you have. If they’re not, either you don’t really know anything at all, or you haven’t built a working relationship with decision makers yet.

Examples

A lot of companies break this work up differently.

Uber has tried pairing data engineers with machine learning experts. More commonly, they use machine learning experts like consultants, temporarily lending them to a project. The team monitors the machine learning models over time so they can get the experts back when it’s time to upgrade their work.

Google uses pragmatism to engineer great products first (PDF), depending on machine learning only when required. They depend heavily on strong data pipelines and containerized services. They has been organizing around containers since before they were cool.

Microsoft seems to be winning at data work by building strong development stacks, the way they’ve always done. Microsoft knows how to build a consistent and complete environment more than any group during my decades in technology.

I work in startups, so I am typically more of a generalist than you’d see in larger organizations. I’ll get involved on a project and usually only handle some tools or models before it’s time to move on. Small organizations are tactical with a sometimes brutal frugality.

#career-advice #workflow #data-science #startup #machine-learning #data analysis

This Is How You Build a Data Workflow
1.15 GEEK