I love to participate in code challenges. There are tons of ’em online, each having their strengths and weaknesses. Most of them focus on data structures and concern themselves with Big O for speed and memory. However, my main occupation is data science and AI, not software engineering.

Sure there’s Kaggle and data and problems to be found everywhere, but this doesn’t provide the quick “1-question challenge” format I love so much about the software engineering challenges.

Anyway, I decided to make something myself.

Data playground

So here’s the idea: a 1-button dataset generator, that comes with a question; i.e. what is the top predictor in X for given y?

I slapped some code together, which you can find here: data-playground. Below is a piece of code that captures the idea:

def generate_data(size=100, n_vars=5):
    data = np.zeros((size, n_vars))

    # insights
    data = add_insights(data)

    # return as X, y
    X = data[:, :-1]
    y = data[:, -1]

    return X, y, answer

Test run

Let’s see if we can solve a challenge we made for ourselves. Let’s answer the question: Which feature in X is the strongest predictor for y?

To get data:

import data_playground
X, y, answer = data_playground.generate_data()

Let’s have a look at the data we got using pandas boxplot function:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(X)
df[‘y’] = y
fig = plt.figure(figsize=(16,10)) # show large in Jupyter lab
df.boxplot()

#data-science #python #challenge #git #playground

Sometimes, I want answers, not constraints.
1.10 GEEK