I love to participate in code challenges. There are tons of ’em online, each having their strengths and weaknesses. Most of them focus on data structures and concern themselves with Big O for speed and memory. However, my main occupation is data science and AI, not software engineering.
Sure there’s Kaggle and data and problems to be found everywhere, but this doesn’t provide the quick “1-question challenge” format I love so much about the software engineering challenges.
Anyway, I decided to make something myself.
So here’s the idea: a 1-button dataset generator, that comes with a question; i.e. what is the top predictor in X for given y?
I slapped some code together, which you can find here: data-playground. Below is a piece of code that captures the idea:
def generate_data(size=100, n_vars=5):
data = np.zeros((size, n_vars))
# insights
data = add_insights(data)
# return as X, y
X = data[:, :-1]
y = data[:, -1]
return X, y, answer
Let’s see if we can solve a challenge we made for ourselves. Let’s answer the question: Which feature in X is the strongest predictor for y?
To get data:
import data_playground
X, y, answer = data_playground.generate_data()
Let’s have a look at the data we got using pandas boxplot function:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(X)
df[‘y’] = y
fig = plt.figure(figsize=(16,10)) # show large in Jupyter lab
df.boxplot()
#data-science #python #challenge #git #playground