This story demonstrates the implementation of a “gradient boosted tree regression” model using python & spark machine learning. The dataset used is “bike rental info” from 2011–2012 in the capital bike share system. Our goal is to predict the count of bike rentals.

1. Load the data

The data in store is a CSV file. We are to create a spark data frame containing the bike data set. We cache this data so that we read it only once from the disk.

#load the dataset & cache
df = spark.read.csv("/databricks-datasets/bikeSharing/data-001/hour.csv", header="true", inferSchema="true")df.cache()

df.cache()
#view the imported dataset
display(df)

#machine-learning #python #regression #data-science #spark

1. Load the data

towardsdatascience.com

Gradient Boosted Tree Regression | Python