And how to implement it in your forecasting model using Gradient Boosting regression. These features can then be used to improve the performance of machine learning algorithms.
According to Wikipedia, feature engineering refers to the process of using domain knowledge to extract features from raw data via data mining techniques. These features can then be used to improve the performance of machine learning algorithms.
Feature engineering does not necessarily have to be fancy. One simple, yet prevalent, use case of feature engineering is in time-series data. The importance of feature engineering in this realm is due to the fact that (raw) time-series data usually only contains one single column to represent the time attribute, namely date-time (or timestamp).
Regarding this date-time data, feature engineering can be seen as extracting useful information from such data as standalone (distinct) features. For example, from a date-time data of “2020–07–01 10:21:05”, we might want to extract the following features from it:
Extracting such kinds of features from date-time data is precisely the objective of the current article. Afterwards, we will incorporate our engineered features as predictors of a gradient boosting regression model. Specifically, we will forecast metro interstate traffic volume.
This article will cover the following.
A step-by-step guide to extract the below features from a date-time column.
How to incorporate those features in a Gradient Boosting regression model to forecast metro interstate traffic volume.
In this article, we will explore the different types of features which are commonly engineered during forecasting projects and the rationale for using them.
According to a survey in Forbes, data scientists spend 80% of their time on data preparation. This shows the importance of feature engineering in data science.
This blog is based on the notebook I used to submit predictions for Kaggle In-Class Housing Prices Competition. My submission ranked 293 on the score board, although the focus of this blog is not how to get a high score but to help beginners develop intuition for Machine Learning regression techniques and feature engineering.
NGBoost: Natural Gradients in Gradient Boosting. The reign of the Gradient Boosters were almost complete in the land of tabular data.
How to apply modern Machine Learning on Volume Spread Analysis (VSA).Following up the previous posts in these series, this time we are going to explore a real Technical Analysis (TA) in the financial market. For a very long time, I have been fascinated by the inner logic of TA called Volume Spread Analysis (VSA). I have found no articles on applying modern Machine learning on this time proving long-lasting technique.