1596690600

*We are back! This post is a continuation of the series “Predicting Interest Rate with Classification Models”. I will make my best to make each article independent in a way that it won’t need the previous ones for you to make the most of it.*

In the first article of the series, we applied a Logistic Regression model to predict up movements of the Fed Fund Effective Rate. For that, we used Quandl to retrieve data from Commodity Indices, Merrill Lynch, and US Federal Reserve.

Data | Image by Author

The variables used in the series articles are RICIA the Euronext Rogers International Agriculture Commodity Index, RICIM the Euronext Rogers International Metals Commodity Index, RICIE the Euronext Rogers International Energy Commodity Index, EMHYY the Emerging Markets High Yield Corporate Bond Index Yield, AAAEY the US AAA-rated Bond Index (yield) and, finally, USEY the US Corporate Bond Index Yield. All of them are daily values ranging from 2005–01–03 to 2020–07–01.

Now let’s move to the intuition of the models that we will use!

#computer-science #random-forest #ai #artificial-intelligence #machine-learning

1623223443

Predictive modeling in data science is used to answer the question “What is going to happen in the future, based on known past behaviors?” Modeling is an essential part of data science, and it is mainly divided into predictive and preventive modeling. Predictive modeling, also known as predictive analytics, is the process of using data and statistical algorithms to predict outcomes with data models. Anything from sports outcomes, television ratings to technological advances, and corporate economies can be predicted using these models.

**Classification Model:**It is the simplest of all predictive analytics models. It puts data in categories based on its historical data. Classification models are best to answer “yes or no” types of questions.**Clustering Model:**This model groups data points into separate groups, based on similar behavior.- **Forecast Model: **One of the most widely used predictive analytics models. It deals with metric value prediction, and this model can be applied wherever historical numerical data is available.
**Outliers Model:**This model, as the name suggests, is oriented around exceptional data entries within a dataset. It can identify exceptional figures either by themselves or in concurrence with other numbers and categories.**Time Series Model:**This predictive model consists of a series of data points captured, using time as the input limit. It uses the data from previous years to develop a numerical metric and predicts the next three to six weeks of data using that metric.

#big data #data science #predictive analytics #predictive analysis #predictive modeling #predictive models

1597549020

This is the final article of the series “ Predicting Interest Rate with Classification Models”. Here are the links if you didn’t read the First or the Second articles of the series where I explain the challenge I had when started at M2X Investments. As I mentioned before, I will try my best to make this article understandable *per se*. I will skip the explanation of assumptions regarding the data for “article-length” reasons. Nevertheless, you can check them in previous posts of the series. Let’s do it!

In previous articles, I applied a couple of classification models to the problem of predicting up movements of the Fed Fund Effective Rate. In short, it is a binary classification problem where 1 represents up movement and 0, neutral or negative movement. The models applied were Logistic Regression, Naive Bayes, and Random Forest. Random Forest was the one that yielded the best results so far, without hyperparameter optimization, with an F1-score of 0.76.

#machine-learning #support-vector-machine #interest-rates #ai #catboost

1596627660

A couple of years ago, I started working for a quant company called M2X Investments, and my first challenge was to create a model that could predict the interest rate movement.

After a couple of days working solely to clean and prepare the data, I took the following approach: build a **simple model** and then **reverse engineer**it to make it better (optimizing and selecting features). Then, if the results weren’t so good, I would **change the model** and **make the same process again** and so forth.

Therefore, these series of posts objective is to apply different classification models to predict the upward movement of the interest rate, providing a brief intuition of the model (there are a lot of posts that cover the model’s mathematics and concepts), and compare their results. By giving more attention to the upward movements, we simplify the problem.

*Note: from here on, the data set I will use is fictitious and for educational purposes only.*

The data set used in this post is from Quandl, specifically from Commodity Indices, Merrill Lynch, and US Federal Reserve. The idea was to use **agriculture, metals, and energy indices**, along with **corporate yield bond rates, **to classify the up movements of the **Federal funds’ effective rate**.

**A brief introduction to Logistic Regression**

Logistic Regression is a binary classification method. It is a type of Generalized Linear Model that predicts the occurrence’s probability of a binary or categorical variable utilizing a logit function. It relies on a kind of function called _sigmoid, _that map the input to a value between 0 and 1.

#machine-learning #ai #predictions #computer-science #artificial-intelligence

1623906928

Model Stacking is a way to improve model predictions by combining the outputs of multiple models and running them through another machine learning model called a meta-learner. It is a popular strategy used to win kaggle competitions, but despite their usefulness they’re rarely talked about in data science articles — which I hope to change.

Essentially a stacked model works by running the output of multiple models through a “meta-learner” (usually a linear regressor/classifier, but can be other models like decision trees). The meta-learner attempts to minimize the weakness and maximize the strengths of every individual model. The result is usually a very robust model that generalizes well on unseen data.

The architecture for a stacked model can be illustrated by the image below:

#tensorflow #neural-networks #model-stacking #how to use “model stacking” to improve machine learning predictions #model stacking #machine learning

1598792160

If you are reading this, then you probably tried to predict who will survive the Titanic shipwreck. This Kaggle competition is a canonical example of machine learning, and a right of passage for any aspiring data scientist. What if instead of predicting *who* will survive, you only had to predict *how many* will survive? Or, what if you had to predict the *average age* of survivors, or the *sum of fare* that the survivors paid?

There are many applications where classification predictions need to be aggregated. For example, a customer churn model may generate probabilities that a customer will churn but the business may be interested in *how many* customers are predicted to churn, or *how much revenue* will be lost. Similarly, a model may give a probability that a flight will be delayed but we may want to know how many flights will be delayed, or how many passengers are affected. Hong (2013) lists a number of other examples from actuarial assessment to warranty claims.

Most binary classification algorithms estimate probabilities that an example belongs to the positive class. If we treat these probabilities as known values (rather than estimates), then the number of positive cases is a random variable with Poisson Binomial probability distribution. (If the probabilities were all the same, the distribution would be Binomial.) Similarly, the sum of a two-value random variables where one value is a zero and the other value some other number (e.g. age, revenue) is distributed as Generalized Poisson Binomial. Under these assumptions we can report mean values as well as prediction intervals. In summary, if we had the true classification probabilities, then we could construct the probability distributions of any aggregate outcome (number of survivors, age, revenue, etc.).

Of course, the classification probabilities we obtain from machine learning models are just estimates. Therefore, treating the probabilities as known values may not be appropriate. (Essentially, we would be ignoring the sampling error in estimating these probabilities.) However, if we are interested only in the aggregate characteristics of survivors, perhaps we should focus on estimating parameters that describe the probability distributions of these aggregate characteristics. In other words, we should recognize that we have a *numerical prediction* problem rather than a *classification* problem.

In this note I compare two approaches to getting aggregate characteristics of Titanic survivors. The first is to *classify and then aggregate*. I estimate three popular classification models and then aggregate the resulting probabilities to get aggregate characteristics of survivors. The second approach is *a regression model* to estimate how aggregate characteristics of a group of passengers affect the share that survives. I evaluate each approach using many random splits of test and train data. The conclusion is that many classification models do poorly when the classification probabilities are aggregated.

#machine-learning #titanic #classification #aggregation #predictions