Introduction

Numerai Tournament

Numerai is a crowdsourced fund, a hedge fund that operates based on the results of stock price predictions made by an unspecified number of people. Numerai holds tournaments in which participants compete for forecasting performance. Tournament participants will build a predictive model based on an encrypted dataset provided by Numerai, and then use it to create a submission. Participants will be ranked and paid (and sometimes burned) based on their predictive performance. Numerai’s backers include Howard Morgan — co-founder of Renaissance Technology, Paul Tudor-Jones, Union Square Ventures and other prominent VCs and persons with significant hedge fund experience. The Numerai dataset is supervised by an advisor specializing in financial machine learning. The total prize money paid out to participants to date is in excess of $34 million, and the project is presumably making good progress.

Image courtesy of Numerai

About the author

The author invests in the Japanese stock market using market neutral methods. Market neutral aims for an absolute return that is independent of the market’s price movements by combining buying and selling (long and short), predicting the relative rise and fall of stock prices in the universe (a group of stocks to invest in). Based on traditional quantitative methods and statistics, the author built this predicting model by machine learning. The results have been good, with a yield of around 40%.

Purpose of this article

In this article, I will share the insights gained in the process of building the author’s model. I first explain the concept of traditional quantitative approach and discuss how to blend it with machine learning to build a modern predictive model.

Notes

Numerai’s dataset is encrypted and the author has no inside knowledge of it. This article is solely the viewpoint from the author’s investing and modeling experience.

Traditional quantitative approach

The study of predicting stock returns has been around for a long time. Let’s start with an explanation of what the traditional quantitative method is and its origins.

BARRA’s risk model

The prototype of the current quants is probably the risk model proposed by Barr Rosenberg [1]. There are many theories about this, but for a history of Wall Street in this area, you should definitely read Peter Bernstein’s book “Capital Ideas” [2].

In the 1960s, based on Markowitz’s covariance model, Rosenberg devised a method to explain the risk of individual companies using a variety of factors. And also he found that these risk factors were related to the excess return on stock prices (risk premium). In 1975, Rosenberg founded a consulting firm, Barr Rosenberg Associates, Inc. This firm became known to management firms around the world as BARRA.

Currently, the BARRA model is the most well-known risk model and MSCI offers it as a vendor. Other risk models include Axioma. Although there are various types of BARRA models, the BARRA Global Equity Model (GEM) is a risk model for stocks in major equity markets around the world [3]. This model decomposes equity returns into country factors, industry factors, risk factors, and individual factors as follows.

This can be described in a multiple regression model as follows. Rn is the excess return (relative to the risk-free interest rate) of stock n, x is the factor exposure of stock n to each factor (k, j, and i), f is the factor return, and en is the specific return. The key here is the concept of factor returns.

Factor returns

For simplicity, I will use a single-factor model rather than a multi-factor model to explain. I will also proceed with the Numerai dataset structure as a concrete example. Factor returns are the regression coefficients f in the following cross-sectional regressions. Here r is the target vector in eraX and x is the vector of featureA in eraX.

Factor return is a measure of how much return is expected by betting on that risk factor in the universe. Factor exposure is how much the stock is exposed to that risk factor, and the greater the exposure, the greater the benefit from the factor returns. As can be seen from the equation above, the regression model is a cross-sectional model over a specific period of time (eraX), and in the actual testing process we cumulate it over time (e.g., monthly) and observe its characteristics.

Below is an example of factor returns from the BARRA GEM document. If a factor return is noticeably rightward, it means that as long as you bet on that factor, you can get a stable return. If it is going to be noticeably downward, then you can bet against that factor (switching long and short). In the current year 2020, few factor returns are noticeable in one direction. Therefore, one should construct a portfolio diversified across a variety of factors considering factor exposure of each stock.

#finance #machine-learning #numerai #stock-market