Following the steps from my previous articles, I preprocessed the text data and transformed the “cleaned” data into a sparse matrix. Please follow the links to check out more details.

Now I am at the last step of conducting news sentiment analysis on WTI crude oil future prices. In this article, I will discuss the use of logistic regression, and some interest results I found from my project. I have some background introduction to this project here.

Define and Construct the Target Value

As discussed briefly in my previous articles, conducting sentiment analysis is solving a classification problem (usually binary) by machine learning models and text data. Solving a classification problem is solving a supervised machine learning problem, which requires both features and target values when training the model. If it is a binary classification problem, the target values are usually positive sentiment and negative sentiment. They are assigned and detailedly defined depending on the context of your research question.

Take my project as an example, the purpose of my project is to predict the change of the crude oil future prices from recently released news articles. I define the positive news as the ones that would predict the price increase, while the negative ones would predict the price decrease. Since I have already collected and transformed the text data and will use them as the features, I now need to assign the target values for my dataset.

The target value of my project is the directions of price change with respect to different news articles. I collected the high-frequency trading data from Bloomberg for the WTI crude oil future close price, which is updating every five minutes.

#machine-learning #gridsearchcv #logistic-regression #sentiment-analysis #data-science

A Step-by-Step Tutorial for Conducting Sentiment Analysis
1.45 GEEK