This article was originally published at:
https://www.blog.duomly.com/python-course-with-building-a-fintech-investment-ai-lesson-2-pandas-and-getting-financial-data
Welcome to the second lesson of the awesome Python course for AI!
In the last lesson, we built the first database migrations and made the project setup.
Here is the URL:
Python course with building a fintech investment AI – Lesson 1: Start the projectToday, we will build super exciting stuff and will focus on the financial database that we will use to train our AI later.
We will work with pandas, some financial plugins, and will do a bit of refactoring.
As well, I will teach you how to create CLI commands in Python.
I cannot wait to teach you all of those powerful skills, so let's start ASAP!
And if you prefer video, here is the youtube version:
In the last episode, we created DB migrations and database models with some types.
Today we will change them a bit to have the possibility of using prices with many numbers after the decimal.
To do that, we need just change types in variable "date" to "db.Date()”.
Next, we need to change all Integers to "db.Numeric()”.
Take a look at the example below.
id = db.Column(db.Integer, primary_key=True) company = db.Column(db.String()) date = db.Column(db.Date()) openPrice = db.Column(db.Numeric()) highPrice = db.Column(db.Numeric()) lowPrice = db.Column(db.Numeric()) closePrice = db.Column(db.Numeric()) volume = db.Column(db.Numeric())
Now, we should save our changes and update the database with the new changes.
I would suggest initing DB from scratch.
As the first step, you need to delete the "migrations" directory (if you created it in the previous lesson).
Next, open terminal and type:
flask db init
Next:
flask db migrate
Next:
flask db migrate
Great!
We can focus on the next features now, and go into the prices module.
Let's start by creating the directory named "prices".
Create that directory in the root of the folder.
Next, go into the created directory and create the file named "prices.py".
Before we go into the proper development, we should remember about installing all of the necessary dependencies that we will use for building AI.
First, we need to create yfinance, that will help us to connect with the financial API, and download all necessary data about the prices.
Open the terminal and type:
pip install yfinance
Next, we need to install pandas:
pip install yfinance
The last one that we need to install is pandas datareader that we will use to handling our financial data:
pip install pandas_datareader
So, we have prepared all the necessary stuff that we needed for development.
Now, we can go into the prices.py file, and import all of these dependencies.
Let's take a look at the example below:
from datetime import date, timedelta import yfinance as yf import pandas as pd from pandas_datareader import data as pdr from app import app, db from models.prices import PriceModel
Next, we need to override yfinance with pandas.
That line of code should be below imports inside the prices.py file.
yf.pdr_override()
We could do that by typing/commenting on the functions as we had done in golang.
But it'd be a shame if we weren't using the user-friendly features of the flask.
We can define CLI commands very, very easily.
It's enough to use the "app" decorator and define the command that should fire the logic that is below the command definition.
Let's take a look:
@app.cli.command("get-prices")
Below the CLI command, we need to specify a function that we will want to fire.
In this case, we need a function named "getPrices".
We will use that to start getting prices.
@app.cli.command("get-prices") def getPrices():
In the next step, we should specify company symbols that we would like to get pricing for.
We should create a list and put there all the NASDAQ symbols as strings.
I've used the heaviest 50 of the NASDAQ, but you can specify whatever companies that you would like to get data for.
stockList = ['MSFT', 'AAPL', 'AMZN', 'FB', 'GOOGL', 'GOOG', 'GOOGL', 'INTC', 'NVDA', 'CSCO', 'ADBE', 'NFLX', 'PEP', 'PYP', 'CMCSA', 'TSLA', 'COST', 'GOOGL', 'AMGN', 'TMUS', 'AVGO', 'CHTR', 'TXN', 'GILD', 'QCOM', 'SBUX', 'INTU', 'MDLZ', 'VRTX', 'FISV', 'BKNG', 'ISRG', 'REGN', 'ADP', 'AMD', 'ATVI', 'CSX', 'BIIB', 'ILMN', 'MU', 'AMAT', 'JD', 'ADSK', 'MELI', 'ADI', 'LRCX', 'MNST', 'WBA', 'EXC', 'KHC', 'LULU', 'EA']
In this step, we should take all of the NASDAQ symbols and put them inside the for loop that will iterate through them.
For every iteration of the stock, we should call the function named "getPrice" and pass "stock" as an argument.
for stock in stockList: getPrice(stock)
To have the possibility of using it, we need to create the function "getPrice".
Setup "stock" as a param name in the function.
def getPrice(stock):
Now, we can start focusing on the core functionality that we are interested in today, fetching data.
As a first step, we need to specify the date range that we are interested in.
I will set up today as the last day, and day that was 31 days ago, that will be my time-range for the stock prices.
today = date.today() monthAgo = today - timedelta(days=31)
In the next step, we can start getting data!
Look at the example below. We use pandas_datareader.
Next, I've set up stock, that I want to get, the start date, and the end date that is today.
data = pdr.get_data_yahoo(stock, start=monthAgo, end=today)
To have a better life and easier manipulation of data, we need to reformat that a bit.
So we will be able to iterate through that data much more comfortable.
df = pd.DataFrame(data)
Now is the moment where we can iterate through all of the prices for the xx company.
We use "df.iterrows()" method here because all of the rows are from pandas.
Next, we call "addPrice", and pass necessary data as arguments.
We pass the whole "row" that we will split later, next we pass index (it's data here), but formatted.
And we pass "stock" to know what stock is that price for.
for index, row in df.iterrows(): addPrice(row, index.strftime("%Y-%m-%d"), stock)
To make using the "addPrice" possible, we need to define that function.
Use "price", "date", and "stock", as params.
def addPrice(price, date, stock):
We are close to the final.
In this step, we should set up the data that we've got from financial API.
We need to fit that to our PriceModel, that we've created in the Python course Lesson1.
new_price = PriceModel(date=date, company=stock, openPrice=price['Open'], highPrice=price['High'], lowPrice=price['Low'], closePrice=price['Close'], volume=price[‚Volume'])
If our data is ready, we have nothing else to do than just create a new record inside the database.
Don’t forget about „db.session.commit()”, that will update our database.
db.session.add(new_price) db.session.commit()
Congratulations!
The prices module is ready now. You can import it inside the app.py.
Add this line of code below the "PriceModel" import.
from prices.prices import getPrices
You can test your application now.
Before you start it, make sure the flask app variable is configured properly (we did it in Lesson 1).
If yes, just open a terminal in the project and type:
flask get-prices
Congratulations!
Now your project has hundreds of financial records.
Those are ready to start working with Artificial Intelligence, which we will build.
What is more important, you've learned where to look for the stock prices, how to select them, and how to save them into the database.
Code repository for the Python course Lesson 2 is here:
https://github.com/Duomly/python-ai-investment-fintech/tree/Python-AI-course-Lesson-2
In the next lesson, we will focus on building the first AI logic for our algorithm.
Keep learning, and I'm super excited I have a chance to teach you how to build powerful investment AI that can predict stock prices!
Thanks for reading,
Radek from Duomly
#python #web-development #machine-learning #database #sql