Processing data for Machine Learning with TensorFlow

Turn your dataset into TensorFlow for the beginner step by step

It is so confusing when training your data using TensorFlow, seeing errors showing shapes or dtype is not right. This is my note trying to organize the tf dataset in an easy way for movie review classification.

In this article, I’m going to deal with the Large Movie Review Dataset and train a Keras.models.Sequential model, which is a plain stack of layers model.

My steps:

1. Load Dataset

2. Create tf.data.Dataset for input

3. Create TextVectorization layer (including tokenization and padding)

4. Create Bag of Word

5. Create the model

6. Fit and train model

Load dataset

Starting from checking what files in the zip file, we can use os.walk(filepath). Then we will have something like this:

import tensorflow as tf
from tensorflow import keras
from pathlib import Path
import numpy as np
import os

path = 'http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz'
FILENAME = 'aclImdb_v1.tar.gz'
filepath = keras.utils.get_file(FILENAME, path, extract=True)

files = Path(filepath).parent/'aclImdb'

for name, subdirs, file in os.walk(files):
  print(name,file)

#machine-learning #python #tensorflow #keras

Turn your dataset into TensorFlow for the beginner step by step

Load dataset

medium.com

Processing data for Machine Learning with TensorFlow