While completing a highly informative AICamp online class taught by Tyler Elliot Bettilyon (TEB) called Deep Learning for Developers, I got interested in creating a more structured way for machine-learning model builders — like me as the student — to understand and evaluate various models and observe their performance when applied to new datasets. Since this particular class focused on TensorFlow (TF), I started to investigate TF components for building a toolset to make this type of modeling evaluation more efficient. In doing so, I learned about two components, TensorFlow Datasets (TFDS) and TensorBoard (TB), that can be quite helpful and this blog post discusses their application in this task. See the References section for links to AICamp, TEB and other useful resources.

Objective

While the term ‘pipeline’ may have several meanings when used in a data science context, I use it here to mean a modeling pipeline or set of programmatic components that can automatically complete end-to-end modeling from loading data, applying a pre-determined model and logging performance results. The goal is to set up a number of modeling tests and to automatically run the pipeline for each test. Once the models are trained, each test result can be easily compared to the others. In summary, the objective is to establish an efficient, organized and methodical mechanism for model testing.

Figure

The logical flow of the modeling pipeline

This approach is depicted in Figure 1. The pipeline consists of three steps:

  1. Data: Loading and processing a dataset,
  2. Analysis: Building predefined models and applying to this dataset,
  3. Results: Capturing key metrics for each dataset-model test for methodical comparison later.

Any analyst who has studied or even dabbled with deep learning neural networks has probably experienced the seemingly boundless array of modeling choices. Any number of many layer types, each with a multitude of configuration options, can be interconnected, and once stacked the model can be trained using multiple optimization routines and numerous hyper-parameters. And there is the question of data, since it may be desirable to apply promising models to new datasets to observe their performance on unseen data or to gain a foundation for further model iterations.

For this application, I worked exclusively with image-classification data and models. TFDS includes audio, image, object-detection, structured, summarization, text, translate and video data and deep-learning models can be specifically constructed for these problems. While the out-of-the box code presented here will require some modifications and testing to be applied to other sets, its foundational framework will still be helpful.

#pipeline #tensorflow #tensorflow datasets #tensorboard

A TensorFlow Modeling Pipeline using TensorFlow Datasets and TensorBoard
5.00 GEEK