TFX for Dummies TensorFlow Extended

_Disclaimer: Before you read this, I would strongly recommend reading the _TensorFlow Data for Dummies

TensorFlow is widely known for its scalability and efficiency. With TensorFlow 2.x automating the entire process of Graph nodes, creating Machine Learning Pipelines has been easier than ever.

But what if I told you there was a simpler approach. This approach is widely used in Google to develop their own sophisticated Machine Learning Models.

Before we jump into our first hands-on tutorial, a quick primer on the essential concepts you would need to thoroughly understand the working of Apache Beam.

What are PCollections? (Apache Beam Primer)

There are three main components in Beam: Pipeline, PCollection, and PTransform.

• Pipeline encapsulates the workflow of your entire data processing tasks from start to finish.

• PCollection is a distributed dataset abstraction that Beam uses to transfer data between PTransforms.

• PTransform is a process that operates with input data (input PCollection) and produces output data (output PCollection). Usually, the first and the last PTransformsrepresent a way to input/output data which can be bounded (batch processing) or unbounded (streaming processing).

To simplify things, we can consider Pipeline as DAG (directed acyclic graph) which represents your whole workflow, PTransforms as nodes (that transform the data) and PCollections as edges of this graph.

#tensorflow #data-science #artificial-intelligence #machine-learning #tensorflow-extended

What are PCollections? (Apache Beam Primer)

medium.com

TFX for Dummies TensorFlow Extended