The fully end to end example that tensorflow extended provides by running tfx template copy taxi $target-dir
produces 17 files scattered in 5 directories. If you are looking for a smaller, simpler and self contained example that actually runs on the cloud and not locally, this is what you are looking for. Cloud services setup is also mentioned here.
We are going to generate statistics and a schema for the Chicago taxi trips csv dataset that you can find by running the tfx template copy taxi
command under the data
directory.
Generated artifacts such as data statistics or the schema are going to be viewed from a jupyter notebook, by connecting to the ML Metadata store or just by downloading artifacts from simple file/binary storage.
Full code sample at the bottom of the article
The whole pipeline can run on your local machine (or on different cloud providers/your custom spark clusters as well). This is an example that can be scaled by using bigger datasets. If you wish to understand how this happens transparently
It’s a good naming practise to use
_/temp_
or_/tmp_
for temporary files and_/staging_
or_/binaries_
for the staging directory.
#apache-beam #tensorflow-extended #deep-learning #tensorflow #google-cloud-platform