Creating a data pipeline is one thing; bringing it into production is another. This is especially true for a modern data pipeline in which multiple services are used for advanced analytics. Examples are transforming unstructured data to structured data, training of ML models and embedding OCR. Integration of multiple services can be complicated and deployment to production has to be controlled. In this blog, an example project is provided as follows:
The code from the project can be found here, the steps of the modern data pipeline are depicted below.
1. High level dataflow, image by author
The architecture of the project will be discussed in the next chapter. Subsequently, a tutorial is provided how to deploy and run the project.
#software-development #devops #data-engineering #programming #azure