The system I work on is about to undergo a significant tech-refresh. The goal is to improve the current hindrances that prevent ingesting more data. The remedy requires a product that receives inputs from external sources, processes them, and disseminates the outcomes to their destinations.

Apache Nifi implements the flow-based programming (FBP) paradigm; it composes of black-box processes that exchange data across predefined connections (an excerpt from Wikipedia).

In short, Apache NiFi is a tool to process and distribute data. Its intuitive UI supports routing definitions, a variety of connectors (in/out), and many built-in processors. All these features combined together make it a suitable optional platform for our use case.

In view of our system’s future needs, we decided to evaluate Nifi thoroughly. The starting point is setting up an environment.

In this article, I’ll describe how to set up a Nifi environment using Docker images and run a simple predefined template; building a Nifi flow from scratch will be covered in another article. The main three parts of the article are:

  • Reviewing Apache Nifi concepts and building-blocks
  • Setting Nifi Flow and Nifi Registry (based on Docker images)
  • Loading a template and running it

Ready? Let’s start with the foundations.

Nifi Components and Concepts

Nifi is based on the following hierarchy:

  • Process Group
  • A collection of processors and their connections. A process group is the smallest unit to be saved in version control (Nifi Registry). A process group can have input and output ports that allow connecting Process Groups. With that, data flow can be composed of more than one Process Group.
  • Processor
  • A processing unit that (mostly) has input and output linked to another processor by a connector_._ Each processor is a black-box that executes a single operation; for example, processors can change the content or the attributes of the FlowFile (see below).
  • FlowFile
  • This is the logical set of data with two parts (content and attributes), which passes between the Nifi Processors. The FlowFile object is immutable, but its contents and attributes can change during the processing.
  • Connection
  • A Connection is a queue that routes FlowFiles between processors. The routing logic is based on conditions related to the processor’s result; a connection is associated with one or more result types. A connection’s conditions are the relationships between processors, which can be static or dynamic. While static relationships are fixed (for example — Success, Failure, Match, or Unmatch), the dynamic relationships are based on attributes of the FlowFile, defined by the user; the last section in this article exemplifies this feature with _RouteOnAttribute _processor.
  • Port
  • The entry and exit points of a Process Group. Each Process Group can have one or more input or output ports, distinguished by their names.
  • Funnel
  • Combines the data from several connections into a single connection.

The Nifi flow below depicts these components:

Process Group and Nifi elements

After reviewing Nifi data flow components, let’s see how to set up an environmen

#dataflow #software-development #apache-nifi #programming #docker #apache

Setting Apache Nifi on Docker Containers
11.15 GEEK