When we talked about Data Processing or doing Big Data ETL, the first that comes to mind is using Hadoop (or Spark). Today, I would like to show another way of creating Data Processing using the Actor System. I will be using classic Akka Actors.
Note: This article will not talk about an Actor System, and introduce an Actor System in Akka. However, I will write another essay explaining the Akka Actor’s introduction, or you can also read the documentation on the Akka side.
Why use the Actor System?
Modern distributed systems with demanding requirements encounter challenges that traditional object-oriented programming (OOP) models cannot solve.
One of the fallback examples of OOP in creating distributed systems includes the challenge of encapsulation. If you write a regular thread model in Java, encapsulation doesn’t encapsulate multiple threads in your application. Encapsulation is only valid in the single-threaded model. To be thread-safe for encapsulation to occur, you need to do synchronized. Other fallbacks can be seen in the Akka side.
Goal
I will create an application that will get data from a log CSV file and use a master-worker pattern to process each message.
The goal is to aggregate and count all the occurrence Http Status code in the Log file corresponding to the Log file content and write it to the out.txt file.
Actor Hierarchy
Since there are multiple components involved in getting the data from the source file, splitting the work, and aggregating them, let’s examine the different actors’ role in this application.
Image for post
There are four kinds of actors: Supervision Actor, Ingestion Actor, Master Actor, Worker Actor.
Supervision Actor supervises, spawning, and stopping Ingestion Actor. Supervision Actor will handle any sort of event failure in the Ingestion Actor.
Ingestion Actor reads the log file from the source, initialized the Master Actor. It splits the log file to each line and sends each line to the Master Actor. Finally, it will receive the aggregated value from the Master Actor and write that value to the out.txt file.
Master Actor plays a role in creating Worker Actor and delegating tasks to the Worker Actors. It also aggregates the results and sends them back to the Ingestion Actor.
The worker Actor will create all the heavy lifting in process.
The data that we will read is a log file, and it will contain IP, Time, URL, Status:
10.128.2.1,[29/Nov/2017:06:58:55,GET /login.php HTTP/1.1,200
However, the data is not always in the right order, and we need to do a filter to process the data.
In this article, I will walk through the process of the code for Supervisor Actor and Ingestion Actor. Master Actor and Worker Actor will be in part 2.
Let’s go forth and conquer.

#programming #data-processing #akka #scala #functional-programming

How to Write a Simple Data Processing Application With Akka Actors
3.75 GEEK