Hands-on tutorial on the basics Streams API and KSQL

Event-driven architectures comprise complex business processes interconnected with streams of events. These are often online service uses cases and also backend processes such as billing, fulfillment or fraud detection, which may need to be decoupled from the frontend where users click buttons and expect things to happen.

Event-driven architectures comprise complex business processes interconnected with streams of events. These are often online service uses cases and also backend processes such as billing, fulfillment or fraud detection, which may need to be decoupled from the frontend where users click buttons and expect things to happen.

The event-driven model provides many benefits: It decouples dependencies between services, provides some level of pluggability to the architecture, enables services to evolve independently, etc.

In his book Designing Event-Driven Systems, Ben Stopford explains how event-driven architectures can be used to build business-critical systems. He describes the value of turning databases “inside out” and treating event streams as a “source of truth.” Among other things, the book shows you how to apply patterns, including event collaboration, event sourcing and CQRS for building microservices and event-oriented architectures.

Such systems typically use Apache Kafka® as the foundation. Kafka is like a central dataplane that holds shared events and keeps services in sync. Its distributed cluster technology provides availability, resiliency and performance properties that strengthen the architecture, leaving the programmer to simply write and deploy client applications that will run load balanced and be highly available.

If you are ready to move from reading about these fundamental concepts to more hands-on learning, Confluent offers several resources:

This two-part blog series will help you develop and validate real-time streaming applications. With part 1, we introduce a new resource:

And in the second part, we validate those streaming applications. For now, let’s talk about this new tutorial for developers.

Tutorial for developers

This free, self-paced tutorial is a great introduction for developers who are just getting started with stream processing. You will learn the basics of the Kafka Streams API, which is far richer than a Kafka producer or Kafka consumer, and common patterns to design and build event-driven applications.

The tutorial is based on a small microservices ecosystem, showcasing an order management workflow, such as one you might find in retail and online shopping. It is built using Kafka Streams, whereby business events that describe the order management workflow propagate through this ecosystem. The blog post Building a Microservices Ecosystem with Kafka Streams and KSQL outlines the approach used.

In this example, the system centers on an Orders Service which exposes a REST interface to POST and GET Orders. Posting an Order creates an event in Kafka that is recorded in the topic orders. This is picked up by three different validation engines (Fraud Service, Inventory Service and Order Details Service), which validate the order in parallel, emitting a PASS or FAIL based on whether each validation succeeds.

The result of each validation is pushed through a separate topic, Order Validations, so that we retain the single writer status of the Orders Service —> Orders Topic (Ben Stopford’s book discusses several options for managing consistency in event collaboration). The results of the various validation checks are aggregated in the Validation Aggregator Service, which then moves the order to a Validated or Failed state, based on the combined result.

To allow users to GET any order, the Orders Service creates a queryable materialized view (embedded inside the Orders Service), using a state store in each instance of the service, so that any Order can be requested historically. Note also that the Orders Service can be scaled out over a number of nodes, in which case GET requests must be routed to the correct node to get a certain key. This is handled automatically using the interactive queries functionality in Kafka Streams.

The Orders Service also includes a blocking HTTP GET so that clients can read their own writes. In this way, we bridge the synchronous, blocking paradigm of a RESTful interface with the asynchronous, non-blocking processing performed server side.

There is a simple service that sends emails, and another that collates orders and makes them available in a search index using Elasticsearch.

Finally, Confluent KSQL is running with persistent queries to enrich streams and to also check for fraudulent behavior.

Here is a diagram of the microservices and the related Kafka topics:

To use the tutorial, first you have to properly set up your environment. You can use a local Confluent Platform install or Docker.

Then run the full end-to-end working solution, which requires no code development, to see a customer-representative deployment of a streaming application. This provides context for each of the exercises in which you will develop pieces of the microservices.

After you have successfully run the full solution, go through the individual exercises in the tutorial to better understand the basic principles of streaming applications. For each exercise, the tutorial provides a stub file for which you have to complete the code. By working on these exercises, you will learn the patterns for writing solid streaming applications and gain experience with using the Kafka Streams API. Complete the code (there are hints if you need them!), compile and run the provided tests to ensure it works!

The tutorial walks through the following exercises:

Exercise 1: Persist events

In this exercise, you will persist events into Kafka by producing records that represent customer orders. An event is simply a thing that happened or occurred. An event in a business is some fact that occurred, such as a sale, an invoice, a trade, a customer experience, etc., and it is the source of truth. In event-oriented architectures, events are first-class citizens that constantly push data into applications. Client applications can then react to these streams of events in real time and decide what to do next.

Exercise 2: Event-driven applications

In this exercise, you will let the order event itself trigger a service. In such an event-driven design, an event stream is the inter-service communication that leads to less coupling and queries, enables services to cross deployment boundaries and avoids synchronous execution. In contrast, service-based architectures are often designed to be request driven, in which services send commands to other services to tell them what to do, await a response or send queries to get the resulting state.

A visual summary of commands, events and queries

Exercise 3: Enriching streams with joins

In this exercise, you will write a service that enriches the streaming order information by joining it with streaming payment information and data from a customer database. Many stream processing applications in practice are coded as streaming joins. For example, applications backing an online shop might need to access multiple updating database tables (e.g., sales prices, inventory, customer information) in order to enrich a new data record (e.g., customer transaction) with context information. In these scenarios, you may need to perform table lookups at very large scale and with a low processing latency.

A stateful streaming service that joins two streams at runtime

A popular pattern is to make the information in the databases available in Kafka through so-called change data capture (CDC), together with Kafka’s Connect API to pull in the data from the database (read more in Robin Moffatt’s blog post No More Silos: How to Integrate Your Databases with Apache Kafka and CDC). Once the data is in Kafka, client applications can perform very fast and efficient joins of such tables and streams, rather than requiring the application to make a query to a remote database over the network for each record.

Exercise 4: Filtering and branching

Kafka can capture a lot of information related to an event into a single Kafka topic. Client applications can then manipulate that data based on some user-defined criteria to create new streams of data that they can act on. In this exercise, you will define one set of criteria to filter records in a stream based on some criteria. Then you will define define another set of criteria to branch records into two different streams.

Exercise 5: Stateful operations

In this exercise, you will create a session window to define five-minute windows for processing. You can combine current record values with previous record values using aggregations. They are stateful operations because they maintain data during processing. Oftentimes, these are combined with windowing capabilities in order to run computations in real time over a window of time. Additionally, you will use a stateful operation to collapse duplicate records in a stream.

Exercise 6: State stores

In this exercise, you will create a state store which is a disk-resident hash table held inside the API for the client application. The state store can be used within stream processing applications to store and query data, an important capability when implementing stateful operations. It can be used to remember recently received input records, to track rolling aggregates, to de-duplicate input records, etc.

State stores in Kafka Streams can be used to create use-case-specific views right inside the service

A state store is also backed by a Kafka topic and comes with all the Kafka guarantees. Consequently, other applications can also interactively query another application’s state store. Querying state stores is always read-only to guarantee that the underlying state stores will never be mutated out of band (i.e., you cannot add new entries).

Exercise 7: Enrichment with KSQL

Confluent KSQL is the streaming SQL engine that enables real-time data processing against Apache Kafka. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without requiring you to write code in a programming language such as Java or Python.

KSQL is scalable, elastic, fault tolerant and able to support a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing and sessionization. In this exercise, you will create one persistent query that enriches the orders stream with customer information. You will create another persistent query that detects fraudulent behavior by counting the number of orders in a given window.

Interested in more?

The new tutorial Introduction to Streaming Application Development is a great introduction for developers to learn the basics of the Kafka Streams API, and apply them to a retail microservices example with an event-driven architecture. With each of these exercises, you can dive in and run the end-to-end automated demo.

We hope you will stay with us for part 2 of this blog series, which will help you be successful in validating your streaming applications and cover unit testing, integration testing, Avro and schema compatibility testing, Confluent Cloud™ tools and multi-datacenter testing.

Validating that a solution works is just as important as implementing one. It provides assurance that the application is working as designed, can handle unexpected events and can evolve without breaking existing functionality.

Introduction to Java Stream API

Introduction to Java Stream API

Java Stream API is there since Java 8. It is used to express computation on data is a short and elegant way. In the following post I will introduce you to the most common methods to give you an idea what you can achieve with the Java Stream API.

Java Stream API is there since Java 8. It is used to express computation on data is a short and elegant way. In the following post I will introduce you to the most common methods to give you an idea what you can achieve with the Java Stream API.

Streams vs Collections

Before jumping right into the code I want to explain the difference between Streams and Collections. It is clear that both have properties in common, they can both be iterated for example. You can iterate through Collection external with e.g. for each loops. Instead you don't loop explicit through streams. You express your computations in a functional way and the Stream API deals for iterations internally. Also the Stream API is lazy, its elements are computed or fetched via network by demand. Collections are a in memory data store which means that every element in your Collection has to be computed and stored in your RAM before you can access it. But this does not mean that you can't use streams for already computed data. It rather makes Streams more flexible than Collections in specific situations. You also can create Streams out of Collection based data structures as I will do in the following examples.


In the following I will show you some examples of the usage of Streams and I will also compare my solution to the classical programming approach.


If you want to filter a list, you can just use filter.

Let's assume we have a list of words.

List<String> words = Arrays.asList("Abra", "Kadrabra", "Aloha");

Now we want to filter all words which starts with a capital A and print them.

        .filter(word -> word.startsWith("A"))

We actually use two stream methods here. First we transform the list to a stream and filter and words which starts with a capital A. On the resulting stream we are printing out every word with System.out.println.

You probably have seen similar lambda expressions like the on in the filter before, but maye you have not seen many which I used in the forEach method. System.out::println is just syntactic sugar for following lambda expression: x -> System.out.println(x).

We can also write the code by not using streams by simply using a for loop:

for (String word : words) {
        if (word.startsWith("A")) {

This solution is in my opinion less elegant and can be written more expressive as I showed in the upper code snippet.

We can even improve our Stream example by creating a method for the startsWith condition in a external method.



private static boolean startsWithA(String word) {
    return word.startsWith("A");


With the Stream API you also never have to write code like this again

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);

int sum = 0;
for (Integer number : numbers) {
    sum += number;

Instead you can write


which does exactly the same as the upper code snippet.


map projects every element of a stream into another form. As I used mapToInt to project the element to Int's, I can use map to project every element of a stream in to a new element.

        .map(number -> number * number)

A traditional approach of writing this code would be following snippet:

for (Integer number : numbers) {
    int squaredNumber = number * number;


Next lets assume we have following data source:

List<List<Integer>> matrix = Arrays.asList(Arrays.asList(1, 2), Arrays.asList(3, 4), Arrays.asList(5, 6));

Our task is it now to sum up all element in the matrix.

To flat the data structure we can use flatMap. It concatenates streams and generates a single stream as a result. So to compute the sum, we can use first flatMap to concatenate multiple streams into one and then use mapToIntand sum as shown in the upper example of sum.


flatMap allows use to abstain from for-loops in for-loops and write elegant code in one line.

Your colleagues will thank you if they pull your code.


Last but not least you can transform a stream into a traditional data structure by using collect.

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6);

List<Integer> evenNumbers = numbers.stream().filter(x -> x % 2 == 0).collect(Collectors.toList());


I have uploaded all examples to my Github

Java 8 introduced Java Streams which allows us to express data processing queries in a short functional and elegant way. There are many more operations to explore, this post should just gave you an introduction to the Stream API. I hope you are motivated now to use the Stream API next time when you are using Java.

Develop this one fundamental skill if you want to become a successful developer

Throughout my career, a multitude of people have asked me&nbsp;<em>what does it take to become a successful developer?</em>

Throughout my career, a multitude of people have asked me what does it take to become a successful developer?

It’s a common question newbies and those looking to switch careers often ask — mostly because they see the potential paycheck. There is also a Hollywood level of coolness attached to working with computers nowadays. Being a programmer or developer is akin to being a doctor or lawyer. There is job security.

But a lot of people who try to enter the profession don’t make it. So what is it that separates those who make it and those who don’t? 

Read full article here

How To Manage an SQL Database

How To Manage an SQL Database

SQL databases come installed with all the commands you need to add, modify, delete, and query your data. This cheat sheet-style guide provides a quick reference to some of the most commonly-used SQL commands.

SQL databases come installed with all the commands you need to add, modify, delete, and query your data. This cheat sheet-style guide provides a quick reference to some of the most commonly-used SQL commands.

An SQL Cheat Sheet


SQL databases come installed with all the commands you need to add, modify, delete, and query your data. This cheat sheet-style guide provides a quick reference to some of the most commonly-used SQL commands.

How to Use This Guide:

  • This guide is in cheat sheet format with self-contained command-line snippets
  • Jump to any section that is relevant to the task you are trying to complete
  • When you see <span class="highlight">highlighted text</span> in this guide’s commands, keep in mind that this text should refer to the columns, tables, and data in your own database.
  • Throughout this guide, the example data values given are all wrapped in apostrophes ('). In SQL, it is necessary to wrap any data values that consist of strings in apostrophes. This isn’t required for numeric data, but it also won’t cause any issues if you do include apostrophes.

Please note that, while SQL is recognized as a standard, most SQL database programs have their own proprietary extensions. This guide uses MySQL as the example relational database management system (RDBMS), but the commands given will work with other relational database programs, including PostgreSQL, MariaDB, and SQLite. Where there are significant differences between RDBMSs, we have included the alternative commands.

Opening up the Database Prompt (using Socket/Trust Authentication)

By default on Ubuntu 18.04, the root MySQL user can authenticate without a password using the following command:

sudo mysql

To open up a PostgreSQL prompt, use the following command. This example will log you in as the postgres user, which is the included superuser role, but you can replace that with any already-created role:

sudo -u postgres psql

Opening up the Database Prompt (using Password Authentication)

If your root MySQL user is set to authenticate with a password, you can do so with the following command:

mysql -u root -p

If you’ve already set up a non-root user account for your database, you can also use this method to log in as that user:

mysql -u user -p

The above command will prompt you for your password after you run it. If you’d like to supply your password as part of the command, immediately follow the -p option with your password, with no space between them:

mysql -u root -ppassword

Creating a Database

The following command creates a database with default settings.

CREATE DATABASE database_name;

If you want your database to use a character set and collation different than the defaults, you can specify those using this syntax:

CREATE DATABASE database_name CHARACTER SET character_set COLLATE collation;

Listing Databases

To see what databases exist in your MySQL or MariaDB installation, run the following command:


In PostgreSQL, you can see what databases have been created with the following command:


Deleting a Database

To delete a database, including any tables and data held within it, run a command that follows this structure:


Creating a User

To create a user profile for your database without specifying any privileges for it, run the following command:

CREATE USER username IDENTIFIED BY 'password';

PostgreSQL uses a similar, but slightly different, syntax:


If you want to create a new user and grant them privileges in one command, you can do so by issuing a GRANT statement. The following command creates a new user and grants them full privileges to every database and table in the RDBMS:

GRANT ALL PRIVILEGES ON *.* TO 'username'@'localhost' IDENTIFIED BY 'password';

Note the PRIVILEGES keyword in this previous GRANT statement. In most RDBMSs, this keyword is optional, and this statement can be equivalently written as:

GRANT ALL ON *.* TO 'username'@'localhost' IDENTIFIED BY 'password';

Be aware, though, that the PRIVILEGES keyword is required for granting privileges like this when Strict SQL mode is turned on.

Deleting a User

Use the following syntax to delete a database user profile:


Note that this command will not by default delete any tables created by the deleted user, and attempts to access such tables may result in errors.

Selecting a Database

Before you can create a table, you first have to tell the RDBMS the database in which you’d like to create it. In MySQL and MariaDB, do so with the following syntax:

USE database;

In PostgreSQL, you must use the following command to select your desired database:

\connect database

Creating a Table

The following command structure creates a new table with the name <span class="highlight">table</span>, and includes two columns, each with their own specific data type:

CREATE TABLE table ( column_1 column_1_data_type, column_2 column_2_data_taype );

Deleting a Table

To delete a table entirely, including all its data, run the following:


Inserting Data into a Table

Use the following syntax to populate a table with one row of data:

INSERT INTO table ( column_A, column_B, column_C ) VALUES ( 'data_A', 'data_B', 'data_C' );

You can also populate a table with multiple rows of data using a single command, like this:

INSERT INTO table ( column_A, column_B, column_C ) VALUES ( 'data_1A', 'data_1B', 'data_1C' ),  ( 'data_2A', 'data_2B', 'data_2C' ), ( 'data_3A', 'data_3B', 'data_3C' );

Deleting Data from a Table

To delete a row of data from a table, use the following command structure. Note that <span class="highlight">value</span> should be the value held in the specified <span class="highlight">column</span> in the row that you want to delete:

DELETE FROM table WHERE column='value';

Note: If you do not include a WHERE clause in a DELETE statement, as in the following example, it will delete all the data held in a table, but not the columns or the table itself:


Changing Data in a Table

Use the following syntax to update the data held in a given row. Note that the WHERE clause at the end of the command tells SQL which row to update. <span class="highlight">value</span> is the value held in <span class="highlight">column_A</span> that aligns with the row you want to change.

Note: If you fail to include a WHERE clause in an UPDATE statement, the command will replace the data held in every row of the table.

UPDATE table SET column_1 = value_1, column_2 = value_2 WHERE column_A=value;

Inserting a Column

The following command syntax will add a new column to a table:

ALTER TABLE table ADD COLUMN column data_type;

Deleting a Column

A command following this structure will delete a column from a table:


Performing Basic Queries

To view all the data from a single column in a table, use the following syntax:

SELECT column FROM table;

To query multiple columns from the same table, separate the column names with a comma:

SELECT column_1, column_2 FROM table;

You can also query every column in a table by replacing the names of the columns with an asterisk (*). In SQL, asterisks act as placeholders to represent “all”:

SELECT * FROM table;

Using WHERE Clauses

You can narrow down the results of a query by appending the SELECT statement with a WHERE clause, like this:

SELECT column FROM table WHERE conditions_that_apply;

For example, you can query all the data from a single row with a syntax like the following. Note that <span class="highlight">value</span> should be a value held in both the specified <span class="highlight">column</span> and the row you want to query:

SELECT * FROM table WHERE column = value;

Working with Comparison Operators

A comparison operator in a WHERE clause defines how the specified column should be compared against the value. Here are some common SQL comparison operators:

Operator What it does = tests for equality != tests for inequality < tests for less-than > tests for greater-than <= tests for less-than or equal-to >= tests for greater-than or equal-to BETWEEN tests whether a value lies within a given range IN tests whether a row’s value is contained in a set of specified values EXISTS tests whether rows exist, given the specified conditions LIKE tests whether a value matches a specified string IS NULL tests for NULL values IS NOT NULL tests for all values other than NULL Working with Wildcards

SQL allows the use of wildcard characters. These are useful if you’re trying to find a specific entry in a table, but aren’t sure of what that entry is exactly.

Asterisks (*) are placeholders that represent “all,” this will query every column in a table:

SELECT * FROM table;

Percentage signs (%) represent zero or more unknown characters.

SELECT * FROM table WHERE column LIKE val%;

Underscores (_) are used to represent a single unknown character:

SELECT * FROM table WHERE column LIKE v_lue;

Counting Entries in a Column

The COUNT function is used to find the number of entries in a given column. The following syntax will return the total number of values held in <span class="highlight">column</span>:

SELECT COUNT(column) FROM table;

You can narrow down the results of a COUNT function by appending a WHERE clause, like this:

SELECT COUNT(column) FROM table WHERE column=value;

Finding the Average Value in a Column

The AVG function is used to find the average (in this case, the mean) amongst values held in a specific column. Note that the AVG function will only work with columns holding numeric values; when used on a column holding string values, it may return an error or 0:

SELECT AVG(column) FROM table;

Finding the Sum of Values in a Column

The SUM function is used to find the sum total of all the numeric values held in a column:

SELECT SUM(column) FROM table;

As with the AVG function, if you run the SUM function on a column holding string values it may return an error or just 0, depending on your RDBMS.

Finding the Largest Value in a Column

To find the largest numeric value in a column or the last value alphabetically, use the MAX function:

SELECT MAX(column) FROM table;

Finding the Smallest Value in a Column

To find the smallest numeric value in a column or the first value alphabetically, use the MIN function:

SELECT MIN(column) FROM table;

Sorting Results with ORDER BY Clauses

An ORDER BY clause is used to sort query results. The following query syntax returns the values from <span class="highlight">column_1</span> and <span class="highlight">column_2</span> and sorts the results by the values held in <span class="highlight">column_1</span> in ascending order or, for string values, in alphabetical order:

SELECT column_1, column_2 FROM table ORDER BY column_1;

To perform the same action, but order the results in descending or reverse alphabetical order, append the query with DESC:

SELECT column_1, column_2 FROM table ORDER BY column_1 DESC;

Sorting Results with GROUP BY Clauses

The GROUP BY clause is similar to the ORDER BY clause, but it is used to sort the results of a query that includes an aggregate function such as COUNT, MAX, MIN, or SUM. On their own, the aggregate functions described in the previous section will only return a single value. However, you can view the results of an aggregate function performed on every matching value in a column by including a GROUP BY clause.

The following syntax will count the number of matching values in <span class="highlight">column_2</span> and group them in ascending or alphabetical order:

SELECT COUNT(column_1), column_2 FROM table GROUP BY column_2;

To perform the same action, but group the results in descending or reverse alphabetical order, append the query with DESC:

SELECT COUNT(column_1), column_2 FROM table GROUP BY column_2 DESC;

Querying Multiple Tables with JOIN Clauses

JOIN clauses are used to create result sets that combine rows from two or more tables. A JOIN clause will only work if the two tables each have a column with an identical name and data type, as in this example:

SELECT table_1.column_1, table_2.column_2 FROM table_1 JOIN table_2 ON table_1.common_column=table_2.common_column;

This is an example of an INNER JOIN clause. An INNER JOIN will return all the records that have matching values in both tables, but won’t show any records that don’t have matching values.

It’s possible to return all the records from one of two tables, including values that do not have a corresponding match in the other table, by using an outer JOIN clause. Outer JOIN clauses are written as either LEFT JOIN or RIGHT JOIN.

A LEFT JOIN clause returns all the records from the “left” table and only the matching records from the “right” table. In the context of outer JOIN clauses, the left table is the one referenced in the FROM clause, and the right table is any other table referenced after the JOIN statement. The following will show every record from <span class="highlight">table_1</span> and only the matching values from <span class="highlight">table_2</span>. Any values that do not have a match in <span class="highlight">table_2</span> will appear as NULL in the result set:

SELECT table_1.column_1, table_2.column_2 FROM table_1 LEFT JOIN table_2 ON table_1.common_column=table_2.common_column;

A RIGHT JOIN clause functions the same as a LEFT JOIN, but it prints the all the results from the right table, and only the matching values from the left:

SELECT table_1.column_1, table_2.column_2 FROM table_1 RIGHT JOIN table_2 ON table_1.common_column=table_2.common_column;

Combining Multiple SELECT Statements with UNION Clauses

A UNION operator is useful for combining the results of two (or more) SELECT statements into a single result set:

SELECT column_1 FROM table UNION SELECT column_2 FROM table;

Additionally, the UNION clause can combine two (or more) SELECT statements querying different tables into the same result set:

SELECT column FROM table_1 UNION SELECT column FROM table_2;


This guide covers some of the more common commands in SQL used to manage databases, users, and tables, and query the contents held in those tables. There are, however, many combinations of clauses and operators that all produce unique result sets. If you’re looking for a more comprehensive guide to working with SQL, we encourage you to check out Oracle’s Database SQL Reference.

Additionally, if there are common SQL commands you’d like to see in this guide, please ask or make suggestions in the comments below.

Learn More

MySQL Databases With Python Tutorial

SQL vs NoSQL or MySQL vs MongoDB

Building Web App using ASP.NET Web API Angular 7 and SQL Server

Learn NoSQL Databases from Scratch - Complete MongoDB Bootcamp 2019

MongoDB with Python Crash Course - Tutorial for Beginners

An Introduction to Queries in PostgreSQL

The Complete SQL Bootcamp

The Complete Oracle SQL Certification Course

SQL for Newbs: Data Analysis for Beginners

The Ultimate MySQL Bootcamp: Go from SQL Beginner to Expert