Kafka Connect 101: Deployment

Kafka Connect runs as a separate process from the Kafka broker, known as a worker. Typically, the worker is deployed on a separate host from the Apache Kafka® broker. Watch this video to learn about how work is performed in Kafka Connect and the different deployment modes available for Kafka Connect workers.


What is GEEK

Buddha Community

Kafka Connect 101: Deployment

PostgreSQL Connection Pooling: Part 4 – PgBouncer vs. Pgpool-II

In our previous posts in this series, we spoke at length about using PgBouncer  and Pgpool-II , the connection pool architecture and pros and cons of leveraging one for your PostgreSQL deployment. In our final post, we will put them head-to-head in a detailed feature comparison and compare the results of PgBouncer vs. Pgpool-II performance for your PostgreSQL hosting !

The bottom line – Pgpool-II is a great tool if you need load-balancing and high availability. Connection pooling is almost a bonus you get alongside. PgBouncer does only one thing, but does it really well. If the objective is to limit the number of connections and reduce resource consumption, PgBouncer wins hands down.

It is also perfectly fine to use both PgBouncer and Pgpool-II in a chain – you can have a PgBouncer to provide connection pooling, which talks to a Pgpool-II instance that provides high availability and load balancing. This gives you the best of both worlds!

Using PgBouncer with Pgpool-II - Connection Pooling Diagram

PostgreSQL Connection Pooling: Part 4 – PgBouncer vs. Pgpool-II


Performance Testing

While PgBouncer may seem to be the better option in theory, theory can often be misleading. So, we pitted the two connection poolers head-to-head, using the standard pgbench tool, to see which one provides better transactions per second throughput through a benchmark test. For good measure, we ran the same tests without a connection pooler too.

Testing Conditions

All of the PostgreSQL benchmark tests were run under the following conditions:

  1. Initialized pgbench using a scale factor of 100.
  2. Disabled auto-vacuuming on the PostgreSQL instance to prevent interference.
  3. No other workload was working at the time.
  4. Used the default pgbench script to run the tests.
  5. Used default settings for both PgBouncer and Pgpool-II, except max_children*. All PostgreSQL limits were also set to their defaults.
  6. All tests ran as a single thread, on a single-CPU, 2-core machine, for a duration of 5 minutes.
  7. Forced pgbench to create a new connection for each transaction using the -C option. This emulates modern web application workloads and is the whole reason to use a pooler!

We ran each iteration for 5 minutes to ensure any noise averaged out. Here is how the middleware was installed:

  • For PgBouncer, we installed it on the same box as the PostgreSQL server(s). This is the configuration we use in our managed PostgreSQL clusters. Since PgBouncer is a very light-weight process, installing it on the box has no impact on overall performance.
  • For Pgpool-II, we tested both when the Pgpool-II instance was installed on the same machine as PostgreSQL (on box column), and when it was installed on a different machine (off box column). As expected, the performance is much better when Pgpool-II is off the box as it doesn’t have to compete with the PostgreSQL server for resources.

Throughput Benchmark

Here are the transactions per second (TPS) results for each scenario across a range of number of clients:

#database #developer #performance #postgresql #connection control #connection pooler #connection pooler performance #connection queue #high availability #load balancing #number of connections #performance testing #pgbench #pgbouncer #pgbouncer and pgpool-ii #pgbouncer vs pgpool #pgpool-ii #pooling modes #postgresql connection pooling #postgresql limits #resource consumption #throughput benchmark #transactions per second #without pooling

Ruth  Nabimanya

Ruth Nabimanya


Tips About Kafka Connect On Heroku You Can't Afford To Miss


With ever-increasing demands from other business units, IT departments have to be constantly looking for service improvements and cost-saving opportunities. This article showcases several concrete use-cases for companies that are investigating or already using Kafka, in particular, Kafka Connect.

Kafka Connect is an enterprise-grade solution for integrating a plethora of applications, ranging from traditional databases to business applications like Salesforce and SAP. Possible integration scenarios range from continuously streaming events and data between applications to large-scale, configurable batch jobs that can be used to replace manual data transfers.

#kafka-connect #kafka #heroku #database #database-architecture #apache-kafka #tutorial #cluster

akshay L

akshay L


Kafka Spark Streaming | Kafka Tutorial

In this kafka spark streaming tutorial you will learn what is apache kafka, architecture of apache kafka & how to setup a kafka cluster, what is spark & it’s features, components of spark and hands on demo on integrating spark streaming with apache kafka and integrating spark flume with apache kafka.

# Kafka Spark Streaming #Kafka Tutorial #Kafka Training #Kafka Course #Intellipaat

Salma  Mateos

Salma Mateos


Miscellaneous ways of Installation of Kafka on Ubuntu 18.04

I have kept this blog as short as possible on two commonly used ways of how to use Kafka Producer-Consumer processes over single node single broker acrhitecture with as minimal required features as possible to make it simple.


  1. Ubuntu 18.04 server and a non-root user with sudo privileges.
  2. At least 4GB of RAM is required on the server. Installation without this amount of RAM may cause the Kafka service to fail, with the Java Virtual Machine(JVM) throwing out an “Out Of Memory” exception during startup. Even using Docker services one need to make sure the host machine has more than 4GB of RAM (advisable 8 GB RAM) as it is an absolute requirement for Kafka will consume a big part of RAM.
  3. OpenJDK 8/11 should be installed on the server. Kafka is written in Java, so it requires a JVM, however in order to use Confluent Open Source binaries one need to use Java 8.

**Remember **: While using sudo one must remember the root password.

#kafka #kafka-installation #data-streaming #big-data-streaming #kafka-connect #ubuntu 18.04

Loma  Baumbach

Loma Baumbach


Choosing Right Partition Count & Replication Factor (Apache Kafka)

A Small Introduction to Kafka!

So before we learn about Kafka, let’s learn how companies start. At first, it’s super simple. You get a source system, and you have a target system and then you need to exchange data. That looks quite simple, right? And then, what happens is, that after a while you have many source systems, and many target systems and they all have to exchange data with one another, and things become really complicated. So the problem is that with a previous architecture

  • Each integration comes with difficulties around

Protocol — how the data is transported (TCP, HTTP, REST, FTP)

Data format — how the data is parsed (Binary, CSV, JSON, Avro)

Data schema & evolution — how the data is shaped and may change

  • Additionally, each time you integrate a source system with the target system, there will be an increased load from the connections

Image for post

So how do we solve this? Well, this is where Apache Kafka comes in. So Apache Kafka, allows you to decouple your data streams and your systems. So now your source systems will have their data end up in Apache Kafka. While your target systems will source their data straight from Apache Kafka and so this decoupling is what is so good about Apache Kafka, and what it enables is really really nice. So for example, what do we have in Kafka? Well, you can have any data stream. For example, it can be website events, pricing data, financial transactions, user interactions, and many more.

#kafka-streams #devops #kafka #kafka-connect #best-practices