Building a Service-oriented Architecture with Rails and Kafka

<em>This blog post is adapted from a talk given by Stella Cotton at RailsConf 2018 titled "</em><a href="https://www.youtube.com/watch?v=Rzl4O1oaVy8" target="_blank"><em>So You’ve Got Yourself a Kafka</em></a><em>."</em>

This blog post is adapted from a talk given by Stella Cotton at RailsConf 2018 titled "So You’ve Got Yourself a Kafka."


In recent years, designing software as a collection of services, rather than a single, monolithic codebase, has become a popular way to build applications. In this post, we'll learn the basics of Kafka and how its event-driven process can be used to power your Rails services. We’ll also talk about practical considerations and operational challenges that your event-driven Rails services might face around monitoring and scaling.

What is Kafka?

Suppose you want to know more information about how your users are engaged on your platform: the pages they visit, the buttons they click, and so on. A sufficiently popular app could produce billions of events, and sending such a high volume of data to an analytics service could be challenging, to say the least.

Enter Kafka, an integral piece for web applications that require real-time data flow. Kafka provides fault-tolerant communication between producers, which generate events, and consumers, which read those events. There can be multiple producers and consumers in any single app. In Kafka, every event is persisted for a configured length of time, so multiple consumers can read the same event over and over. A Kafka cluster is comprised of several brokers, which is just a fancy name for any instance running Kafka.

One of the key performance characteristics of Kafka is that it can process an extremely high throughput of events. Traditional enterprise queuing systems, like AMQP, have the event infrastructure itself keep track of the events that each consumer has processed. As your number of consumers scales up, that infrastructure will suffer under a greater load, as it needs to keep track of more and more states. And even establishing an agreement with a consumer is not trivial. Should a broker mark a message as "done"once it's sent over the network? What happens if a consumer goes down and needs a broker to re-send an event?

Kafka brokers, on the other hand, do not track any of its consumers. The consumer service itself is in charge of telling Kafka where it is in the event processing stream, and what it wants from Kafka. A consumer can start in the middle, having provided Kafka an offset of a specific event to read, or it can start at the very beginning or even very end. A consumer's ability to read event data is constant time of O(1); as more events arrive, the amount of time to look up information from the stream doesn't change.

Kafka also has a scalable and fault-tolerant profile. It runs as a cluster on one or more servers that can be scaled out horizontally by adding more machines. The data itself is written to disk and then replicated across multiple brokers. For a concrete number around what scalable looks like, companies such as Netflix, LinkedIn, and Microsoft all send over a trillion messages per day through their Kafka clusters!

Setting Kafka up in Rails

Heroku provides a Kafka cluster add-on that can be used in any environment. For Ruby apps, we recommend using the ruby-kafka gem for real-world use cases. A bare minimum implementation only requires you to provide hostnames for your brokers:

# config/initializers/kafka_producer.rb
require "kafka"
Configure the Kafka client with the broker hosts and the Rails
logger.

$kafka = Kafka.new(["kafka1:9092", "kafka2:9092"], logger: Rails.logger)

Set up an asynchronous producer that delivers its buffered messages every ten seconds:

$kafka_producer = $kafka.async_producer(
delivery_interval: 10,
)

Make sure to shut down the producer when exiting.

at_exit { $kafka_producer.shutdown }

After setting up the Rails initializer, you can start using the gem to send event payloads. Because of the asynchronous behavior of sending events, we can write an event outside the thread of our web execution, like this:

class OrdersController < ApplicationController
def create
@comment = Order.create!(params)

$kafka_producer.produce(order.to_json, topic: "user_event", partition_key: user.id)

end
end

We'll talk more about Kafka's serialization formats below, but in this scenario, we're using good old JSON. The topic keyword argument refers to the log where Kafka is going to write the event. Topics themselves are divided into partitions, which allow you to "split" the data in a particular topic across multiple brokers for scalability and reliability. It's a good idea to have two or more partitions per topic so if one partition fails, your events can still be written and consumed. Kafka guarantees that events are delivered in order inside a partition, but not inside a whole topic. If the order of the events is important, passing in a partition_key will ensure that all events of a specific type go to the same partition.

Kafka for your services

Some of the properties that make Kafka valuable for event pipeline systems also make it a pretty interesting fault tolerant replacement for RPC between services. Let's use an example of an e-commerce application to illustrate what this means in practice:

def create_order
create_order_record
charge_credit_card # call to Payments Service
send_confirmation_email # call to Email Service
end

Let's assume that when a user places an order, this create_order method is going to be executed. It'll create an order record, charge the user's credit card, and send out a confirmation email. Those last two steps have been extracted out into services.

One challenge with this setup is that the upstream service is responsible for monitoring the downstream availability. If the email system is having a really bad day, the upstream service is responsible for knowing whether that email service is available. And if it isn't available, it also needs to be in charge of retrying any failing requests. How might Kafka's event stream help in this situation? Let's take a look:

In this event-oriented world, the upstream service can write an event to Kafka indicating that an order was created. Because Kafka has an "at least once" guarantee, the event is going to be written to Kafka at least once, and will be available for a downstream consumer to read. If the email service is down, the event is still persisted for it to consume. When the downstream email service comes back online, it can continue to process the events it missed in sequence.

Another challenge with an RPC-oriented architecture is that, in increasingly complex systems, integrating a new downstream service means also changing an upstream service. Suppose you'd like to integrate a new service that kicks off a fulfillment process when an order is created. In an RPC world, the upstream service would need to add a new API call out to your new fulfillment service. But in an event-oriented world, you would add a new consumer inside the fulfillment service that consumes the order once the event is created inside of Kafka.

Incorporating events in a service-oriented architecture

In a blog post titled "What do you mean by “Event-Driven”," Martin Fowler discusses the confusion surrounding "event-driven applications." When developers discuss these systems, they can actually be talking about incredibly different kinds of applications. In an effort to bring a shared understanding to what an event-driven system is, he's started defining a few architectural patterns.

Let's take a quick look at what these patterns are! If you'd like to learn more about them, check out his keynote at GOTO Chicago 2017 that covers these in-depth.

Event Notification

The first pattern Fowler talks about is called Event Notification. In this scenario, one service simply notifies the downstream services that an event happened with the bare minimum of information:

{
"event": "order_created",
"published_at": "2016-03-15T16:35:04Z"
}

If a downstream service needs more information about what happened, it will need to make a network call back upstream to retrieve it.

Event-Carried State Transfer

The second pattern is called Event-Carried State Transfer. In this design, the upstream service augments the event with additional information, so that a downstream consumer can keep a local copy of that data and not have to make a network call to retrieve it from the upstream service:

{
"event": "order_created",
"order": {
"order_id": 98765,
"size": "medium",
"color": "blue"
},
"published_at": "2016-03-15T16:35:04Z"
}

Event-Sourced

A third designation from Fowler is an Event-Sourced architecture. This implementation suggests that not only is each piece of communication between your services kicked off by an event, but that by storing a representation of an event, you could drop all your databases and still completely rebuild the state of your application by replaying that event stream. In other words, each payload encapsulates the exact state of your system at any moment.

An enormous challenge to this approach is that, over time, code changes. A future API call to a downstream service might return a different set of data that previously available, which makes it difficult to recalculate the state at that moment.

Command Query Responsibility Segregation

The final pattern mentioned is Command Query Responsibility Segregation, or CQRS. The idea here is that actions you might need to perform on a record--creating, reading, updating--are split out into separate domains. That means that one service is responsible for writing and another is responsible for reading. In event-oriented architectures, you'll often see event systems nestled in the diagrams at the place where commands are actually written.

The writer service is going to read off of the event stream, process commands, and store them to a write database. Any queries happen on a read-only database. Separating out read and write logic into two different services adds an increase in complexity, but it does allow you to optimize performance separately for those systems.

Practical considerations

Let's talk about a few practical considerations that you might run into while integrating Kafka into your service-oriented application.

The first thing to consider are slow consumers. In an event-driven system, your services need to be able to process events as quickly as the upstream service produces them. Otherwise, they will slowly drift behind, without any indication that there's a problem, because there won't be any timeouts or call failures. One place where you can identify timeouts will be on the socket connection with the Kafka brokers. If a service is not processing events fast enough, that connection can timeout, and reestablishing it has an additional time cost since it's expensive to create those sockets.

If a consumer is slow, how do you speed it up? For Kafka, you can increase the number of consumers in your consumer group so that you can process more events in parallel. You'll want at least two consumer processes running per service, so that if one goes down, any other failed partitions can be reassigned. Essentially, you can parallelize work across as many consumers as you have topic partitions. (As with any scaling issue, you can't just add consumers forever; eventually you're going to hit scaling limits on shared resources, like databases.)

It's also extremely valuable to have metrics and alerts around how far behind you are from when an event was added to the queue. ruby-kafka is instrumented with ActiveSupport notifications, but it also has StatsD and Datadog reporters that are automatically included. You can use these to report on whether you're lagging behind when the events are added. The ruby-kafka gem even provides a list of recommended metrics to monitor!

Another aspect to building systems with Kafka is to design your consumers for failure. Kafka is guaranteed to send an event at least once; there's never a chance that messages will not be sent at all. But you need to design consumers to expect duplicated events. One way to do that is to always rely on UPSERT to add new records to your database. If a record already exists with the same attributes, the call will essentially be a no-op. Alternatively, you can include a unique identifier to each event, and just skip operating on events that have already been seen before.

Payload formats

One surprising aspect to Kafka is its very permissive attitude towards data. You can send it anything in bytes and it will simply send that back out to consumers without any verification. This feature makes its usage extremely flexible because you don't need to adopt a specific format. But what happens when an upstream service decides to change an event that it produces? If you just change that event payload, there's a really good chance that one of your downstream consumers will break.

Before you begin adopting an event-driven architecture, choose a data format and evaluate how it can help you register schemas and evolve them over time. It's much easier to think about validation and evolution of schemas before you actually implement them.

One format to use is, of course, JSON, the format of the web. It's human readable and supported in basically every programming language. But there are a few downsides. For one, the actual size of JSON payloads can be really large. The payloads require you to send key-value pairs, which are flexible, but are also often duplicated across every event. There's no built-in documentation inside a payload, such that, given a value, you might not know what it means. Schema evolution is also a challenge, since there's no built-in support for aliasing one key to another if you need to rename a field.

Confluent, the team that built Apache Kafka, recommends using Avro as a data serialization system. The data is sent over in binary, so it's not human-readable. But the upside is that there is a more robust schema support. A full Avro object includes its schema and its data. Avro comes with support for simple types, like integers, and complex ones, like dates. It also embeds documentation into the schema, which allows you to comprehend what a field does in your system. It provides built-in tools that help you evolve your schemas in backwards-compatible ways over time.

avro-builder is a gem created by Salsify that offers a very Ruby-ish DSL to help you create your schemas. If you're curious to learn more about Avro, the Salsify engineering blog has a really great writeup on avro and avro-builder!

More information

If you're curious to learn more about how we run our hosted Kafka products, or how we use Kafka internally at Heroku, we do have two talks other Heroku engineers have given that you can watch!

Jeff Chao's talk at DataEngConf SF '17 was titled "Beyond 50,000 Partitions: How Heroku Operates and Pushes the Limits of Kafka at Scale," while Pavel Pravosud spoke at Dreamforce '16 about " Dogfooding Kafka: How We Built Heroku's Real-Time Platform Event Stream." Enjoy!


By : Stella Cotton


Ruby vs Ruby on Rails web framework

Rails is a development tool which gives web developers a framework, providing structure for all the code they write. The Rails framework helps developers to build websites and applications. Ruby is a programming language stronger than Perl and more object-oriented than Python. It is being developed with increasing productivity.

Ruby on Rails vs PHP

Ruby on Rails vs PHP

Understanding the pros and cons of Ruby on Rails versus PHP is important when deciding how to create your business-critical applications.

Originally published at https://www.engineyard.com

There’s more than one way to build a web application. No matter what type of application you are trying to create, your programmers have their preferred approach and their preferred code languages to accomplish the task. In the world of web applications, most program developers have to decide between Ruby on Rails versus PHP.

Ruby on Rails consists of Ruby, which is a scripting language, and Rails, which is a web development framework; PHP is a server-side scripting language. Both programming languages have been around since the mid-1990s, but PHP rules the web, while Ruby on Rails is more popular for business web applications. Understanding the pros and cons of Ruby on Rails versus PHP is important when deciding how to create your business-critical applications.

Ruby on Rails Versus PHP at First Glance

Both Ruby on Rails and PHP are open source, so there are no licensing fees. However, because PHP is used to run most of today’s web systems, there are more PHP programmers than Ruby developers, which means there is a larger pool of PHP experts and a larger open source library to draw from.

Part of the reason PHP is more popular with web developers is because it is easier to learn. PHP is also an object-oriented programming language, which makes it easier to be more creative and tackle tougher software challenges.

Once web developers master PHP, many of them choose to add Ruby on Rails to their expertise because of the advantages and power that Ruby on Rails offers for business application development. Ruby and Rails were created together to deliver web solutions, and the primary difference between PHP and Ruby on Rails is that Rails requires you to understand the full stack, including the web server, application server, and database engine.

Since both Ruby and PHP are open source, the support of the programming communities is an important differentiator. PHP has more deployments so it has a larger developer community, but the Ruby on Rails community is very skilled and enthusiastic and they want to share, so there is a growing library of ready-to-use Ruby gems.

Differences in Deployment

When it comes to deployment, PHP is very easy to implement. You simply transfer files to the web server via FTP and that’s it. With PHP, you don’t need to worry about the web stack. Most hosting services use a combination of open source for the stack, including Linux, Apache, MySQL, and PHP (LAMP), so once the files are loaded, they just run. That’s the advantage of server-side software.

Ruby on Rails is more complex to deploy because you have to know the full stack. That means knowing the details of the web server (e.g., Apache or NginX), as well as the database. You have to go through more steps, such as precompiling assets to make sure all the right files are there. This is the price of being able to design and deploy more complex applications.

Where Ruby on Rails really shines is in the software development process itself. Since Ruby is an object-intensive language, everything is an object, including classes and modules, with Rails providing an integrated test framework. PHP is not always object-oriented, so coding can be laborious and time-consuming. Applications can be built and tested in Ruby on Rails much faster than in PHP, so even if there is some debugging involved, Ruby on Rails dramatically reduces the time to deployment.

As noted above, PHP applications are relatively simple to deploy since there is no stack to worry about, and they are relatively inexpensive to host. Hosting Ruby on Rails applications is another story. Not all hosting providers will support Ruby on Rails, and those that do usually add additional a la carte fees because Ruby applications require more services.

The Business Case for Ruby on Rails versus PHP

While it’s clear that Ruby is a more difficult programming language to master, in many ways, it is a more robust language that is better suited for creating business applications. PHP was created specifically for the web, but Ruby on Rails offers much more.

For one thing, Ruby on Rails applications tend to be cleaner and more compact. Because PHP is so simple, it lends itself to sloppy coding that can be impossible to maintain. Ruby has the advantage of being more elegant and concise, and the documentation for Ruby applications tends to be generated with the code so anyone can make revisions or upgrades.

Most importantly, Ruby on Rails lends itself to agile software practices and rapid application development (RAD). Rails is a mature framework that allows programmers to create maintainable software, and it has integrated testing tools that shorten the developer cycle. When you consider the cost of talented programmers (and you know that time is money), reducing development time can mean substantial savings.

Depending on your business development needs, you may be leaning toward PHP or Ruby on Rails. Each has its strengths and weaknesses, but Ruby on Rails continues to gain popularity for business-critical and e-commerce applications because of its versatility, scalability, and upgradability. In the end, you have to consider which language will deliver a cleaner, more stable application that can evolve and grow with your business.

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Learn More

PHP for Beginners - Become a PHP Master - CMS Project

Learn Object Oriented PHP By Building a Complete Website

PHP OOP: Object Oriented Programming for beginners + Project

Laravel PHP Framework Tutorial - Full Course for Beginners (2019)

Symfony PHP Framework - Full Course

What is a Full Stack Developer?

Build RESTful API In Laravel 5.8 Example

Laravel 5.8 Tutorial from Scratch for Beginners

Build a CRUD Operation using PHP & MongoBD

Build a CRUD Web App with Ruby on Rails

Ruby on Rails Tutorial for Beginners

How To Use PostgreSQL with Your Ruby on Rails Application on macOS

Python, Ruby, and Golang: A Command-Line Application Comparison