Running Apache Kafka Efficiently on the Cloud ft. Adithya Chandra

https://cnfl.io/podcast-episode-160 | Focused on optimizing Apache Kafka® performance with maximized efficiency, Confluent’s Product Infrastructure team has been actively exploring opportunities for scaling out Kafka clusters. They are able to run Kafka workloads with half the typical memory usage while saving infrastructure costs, which they have tested and now safely rolled out across Confluent Cloud.

After spending seven years at Amazon Web Services (AWS) working on search services and Amazon Aurora as a software engineer, Adithya Chandra decided to apply his expertise in cluster management, load balancing, elasticity, and performance of search and storage clusters to the Confluent team.

Last year, Confluent shipped Tiered Storage, which moves eligible data to remote storage from a Kafka broker. As most of the data moves to remote storage, we can upgrade to better storage volumes backed by solid-state drives (SSDs). SSDs are capable of higher throughput compared to hard disk drives (HDDs), capable of fast, random IO, yet more expensive per provisioned gigabyte. Given that SSDs are useful at random IO and can support higher throughput, Confluent started investigating whether it was possible to run Kafka with lesser RAM, which is comparatively much more expensive per gigabyte compared to SSD. Instance types in the cloud had the same CPU but half the memory was 20% cheaper.

In this episode, Adithya covers how to run Kafka more efficiently on Confluent Cloud and dives into the following:
► Memory allocation on an instance running Kafka
► What is a JVM heap? Why should it be sized? How much is enough? What are the downsides of a small heap?
► Memory usage of Datadog, Kubernetes, and other processes, and allocating memory correctly
► What is the ideal page cache size? What is a page cache used for? Are there any parameters that can be tuned? How does Kafka use the page cache?
► Testing via the simulation of a variety of workloads using Trogdor
► High-throughput, high-connection, and high-partition tests and their results
► Available cloud hardware and finding the best fit, including choosing the number of instance types, migrating from one instance to another, and using nodepools to migrate brokers safely, one by one
► What do you do when your preferred hardware is not available? Can you run hybrid Kafka clusters if the preferred instance is not widely available?
► Building infrastructure that allows you to perform testing easily and that can support newer hardware faster (ARM processors, SSDs, etc.)

EPISODE LINKS
► Join the Confluent Community: https://www.confluent.io/community/ask-the-community/?utm_source=youtube&utm_medium=podcast&utm_campaign=tm.devx_ch.sa-running-apache-kafka-effciently-on-cloud_content.apache-kafka
► Learn more at Confluent Developer: https://developer.confluent.io/confluent-cloud-demo/?utm_source=youtube&utm_medium=podcast&utm_campaign=tm.devx_ch.sa-running-apache-kafka-effciently-on-cloud_content.apache-kafka
► Kafka streaming in 10 minutes on Confluent Cloud: https://www.confluent.io/online-talks/confluent-cloud-demo/?utm_source=youtube&utm_medium=podcast&utm_campaign=tm.devx_ch.sa-running-apache-kafka-effciently-on-cloud_content.apache-kafka
► Use 60PDCAST to get $60 of free Confluent Cloud: https://www.confluent.io/confluent-cloud/tryfree?utm_source=youtube&utm_medium=podcast&utm_campaign=tm.devx_ch.sa-running-apache-kafka-effciently-on-cloud_content.apache-kafka
► Promo code details: https://www.confluent.io/confluent-cloud-promo-disclaimer/?utm_source=youtube&utm_medium=podcast&utm_campaign=tm.devx_ch.sa-running-apache-kafka-effciently-on-cloud_content.apache-kafka

#kafka #cloud

What is GEEK

Buddha Community

Running Apache Kafka Efficiently on the Cloud ft. Adithya Chandra
Adaline  Kulas

Adaline Kulas

1594162500

Multi-cloud Spending: 8 Tips To Lower Cost

A multi-cloud approach is nothing but leveraging two or more cloud platforms for meeting the various business requirements of an enterprise. The multi-cloud IT environment incorporates different clouds from multiple vendors and negates the dependence on a single public cloud service provider. Thus enterprises can choose specific services from multiple public clouds and reap the benefits of each.

Given its affordability and agility, most enterprises opt for a multi-cloud approach in cloud computing now. A 2018 survey on the public cloud services market points out that 81% of the respondents use services from two or more providers. Subsequently, the cloud computing services market has reported incredible growth in recent times. The worldwide public cloud services market is all set to reach $500 billion in the next four years, according to IDC.

By choosing multi-cloud solutions strategically, enterprises can optimize the benefits of cloud computing and aim for some key competitive advantages. They can avoid the lengthy and cumbersome processes involved in buying, installing and testing high-priced systems. The IaaS and PaaS solutions have become a windfall for the enterprise’s budget as it does not incur huge up-front capital expenditure.

However, cost optimization is still a challenge while facilitating a multi-cloud environment and a large number of enterprises end up overpaying with or without realizing it. The below-mentioned tips would help you ensure the money is spent wisely on cloud computing services.

  • Deactivate underused or unattached resources

Most organizations tend to get wrong with simple things which turn out to be the root cause for needless spending and resource wastage. The first step to cost optimization in your cloud strategy is to identify underutilized resources that you have been paying for.

Enterprises often continue to pay for resources that have been purchased earlier but are no longer useful. Identifying such unused and unattached resources and deactivating it on a regular basis brings you one step closer to cost optimization. If needed, you can deploy automated cloud management tools that are largely helpful in providing the analytics needed to optimize the cloud spending and cut costs on an ongoing basis.

  • Figure out idle instances

Another key cost optimization strategy is to identify the idle computing instances and consolidate them into fewer instances. An idle computing instance may require a CPU utilization level of 1-5%, but you may be billed by the service provider for 100% for the same instance.

Every enterprise will have such non-production instances that constitute unnecessary storage space and lead to overpaying. Re-evaluating your resource allocations regularly and removing unnecessary storage may help you save money significantly. Resource allocation is not only a matter of CPU and memory but also it is linked to the storage, network, and various other factors.

  • Deploy monitoring mechanisms

The key to efficient cost reduction in cloud computing technology lies in proactive monitoring. A comprehensive view of the cloud usage helps enterprises to monitor and minimize unnecessary spending. You can make use of various mechanisms for monitoring computing demand.

For instance, you can use a heatmap to understand the highs and lows in computing visually. This heat map indicates the start and stop times which in turn lead to reduced costs. You can also deploy automated tools that help organizations to schedule instances to start and stop. By following a heatmap, you can understand whether it is safe to shut down servers on holidays or weekends.

#cloud computing services #all #hybrid cloud #cloud #multi-cloud strategy #cloud spend #multi-cloud spending #multi cloud adoption #why multi cloud #multi cloud trends #multi cloud companies #multi cloud research #multi cloud market

Adaline  Kulas

Adaline Kulas

1594166040

What are the benefits of cloud migration? Reasons you should migrate

The moving of applications, databases and other business elements from the local server to the cloud server called cloud migration. This article will deal with migration techniques, requirement and the benefits of cloud migration.

In simple terms, moving from local to the public cloud server is called cloud migration. Gartner says 17.5% revenue growth as promised in cloud migration and also has a forecast for 2022 as shown in the following image.

#cloud computing services #cloud migration #all #cloud #cloud migration strategy #enterprise cloud migration strategy #business benefits of cloud migration #key benefits of cloud migration #benefits of cloud migration #types of cloud migration

Running Apache Kafka Efficiently on the Cloud ft. Adithya Chandra

https://cnfl.io/podcast-episode-160 | Focused on optimizing Apache Kafka® performance with maximized efficiency, Confluent’s Product Infrastructure team has been actively exploring opportunities for scaling out Kafka clusters. They are able to run Kafka workloads with half the typical memory usage while saving infrastructure costs, which they have tested and now safely rolled out across Confluent Cloud.

After spending seven years at Amazon Web Services (AWS) working on search services and Amazon Aurora as a software engineer, Adithya Chandra decided to apply his expertise in cluster management, load balancing, elasticity, and performance of search and storage clusters to the Confluent team.

Last year, Confluent shipped Tiered Storage, which moves eligible data to remote storage from a Kafka broker. As most of the data moves to remote storage, we can upgrade to better storage volumes backed by solid-state drives (SSDs). SSDs are capable of higher throughput compared to hard disk drives (HDDs), capable of fast, random IO, yet more expensive per provisioned gigabyte. Given that SSDs are useful at random IO and can support higher throughput, Confluent started investigating whether it was possible to run Kafka with lesser RAM, which is comparatively much more expensive per gigabyte compared to SSD. Instance types in the cloud had the same CPU but half the memory was 20% cheaper.

In this episode, Adithya covers how to run Kafka more efficiently on Confluent Cloud and dives into the following:
► Memory allocation on an instance running Kafka
► What is a JVM heap? Why should it be sized? How much is enough? What are the downsides of a small heap?
► Memory usage of Datadog, Kubernetes, and other processes, and allocating memory correctly
► What is the ideal page cache size? What is a page cache used for? Are there any parameters that can be tuned? How does Kafka use the page cache?
► Testing via the simulation of a variety of workloads using Trogdor
► High-throughput, high-connection, and high-partition tests and their results
► Available cloud hardware and finding the best fit, including choosing the number of instance types, migrating from one instance to another, and using nodepools to migrate brokers safely, one by one
► What do you do when your preferred hardware is not available? Can you run hybrid Kafka clusters if the preferred instance is not widely available?
► Building infrastructure that allows you to perform testing easily and that can support newer hardware faster (ARM processors, SSDs, etc.)

EPISODE LINKS
► Join the Confluent Community: https://www.confluent.io/community/ask-the-community/?utm_source=youtube&utm_medium=podcast&utm_campaign=tm.devx_ch.sa-running-apache-kafka-effciently-on-cloud_content.apache-kafka
► Learn more at Confluent Developer: https://developer.confluent.io/confluent-cloud-demo/?utm_source=youtube&utm_medium=podcast&utm_campaign=tm.devx_ch.sa-running-apache-kafka-effciently-on-cloud_content.apache-kafka
► Kafka streaming in 10 minutes on Confluent Cloud: https://www.confluent.io/online-talks/confluent-cloud-demo/?utm_source=youtube&utm_medium=podcast&utm_campaign=tm.devx_ch.sa-running-apache-kafka-effciently-on-cloud_content.apache-kafka
► Use 60PDCAST to get $60 of free Confluent Cloud: https://www.confluent.io/confluent-cloud/tryfree?utm_source=youtube&utm_medium=podcast&utm_campaign=tm.devx_ch.sa-running-apache-kafka-effciently-on-cloud_content.apache-kafka
► Promo code details: https://www.confluent.io/confluent-cloud-promo-disclaimer/?utm_source=youtube&utm_medium=podcast&utm_campaign=tm.devx_ch.sa-running-apache-kafka-effciently-on-cloud_content.apache-kafka

#kafka #cloud

Shawn  Durgan

Shawn Durgan

1621674949

Comparison of Apache Kafka Products and Cloud Services

Comparison of Open Source Apache Kafka vs Vendors including Confluent, Cloudera, Red Hat, Amazon MSK. Let’s see how big Kafka really is.

Apache Kafka became the de facto standard for event streaming. The open-source community is huge. Various vendors added Kafka and related tooling to their offerings or provide a Kafka cloud service. This blog post uses the car analogy - from the motor engine to the self-driving car - to explore the different Kafka offerings available on the market. I also cover a few other vehicles, meaning (partly) Kafka-compatible technologies. The goal is not a feature-by-feature comparison (that would be outdated the day after the publication). Instead, the intention is to educate about the different deployment models, product strategies, and trade-offs from the available options.

What car would you choose

Disclaimer: I work for Confluent. However, the post is not about comparing features but explaining the concepts behind the alternatives. I talk to enterprises across the globe every week. I can assure you that many people I talk to are not aware or mislead about what you read in the following sections. Hence, I hope that the following helps you to make the right decision. Either choose to run open-source Apache Kafka or one of the various commercial Kafka offerings, or even a combination of both.

Apache Kafka Components and Use Cases

The goal is not to introduce Kafka here. The minimum you should know is that Kafka is NOT just a messaging layer for data ingestion into a data lake. This is just a fraction of today’s usages.

Kafka is an open-source framework under Apache 2.0 license. It provides a combination of messaging, storage, processing, and integration of high volumes of data at scale in real-time and fault-tolerant. That’s what makes Kafka unique compared to other MQ, ETL, ESB, and API platforms.

Kafka is deployed in production for various use cases across industries. This includes analytical and mission-critical workloads. Different deployments require different SLAs. You should always ask yourself what happens if the Kafka infrastructure is in trouble. What are your RTO (Recovery Time Objective) and RPO (Recovery Point Objective)? Or in other words: How much data is okay to lose? How much downtime is acceptable? Start your Kafka projects with these questions in mind when you start your comparison of the options!

#aws #big-data #cloud #kafka #apache-kafka

Google Cloud: Caching Cloud Storage content with Cloud CDN

In this Lab, we will configure Cloud Content Delivery Network (Cloud CDN) for a Cloud Storage bucket and verify caching of an image. Cloud CDN uses Google’s globally distributed edge points of presence to cache HTTP(S) load-balanced content close to our users. Caching content at the edges of Google’s network provides faster delivery of content to our users while reducing serving costs.

For an up-to-date list of Google’s Cloud CDN cache sites, see https://cloud.google.com/cdn/docs/locations.

Task 1. Create and populate a Cloud Storage bucket

Cloud CDN content can originate from different types of backends:

  • Compute Engine virtual machine (VM) instance groups
  • Zonal network endpoint groups (NEGs)
  • Internet network endpoint groups (NEGs), for endpoints that are outside of Google Cloud (also known as custom origins)
  • Google Cloud Storage buckets

In this lab, we will configure a Cloud Storage bucket as the backend.

#google-cloud #google-cloud-platform #cloud #cloud storage #cloud cdn