Architecture and Components of Apache YARN

YARN is an open-source project for Apache representing “Yet Another Resource Negotiator”. Hadoop Collection Manager is responsible for sharing resources (such as CPU, memory, disk, and network), and organizing and monitoring tasks throughout the Hadoop collection. Previous versions of Hadoop only support MapReduce functionality in the Hadoop collection; However, the advance of YARN has also made it possible to use other large data solution frameworks such as Spark, Flink, and Samza and many more in the Hadoop Cluster. YARN supports a wide variety of tasks such as broadcast processing, cluster processing, graph processing and duplicate processing.

See more at:


What is GEEK

Buddha Community

Architecture and Components of Apache YARN
Christa  Stehr

Christa Stehr


Angular Architecture Components and Features

Angular is one of the most popular frameworks for developing Desktop and mobile applications for clients. Angular application uses HTML and TypeScript. You can use this in cross-platform mobile development via IONIC. Angular Implements both Core and Optional functionalities in the form of TypeScript libraries that you can import in your application. You should have domain knowledge of HTML, CSS, and JavaScript for working with Angular. In this Angular Tutorial by DataFlair, we will learn about Angular Architecture and its components.

There are three basic things in Angular that are Components, Modules, and Routing. An angular app is a combination of different NgModules as modules are the building block of angular. Components, on the other hand, are responsible for defining the views, which are a part of elements of the screen. You can change the Views using data and program logic. Routing is the functionality that links multiple components together.

angular architecture

Architecture of Angular

The Building blocks of Angular Architecture as depicted in the image are:

Architecture of angular

  • Module
  • Template
  • Component
  • Metadata
  • Data Binding
  • Services
  • Directives
  • Dependency Injection

Let us learn each of these Angular Architecture Components in detail now:

1. Module

Angular is a modular platform and it may contain one or more Angular Module or NgModules depending on the demand. It is the essential module that is always present is the Root module namely “AppModule” in the application.

Flow of angular application

NgModule is a Decorator function that handles the compilation part of the application. It works in synergy with other modules. It takes a single object in the form of Metadata. NgModule communicates with other modules for bootstrapping them and works in the Parent-Child relationship for the proper execution of the application.

Here are the properties of NgModule:

Angular NgModule Elaborated

#angular tutorials #angular architecture #angular architecture components #angular architecture working

Serverless Vs Microservices Architecture - A Deep Dive

Companies need to be thinking long-term before even starting a software development project. These needs are solved at the level of architecture: business owners want to assure agility, scalability, and performance.

The top contenders for scalable solutions are serverless and microservices. Both architectures prioritize security but approach it in their own ways. Let’s take a look at how businesses can benefit from the adoption of serverless architecture vs microservices, examine their differences, advantages, and use cases.

#serverless #microservices #architecture #software-architecture #serverless-architecture #microservice-architecture #serverless-vs-microservices #hackernoon-top-story

Architecture and Components of Apache YARN

YARN is an open-source project for Apache representing “Yet Another Resource Negotiator”. Hadoop Collection Manager is responsible for sharing resources (such as CPU, memory, disk, and network), and organizing and monitoring tasks throughout the Hadoop collection. Previous versions of Hadoop only support MapReduce functionality in the Hadoop collection; However, the advance of YARN has also made it possible to use other large data solution frameworks such as Spark, Flink, and Samza and many more in the Hadoop Cluster. YARN supports a wide variety of tasks such as broadcast processing, cluster processing, graph processing and duplicate processing.

See more at:


Sheldon  Grant

Sheldon Grant


Apache Pulsar Architecture and Benefits

Introduction to Apache Pulsar

Apache Pulsar is a multi-tenant, high-performance server to server messaging system. Yahoo developed it. In late 2016 it was a first open-source project. Now it is in the incubation, under the Apache Software Foundation(ASF). Pulsar works on the pub-sub pattern, where there is a Producer, and a Consumer also called the subscribers, the topic is the core of the pub-sub model, where producer publish their messages on a given pulsar topic, and consumer subscribes to a problem to get news from that topic and send an acknowledgement.

Once a subscription has been acknowledged, all the messages will be retained by the pulsar. One Consumer acknowledged has been processed only after that message gets deleted.Apache Pulsar TopicsApache Pulsar Topics:  are well defined named channels for transmitting messages from producers to consumers. Topics names are well-defined URL.

Namespaces:  It is logical nomenclature within a tenant. A tenant can create multiple namespaces via admin API. A namespace allows the application to create and manage a hierarchy of topics. The number of issues can be created under the namespace.

Apache Pulsar Subscription Modes

A subscription is a named rule for the configuration that determines the delivery of the messages to the consumer. There are three subscription modes in Apache Pulsar


Apache Pulsar Subscription Mode Exclusive

In Exclusive mode, only a single consumer is allowed to attach to the subscription. If more then one consumer attempts to subscribe to a topic using the same subscription, then the consumer receives an error. Exclusive mode as default is subscription model.


Apache Pulsar Subscription Failover

In failover, multiple consumers attached to the same topic. These consumers are sorted in lexically with names, and the first consumer is the master consumer, who gets all the messages. When a master consumer gets disconnected, the next consumers will get the words.


Apache Pulsar Subscription Mode SharedShared and round-robin mode, in which a message is delivered only to that consumer in a round-robin manner. When that user is disconnected, then the messages sent and not acknowledged by that consumer will be re-scheduled to other consumers. Limitations of shared mode-

  • Message ordering is not guaranteed.
  • You can’t use cumulative acknowledgement with shared mode.

The process used for analyzing the huge amount of data at the moment it is used or produced. Click to explore about our, Real Time Data Streaming Tools

Routing Modes

The routing modes determine which partition to which topic a message will be subscribed. There are three types of routing methods. When using partitioned questions to publish, routing is necessary.

Round Robin Partition 

If no key is provided to the producer, it will publish messages across all the partitions available in a round-robin way to achieve maximum throughput. Round-robin is not done per individual message but set to the same boundary of batching delay, and this ensures effective batching. While if a key is specified on the message, the producer that is partitioned will hash the key and assign all the messages to the particular partition. This is the default mode.

Single Partition

If no key is provided, the producer randomly picks a single partition and publish all the messages in that particular partition. While if the key is specified for the message, the partitioned producer will hash the key and assign the letter to the barrier.

Custom Partition

The user can create a custom routing mode by using the java client and implementing the MessageRouter interface. Custom routing will be called for a particular partition for a specific message.

Apache Pulsar Architecture

Pulsar ArchitecturePulsar cluster consists of different parts in it: In pulsar, there may be one more broker’s handles, and load balances incoming messages from producers, it dispatches messages to consumers, communicates with the pulsar configuration store to handle various coordination tasks. It stores messages in BookKeeper instances.

  • BookKeeper cluster consisting of one or more bookies to handles persistent storage of messages.
  • ZooKeeper cluster calls the configuration store to handle coordination tasks that involve multiple groups.


The broker is a stateless component that handles an HTTP server and the Dispatcher. An HTTP server exposes a Rest API for both administrative tasks and topic lookup for producers and consumers. A dispatcher is an async TCP server over a custom binary protocol used for all data transfers.


A Pulsar instance usually consists of one or more Pulsar clusters. It consists of: One or more brokers, a zookeeper quorum used for cluster-level configuration and coordination and an ensemble of bookies used for persistent storage of messages.

Metadata store

Pulsar uses apache zookeeper to store the metadata storage, cluster config and coordination.

Persistent storage

Pulsar provides surety of message delivery. If a message reaches a Pulsar broker successfully, it will be delivered to the target that’s intended for it.

Pulsar Clients

Pulsar has client API’s with language Java, Go, Python and C++. The client API encapsulates and optimizes pulsar’s client-broker communication protocol. It also exposes a simple and intuitive API for use by the applications. The current official Pulsar client libraries support transparent reconnection, and connection failover to brokers, queuing of messages until acknowledged by the broker, and these also consists of heuristics such as connection retries with backoff.

Client setup phase

When an application wants to create a producer/consumer, the pulsar client library will initiate a setup phase that is composed of two setups:

  1. The client will attempt to determine the owner of the topic by sending an HTTP lookup request to the broker. The application could reach to an active broker which in return by looking at the cached metadata of zookeeper will let the user know about the serving topic or assign it to the least loaded broker in case nobody is serving it.
  2. Once the client library has the broker address, it will create a TCP connection (or reuse an existing connection from the pool) and authenticate it. Within this connection, binary commands are exchanged between the broker and the client from the custom protocol. At this point, the client sends a command to create consumer or producer to the broker, which complies after user validates the authorization policy.


Apache Pulsar’s Geo-replication enables messages to be produced in one geolocation and can be consumed in other geolocation.  Geo ReplicationIn the above diagram, whenever producers P1, P2, and P3 publish a message to the given topic T1 on Cluster – A, B and C respectively, all those messages are instantly replicated across clusters. Once replicated, this allows consumers C1 & C2 to consume the messages from their respective groups. Without geo-replication, C1 and C2 consumers are not able to consume messages published by P3 producers.


Pulsar was created from the group up as a multi-tenant system. Apache supports multi-tenancy. It is spread across a cluster, and each can have their authentication and authorization scheme applied to them. They are also the administrative unit at which storage, message Ttl, and isolation policies can be managed.


To each tenant in a particular pulsar instance you can assign:     

  • An authorization scheme.     
  • The set of the cluster to which the tenant’s configuration applies.

The Dataset is a data structure in Spark SQL which is strongly typed, Object-oriented and is a map to a relational schema.Click to explore about our, RDD in Apache Spark Advantages

Authentication and Authorization

Pulsar has support for the authentication mechanism which can be configured at the broker, and it also supports authorization to identify the client and its access rights on topics and tenants.

Tiered Storage

Pulsar’s architecture allows topic backlogs to grow very large. This makes a rich set of the situation over time. To alleviate this cost is to use Tiered Storage. The Tiered Storage move older messages in the backlog can be moved from BookKeeper to cheaper storage. Which means clients can access older backlogs.

Schema Registry

Type safety is paramount in communication between the producer and the consumer in it. For safety in messaging, pulsar adopted two basic approaches:

Client-side approach

In this approach message producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also “knowing” which types are being transmitted via which topics. 

Server-side approach

In this approach which producers and consumers inform the system which data types can be transmitted via the topic. With this approach, the messaging system enforces type safety and ensures that both producers and consumers remain in sync.

How schemas work ?

Pulsar schema is applied and enforced at the topic level. Producers and consumers upload schemas to pulsar are asked. Pulsar schema consists of :

  • Name: name is the topic to which the schema is applied.
  • Payload: binary representation of the schema.
  • User-defined properties as a string/string map

It supports the following schema formats:

  • JSON
  • Protobuf
  • Avro
  • string (used for UTF-8-encoded lines) 

If no schema is defined, producers and consumers handle raw bytes.

What are the Pros and Cons?

The pros and cons of Apache Pulsar are described below:


  • Feature-rich – persistent/nonpersistent topics
  • Multi-tenancy
  • More flexible client API- including CompletableFutures,fluent interface
  • Java clients have till date to no java docs.


  •  Community base is small.
  •  The reader can’t read the last message in the topic [need to skim through all the words]
  •  Higher operational complexity – ZooKeeper + Broker nodes + BookKeeper + all clustered.
  • Java client components are thread-safe – the consumer can acknowledge messages from different threads.

Apache Pulsar Multi-Layered Architecture

Pulsar multilayered Architecture

Difference between Apache Kafka and Apache Pulsar

S.No. KafkaApache Pulsar
1It is more mature and higher-level APIs.It incorporated improved design stuff of Kafka and its existing capabilities.
2Built on top of Kafka Streams

 Unified messaging model and API.

  • Streaming via exclusive, failover subscription
  • Queuing via shared subscription
3Producer-topic-consumer group-consumerProducer-topic-subscription-consumer
4Restricts fluidity and flexibilityProvide fluidity and flexibility
5Messages are deleted based on retention. If a consumer doesn’t read words before the retention period, it will lose data. Messages are only deleted after all subscriptions consumed them. No data loss, even the consumers of a subscription are down for a long time. Words are allowed to keep for a configured retention period time even after all subscriptions consume them.

Drawbacks of Kafka

  1. High Latency
  2. Poor Scalability
  3. Difficulty supporting global architecture (fulfilled by pulsar with the help of geo-replication)
  4. High OpEx (operation expenditure)

How Apache Pulsar is better than Kafka

  1. Pulsar has shown notable improvements in bot latency and throughput when compared with Kafka. Pulsar is approximately 2.5 times faster and has 40% less lag than Kafka.
  2. Kafka, in many scenarios, has shown that it doesn’t go well when there are thousands of topics and partitions even if the data is not massive. Fortunately, the pulsar is designed to serve hundreds of thousands of items in a cluster deployed.
  3. Kafka stores data and logs in the dedicated files and directories (Broker) this creates trouble at the time of scaling (files are loaded to disk periodically). In contrast, scaling is effortless in the case of the pulsar as pulsar has stateless brokers that means scaling is not rocket science, pulsar uses bookies to store data. 
  4. Kafka brokers are designed to work together in a single region in the network provided. So it is not an easy way to work with multi-datacentre architecture. Whereas, pulsar offers geo-replication in which user can easily replicate it’s data synchronously or asynchronously among any number of clusters.
  5. Multi-tenancy is a feature that can be of great use as it provides different types of defined tenants that are specific to the needs of a particular client or organization. In layman language, it’s like describing a set of properties so that each specific property satisfies the need for a specific group of clients/consumers using it.

Even though it looks like Kafka lags behind pulsar, but kip (Kafka improvement proposals) has almost all of these drawbacks covered in its discussion and users can hope to see the changes in the upcoming versions of the Kafka.

Kafka To Pulsar –  User can easily migrate to Pulsar from Kafka as Pulsar natively supports to work directly with Kafka data through connectors provided or one can import Kafka application data to pulsar quite easily.

Pulsar SQL  uses Presto to query over the old messages that are kept in backlog (Apache BookKeeper).


Apache Pulsar is a powerful stream-processing platform that has been able to learn from the previously existing systems. It has a layered architecture which is complemented by the number of great out-of-the-box features like multi-tenancy, zero rebalancing downtime,geo-replication, proxy and durability and TLS-based authentication/authorization. Compared to other platforms, pulsar can give you the ultimate tools with more capabilities.

Original article source at:

#kafka #apache #architecture #benefits 

Apache Hbase Security with Kerberos and Architecture

Apache HBase is a column-oriented NoSQL database. This seems similar to the relational database, but this stores Data in a column-oriented approach. This is written in Java and is open source, distributed the multi-dimensional database.HBase provides BigTable like capabilities and runs at the top of HDFS(Hadoop Distributed File System). To need fast and random access to the data, HBase is the best choice as it provides high throughput and low latency on reading/write operations. Apache HBase consists of the keys and values and each key points to an amount which can be an array of bits or can be strings. Thus we can say that large data sets are stored in the Hbase, and this stored data can be sharable.

#insights #apache #architecture