Bongani  Ngema

Bongani Ngema


NewsFeed: A Localized News Reader Android App Powered By

About News Feed

News Feed is a news app powered by which shows latest news and categorized news based on user location.

  • News is Loaded based on user Locality.
  • Clean and Simple Material UI.
  • Supports dark theme.

App architecture in MVVM

Libraries used


Open the project in Android Studio

git clone
  • Create and Place your NewsAPI key in .
  • Wait for project to finish building and happy coding.


News Feed can be download from our releases section (

Download Details:

Author: KevinGitonga
Source Code: 
License: Apache-2.0 license

#kotlin #news #mvvm #clean #architecture 

NewsFeed: A Localized News Reader Android App Powered By
Lawrence  Lesch

Lawrence Lesch


The Simple But Very Powerful & incredibly Fast State Management


The most straightforward, extensible and incredibly fast state management that is based on React state hook.


Hookstate is a modern alternative to Redux, Mobx, Recoil, etc. It is simple to learn, easy to use, extensible, very flexible and capable to address all state management needs of large scalable applications. It has got impressive performance and predictable behavior.

Any questions? Just ask by raising a GitHub ticket.

Why Hookstate

Migrating to version 4

Documentation / Code samples / Demo applications

Demo application

Development tools

Plugins / Extensions

API reference

Hookstate developers workflow

This is the mono repository, which combine the Hookstate core package, extensions, docs and demo applications. pnpm is used as node_modules manager and nx as a scripts launcher. Each package defines its own rules how to build, test, etc.

From the repository root directory:

npm install -f pnpm - install pnpm tool

pnpm install - install node_modules for all packages

pnpm nx <script> <package> - run script for a package as well as build dependencies if required, for example:

  • pnpm nx build core - run build script for core package
  • pnpm nx start todolist - run start script for todolist package as well as build for all dependencies

Download Details:

Author: Avkonst
Source Code: 
License: MIT license

#typescript #react #plugin #architecture 

The Simple But Very Powerful & incredibly Fast State Management
Bongani  Ngema

Bongani Ngema


Libbra: A Currency Tracker App Demonstration


Libbra is a sample app that allows to track currency exchanges. This app presents modern approach to Android application development using Kotlin and latest tech-stack.

This project is a hiring task by Revolut. The goal of the project is to demonstrate best practices, provide a set of guidelines, and present modern Android application architecture that is modular, scalable, maintainable and testable. This application may look simple, but it has all of these small details that will set the rock-solid foundation of the larger app suitable for bigger teams and long application lifecycle management.

Libbra Preview


Environment setup

First off, you require the latest Android Studio 3.6.0 (or newer) to be able to build the app.

Moreover, to sign your app for release, please refer to to find required fields.

# Signing Config<look>

Code style

To maintain the style and quality of the code, are used the bellow static analysis tools. All of them use properly configuration and you find them in the project root directory .{toolName}.

ToolsConfig fileCheck commandFix command
detektdefault-detekt-config./gradlew detekt-
ktlint-./gradlew ktlintCheck./gradlew ktlintFormat
spotless/spotless./gradlew spotlessCheck./gradlew spotlessApply
lint/.lint./gradlew lint-

All these tools are integrated in pre-commit git hook, in order ensure that all static analysis and tests passes before you can commit your changes. To skip them for specific commit add this option at your git command:

git commit --no-verify

The pre-commit git hooks have exactly the same checks as Github Actions and are defined in this script. This step ensures that all commits comply with the established rules. However the continuous integration will ultimately be validated that the changes are correct.


App support different screen sizes and the content has been adapted to fit for mobile devices and tablets. To do that, it has been created a flexible layout using one or more of the following concepts:

In terms of design has been followed recommendations android material design comprehensive guide for visual, motion, and interaction design across platforms and devices. Granting the project in this way a great user experience (UX) and user interface (UI). For more info about UX best practices visit link.

Moreover, has been implemented support for dark theme with the following benefits:

  • Can reduce power usage by a significant amount (depending on the device’s screen technology).
  • Improves visibility for users with low vision and those who are sensitive to bright light.
  • Makes it easier for anyone to use a device in a low-light environment.
PageLight ModeDark Mode


The architecture of the application is based, apply and strictly complies with each of the following 5 points:


Modules are collection of source files and build settings that allow you to divide a project into discrete units of functionality. In this case apart from dividing by functionality/responsibility, existing the following dependence between them:

The above graph shows the app modularisation:

  • :app depends on :rules.
  • :rules depends on nothing.

App module

The :app module is an, which is needed to create the app bundle. It is also responsible for initiating the dependency graph and another project global libraries, differentiating especially between different app environments.

Rules modules

The :rules module is an, basically contains lint checks for the entire project.

Architecture components

Ideally, ViewModels shouldn’t know anything about Android. This improves testability, leak safety and modularity. ViewModels have different scopes than activities or fragments. While a ViewModel is alive and running, an activity can be in any of its lifecycle states. Activities and fragments can be destroyed and created again while the ViewModel is unaware.

Passing a reference of the View (activity or fragment) to the ViewModel is a serious risk. Lets assume the ViewModel requests data from the network and the data comes back some time later. At that moment, the View reference might be destroyed or might be an old activity that is no longer visible, generating a memory leak and, possibly, a crash.

The communication between the different layers follow the above diagram using the reactive paradigm, observing changes on components without need of callbacks avoiding leaks and edge cases related with them.


This project takes advantage of many popular libraries, plugins and tools of the Android ecosystem. Most of the libraries are in the stable version, unless there is a good reason to use non-stable dependency.


  • Jetpack:
    • Android KTX - provide concise, idiomatic Kotlin to Jetpack and Android platform APIs.
    • AndroidX - major improvement to the original Android Support Library, which is no longer maintained.
    • Data Binding - allows you to bind UI components in your layouts to data sources in your app using a declarative format rather than programmatically.
    • ViewBinding - allows you to more easily write code that interacts with views.
    • Lifecycle - perform actions in response to a change in the lifecycle status of another component, such as activities and fragments.
    • LiveData - lifecycle-aware, meaning it respects the lifecycle of other app components, such as activities, fragments, or services.
    • Navigation - helps you implement navigation, from simple button clicks to more complex patterns, such as app bars and the navigation drawer.
    • ViewModel - designed to store and manage UI-related data in a lifecycle conscious way. The ViewModel class allows data to survive configuration changes such as screen rotations.
  • Coroutines - managing background threads with simplified code and reducing needs for callbacks.
  • Dagger2 - dependency injector for replacement all FactoryFactory classes.
  • Retrofit - type-safe HTTP client.
  • Coil - image loading library for Android backed by Kotlin Coroutines.
  • Kotlinx Serialization - consists of a compiler plugin, that generates visitor code for serializable classes, runtime library with core serialization API and JSON format, and support libraries with ProtoBuf, CBOR and properties formats.
  • Timber - a logger with a small, extensible API which provides utility on top of Android's normal Log class.
  • and more...

Test dependencies

  • Orchestrator - allows you to run each of your app's tests within its own invocation of Instrumentation.
  • Espresso - to write concise, beautiful, and reliable Android UI tests
  • JUnit - a simple framework to write repeatable tests. It is an instance of the xUnit architecture for unit testing frameworks.
  • JUnit5 - a Gradle plugin that allows for the execution of JUnit 5 tests in Android environments using Android Gradle Plugin 3.5.0 or later.
  • Mockk - provides DSL to mock behavior. Built from zero to fit Kotlin language.
  • AndroidX - the androidx test library provides an extensive framework for testing Android apps.
  • and more...


  • Ktlint - a pluging that creates convenient tasks in your Gradle project that run ktlint checks or do code auto format.
  • Detekt - a static code analysis tool for the Kotlin programming language.
  • Spotless - a code formatter can do more than just find formatting errors.
  • Versions - make easy to determine which dependencies have updates.
  • JUnit5 - a Gradle plugin that allows for the execution of JUnit5 tests in Android environments using Android Gradle Plugin 3.5.0 or later.
  • and more...

Download Details:

Author: Nuhkoca
Source Code: 
License: Apache-2.0 license

#kotlin #android #architecture #components 

Libbra: A Currency Tracker App Demonstration
Bongani  Ngema

Bongani Ngema


iiCnma: A Playground android App, Showcasing The Latest Technologies


A playground android app, showcasing the latest technologies and architecture patterns using the Movie Database APIs.


iicnma-home-detail.gif iicnma-search.gif iicnma-favorites.gif


  • Kotlin Coroutines, Flow, StateFlow
  • Hilt
  • Paging3
  • Navigation Component
  • LiveData
  • ViewModel
  • Room
  • Retrofit
  • OkHttp3
  • Glide
  • jUnit
  • Mockk
  • Coroutine Test


A custom architecture inspired by the Google MVVM and the Clean architecture.

This architecture allows app to be offline first. It gets data from the network if it doesn't exist in the local database and persists it. Local database is the single source of truth of the app and after its data changes, it notifies other layers using coroutine flows.


Clone the repository and get an API key from the Movie Database and put it in the file as below:


Download Details:

Author: ImnIrdst
Source Code: 
License: MIT license

#kotlin #android #mvvm #clean #architecture 

iiCnma: A Playground android App, Showcasing The Latest Technologies
Bongani  Ngema

Bongani Ngema


Alkaa: Open-source App to Manage Your Tasks Quickly and Easily

Alkaa 2.0

Alkaa (begin, start in Finnish) is a to-do application project to study the latest components, architecture and tools for Android development. The project evolved a lot since the beginning is available on Google Play! :heart:

The current version of Alkaa was also completely migrate to Jetpack Compose!

📦 Download

Get it on Google Play

📚 Android tech stack

One of the main goals of Alkaa is too use all the latest libraries and tools available.

🧑🏻‍💻 Android development

For more dependencies used in project, please access the Dependency File

If you want to check the previous version of Alkaa, please take a look at the last V1 release

🧪 Quality

🏛 Architecture

Alkaa architecture is strongly based on the Hexagonal Architecture by Alistair Cockburn. The application also relies heavily in modularization for better separation of concerns and encapsulation.

Let's take a look in each major module of the application:

  • app - The Application module. It contains all the initialization logic for the Android environment and starts the Jetpack Navigation Compose Graph.
  • features - The module/folder containing all the features (visual or not) from the application
  • domain - The modules containing the most important part of the application: the business logic. This module depends only on itself and all interaction it does is via dependency inversion.
  • data - The module containing the data (local, remote, light etc) from the app.
  • libraries - The module with useful small libraries for the project, such as design system,
  • navigation, test etc.

This type of architecture protects the most important modules in the app. To achieve this, all the dependency points to the center, and the modules are organized in a way that the more the module is in the center, more important it is.

To better represents the idea behind the modules, here is a architecture image representing the flow of dependency:

Alkaa Architecture

Download Details:

Author: igorescodro
Source Code: 
License: Apache-2.0 license

#kotlin #android #clean #architecture 

Alkaa: Open-source App to Manage Your Tasks Quickly and Easily
Hunter  Krajcik

Hunter Krajcik


Download Microsoft Azure Architecture Icons

Azure architecture icons help us to build a custom architecture diagram for our custom designs and solutions for the Customers.

Follow the below steps to download the Official Microsoft Azure Architecture Icons.

Step 1

Click on the link Azure icons – Azure Architecture Center | Microsoft Learn.

Step 2

On Azure architecture icons page, Select the I agree to the above terms Check box and click on Download SVG icons.

How To Download Microsoft Azure Architecture Icons

Step 3

All the Azure Icons will be downloaded as a Zip folder in the Downloads folder. Unzip the folder and have a look at all the icons used in Microsoft Azure Products in the Icons folder.

How To Download Microsoft Azure Architecture Icons

Note: Check for the Icon updates in the article and download the latest icons, whenever it is required.

How To Download Microsoft Azure Architecture Icons

Hope you have successfully downloaded Microsoft Azure Architecture icons.

Like and share your valuable feedback on this blog.

Original article source at:

#azure #architecture #icons 

Download Microsoft Azure Architecture Icons
Rupert  Beatty

Rupert Beatty


ApplicationCoordinator: Coordinators Essential Tutorial


A lot of developers need to change navigation flow frequently, because it depends on business tasks. And they spend a huge amount of time for re-writing code. In this approach, I demonstrate our implementation of Coordinators, the creation of a protocol-oriented, testable architecture written on pure Swift without the downcast and, also to avoid the violation of the S.O.L.I.D. principles.

Example provides very basic structure with 6 controllers and 5 coordinators with mock data and logic.

I used a protocol for coordinators in this example:

protocol Coordinator: class {
    func start()
    func start(with option: DeepLinkOption?)

All flow controllers have a protocols (we need to configure blocks and handle callbacks in coordinators):

protocol ItemsListView: BaseView {
    var authNeed: (() -> ())? { get set }
    var onItemSelect: (ItemList -> ())? { get set }
    var onCreateButtonTap: (() -> ())? { get set }

In this example I use factories for creating coordinators and controllers (we can mock them in tests).

protocol CoordinatorFactory {
    func makeItemCoordinator(navController navController: UINavigationController?) -> Coordinator
    func makeItemCoordinator() -> Coordinator
    func makeItemCreationCoordinatorBox(navController: UINavigationController?) ->
        (configurator: Coordinator & ItemCreateCoordinatorOutput,
        toPresent: Presentable?)

The base coordinator stores dependencies of child coordinators

class BaseCoordinator: Coordinator {
    var childCoordinators: [Coordinator] = []

    func start() { }
    func start(with option: DeepLinkOption?) { }
    // add only unique object
    func addDependency(_ coordinator: Coordinator) {
        for element in childCoordinators {
            if element === coordinator { return }
    func removeDependency(_ coordinator: Coordinator?) {
            childCoordinators.isEmpty == false,
            let coordinator = coordinator
            else { return }
        for (index, element) in childCoordinators.enumerated() {
            if element === coordinator {
                childCoordinators.remove(at: index)

AppDelegate store lazy reference for the Application Coordinator

var rootController: UINavigationController {
    return self.window!.rootViewController as! UINavigationController
  private lazy var applicationCoordinator: Coordinator = self.makeCoordinator()
  func application(_ application: UIApplication,
                   didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -> Bool {
    let notification = launchOptions?[.remoteNotification] as? [String: AnyObject]
    let deepLink = notification)
    applicationCoordinator.start(with: deepLink)
    return true
  private func makeCoordinator() -> Coordinator {
      return ApplicationCoordinator(
        router: RouterImp(rootController: self.rootController),
        coordinatorFactory: CoordinatorFactoryImp()

Based on the post about Application Coordinators and Application Controller pattern description

Coordinators Essential tutorial. Part I

Coordinators Essential tutorial. Part II

Download Details:

Author: AndreyPanov
Source Code: 
License: MIT license

#swift #ios #architecture 

ApplicationCoordinator: Coordinators Essential Tutorial
Hunter  Krajcik

Hunter Krajcik


A dependency management library inspired by SwiftUI's "environment."


A dependency management library inspired by SwiftUI's “environment.”

Learn More

This library was motivated and designed over the course of many episodes on Point-Free, a video series exploring functional programming and the Swift language, hosted by Brandon Williams and Stephen Celis.

video poster image 


Dependencies are the types and functions in your application that need to interact with outside systems that you do not control. Classic examples of this are API clients that make network requests to servers, but also seemingly innocuous things such as UUID and Date initializers, file access, user defaults, and even clocks and timers, can all be thought of as dependencies.

You can get really far in application development without ever thinking about dependency management (or, as some like to call it, "dependency injection”), but eventually uncontrolled dependencies can cause many problems in your code base and development cycle:

Uncontrolled dependencies make it difficult to write fast, deterministic tests because you are susceptible to the vagaries of the outside world, such as file systems, network connectivity, internet speed, server uptime, and more.

Many dependencies do not work well in SwiftUI previews, such as location managers and speech recognizers, and some do not work even in simulators, such as motion managers, and more. This prevents you from being able to easily iterate on the design of features if you make use of those frameworks.

Dependencies that interact with 3rd party, non-Apple libraries (such as Firebase, web socket libraries, network libraries, etc.) tend to be heavyweight and take a long time to compile. This can slow down your development cycle.

For these reasons, and a lot more, it is highly encouraged for you to take control of your dependencies rather than letting them control you.

But, controlling a dependency is only the beginning. Once you have controlled your dependencies, you are faced with a whole set of new problems:

How can you propagate dependencies throughout your entire application in a way that is more ergonomic than explicitly passing them around everywhere, but safer than having a global dependency?

How can you override dependencies for just one portion of your application? This can be handy for overriding dependencies for tests and SwiftUI previews, as well as specific user flows such as onboarding experiences.

How can you be sure you overrode all dependencies a feature uses in tests? It would be incorrect for a test to mock out some dependencies but leave others as interacting with the outside world.

This library addresses all of the points above, and much, much more. Explore all of the tools this library comes with by checking out the documentation, and reading these articles:

Getting started

Quick start: Learn the basics of getting started with the library before diving deep into all of its features.

What are dependencies?: Learn what dependencies are, how they complicate your code, and why you want to control them.


Using dependencies: Learn how to use the dependencies that are registered with the library.

Registering dependencies: Learn how to register your own dependencies with the library so that they immediately become available from any part of your code base.

Live, preview, and test dependencies: Learn how to provide different implementations of your dependencies for use in the live application, as well as in Xcode previews, and even in tests.


Designing dependencies: Learn techniques on designing your dependencies so that they are most flexible for injecting into features and overriding for tests.

Overriding dependencies: Learn how dependencies can be changed at runtime so that certain parts of your application can use different dependencies.

Dependency lifetimes: Learn about the lifetimes of dependencies, how to prolong the lifetime of a dependency, and how dependencies are inherited.

Single entry point systems: Learn about "single entry point" systems, and why they are best suited for this dependencies library, although it is possible to use the library with non-single entry point systems.


  • Concurrency support: Learn about the concurrency tools that come with the library that make writing tests and implementing dependencies easy.


We rebuilt Apple's Scrumdinger demo application using modern, best practices for SwiftUI development, including using this library to control dependencies on file system access, timers and speech recognition APIs. That demo can be found in our SwiftUINavigation library.


The latest documentation for the Dependencies APIs is available here.


You can add Dependencies to an Xcode project by adding it to your project as a package.

If you want to use Dependencies in a SwiftPM project, it's as simple as adding it to your Package.swift:

dependencies: [
  .package(url: "", from: "0.1.0")

And then adding the product to any target that needs access to the library:

.product(name: "Dependencies", package: "swift-dependencies"),


This library controls a number of dependencies out of the box, but is also open to extension. The following projects all build on top of Dependencies:


There are many other dependency injection libraries in the Swift community. Each has its own set of priorities and trade-offs that differ from Dependencies. Here are a few well-known examples:

Download Details:

Author: Pointfreeco
Source Code: 
License: MIT license

#swift #dependencies #architecture 

A dependency management library inspired by SwiftUI's "environment."
Rupert  Beatty

Rupert Beatty


The Universal System Operator and Architecture for RxSwift


The simplest architecture for RxSwift

    typealias Feedback<State, Event> = (Observable<State>) -> Observable<Event>

    public static func system<State, Event>(
        initialState: State,
        reduce: @escaping (State, Event) -> State,
        feedback: Feedback<State, Event>...
    ) -> Observable<State>



  • If it did happen -> Event
  • If it should happen -> Request
  • To fulfill Request -> Feedback loop


  • System behavior is first declaratively specified and effects begin after subscribe is called => Compile time proof there are no "unhandled states"

Debugging is easier

  • A lot of logic is just normal pure function that can be debugged using Xcode debugger, or just printing the commands.

Can be applied on any level

  • Entire system
  • application (state is stored inside a database, CoreData, Firebase, Realm)
  • view controller (state is stored inside system operator)
  • inside feedback loop (another system operator inside feedback loop)

Works awesome with dependency injection


  • Reducer is a pure function, just call it and assert results
  • In case effects are being tested -> TestScheduler

Can model circular dependencies

Completely separates business logic from effects (Rx).

  • Business logic can be transpiled between platforms (ShiftJS, C++, J2ObjC)


Simple UI Feedback loop

Complete example

    initialState: 0,
    reduce: { (state, event) -> State in
        switch event {
        case .increment:
            return state + 1
        case .decrement:
            return state - 1
    scheduler: MainScheduler.instance,
        // UI is user feedback
        bind(self) { me, state -> Bindings<Event> in
            let subscriptions = [

            let events = [
       { Event.increment },
       { Event.decrement }

            return Bindings(
                subscriptions: subscriptions,
                events: events

Play Catch

Simple automatic feedback loop.

Complete example

    initialState: State.humanHasIt,
    reduce: { (state: State, event: Event) -> State in
        switch event {
        case .throwToMachine:
            return .machineHasIt
        case .throwToHuman:
            return .humanHasIt
    scheduler: MainScheduler.instance,
        // UI is human feedback
        // NoUI, machine feedback
        react(request: { $0.machinePitching }, effects: { (_) -> Observable<Event> in
            return Observable<Int>
                .timer(1.0, scheduler: MainScheduler.instance)
                .map { _ in Event.throwToHuman }


Complete example

    initialState: State.empty,
    reduce: State.reduce,
        // UI, user feedback
        // NoUI, automatic feedback
        react(request: { $0.loadNextPage }, effects: { resource in
            return URLSession.shared.loadRepositories(resource: resource)
                .asSignal(onErrorJustReturn: .failure(.offline))

Run RxFeedback.xcodeproj > Example to find out more.



CocoaPods is a dependency manager for Cocoa projects. You can install it with the following command:

$ gem install cocoapods

To integrate RxFeedback into your Xcode project using CocoaPods, specify it in your Podfile:

pod 'RxFeedback', '~> 3.0'

Then, run the following command:

$ pod install


Carthage is a decentralized dependency manager that builds your dependencies and provides you with binary frameworks.

You can install Carthage with Homebrew using the following command:

$ brew update
$ brew install carthage

To integrate RxFeedback into your Xcode project using Carthage, specify it in your Cartfile:

github "NoTests/RxFeedback" ~> 3.0

Run carthage update to build the framework and drag the built RxFeedback.framework into your Xcode project. As RxFeedback depends on RxSwift and RxCocoa you need to drag the RxSwift.framework and RxCocoa.framework into your Xcode project as well.

Swift Package Manager

The Swift Package Manager is a tool for automating the distribution of Swift code and is integrated into the swift compiler.

Once you have your Swift package set up, adding RxFeedback as a dependency is as easy as adding it to the dependencies value of your Package.swift.

dependencies: [
    .package(url: "", majorVersion: 1)

Difference from other architectures

  • Elm - pretty close, feedback loops for effects instead of Cmd, which effects to perform are encoded into state and queried by feedback loops
  • Redux - kind of like this, but feedback loops instead of middleware
  • Redux-Observable - observables observe state vs. being inside middleware between view and state
  • Cycle.js - no simple explanation :), ask @andrestaltz
  • MVVM - separates state from effects and doesn't require a view

Download Details:

Author: NoTests
Source Code: 
License: MIT license

#swift #architecture #feedback #loop

The Universal System Operator and Architecture for RxSwift

Learn Marklogic Server Architecture


Data is the new oil. And hence managing data is of utmost importance for any enterprise. With the huge amount of data that is generated for a market now and to provide superior performance over them, NoSQL databases are now ruling the tech industry. Within the numerous NoSQL databases in the market, this emerging one is catching the attention of numerous techies and businesses. Marklogic will definitely be having very prosperous future.

What is MarkLogic Server?

In a single sentence, Marklogic is an enterprise NoSQL multi-model database management system. Let’s now break down the above sentence to get a clearer picture.

  • Enterprise – Marklogic provides enterprise features like security, acid transactions, and real-time full-text search.
  • NoSQL – Marklogic is obviously a NoSQL database at its core, hence we can expect the flexibility and scalability of a NoSQL DB.
  • Multi-model – We can save all data no matter what shape or form it is in.
  • Database – Marklogic helps in the storage of data.
  • Management System – Marklogic just doesn’t dump the data, but it helps to govern it.

Marklogic Server Architecture

Marklogic is basically a clustered database that has multiple nodes running. The following is a layered structure inside 1 node.

Let’s understand the different layers in detail in the bottom-up approach.

Data Layer

At the bottom, we have the data layer, and at the bottom of that, there is the storage system for storing the data. It is multi-model, so there is a different kind of storage. It can store compressed text like json and XML, and we can understand the structure of those documents at this level. We have binary for storing images and videos, semantic for semantic triples, and semantic relationships.

Next, we have an extensive set of indexes, consisting of the main full-text index. It also has other specialized indexes like geospatial, scalar, semantic, relational, etc. It also has a security index at this level. All data in Marklogic is mediated through the security index as Marklogic provides security in the most fundamental level of data access to Marklogic.

Caches – Provides efficient access to data storage and data on disk.
Journal – The data in Marklogic, i.e the compressed data and the indexes are written in batches. So first a journal entry is made and the data is committed to disk. So in a case of disaster, before we commit the batch data efficiently, we still have the committed journal record we can start up and get back to a good known state and maintain a consistent state.

Transaction Controller – It handles all the above, mediating transactions across the cluster. It follows acid properties, so in case of even very complicated transactions, it will make it to all the nodes in the cluster together or not at all.

Query Layer

Broadcaster – At the base of the query layer is the broadcaster, federating queries across the cluster and to multiple threads within this node in the cluster.
Aggregator – Consolidates those partial results into a complete resultset.

Caches – Used to cache the queries that are executed frequently.

Evaluator – There are multiple evaluators in Marklogic, the 2 main ones being Javascript (for json) and xquery (for xml), as well as other specialized evaluators for more specific data formats like SQL for relational data. Supporting all these evaluators is an extensive library of functions that help them to make them even more capable.


The interface to these is Http rest endpoints. There is an extensive collection of endpoints to felicitate search documents, crud operations, administration, etc. We can define new endpoints can be defined as per business requirements for the required data services.


If we are dealing with java/nodeJs there are client APIs that provide access to the same set of services. We can take our own endpoint specifications and compile them so that again the developers here can access them in an idiomatic way. If any other languages like python or shell script, we can just call rest HTTP in the normal way.


How does it fit in the bigger picture? Marklogic is a distributed DB with multiple nodes in the cluster. The above diagram is just 1 node. Each node could be just a data layer, query layer and interface, or a combination. This can be deployed on-premise or on the cloud. We could use one of the services in the cloud. Like if we are using the query service, what we have is an elastic pool of nodes focused just on the query layer that scales to our workload. If we are using the datahub service, we have a full-stack application that is dedicated to helping us integrate data. That’s the MarkLogic server and this is how it fits into our world.

Original article source at:

#server #architecture 

Learn Marklogic Server Architecture
Desmond  Gerber

Desmond Gerber


Learn Data Catalog Architecture for Enterprise Data Assets

Introduction to Data Catalog Architecture

"By 2019, data and analytics companies that have agile curated internal and external datasets for a variety of content writers would recognize twice the market benefits of those that do not," according to the study. On the other hand, organizations continue to fail to comprehend the importance of metadata management and cataloging. Given that data unification and collaboration are becoming increasingly important success factors for businesses, it's worth revisiting the data catalog and its advantages for the entire enterprise, as it will soon become the pillar of the data-driven strategy. Data Catalog is a comprehensive list of all data assets in an organization intended to assist data professionals in rapidly locating the most suitable data for any analytical or business purpose.

What is Data Catalog?

A data catalog is a list of all the data that an entity has. It is a library where data is indexed, organized, and stored for an entity. Most data catalogs provide data sources; data use information, and data lineage, explaining where the data came from and how it involved into its current state. Organizations may use a data catalog to centralize information, classify what they have, and separate data based on its content and source. A data catalog's goal is to help you understand your data and learn what you didn't know before. 

Some Important points

As a result, ensure you don't leave any data out of the catalog. Your Big Data activities can also include a data cataloging service. 

Make it a part of your daily routine rather than a separate task. Align the data plan with the catalog.

Set accessibility rules to avoid unauthorized data access.

Why is Data Catalog Important?

Listed below are the reasons Why Data Catalog is important:

Dataset Searching

Data catalog scan by facets, keywords, and business terms with robust search capabilities. Non-technical users can appreciate the ability to search using natural language. The ability to rank search results based on relevance and frequency of use is beneficial and advantageous.

Dataset Evaluation

It provides the ability to assess a dataset's suitability for an analysis use case without having to download or procure data first is critical. Previewing a dataset, seeing all related metadata, seeing user ratings, reading user reviews and curator annotations, and viewing data quality information are important evaluation features.

Data Access

It helps in its journey from search to assessment to data access should be a smooth one, with the catalog understanding access protocols and having direct access or collaborating with access technologies. Access safeguards for confidentiality, privacy, and enforcement of sensitive data are among the data access functions.

How does Data Catalog work?

Today's data production must scale to accommodate massive data volumes and high-performance computing. To adapt to data, technology, and consumer needs, it must be versatile and resilient. It must ensure that essential data information is readily available for customers to access and comprehend. It must be able to handle all data speeds, from streaming to batch ETL (Extract, Transform, and Load). It should be able to handle all forms of data, from relational to unstructured and semi-structured. It must allow all data users access to data while still protecting confidential data, and none of this is possible without metadata.

Source Data

They are connecting to the necessary data source. Data from within the company and from outside sources are examples of sources. Relationally structured, semi-structured, multi-structured, and unstructured data are all included.

Ingest Data

Including data in the analytics process. Batch and real-time ingestion methods are available, ranging from batch ETL to data stream processing. Scalability and elasticity are critical for adapting to data volumes and speed changes.

Refine Data

Data lakes, data centers, and master data/reference data hubs are all examples of shareable data stores. The data refinery is in charge of data cleansing, integration, aggregation, and other forms of data transformations.

Access data

Access to data is provided in various ways, including query, data virtualization, APIs, and data services, for both people and the applications and algorithms that use it.

Analyze Data

Turning data into information and insights includes basic reporting to data science, artificial intelligence, and machine learning.

Consume Data

Data consumption is the point at which data and people become inextricably linked. Data consumption aims to get from data and observations to decisions, behavior, and effects.

Key Ingredients for a Successful Data Catalog

All data catalogs are not created equal. It's critical to filter players based on key capabilities when selecting a data catalog. As a result, many data catalogs, including Talend Data Catalog, depend on critical components that will ensure your data strategy's effectiveness. Let's take a look at some of the essential features:

Connectors and easy-to-curation tools to build your single place of trust

The data catalog's ability to map physical datasets in your dataset, regardless of their origin or source, is enhanced by having many connectors. You can extract metadata from business intelligence software, data integration tools, SQL queries, enterprise apps like Salesforce or SAP, or data modeling tools using powerful capabilities, allowing you to onboard people to verify and certify your datasets for extended use.

Automation to gain speed and agility

Data stewards won't have to waste time manually linking data sources thanks to improved automation. They'll then concentrate on what matters most: fixing data quality problems and curating them for the whole company's good. Of course, you'll need the support of stewards to complement automation – to enrich and curate datasets over time.

Powerful search to quickly explore large datasets.

The quest should be multifaceted as the primary component of a catalog, allowing you to assign various criteria to perform an advanced search. Search parameters include names, height, time, owner, and format.

To conduct root cause analysis, use Lineage.

Lineage allows you to link a dashboard to the data it displays. Understanding the relationship between various forms and data sources relies heavily on lineage and relationship exploration. So, if your dashboard shows erroneous data, a steward may use the lineage to determine where the issue is.

Glossary to add business context to your data

The ability to federate people around the data is essential for governance. To do so, they must have a shared understanding of words, definitions, and how to relate them to the data. As a result, the glossary is helpful. If you look for PII in a data catalog, you'll find the following data sources: It's especially useful in the context of GDPR (General Data Protection Regulation), where you need to take stock of all the data you have.

Profiling to avoid polluting your data lake

When linking multiple data sources, data profiling is essential for determining your data quality in terms of completeness, accuracy, timeliness, and consistency. It will save time and enable you to spot inaccuracies quickly, allowing you to warn stewards before polluting the data lake.

Benefits of Data Catalog

The whole company gains when data professionals can help themselves to the data they need without IT interference, without relying on experts or colleagues for guidance, without being limited to just the assets they are familiar with, and without having to worry about governance enforcement.

The improved context for data

Analysts can find comprehensive explanations of data, including input from other data citizens, and understand how data is important to the company.

Increased operational efficiency

A data catalog establishes an efficient division of labor between users and IT—data people can access and interpret data more quickly. At the same time, IT workers can concentrate on higher-priority tasks.

Reduced risk

Analysts may be more confident that they're dealing with data they've been granted permission to use for a specific reason and that they're following business and data privacy regulations. They can also quickly scan annotations and metadata for null fields or incorrect values that might skew the results.

Greater success with data management initiatives

It is difficult for data analysts to identify, view, plan, and trust data, the less likely BI and big data projects will be successful.

Better and faster data analysis

The Data professionals will respond rapidly to the problems, opportunities, and challenges with analysis and answers based on all of the company's most appropriate, contextual data.

A data catalog will also assist the company in achieving particular technological and business goals. A data catalog can help discover new opportunities for cross-selling, up-selling, targeted promotions, and more by supplying analysts with a holistic view of their customers.

Role of the Data Catalog

Metadata is a thread that connects all other building materials, including ways for ingestion to be aware of sources, refinement to be connected to ingestion, and so on. Every component of the architecture contributes to the development and use of metadata.

Data Acquisition

  • Sourcing and ingestion is the point at which data acquisition is continuously updated with record metadata of all data within the analytics ecosystem system.
  • The intelligent data catalog includes AI / ML capabilities for retrieving and extracting metadata, reducing the manual effort required to capture metadata, and improving the level of metadata completeness.

Data Modification

Collects information on data flow across data pipelines and all data flow changes. This involves all data pipelines that send data to data lakes, warehouses, and processing pipelines.

This metadata, which is derived from data perception, offers lineage information critical for accurate data and a helpful tool for tracking and troubleshooting issues.

Data Availability and Data

  • Analysts rely heavily on the data catalog to collect the data they need, interpret and evaluate it, and know how to navigate it. Metadata also connects data access and governance, ensuring access restrictions are implemented.
  • Data valuation processes benefit from collecting metadata regarding access rates, and learning who accesses data the most frequently aids data professionals.

Consuming Data

This allows for collecting metadata on who uses what data, what kinds of use cases are used, and what effect the data has on the enterprise. Data processing and data-driven cultures are built on a deep understanding of data users and their data dependencies.

Everyone dealing with data should know the amount of knowledge available on data policy, preparation, and management.

Managing Data Governance, Administration, and Infrastructure Management

It is founded on data understanding, processing systems, data uses, and consumers.

Data collection systems are combined, and data processing processes are supported as information is managed as metadata in the data catalog. 


Data-driven organizations are a goal for many businesses. They want more accurate, quicker analytics without losing security. That is why data processing is becoming increasingly necessary and challenging. A data catalog makes data storage easy to handle and meets various demands. It's challenging to manage data in the era of big data, data lakes, and self-service. Data catalog assists in meeting those difficulties. Active data curation is a vital digital data processing method and a key component of data catalog performance.

Read more about the Top 9 Challenges of Big Data Architecture | Overview

Click to explore What is Data Observability?

Original article source at:

#data #asset #architecture 

Learn Data Catalog Architecture for Enterprise Data Assets
Gordon  Matlala

Gordon Matlala


Data Mesh Architecture and its Benefits

Introduction to Data Mesh

Data mesh builds a layer of connectivity that takes away the complexities of connecting, managing, and supporting data access. It is a way to fasten the data together that is held across multiple data silos. It combines the data distributed data across different locations and organizations. It provides data that is highly available, easily discoverable, and secure. It is beneficial in an organization where a team generates data from many data-driven use cases and access patterns in it.

We can use it, like when we need to connect cloud applications to sensitive data that lives in a customer's cloud environment. Also, when we need to create virtual data catalogs obtained from various data sources that can't be centralized. There is also a situation in which it is used, for instance, when we create virtual data warehouses or data lakes for analytics and ML training that can be done without consolidating data into a single repository.

What is Anthos Service Mesh?

It is a fully managed service mesh that is used for complex microservices architectures. It is a suite of tools that monitor and manage a reliable service mesh on-premises or Google Cloud. It's powered by Istio, which is a highly configurable and one of the powerful open-source service mesh platforms that have tools and features that enable industry best practices. It defines and manages configuration centrally at a higher level. It is deployed as a uniform layer across the full infrastructure. Service developers and operators can use a rich feature set without making a single change to the application code.

Anthos Service Mesh relies on Google Kubernetes Engine (GKE ) GKE On-Premise Observability features. Microservices architectures provide many benefits, but on the other hand, there are challenges like added complexity and fragmentation for different workloads. It solves the problem like it unburdens operations and development teams by simplifying service delivery across the board, from traffic management and mesh telemetry to securing communications between services.

What are the features of Anthos Service Mesh?

Here are some of the features of Anthos Service Mesh

Deep visibility built-in [beta]

Anthos Service Mesh is integrated with Cloud Logging, Cloud Monitoring, and Cloud Trace that provides many benefits, such as monitoring SLOs at a per-service level and setting targets for latency and availability.

Easy authentication, encryption

Anthos Service Mesh ensures easy authentication and encryption. It transport authentication through MTLS (Mutual Transport Layer Security) has never been more effortless. It secures service-to-service as well as end-user-to-service communications with just a one-click mTLS installation or incremental implementation.

Flexible authorization

It provides flexible authorization like we only need to specify the permissions after that grant access to them at the level that we choose, from namespace down to users.

Fine-grained traffic controls

Anthos Service Mesh opens up many traffic management features as it decouples traffic flow from infrastructure scaling and includes dynamic requests. Routing for A/B testing, canary deployments, and gradual rollout, and that also all outside of your application code.

Failure recovery out of the box

It provides many critical failure-recovery features out of the box, to configure dynamically at runtime.

What is Azure Service Fabric Mesh?

Azure Service Fabric Mesh helps the developers deploy microservices applications, and there is no need to manage virtual machines, storage, or networking. The applications hosted on Service Fabric Mesh run and scale without worrying about the infrastructure powering it. Service Fabric Mesh has clusters of many machines, and every one of these cluster operations is hidden from the developer.

You only need to upload the code and mention the resources we need, availability requirements, and resource limits. It automatically allocates the infrastructure and handles infrastructure failures as well, and we need to make sure the applications are highly available. We need to take care of the health and responsiveness of the application and not the infrastructure. Azure Service Fabric has three public offerings: Service Fabric Azure Cluster service, Service Fabric Standalone, and Azure Service Fabric Mesh service.

What is AWS App Mesh?

AWS App Mesh helps to run services by providing consistent visibility and network traffic controls. For services built across multiple computing infrastructure types. App Mesh abolishes the necessity to update the application code. To vary how monitoring data is collected or traffic is routed between services. It configures each service to export monitoring data and implements consistent communications control logic across your application. When any failure occurs or when code changes must be deployed, therein situation makes it easy. To pinpoint the precise location of errors quickly and automatically reroute network traffic.

What are the advantages of AWS App Mesh?

Following are the advantages of AWS App Mesh: Provides End-to-end visibility because it captures metrics, logs, and traces from all of your applications. We can combine and export this data to Amazon CloudWatch, AWS X-Ray, and community tools for monitoring, helping to quickly identify and isolate issues with any service to optimize your entire application.

Ensure High Availability

App Mesh gives controls to configure how traffic flows between your services. Implement easily custom traffic routing rules to ensure that every service is highly available during deployments, after failures, and as your application scales.

Streamline Operations

App Mesh configures and deploys a proxy that manages all communications traffic to and from your services. This removes the requirement to configure communication protocols for every service, write custom code, or implement libraries to control the application.

Enhance Any Application

Users can use App Mesh with services running on any compute services like AWS Fargate, Amazon EKS, Amazon ECS, and Amazon EC2. App Mesh can also monitor and control communications for monoliths running on EC2. Teams running containerized applications, orchestration systems, or VPCs as one application with no code changes.

Hybrid Deployments

To configure a service mesh for applications deployed on-premises, we can use AWS App Mesh on AWS Outposts. AWS Outposts could be a fully managed service that extends AWS infrastructure, AWS services, APIs, and tools to virtually any connected site. With AWS App Mesh on Outposts, you'll provide consistent communication control logic. For services across AWS Outposts and AWS cloud to simplify hybrid application networking.

Data Mesh vs Data Lake

Given below are the differences between Data Mesh and Data Lake.

  • The data lake is a storage repository. That holds a vast amount of raw data in its native format. The hierarchical data warehouse stores data in files or folders. Whereas the data lake uses a flat architecture to store data.
  • The advantage of the data lake is that it is a Centralized, singular, schema-less data store with raw (as-is) data as well as massaged data.
  • The Mechanism for fast ingestion of data with appropriate latency
  • It helps to map data across various sources and give visibility and security to users
  • Catalog to find and retrieve data
  • Costing model of centralized service
  • Ability to manage security, permissions, and data masking
  • The main difference between data mesh and data lake is that it is decentralized ownership in which domain teams usually consider their data a byproduct that they don't own because a data lake is centralized ownership of that raw data.

How is Data Mesh different from Data Fabric?

  • Data Fabric integrates data management across cloud and on-premises to accelerate digital transformation. It helps deliver consistent and integrated hybrid cloud data services that help data visibility and insights, data access and control, and data protection and security.
  • Data Fabric and its difference is that Data fabric allows clear access of data and sharing of data across distributed computing systems by means of a data management framework that is single, secured, and controlled.
  • But Data Mesh follows a metadata-driven approach and is a distributed data architecture supported by machine learning capabilities. It is a tailor-made distributed ecosystem with reusable data services, a centralized governance policy, and dynamic data pipelines.

What are the benefits of Data Mesh?

  1. It provides agility. In this, each node works independently. The node is containerized and can be deployed as soon as any changes are ready.
  2. Construct and deploy new nodes to the mesh, whenever new data arises. Many portals and teams can access the same node, allowing the organization to scale the data mesh. This way, it provides scalability.
  3. Use it under various circumstances, like connecting cloud applications to sensitive data that lives in a customer's on-premise or cloud environment. Use it while creating virtual data catalogs from various data sources. We need to create virtual data warehouses or data lakes for analytics and machine learning training without consolidating data into a single repository.


A data mesh allows the organization to escape the analytical and consumptive confines of monolithic data architectures and connects siloed data. To enable machine learning and automated analytics at scale. It allows the company to be data-driven and give up data lakes and data warehouses. It replaces them with the power of data access, control, and connectivity.

Original article source at:

#data #architecture 

Data Mesh Architecture and its Benefits
Sheldon  Grant

Sheldon Grant


Apache Pulsar Architecture and Benefits

Introduction to Apache Pulsar

Apache Pulsar is a multi-tenant, high-performance server to server messaging system. Yahoo developed it. In late 2016 it was a first open-source project. Now it is in the incubation, under the Apache Software Foundation(ASF). Pulsar works on the pub-sub pattern, where there is a Producer, and a Consumer also called the subscribers, the topic is the core of the pub-sub model, where producer publish their messages on a given pulsar topic, and consumer subscribes to a problem to get news from that topic and send an acknowledgement.

Once a subscription has been acknowledged, all the messages will be retained by the pulsar. One Consumer acknowledged has been processed only after that message gets deleted.Apache Pulsar TopicsApache Pulsar Topics:  are well defined named channels for transmitting messages from producers to consumers. Topics names are well-defined URL.

Namespaces:  It is logical nomenclature within a tenant. A tenant can create multiple namespaces via admin API. A namespace allows the application to create and manage a hierarchy of topics. The number of issues can be created under the namespace.

Apache Pulsar Subscription Modes

A subscription is a named rule for the configuration that determines the delivery of the messages to the consumer. There are three subscription modes in Apache Pulsar


Apache Pulsar Subscription Mode Exclusive

In Exclusive mode, only a single consumer is allowed to attach to the subscription. If more then one consumer attempts to subscribe to a topic using the same subscription, then the consumer receives an error. Exclusive mode as default is subscription model.


Apache Pulsar Subscription Failover

In failover, multiple consumers attached to the same topic. These consumers are sorted in lexically with names, and the first consumer is the master consumer, who gets all the messages. When a master consumer gets disconnected, the next consumers will get the words.


Apache Pulsar Subscription Mode SharedShared and round-robin mode, in which a message is delivered only to that consumer in a round-robin manner. When that user is disconnected, then the messages sent and not acknowledged by that consumer will be re-scheduled to other consumers. Limitations of shared mode-

  • Message ordering is not guaranteed.
  • You can’t use cumulative acknowledgement with shared mode.

The process used for analyzing the huge amount of data at the moment it is used or produced. Click to explore about our, Real Time Data Streaming Tools

Routing Modes

The routing modes determine which partition to which topic a message will be subscribed. There are three types of routing methods. When using partitioned questions to publish, routing is necessary.

Round Robin Partition 

If no key is provided to the producer, it will publish messages across all the partitions available in a round-robin way to achieve maximum throughput. Round-robin is not done per individual message but set to the same boundary of batching delay, and this ensures effective batching. While if a key is specified on the message, the producer that is partitioned will hash the key and assign all the messages to the particular partition. This is the default mode.

Single Partition

If no key is provided, the producer randomly picks a single partition and publish all the messages in that particular partition. While if the key is specified for the message, the partitioned producer will hash the key and assign the letter to the barrier.

Custom Partition

The user can create a custom routing mode by using the java client and implementing the MessageRouter interface. Custom routing will be called for a particular partition for a specific message.

Apache Pulsar Architecture

Pulsar ArchitecturePulsar cluster consists of different parts in it: In pulsar, there may be one more broker’s handles, and load balances incoming messages from producers, it dispatches messages to consumers, communicates with the pulsar configuration store to handle various coordination tasks. It stores messages in BookKeeper instances.

  • BookKeeper cluster consisting of one or more bookies to handles persistent storage of messages.
  • ZooKeeper cluster calls the configuration store to handle coordination tasks that involve multiple groups.


The broker is a stateless component that handles an HTTP server and the Dispatcher. An HTTP server exposes a Rest API for both administrative tasks and topic lookup for producers and consumers. A dispatcher is an async TCP server over a custom binary protocol used for all data transfers.


A Pulsar instance usually consists of one or more Pulsar clusters. It consists of: One or more brokers, a zookeeper quorum used for cluster-level configuration and coordination and an ensemble of bookies used for persistent storage of messages.

Metadata store

Pulsar uses apache zookeeper to store the metadata storage, cluster config and coordination.

Persistent storage

Pulsar provides surety of message delivery. If a message reaches a Pulsar broker successfully, it will be delivered to the target that’s intended for it.

Pulsar Clients

Pulsar has client API’s with language Java, Go, Python and C++. The client API encapsulates and optimizes pulsar’s client-broker communication protocol. It also exposes a simple and intuitive API for use by the applications. The current official Pulsar client libraries support transparent reconnection, and connection failover to brokers, queuing of messages until acknowledged by the broker, and these also consists of heuristics such as connection retries with backoff.

Client setup phase

When an application wants to create a producer/consumer, the pulsar client library will initiate a setup phase that is composed of two setups:

  1. The client will attempt to determine the owner of the topic by sending an HTTP lookup request to the broker. The application could reach to an active broker which in return by looking at the cached metadata of zookeeper will let the user know about the serving topic or assign it to the least loaded broker in case nobody is serving it.
  2. Once the client library has the broker address, it will create a TCP connection (or reuse an existing connection from the pool) and authenticate it. Within this connection, binary commands are exchanged between the broker and the client from the custom protocol. At this point, the client sends a command to create consumer or producer to the broker, which complies after user validates the authorization policy.


Apache Pulsar’s Geo-replication enables messages to be produced in one geolocation and can be consumed in other geolocation.  Geo ReplicationIn the above diagram, whenever producers P1, P2, and P3 publish a message to the given topic T1 on Cluster – A, B and C respectively, all those messages are instantly replicated across clusters. Once replicated, this allows consumers C1 & C2 to consume the messages from their respective groups. Without geo-replication, C1 and C2 consumers are not able to consume messages published by P3 producers.


Pulsar was created from the group up as a multi-tenant system. Apache supports multi-tenancy. It is spread across a cluster, and each can have their authentication and authorization scheme applied to them. They are also the administrative unit at which storage, message Ttl, and isolation policies can be managed.


To each tenant in a particular pulsar instance you can assign:     

  • An authorization scheme.     
  • The set of the cluster to which the tenant’s configuration applies.

The Dataset is a data structure in Spark SQL which is strongly typed, Object-oriented and is a map to a relational schema.Click to explore about our, RDD in Apache Spark Advantages

Authentication and Authorization

Pulsar has support for the authentication mechanism which can be configured at the broker, and it also supports authorization to identify the client and its access rights on topics and tenants.

Tiered Storage

Pulsar’s architecture allows topic backlogs to grow very large. This makes a rich set of the situation over time. To alleviate this cost is to use Tiered Storage. The Tiered Storage move older messages in the backlog can be moved from BookKeeper to cheaper storage. Which means clients can access older backlogs.

Schema Registry

Type safety is paramount in communication between the producer and the consumer in it. For safety in messaging, pulsar adopted two basic approaches:

Client-side approach

In this approach message producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also “knowing” which types are being transmitted via which topics. 

Server-side approach

In this approach which producers and consumers inform the system which data types can be transmitted via the topic. With this approach, the messaging system enforces type safety and ensures that both producers and consumers remain in sync.

How schemas work ?

Pulsar schema is applied and enforced at the topic level. Producers and consumers upload schemas to pulsar are asked. Pulsar schema consists of :

  • Name: name is the topic to which the schema is applied.
  • Payload: binary representation of the schema.
  • User-defined properties as a string/string map

It supports the following schema formats:

  • JSON
  • Protobuf
  • Avro
  • string (used for UTF-8-encoded lines) 

If no schema is defined, producers and consumers handle raw bytes.

What are the Pros and Cons?

The pros and cons of Apache Pulsar are described below:


  • Feature-rich – persistent/nonpersistent topics
  • Multi-tenancy
  • More flexible client API- including CompletableFutures,fluent interface
  • Java clients have till date to no java docs.


  •  Community base is small.
  •  The reader can’t read the last message in the topic [need to skim through all the words]
  •  Higher operational complexity – ZooKeeper + Broker nodes + BookKeeper + all clustered.
  • Java client components are thread-safe – the consumer can acknowledge messages from different threads.

Apache Pulsar Multi-Layered Architecture

Pulsar multilayered Architecture

Difference between Apache Kafka and Apache Pulsar

S.No. KafkaApache Pulsar
1It is more mature and higher-level APIs.It incorporated improved design stuff of Kafka and its existing capabilities.
2Built on top of Kafka Streams

 Unified messaging model and API.

  • Streaming via exclusive, failover subscription
  • Queuing via shared subscription
3Producer-topic-consumer group-consumerProducer-topic-subscription-consumer
4Restricts fluidity and flexibilityProvide fluidity and flexibility
5Messages are deleted based on retention. If a consumer doesn’t read words before the retention period, it will lose data. Messages are only deleted after all subscriptions consumed them. No data loss, even the consumers of a subscription are down for a long time. Words are allowed to keep for a configured retention period time even after all subscriptions consume them.

Drawbacks of Kafka

  1. High Latency
  2. Poor Scalability
  3. Difficulty supporting global architecture (fulfilled by pulsar with the help of geo-replication)
  4. High OpEx (operation expenditure)

How Apache Pulsar is better than Kafka

  1. Pulsar has shown notable improvements in bot latency and throughput when compared with Kafka. Pulsar is approximately 2.5 times faster and has 40% less lag than Kafka.
  2. Kafka, in many scenarios, has shown that it doesn’t go well when there are thousands of topics and partitions even if the data is not massive. Fortunately, the pulsar is designed to serve hundreds of thousands of items in a cluster deployed.
  3. Kafka stores data and logs in the dedicated files and directories (Broker) this creates trouble at the time of scaling (files are loaded to disk periodically). In contrast, scaling is effortless in the case of the pulsar as pulsar has stateless brokers that means scaling is not rocket science, pulsar uses bookies to store data. 
  4. Kafka brokers are designed to work together in a single region in the network provided. So it is not an easy way to work with multi-datacentre architecture. Whereas, pulsar offers geo-replication in which user can easily replicate it’s data synchronously or asynchronously among any number of clusters.
  5. Multi-tenancy is a feature that can be of great use as it provides different types of defined tenants that are specific to the needs of a particular client or organization. In layman language, it’s like describing a set of properties so that each specific property satisfies the need for a specific group of clients/consumers using it.

Even though it looks like Kafka lags behind pulsar, but kip (Kafka improvement proposals) has almost all of these drawbacks covered in its discussion and users can hope to see the changes in the upcoming versions of the Kafka.

Kafka To Pulsar –  User can easily migrate to Pulsar from Kafka as Pulsar natively supports to work directly with Kafka data through connectors provided or one can import Kafka application data to pulsar quite easily.

Pulsar SQL  uses Presto to query over the old messages that are kept in backlog (Apache BookKeeper).


Apache Pulsar is a powerful stream-processing platform that has been able to learn from the previously existing systems. It has a layered architecture which is complemented by the number of great out-of-the-box features like multi-tenancy, zero rebalancing downtime,geo-replication, proxy and durability and TLS-based authentication/authorization. Compared to other platforms, pulsar can give you the ultimate tools with more capabilities.

Original article source at:

#kafka #apache #architecture #benefits 

Apache Pulsar Architecture and Benefits
Rupert  Beatty

Rupert Beatty


Learn Overview Of Microservices and Service-Oriented Architecture

What is Service-Oriented Architecture?

  • Service-Oriented Architecture (SOA) is a software architectural style that structures an application by breaking it down into multiple components called services.
  • Each service represents a functional business domain.
  • In SOA applications, each service is independent and provides its own business purposes but can communicate with others across various platforms and languages.
  • SOA components are loosely coupled and use a central Enterprise Service Bus (ESB) to communicate.

What is a microservice?

  • On the other hand, a microservice is an architectural style that focuses on maintaining several independent services that work collectively to create an application.
  • Each individual service within a microservice uses internal APIs to communicate.


  • Although SOA and Microservices seem similar, they are still two different architecture types. Microservices are like a more fine-grained evolution of SOA.
  • One of their main differences is scope. Microservices are suited to smaller modern web services.
  • Each service within a microservices generally has one specific purpose, whereas components in SOA have more complex business purposes and functionality and are often implemented as subsystems.
  • SOA is therefore suited to larger enterprise application environments.
  • Another significant difference is how both architectures communicate. Every service in SOA communicates through an ESB. If this ESB fails, it compromises functionality across all services.
  • On the other hand, services within a microservice are entirely independent. If one fails, the rest of the services remain functional. Overall, Microservices are more error tolerant.
  • Today SOA applications are uncommon as it's an older architecture that may not be suitable for modern cloud-based applications. 
  • However, microservices were developed for the cloud-native movement, and most developers prefer the versatility of service independence they offer.

Original article source at:

#microservices #service #architecture 

Learn Overview Of Microservices and Service-Oriented Architecture
Nigel  Uys

Nigel Uys


Learn, Build and Deploy Microservices

Microservice Architecture – Learn, Build and Deploy Microservices

Microservice Architecture:

From my previous blog, you must have got a basic understanding of Microservice Architecture. But, being a professional with certified expertise in Microservices will require more than just the basics. In this blog, you will get into the depth of the architectural concepts and implement them using an UBER-case study.

In this blog, you will learn about the following:

  • Definition Of Microservice Architecture
  • Key Concepts Of Microservice Architecture
  • Pros And Cons Of Microservice Architecture
  • UBER – Case Study

You can refer to the What is Microservices, to understand the fundamentals and benefits of Microservices.

It will only be fair if I give you the definition of Microservices.

Definition Of Microservices

As such, there is no proper definition of Microservices aka Microservice Architecture, but you can say that it is a framework which consists of small, individually deployable services performing different operations.

Microservices focus on a single business domain that can be implemented as fully independent deployable services and implement them on different technology stacks.

Differences Between Monolithic Architecture And Microservices - Microservice Architecture - Edureka

Figure 1:  Difference Between Monolithic and Microservice Architecture – Microservice Architecture

Refer to the diagram above to understand the difference between monolithic and microservice architecture. For a better understanding of differences between both the architectures, you can refer to my previous blog What Is Microservices

To make you understand better, let me tell you some key concepts of microservice architecture.

Key Concepts Of  Microservice Architecture

Before you start building your own applications using microservices you need to be clear about the scope, and functionalities of your application.

Following are some guidelines to be followed while discussing microservices.

Guidelines While Designing Microservices

  • As a developer, when you decide to build an application separate the domains and be clear with the functionalities.
  • Each microservice you design shall concentrate only on one service of the application.
  • Ensure that you have designed the application in such a way that each service is individually deployable.
  • Make sure that the communication between microservices is done via a stateless server.
  • Each service can be furthered refactored into smaller services, having their own microservices.

Now, that you have read through the basic guidelines while designing microservices, let’s understand the architecture of microservices. 

How Does Microservice Architecture Work?

A typical Microservice Architecture (MSA) should consist of the following components:

  1. Clients
  2. Identity Providers
  3. API Gateway
  4. Messaging Formats
  5. Databases
  6. Static Content
  7.  Management
  8. Service Discovery

Refer to the diagram below.

Architecture Of Microservices - Microservice Architecture - Edureka

Figure 2:  Architecture Of Microservices – Microservice Architecture

I know the architecture looks a bit complex, but let me simplify it for you.

1. Clients

The architecture starts with different types of clients, from different devices trying to perform various management capabilities such as search, build, configure etc.

2. Identity Providers

These requests from the clients are then passed on the identity providers who authenticate the requests of clients and communicate the requests to API Gateway. The requests are then communicated to the internal services via well-defined  API Gateway.

3. API Gateway

Since clients don’t call the services directly, API Gateway acts as an entry point for the clients to forward requests to appropriate microservices.

The advantages of using an API gateway include:

  • All the services can be updated without the clients knowing.
  • Services can also use messaging protocols that are not web-friendly.
  • The API Gateway can perform cross-cutting functions such as providing security, load balancing etc.

After receiving the requests of clients, the internal architecture consists of microservices which communicate with each other through messages to handle client requests.

4. Messaging Formats

There are two types of messages through which they communicate:

  • Synchronous Messages: In the situation where clients wait for the responses from a service, Microservices usually tend to use REST (Representational State Transfer) as it relies on a stateless, client-server, and the HTTP protocol. This protocol is used as it is a distributed environment each and every functionality is represented with a resource to carry out operations
  • Asynchronous Messages: In the situation where clients do not wait for the responses from a service, Microservices usually tend to use protocols such as AMQP, STOMP, MQTT. These protocols are used in this type of communication since the nature of messages is defined and these messages have to be interoperable between implementations.

The next question that may come to your mind is how do the applications using Microservices handle their data?

5. Data Handling

Well, each Microservice owns a private database to capture their data and implement the respective business functionality.Also, the databases of Microservices are updated through their service API only. Refer to the diagram below:

Representation Of Databases Within Each Microservice - Microservice Architecture - Edureka494-02Figure 3:  Representation Of Microservices Handling Data – Microservice Architecture

The services provided by Microservices are carried forward to any remote service which supports inter-process communication for different technology stacks.

6. Static Content

After the Microservices communicate within themselves, they deploy the static content to a cloud-based storage service that can deliver them directly to the clients via Content Delivery Networks (CDNs).

Apart from the above components, there are some other components appear in a typical Microservices Architecture:

7. Management

This component is responsible for balancing the services on nodes and identifying failures.

8. Service Discovery

Acts as a guide to Microservices to find the route of communication between them as it maintains a list of services on which nodes are located.

Now, let’s look into the pros and cons of this architecture to gain a better understanding of when to use this architecture.

Pros and Cons of Microservice Architecture

Refer to the table below.


Pros Of Microservice ArchitectureCons Of Microservice Architecture
Freedom to use different technologiesIncreases troubleshooting challenges
Each microservice focuses on single business capabilityIncreases delay due to remote calls
Supports individual deployable unitsIncreased efforts for configuration and other operations
Allows frequent software releasesDifficult to maintain transaction safety
Ensures security of each serviceTough to track data across various service boundaries
Multiple services are parallelly developed and deployedDifficult to move code between services

Let us understand more about Microservices by comparing UBER’s previous architecture to the present one.


UBER’s Previous Architecture

Like many startups, UBER began its journey with a monolithic architecture built for a single offering in a single city. Having one codebase seemed cleaned at that time, and solved UBER’s core business problems. However, as UBER started expanding worldwide they rigorously faced various problems with respect to scalability and continuous integration.

Monolithic Architecture Of UBER - Microservice Architecture - Edureka

  Figure 4:  Monolithic Architecture Of UBER – Microservice Architecture

The above diagram depicts UBER’s previous architecture.

  • A REST API is present with which the passenger and driver connect.
  • Three different adapters are used with API within them, to perform actions such as billing, payments, sending emails/messages that we see when we book a cab.
  • A MySQL database to store all their data.

So, if you notice here all the features such as passenger management, billing, notification features, payments, trip management and driver management were composed within a single framework.

Problem Statement

While UBER started expanding worldwide this kind of framework introduced various challenges. The following are some of the prominent challenges

  • All the features had to be re-built, deployed and tested again and again to update a single feature.
  • Fixing bugs became extremely difficult in a single repository as developers had to change the code again and again.
  • Scaling the features simultaneously with the introduction of new features worldwide was quite tough to be handled together.


To avoid such problems UBER decided to change its architecture and follow the other hyper-growth companies like Amazon, Netflix, Twitter and many others. Thus, UBER decided to break its monolithic architecture into multiple codebases to form a microservice architecture.

Refer to the diagram below to look at UBER’s microservice architecture.

Microservice Architecture Of UBER - Microservice Architecture - Edureka

Figure 5:  Microservice Architecture Of UBER – Microservice Architecture

  • The major change that we observe here is the introduction of API Gateway through which all the drivers and passengers are connected. From the API Gateway, all the internal points are connected such as passenger management, driver management, trip management and others.
  • The units are individual separate deployable units performing separate functionalities.
    • For Example: If you want to change anything in the billing Microservices, then you just have to deploy only billing Microservices and don’t have to deploy the others.
  • All the features were now scaled individually i.e. The interdependency between each and every feature was removed.
    • For Example, we all know that the number of people searching for cabs is more comparatively more than the people actually booking a cab and making payments. This gets us an inference that the number of processes working on the passenger management microservice is more than the number of processes working on payments.

In this way, UBER benefited by shifting its architecture from monolithic to Microservices.

I hope you have enjoyed reading this post on Microservice Architecture. I will be coming up with more blogs, which will contain hands-on as well.

If you wish to learn Microservices and build your own applications, then check out our Microservices Architecture Training which comes with instructor-led live training and real-life project experience. This training will help you understand Microservices in depth and help you achieve mastery over the subject.

Got a question for us? Please mention it in the comments section of ” Microservice Architecture” and I will get back to you.

Original article source at:

#microservices #architecture 

Learn, Build and Deploy Microservices