Tia  Gottlieb

Tia Gottlieb


Leveraging Webhooks for Real-time Data Warehousing

Introduction to Webooks, an event-driven alternative to Polling


For years, we have been experiencing the trend towards service-oriented architectures for creating adaptive applications based on self-contained services, often implemented as microservices nowadays. The communication between those independent services happens through a wide range of different API technologies using some sort of lightweight communication protocols. This overall architectural design approach is a fundamental part of today’s cloud and serverless computing where servers including databases tend to completely disappear. Of course, they are still used, but the idea is that developers do not need to be aware of them. Relating this to data warehousing, then APIs have also started to replace databases in being the access point for retrieving and integrating data.

Image for post


Most cloud-based applications expose REST APIs these days allowing other systems to retrieve or manipulate data. However, as you might know from experience, repeatedly requesting data over REST APIs, has several negative implications. Above all, if you are dealing with huge amounts of data or/and you want to retrieve only changed data, preferably in (near) real-time. The most ubiquitous way to accomplish this is Polling. This means to obtain data updates through constantly sending requests to an API without knowing the server’s state and whether anything has changed in the first place. The API provider Zapier did a very interesting study across 30 million poll requests made through their services, and found that 98.5% of polls are wasted. Apart from this inefficiency, Polling towards the system also may degrade the overall performance of an application. Not to mention required mechanism for comparing states between requests to find changes, any pull request limitations or mechanisms for detecting deleted records. The good news is now that it doesn’t have to be like this.

Image for post


Instead of polling, you can subscribe and listen to retrieve event-triggered changes in real-time, just like push notifications. How does that sound?

There are many different real-time web technologies around such as Webhooks, Websockets, Server-sent Events, Long polling, Comet, etc. They are the backbone of almost all modern web applications nowadays. However, for any sort of event notifications, especially Webhooks have become increasingly adopted. For instance, GitHub moved all their services over to Webhooks which enable their APIs to push streams of events via HTTP POST requests to a configured callback URL (the webhook). There is no need to constantly pull anymore. That’s why they are also often referred as “reverse APIs”. Most modern Webhooks essentially boil down to just listening for any changes to data and then automatically sending it to another HTTP endpoint. Such event-driven APIs are therefore a perfect fit for data warehousing.

#big-data #web-development #data-analysis #data-science #programming #data analysis

What is GEEK

Buddha Community

Leveraging Webhooks for Real-time Data Warehousing
Ian  Robinson

Ian Robinson


4 Real-Time Data Analytics Predictions for 2021

Data management, analytics, data science, and real-time systems will converge this year enabling new automated and self-learning solutions for real-time business operations.

The global pandemic of 2020 has upended social behaviors and business operations. Working from home is the new normal for many, and technology has accelerated and opened new lines of business. Retail and travel have been hit hard, and tech-savvy companies are reinventing e-commerce and in-store channels to survive and thrive. In biotech, pharma, and healthcare, analytics command centers have become the center of operations, much like network operation centers in transport and logistics during pre-COVID times.

While data management and analytics have been critical to strategy and growth over the last decade, COVID-19 has propelled these functions into the center of business operations. Data science and analytics have become a focal point for business leaders to make critical decisions like how to adapt business in this new order of supply and demand and forecast what lies ahead.

In the next year, I anticipate a convergence of data, analytics, integration, and DevOps to create an environment for rapid development of AI-infused applications to address business challenges and opportunities. We will see a proliferation of API-led microservices developer environments for real-time data integration, and the emergence of data hubs as a bridge between at-rest and in-motion data assets, and event-enabled analytics with deeper collaboration between data scientists, DevOps, and ModelOps developers. From this, an ML engineer persona will emerge.

#analytics #artificial intelligence technologies #big data #big data analysis tools #from our experts #machine learning #real-time decisions #real-time analytics #real-time data #real-time data analytics

 iOS App Dev

iOS App Dev


Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

 iOS App Dev

iOS App Dev


Apache Hudi: How Uber Gets Data a Ride to its Destination

Apache Hudi provides tools to ingest data into HDFS or cloud storage, and is designed to get data into the hands of users and analysts quickly.

At a busy, data-intensive enterprise such as Uber, the volumes of real-time data that need to move through its systems on a minute-by-minute basis reaches epic proportions. This calls for a data lake extraordinaire, in which data can immediately be extracted and leveraged across a range of functions, from back-end business applications to front-end mobile apps. Uber depends on up-to-the-minute bookings and alerts as part of its appeal to customers, so its reliance on real-time data streaming platforms is off-the-charts. It has turned to Apache Hudi, an emerging platform that brings stream processing to big data, providing fresh data while being an order of magnitude efficient over traditional batch processing.

I recently had the opportunity to moderate a webcast about Apache Hudi with Nishith Agarwal and Sivabalan Narayanan, both engineers with Uber. Both Agarwal and Narayanan are active members of the Hudi programming committee.

The Hudi data lake project was originally developed at Uber in 2016, open-sourced in 2017, and submitted to the Apache Incubator in January 2019. Apache Hudi data lake technology enables stream processing on top of Apache Hadoop compatible cloud stores and distributed file systems. The solution provides tools to ingest data onto HDFS or cloud storage, as well as provide an incremental approach to resource-intensive ETL, Hive, or Spark jobs. It is designed to get data into the hands of users and analysts much quicker.

#analytics #big data #big data platforms #data management #expert systems #from our experts #real-time decisions #real-time applications #real-time data

Gerhard  Brink

Gerhard Brink


Testing Tools and Considerations for Real-Time Applications - RTInsights

Robust testing means that your Real-Time Application is more stable and reliable than ever before.

When building a Real-Time Application, any engineer would agree that testing the Application is half of the battle. Creating tests that fully cover every scenario is challenging and time-consuming.

Applications are becoming faster and easier to build in new low-code environments. As the Application creation process is revolutionized, the test creation process must be quick to follow; otherwise, the quality of Applications will begin to suffer

Rethinking Test Inputs and Outputs

Low-Code Test Development

Event Mocking

Capture Events and Replay

A New Era of Testing

#application performance #big data #big data analysis tools #big data architectures #big data platforms #real-time decisions #events #low code #real-time data

Gerhard  Brink

Gerhard Brink


Getting Started With Data Lakes

Frameworks for Efficient Enterprise Analytics

The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.

This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.


As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).

This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.

#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management