Noah  Rowe

Noah Rowe


What Is the Next Stop For Big Data?

Due to different emphases, traditional databases can be divided into a transaction-focused OLTP system and an analytics-focused OLAP system. With the development of the Internet, the amount of data has increased exponentially, and the single-machine database can no longer meet the needs of the business. Especially in the field of analytics, a query may need to process a large part or even the full amount of data, and the pressure brought by massive data becomes particularly urgent. This contributed to the big data revolution that started with Hadoop technology in the past decade or so and solved the demand for massive data analytics. At the same time, several distributed database products have emerged in the database field to cope with the growth of OLTP scenario data.

distributed database vs big data system

To analyze the data in the OLTP system, the standard practice is to synchronize the data in the OLTP system to an OLAP system regularly (e.g. every day). This architecture ensures that analytical queries will not affect online transactions. However, regular synchronization has led to the analytics results not being based on the latest data, and this delay has deprived us of the opportunity to make more timely business decisions. To solve this problem, HTAP architecture has emerged in recent years, which allows us to directly analyze the data in the OLTP database, thus ensuring the timeliness of analytics. Analytics is no longer a unique capability of traditional OLAP systems or big data systems. A natural question is: Since HTAP has the capability of analytics, will it replace big data systems? What is the next stop for big data?

Background Introduction

To answer this question, we will take the recommendation system as an example to analyze the typical scenarios of big data systems.

When you see the shopping application showing you what you just want to buy and the short video application playing your favorite music, the recommendation system is playing its magical role. The core goal of an advanced recommendation system is to make personalized recommendations according to the real-time behavior of users. Each interaction between users and the system will optimize the next experience in real-time. To support such a system, the big data technology stack has evolved into a very complex and fragmented system.

The following figure shows a big data technology stack that supports a real-time recommendation system.

To provide a high-quality real-time personalized recommendation, the recommendation system relies heavily on real-time features and continuous updating of models.

Real-time features can be divided into two categories:

  • The system will collect massive user behavior events (such as browsing, clicking, etc.) and transaction records (such as payment records synchronized from the OLTP database, etc.). The amount of these data is very large (the traffic volume may be as high as tens of millions or even hundreds of millions of pieces per second), and most of them do not come from the trading system. For the convenience of future use, these data will be imported into the system (a in the figure), and at the same time, they will be associated with various dimension table data to deduce a series of important features (1 in the figure), which will be updated to the recommendation system in real-time to optimize the user experience. The real-time dimension table association here requires point check support with low latency and high throughput to keep up with the newly generated data.
  • The system will also use sliding windows and other methods to calculate the characteristics of various dimensions and time granularity (such as the number of clicks in the past 5 minutes, the number of views in the past 7 days and the sales in the past 30 days of a particular commodity, etc.). Depending on the granularity of the sliding window, these aggregations may be completed through stream computation or batch processing.

These data are also used to generate real-time and offline machine learning samples, and the trained models will be continuously updated to the recommendation system after verification.

What is explained above is the core part of an advanced recommendation system, but this is only the tip of the iceberg of the whole system? Also, a complete set of systems such as real-time model monitoring, verification, analytics, and tuning are needed, which include: using a real-time large screen to view the results of A/B test (3), using interactive analytics (4) for BI, and refining and tuning the model. Also, the operation will use various complicated queries to gain insight into the progress of the business and carry out targeted marketing utilizing customer-targeting and product recommendations.

This example shows a very complex but typical big data scenario, from real-time data import (a) to pre-aggregation (b), from data service (1), continuous aggregation (3), to interactive query (4), to batch processing (2). Such complex scenarios have very diversified requirements for big data systems. We have seen two new trends in the practice of building these systems.

Real-time: Business needs to quickly gain business insight from the data just collected. The written data needs to be visible in seconds or even sub-seconds. The lengthy offline ETL process is becoming intolerable. At the same time, the collected data is much larger than the data synchronized from the OLTP system, and the event log data such as user browsing and clicking are even several orders of magnitude larger than it. Our system needs to be able to provide low-latency query capability while writing data in extremely high throughput.

Hybrid serving and analytics: Traditional OLAP systems often play a relatively static role in business. We gain business insight (such as pre-calculated views, models, etc.) by analyzing data, and provide online data services via another system based on the acquired knowledge. The service and analytics here are a fragmented process. In contrast, the ideal business decision-making process is often an online process of continuous optimization. The process of service will generate a large amount of new data, and we need to make complex analytics of these new data. The insight generated by the analytics is fed back to the service in real-time to create greater commercial value. Service and analytics are forming a closed loop.

Existing solutions address the need for real-time service/analytics convergence through a combination of a series of products. For example, through Apache Flink to do real-time pre-aggregation of data, the aggregated data will be stored in products such as Apache Druid that provide multi-dimensional analytics, and data services will be provided through products such as Apache HBase. This chimney development mode will inevitably generate isolated data islands, thus causing unnecessary data duplication. The complex data synchronization between various products also makes the consistency and security of data a challenge. This complexity makes it difficult for application development to respond to new requirements quickly, affects the iteration speed of business, and also brings large additional overheads to development and operation and maintenance.

big data system

We believe that real-time service/analytics integration should be implemented through a unified Hybrid Serving/Analytical Processing (HSAP) system.

Through such a system, application development no longer needs to deal with multiple different products, and no longer needs to learn and accept the problems and limitations of each product, which can greatly simplify the business architecture and improve the development and operation efficiency. Such a unified system can avoid unnecessary data duplication and thus save costs. At the same time, this architecture can also bring second-level or even sub-second-level real-time performance to the system, making business decisions more real-time, thus allowing data to play a greater commercial value.

Although a distributed HTAP system has the capability of real-time analytics, it cannot solve the problem of big data.

First of all, the data synchronized by the transaction system is only a small part of the data that the real-time recommendation system needs to process. Most of the other data come from non-transaction systems such as logs (users often have dozens or even hundreds of browsing behaviors before each purchase). Most of the analytics are conducted on these non-transactional data. However, the HTAP system does not have this part of data, so it is impossible to analyze.

Can these non-transaction data be written into the HTAP system for analytics? Let’s analyze the difference in data writing mode between the HTAP system and the HSAP system. The cornerstone and advantage of the HTAP system are to support fine-grained distributed transactions. Transactional data are often written into the HTAP system in the form of many distributed small transactions. However, the data from logs and other systems do not have the semantics of fine-grained distributed transactions. If these non-transactional data are to be imported into the HTAP system, unnecessary overheads will inevitably be brought.

#big data #data warehouse #data lake #htap #data analysis

What is GEEK

Buddha Community

What Is the Next Stop For Big Data?
Siphiwe  Nair

Siphiwe Nair


Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

Big Data Consulting Services | Big Data Development Experts USA

Big Data Consulting Services

Traditional data processing application has limitations of its own in terms of processing the large chunk of complex data and this is where the big data processing application comes into play. Big data processing app can easily process complex and large information with their advanced capabilities.

Want to develop a Big Data Processing Application?

WebClues Infotech with its years of experience and serving 350+ clients since our inception is the agency to trust for the Big Data Processing Application development services. With a team that is skilled in the latest technologies, there can be no one better for fulfilling your development requirements.

Want to know more about our Big Data Processing App development services?


Share your requirements

View Portfolio

#big data consulting services #big data development experts usa #big data analytics services #big data services #best big data analytics solution provider #big data services and consulting

Silly mistakes that can cost ‘Big’ in Big Data Analytics

Big Data has played a major role in defining the expansion of businesses of all kinds as it helps the companies to understand their audience and devise their business techniques in accordance with the requirement.

The importance of ‘Data’ has been spoken very highly in the modern-day business. Thus, while using big data analysis, the companies must keep away from these minor mistakes otherwise it could have a major impact on their performances. Big Data analysis can be the silver bullet that can answer your questions and help your business to scale newer heights.

Read More: Silly mistakes that can cost ‘Big’ in Big Data Analytics

#top big data analytics companies #best big data service providers #big data for business #big data technology #big data mistakes #big data analytics

Big Data can be The ‘Big’ boon for The Modern Age Businesses

The rapid growth of technology has led to many people opting for online services, and thus the collection and maintenance of data becomes a significant factor for any company. Big data analytics service providers can help the companies get a massive edge over their competitors as they would manage the data well and allow the businesses to make better business decisions. It will provide you with a combination of increased customer experience, revenue, and reduced cost and thus will create a win-win situation for your business. Big data technologies will be your perfect ally in excelling in the cut-throat business environment and come out with flying colors.

Read More: Big Data can be The ‘Big’ boon for The Modern Age Businesses

#big data analytics service providers #top big data analytics companies #impact of big data on businesses #best big data consulting firms #big data #big data for businesses

Top Microsoft big data solutions Companies | Best Microsoft big data Developers

An extensively researched list of top Microsoft big data analytics and solution with ratings & reviews to help find the best Microsoft big data solutions development companies around the world.
An exclusive list of Microsoft Big Data consulting and solution providers, after examining various factors of expert big data analytics firms and found the equivalent matches that boast the ace qualities with proven fineness in data analytics. For business growth and enterprise acceleration getting inputs from the whole data of the organization have become necessary, thus we bring to you the most trustworthy Microsoft Big Data consultants and solutions providers for your assistance.
Let’s take a look at the List of Best Microsoft big data solutions Companies.

#microsoft big data solutions development companies #microsoft big data analytics and solution #microsoft big data consultants #microsoft big data developers #microsoft big data #microsoft big data solution providers