Intro

This is a continuation of the previous part, where there was described a problem statement related to the analysis of IoT data on the example of fitness tracking activities. There were also described the reasoning behind the storage type selection and recommended to have 2 types of storages:

  1. Big data storage to store the Time-Series kind of data
  2. Relational analytics database giving the maximum flexibility for the data analytics. The more details are here.

The purpose of the story is to describe the recommended data model for the second type of storage — relational analytics storage. The logical data model for the data warehouse part.

Modelling Techniques

Fortunately, the Data warehousing concept is already on the market for decades so it is well matured and many thought leaders contributed to best practices and patterns creation around solution building and data modeling.

Let’s use Ralph’s Kimball methodology to approach the modeling and a couple of patterns for particular design decisions.

From the methodology standpoint we going to do the following steps:

  1. Select a list of business processes describing our fitness tracking case
  2. For each process we define:
  • Granularity — the level of details on which the data being stored
  • List of dimensions describing the context of the business process, answering the questions who, what, when, where, why, and how
  • List of measures — quantitative numeric values measuring the fact according to the grain

We gather these details in a **Bus matrix, **which isthe first high-level version of the model described by the business terms and is well suitable and recommended for sharing and discussing with all kinds of stakeholders.

The particular design patterns used to create a logical data model will be:

  • Star Schema as the main model layout, which gives the maximum flexibility and efficiency. It is also well scalable, which is important for the IoT use case.
  • **Periodic Snapshot Fact table **— a fact table where each row aggregates measurements occurring over a certain predefined period. The grain is the period, not the individual transaction
  • **Accumulating Snapshot Fact table **— a fact table where each rowaggregates the measurement events occurring at predictable steps between the beginning and the end of a process.

#ralph-kimball #data-analytics #data-warehouse #data-modeling #data-science

IOT Data Analytics: Data Model
4.65 GEEK