What is Data Mesh? And should you mesh it up too?

Everyone working in or related to data analytics has surely heard of the buzzword catching on as fast as a wildfire — Data Mesh.

What is Data Mesh?

Data Mesh is one of the hottest topics in data analytics at the moment. It has found its way into the trends that are predicted to disrupt and dominate the data market in 2021.

But what exactly is Data Mesh and why more and more companies are looking to implement the latest trend in the data industry? Below, we’ll explore what Data Mesh means and if you should too mesh it up (title reference to Barr Moses’s article What is a Data Mesh — and How Not to Mesh it Up).

In the age of data as a first-class citizen, every enterprise strives to be data-driven, pouring hefty investments into data platforms and enablers. However, the ever-growing data demands are no match for the traditional data warehouse or data lake with limited real-time streaming capabilities.

The need for democratisation and scalability in the data pipelines underpins the faults in the legacy systems and conflicting business priorities. Fortunately, there is a new enterprise data architecture on the rise that ushers in a new lease on the bulky and fragile data pipelines. Data Mesh introduces a way of seeing data not as a by-product but as decentralised self-contained data products.

Software development was actually the first to transition from monolithic applications to microservice architectures. We are now seeing the data industry follow suit by moving away from massive data teams prioritising centralised, monolithic data lakes and databases, to one that prioritises data domains and data products as first-class citizens.

This paradigm shift in data architecture means data teams have to provide sharing, publishing, discoverability, and more importantly, interoperability of all data assets within the mesh. But more importantly, this pivot drives teams to prioritise outcomes and products they delivered to the business at all times; instead of obsessing over the underlying technology or stack used.

Zhamak Dehghani, a ThoughtWorks consultant and the original architect of Data Mesh, defines the concept as data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design. However, Zhamak emphasises that her support for a domain-oriented approach doesn’t mean she is “advocating for a fragmented, siloed domain-oriented data often hidden in the bowels of operational systems; siloed domain data that is hard to discover, make sense of and consume”, neither she is “advocating for multiple fragmented data warehouses that are the results of years of accumulated tech debt.” But she argues that the response to these accidental silos of unreachable data is not creating a centralised data platform, with a centralised team who owns and curates the data from all domains, as it doesn’t scale.

Zhamak Dehghani instead finds that the paradigm shift is necessary to solve the architectural failure modes. This paradigm shift sits at the intersection of techniques instrumental in building modern distributed architecture at scale; Techniques that the tech industry at large have adopted at an accelerated rate and have created successful outcomes. Dehghani’s idea of the next enterprise data platform architecture exists in the convergence of Distributed Domain-Driven Architecture, Self-serve Platform Design, and Product Thinking with Data.

The promise and premise of the Data Mesh

Although Data Mesh is getting a lot of attention, the fundamental ideas are actually not new. Many forward-oriented organisations have implemented it. Daniel Tidström, Partner & Management Consultant at Data Edge, has been working with it in parts at least for quite some time.

According to Daniel, Data Mesh becomes crucial when a company scales quickly.

“With the proliferation of data sources and data consumers, having one central team to manage and own data ingestion, data transformation and serving data to all potential stakeholders will inevitably lead to scaling issues,” states Daniel. “Given the increasing importance of data in our organisations, designing for scalable teams and scalable platforms is really crucial. This is a recognised problem in other areas of software engineering so I can’t see why data must still live inside a monolith.”

The alternative would be to scale out the team by hiring more data engineers, but everyone in the industry knows that finding good data engineers and skills is really hard. So it makes perfect sense to go for a distributed data architecture at scale.

Also, in companies where domain-driven development and microservices architectures are implemented, it just makes sense to also consider moving the ownership of data into the domains, Daniel explains.

Daniel Tidström is currently working for a client that has implemented a Kafka infrastructure binding all domains together. To work with the data, they need to be able to manage service level agreements, know what is published, understand what the schema looks like, and how the schema evolves.

All these things Daniel and his team are doing points to the Data Mesh direction, although they are not necessarily calling it Data Mesh. However, it’s crucial that the data product owners and the domains need to treat data as a first-class citizen and deliver the data as a product.

#data-engineering #data-architecture #data-mesh

What is Data Mesh?

The promise and premise of the Data Mesh

towardsdatascience.com

What is Data Mesh? And should you mesh it up too?