Delta Sharing is an open protocol for securely sharing data across organizations in real time, completely independent of the platform on which the data resides.

Databricks made several announcements at this week’s Data + AI Summit. Top among them was the launch of a new open-source project called Delta Sharing, the world’s first open protocol for securely sharing data across organizations in real time, completely independent of the platform on which the data resides. Delta Sharing is included within the open-source Delta Lake project and supported by Databricks and a broad set of data providers, including NASDAQ, ICE, S&P, Precisely, Factset, Foursquare, SafeGraph, and software vendors like AWS, Microsoft, Google Cloud, and Tableau.

The solution takes aim at a common industry problem. Namely, data sharing has become critical to the digital economy as enterprises wish to easily and securely exchange data with their customers, partners, and suppliers, such as a retailer sharing timely inventory data with each of the brands they carry. However, data sharing solutions have historically been tied to a single vendor or commercial product, tethering data access to proprietary systems and limiting collaboration between organizations that use different platforms.

In a call with RTInsights, Joel Minnick, Vice President, Marketing at Databricks, explained the rationale behind Delta Sharing. “What you’ve got now is a proliferation of a bunch of silos of data sharing networks that let folks share some of the data with some of the people some of the time. And it’s been this way since the ’80s, yet we still see new entrants into the market all the time, standing up new proprietary data sharing networks.”

He continued: “Our heritage, our roots are always in open source. This feels like a problem that could be solved in a really effective way if we approached it from an open point of view.”

He noted that Delta Sharing solves a couple of problems. One is that it is a fully open, secure protocol for sharing data, so it removes any proprietary lock-in. But it also solves a second really big problem, which is that a lot of these data-sharing networks and data-sharing tools that are out there today were built for sharing structured data. And that is what they govern, and what they express is just a SQL interface most of the time.

Minnick noted that the types of data that customers want to share these days more and more are leaning towards being unstructured. For example, businesses frequently want to share images, videos, dashboards, and machine learning models.

Delta Sharing is built out of the gate to also support data science and be able to provide governance to unstructured data as well, as well as to express itself, not just through SQL, but through Python. And so, it can meet the needs of data engineers, data analysts, and data scientists.

These points were emphasized at the announcement. “The top challenge for data providers today is making their data easily and broadly consumable. Managing dozens of different data delivery solutions to reach all user platforms is untenable. An open, interoperable standard for real-time data sharing will dramatically improve the experience for both data providers and data users,” said Matei Zaharia, Chief Technologist and Co-Founder of Databricks. “Delta Sharing will standardize how data is securely exchanged between enterprises regardless of which storage or computing platform they use, and we are thrilled to make this innovation open source.”

The bottom line is that Delta Sharing extends the applicability of the lakehouse architecture that organizations are rapidly adopting today, as it enables an open, simple, collaborative approach to data and AI within and now between organizations.

#artificial intelligence technologies #big data #big data platforms #data management #machine learning #trending now #elt #open source

Databricks Launches Open Source Delta Sharing Project
1.15 GEEK