Data Catalogs Are Dead; Long Live Data Discovery

As companies increasingly leverage data to power digital products, drive decision making, and fuel innovation, understanding the health and reliability of these most critical assets is fundamental. For decades, organizations have relied on data catalogs to power data governance. But is that enough?
Debashis Saha, VP of Engineering at AppZen, formerly at eBay and Intuit, and Barr Moses, CEO and Co-founder of Monte Carlo, discuss why data catalogs aren’t meeting the needs of the modern data stack, and how a new approach — data discovery — is needed to better facilitate metadata management and data reliability.

#data #metadata #data-catalog #data-science #data-engineering

What is GEEK

Buddha Community

Data Catalogs Are Dead; Long Live Data Discovery
 iOS App Dev

iOS App Dev

1620466520

Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

Data Catalogs Are Dead; Long Live Data Discovery

As companies increasingly leverage data to power digital products, drive decision making, and fuel innovation, understanding the health and reliability of these most critical assets is fundamental. For decades, organizations have relied on data catalogs to power data governance. But is that enough?
Debashis Saha, VP of Engineering at AppZen, formerly at eBay and Intuit, and Barr Moses, CEO and Co-founder of Monte Carlo, discuss why data catalogs aren’t meeting the needs of the modern data stack, and how a new approach — data discovery — is needed to better facilitate metadata management and data reliability.

#data #metadata #data-catalog #data-science #data-engineering

Gerhard  Brink

Gerhard Brink

1620629020

Getting Started With Data Lakes

Frameworks for Efficient Enterprise Analytics

The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.

This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.

Introduction

As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).


This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.

#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management

Data Catalog 3.0: Modern Metadata for the Modern Data Stack

2020 brought a lot of new words into our everyday vocabulary — think coronavirusdefund, and malarkey. But in the data world, another phrase has been making the rounds… the modern data stack.

The data world has recently converged around the best set of tools for dealing with massive amounts of data, aka the “modern data stack”. This includes setting up data infrastructure on best-of-breed tools like Snowflake for data warehousing, Databricks for data lakes, and Fivetran for data ingestion.

The good? The modern data stack is super fast, easy to scale up in seconds, and requires little overhead. The bad? It’s still a noob in terms of bringing governance, trust and context to data.

That’s where metadata comes in.

So what should modern metadata look like in today’s modern data stack? How can basic data catalogs evolve into a powerful vehicle for data democratization and governance? Why does metadata management need a paradigm shift to keep up with today’s needs?

In the past year, I’ve spoken to over 350 data leaders to understand their fundamental challenges with existing metadata management solutions and construct a vision for modern metadata management. I like to call this approach Data Catalog 3.0”.

Why does the modern data stack need “modern” metadata management more than ever?

A few years ago, data would primarily be consumed by the IT team in an organization. However, today data teams are more diverse than ever — data engineers, analysts, analytics engineers, data scientists, product managers, business analysts, citizen data scientists, and more. Each of these people have their own favorite and equally diverse data tools, everything from SQL, Looker, and Jupyter to Python, Tableau, dbt, and R.

This diversity is both a strength and struggle. All of these people have different ways of approaching a problem, tools, skill sets, tech stacks, ways of working… essentially, they each have a unique “data DNA”.

The result is often chaos within collaboration. Frustrated questions like “What does this column name actually mean?” and “Why are the sales numbers on the dashboard wrong again?” bring speedy teams to a crawl when they need to use data.

These questions aren’t anything new. After all, Gartner has published its Magic Quadrant for Metadata Management Solutions for over 5 years now.

But there’s still no good solution. Most data catalogs are little more than band-aid solutions from the Hadoop era, rather than keeping in step with the innovation and advances behind today’s modern data stack.

The past and future of metadata management

Just like data, how we think about and work with metadata has steadily evolved over the past three decades. It can be broadly broken down into three stages of evolution: Data Catalog 1.0, Data Catalog 2.0, and Data Catalog 3.0.

#data-catalog #metadata #data-platforms #data #data catalog

Cyrus  Kreiger

Cyrus Kreiger

1618039260

How Has COVID-19 Impacted Data Science?

The COVID-19 pandemic disrupted supply chains and brought economies around the world to a standstill. In turn, businesses need access to accurate, timely data more than ever before. As a result, the demand for data analytics is skyrocketing as businesses try to navigate an uncertain future. However, the sudden surge in demand comes with its own set of challenges.

Here is how the COVID-19 pandemic is affecting the data industry and how enterprises can prepare for the data challenges to come in 2021 and beyond.

#big data #data #data analysis #data security #data integration #etl #data warehouse #data breach #elt