This is an introduction to the young and fast-growing field of data mining (also known as knowledge discovery from data, or KDD for short). It focuses on fundamental data mining concepts and techniques for discovering interesting patterns from data in various applications.

The world we that we see today have automated data collection tools, databases systems, world wide web, and computerized society. This results in an explosive growth in data, from terabytes to petabytes.

We are drowning in the ocean of data but starving for knowledge.

A huge velocity, volume, and variety of data are what our new age has provided us. We have cheaper technology, mobile computing, social networking, Cloud computing which has evoked this data storm.

These are the reasons why conventional methods fade away and we need some novel methods like Data mining to process the new era of data.

What is Data Mining (DM)?

Data mining is an iterative and interactive process of discovering novel, valid, useful, and understandable patterns and models from massive data sources.

Image for post

Breaking down the definition of data mining.


What is Knowledge Discovery (KD)?

The overall process of generating knowledge from massive databases is called KD. It is a more complex process than DM. DM is a step of KD which deals with the identification of patterns in the data.

Let us breakdown the process of KD.

Step 1. Learning the Application Domain

We should have prior knowledge of the application areas where we are going to discover the knowledge. It is observed that having prior knowledge helps the better generation of insights from the data.

Step 2. Data Cleaning

Once we have obtained the data from warehouses, we need to remove the noise and the inconsistent data. It may take up to 60% effort in the knowledge discovery process.

#ai #data-science #data-mining #machine-learning

You don’t know what you don’t know
1.35 GEEK