Sometimes it might be confusing to some people to distinguish between Data Science and Data Mining, so after reading this article it will clear your concepts about Data Science and Data Mining.

Lets begin with their formal definition and their history related knowledge.

Data Mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.Data mining is an inter- disciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use**.**

1989 The term “Knowledge Discovery in Databases” (KDD) is coined by Gregory Piatetsky-Shapiro. It also at this time that he co-founds the first workshop also named KDD.

1990s The term “data mining” appeared in the database community. Retail companies and the financial community are using data mining to analyze data and recognize trends to increase their customer base, predict fluctuations in interest rates, stock prices, customer demand.

Data Science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning and big data.

In 1962, John Tukey described a field he called “data analysis,” which resembles modern data science. Later, attendees at a 1992 statistics symposium at the University of Montpellier II acknowledged the emergence of a new discipline focused on data of various origins and forms, combining established concepts and principles of statistics and data analysis with computing.

The term “data science” has been traced back to 1974, when Peter Naur proposed it as an alternative name for computer science. In 1996, the International Federation of Classification Societies became the first conference to specifically feature data science as a topic. However, the definition was still in flux. In 1997, C.F. Jeff Wu suggested that statistics should be renamed data science. He reasoned that a new name would help statistics shed inaccurate stereotypes, such as being synonymous with accounting, or limited to describing data. In 1998, Chikio Hayashi argued for data science as a new, interdisciplinary concept, with three aspects: data design, collection, and analysis.

Key differences

  1. Probably the biggest difference between data science and data mining lies in their terms. Data science is a broad field that includes the processes of capturing of data, analyzing, and deriving insights from it. On the other hand, data mining is mainly about finding useful information in a dataset and utilizing that information to uncover hidden patterns.
  2. Another major difference between data science and data mining is that the former is a multidisciplinary field that consists of statistics, social sciences, data visualizations, natural language processing, data mining etc while the latter is a subset of the former.
  3. The role of a data science professional can be considered as a combination of an AI researcher, a deep learning engineer, a machine learning engineer, or a data analyst, to some extent. The person might be able to perform the role of a data engineer as well. On the contrary, a data mining professional doesn’t necessarily have to be able to perform all these roles.
  4. Another notable difference between data science and data mining lies in the type of data used by these professionals. Usually, data science deals with every type of data whether structured, semi-structured, or unstructured. On the other hand, data mining mostly deals with structured data.

#data-mining #database #data-science #data analysis

Difference Between Data Science and Data Mining
1.50 GEEK