Exploratory data analysis (EDA) is an approach to analyze the data and find patterns, visual insights, etc. that the data set is having, before proceeding to model. One spends a lot of time doing EDA to get a better understanding of data, that can be minimized by using auto visualizations tools such as Pandas-profiling, Sweetviz, Autoviz, or D-Tale

EDA involves a lot of steps including some statistical tests, visualization of data using different kinds of plots, and many more. Some of the steps of EDA are discussed below:

  • Data Quality Check: Can be done using pandas library functions like describe()info()dtypes(), etc. It is used to find several features, its datatypes, duplicate values, missing value, etc.
  • Statistical Test: Some statistical tests like Pearson correlation, Spearman correlation, Kendall test, etc are done to get a correlation between the features. It can be implemented in python using the _stats _library.
  • Quantitative Test: Some quantitative test is used to find the spread of numerical features, count of categorical features. It can be implemented in python using the functions of the pandas library.
  • Visualization: Feature visualization is very essential to get an understanding of the data. Graphical techniques like bar plots, pie charts are used to get an understanding of categorical features, whereas scatter plots, histograms are used for numerical features.

To perform the above-mentioned tasks we need to type several lines of code. Here auto-visualization library comes into the play, which can perform all these tasks using just 1 line of code. Some of these auto-visualization tools we will discuss in this article:

  • Pandas-Profiling
  • Sweetviz
  • Autoviz
  • D-Tale

#data-science #data-analytics #artificial-intelligence #machine-learning #python

4 Libraries that can perform EDA in one line of Python code
9.50 GEEK