Working with tabular data in data science we always use the Pandas library in Python. This is widely used for data exploration, analysis, munging and manipulation. These are the primary steps for understanding the data well and making it ready for the model to fit. The only disadvantage of using pandas is its time consuming when there’s a large amount of data(big data).
Datatable overcomes the limitations of pandas and speeds up the process of EDA(exploratory data analysis). Datatable has been built by H20.ai, one of the leading AI ML companies in the world. Datatable is pretty similar to pandas and R data.table libraries. Datatable has proper documentation. Works with Python version 3.6+.
In this article, I’ll be discussing the implementation of the datatable library with a large dataset.
#developers corner #data analysis #data-science