Managing large datasets on Kaggle without fearing about the out of memory error
Datatable is a Python package for manipulating large dataframes. It has been created to provide big data support and enable high performance. This toolkit resembles pandas very closely but is more focused on speed.It supports out-of-memory datasets, multi-threaded data processing, and has a flexible API. In the past, we have written a couple of articles that explain in detail how to use datatable for reading, processing, and writing tabular datasets at incredible speed:
These two articles compare datatable’s performance with the pandas’ library on certain parameters. Additionally, they also explain how to use datatable for data wrangling and munging and how their performance compares to other libraries in the same space.
However, this article is mainly focused on people who are interested in using datatable on the Kaggle platform. Of late, many competitions on Kaggle are coming with datasets that are just impossible to read in with pandas alone. We shall see how we can use datatable to read those large datasets efficiently and then convert them into other formats seamlessly.
🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...
Enroll in our Data Science with Python training in Chennai. Best Data Science with Python Training courses in Chennai for 100% Job Placements Support.
🔥Intellipaat Python for Data Science Course: https://intellipaat.com/python-for-data-science-training/In this python for data science video you will learn e...
Master Applied Data Science with Python and get noticed by the top Hiring Companies with IgmGuru's Data Science with Python Certification Program. Enroll Now
Python for Data Science, you will be working on an end-to-end case study to understand different stages in the data science life cycle. This will mostly deal with "data manipulation" with pandas and "data visualization" with seaborn. After this, an ML model will be built on the dataset to get predictions. You will learn about the basics of the sci-kit-learn library to implement the machine learning algorithm.