We all love CSVs. They are our number one choice for storing tables. Importing databases. But, there is so much data that we cannot always have it in a format we want. What happens if the data comes in a format you don’t know how to deal with? You can’t just say “Put it away, I want my good ol’ CSV”, no. As a data scientist, you must learn how to read different file formats into your environment. In this article, we are going to explore how to import 6 different file types in Python, namely, ExcelMATLABSASStataPickle, and HDF5.


How to deal with foreign file types in Python generally?

If you have to work with any file type that is not either CSV or JSON, the rule of thumb is to convert those files to a type you want. For tabular-structured data, [pandas](https://pandas.pydata.org/docs/)is the best package in the market. Its DataFrames can meet your data needs more than enough and you will find them easy to work with.

For files that are not native to **Python **eco-system, there are often packages built that enable you to import them in your **Python **programs. They usually have nice documentation to guide you through the installation process.

Let’s start exploring!

Note: the sample files used in code snippets can be found in these links:

  1. Excel sample, Microsoft.
  2. MATLAB sample, DataCamp servers.
  3. SAS sample, ftp://ftp.sas.com
  4. _Stata sample, _Principles of Econometrics
  5. _HDF5 sample, _Kaggle

It is also a good practice to install any of the packages illustrated in the examples in a separate [conda](https://www.anaconda.com/products/individual) environment.

#programming #data-science #pandas #machine-learning #python

Do you know how to import these 6 file types in Python?
1.55 GEEK