You will learn how to use the data sets from UCI that come with the **.data **file type in this quick article.

Where can data be found?

Kaggle.com is a great choice for finding data to use in your data science projects. The site is filled with interesting data sets, notebooks from other scientists and tutorials. All the data sets I have encountered on **Kaggle **have been **.csv **files, this is very convenient when working with pandas.

You might wonder (at least I did) if **Kaggle **is the only place where data can be found.

Hint:

It is not!

You will also find awesome data sets on UCI Machine Learning Repository. An example of an interesting data set is the Breast Cancer Wisconsin (Original) Data Set.

I recently wanted to use this exact data set to practice my classification skills. However, I quickly ran into some trouble (or so I thought). The data I had downloaded was contained in a **.data **file…

Image for post

Image for post

Screenshot from Windows Explorer showing the name and file extension of the data set.

How do you work with that?

I certainly didn’t know.

As I have only ever worked with **.csv **files (I am a relatively new data scientist) all I know how to do is use the pandas **read_csv() **function to import my data sets into a DataFrame.

To download the data first click on the Data Folder which well take you to a second page (lower half of the following picture), here you click on the file you want to download.

Image for post

How to download the data from UCI

The **.data **file can be opened with **Microsoft Excel **or Notepad.

I tried doing the latter:

#machine-learning #ai #data-science #pandas #python

How to use .data files from UCI
2.35 GEEK