You will learn how to use the data sets from UCI that come with the **.data **file type in this quick article.
Kaggle.com is a great choice for finding data to use in your data science projects. The site is filled with interesting data sets, notebooks from other scientists and tutorials. All the data sets I have encountered on **Kaggle **have been **.csv **files, this is very convenient when working with pandas.
You might wonder (at least I did) if **Kaggle **is the only place where data can be found.
Hint:
It is not!
You will also find awesome data sets on UCI Machine Learning Repository. An example of an interesting data set is the Breast Cancer Wisconsin (Original) Data Set.
I recently wanted to use this exact data set to practice my classification skills. However, I quickly ran into some trouble (or so I thought). The data I had downloaded was contained in a **.data **file…
Screenshot from Windows Explorer showing the name and file extension of the data set.
How do you work with that?
I certainly didn’t know.
As I have only ever worked with **.csv **files (I am a relatively new data scientist) all I know how to do is use the pandas **read_csv() **function to import my data sets into a DataFrame.
To download the data first click on the Data Folder which well take you to a second page (lower half of the following picture), here you click on the file you want to download.
How to download the data from UCI
The **.data **file can be opened with **Microsoft Excel **or Notepad.
I tried doing the latter:
#machine-learning #ai #data-science #pandas #python