On Hacker Noon, I will be sharing some of my best-performing machine learning articles. This listicle on datasets built for regression or linear regression tasks has been upvoted many times on Reddit and reshared dozens of times on various social media platforms. I hope Hacker Noon data scientists find it useful as well!
Every data scientist will likely have to perform linear regression tasks and predictive modeling processes at some point in their studies or career. For those of you looking to learn more about the topic or complete some sample assignments, this article will introduce open linear regression datasets you can download today. Additionally, some of the datasets on this list include sample regression tasks for you to complete with the data.
This dataset includes data taken from cancer.gov about deaths due to cancer in the United States. Along with the dataset, the author includes a full walkthrough on how they sourced and prepared the data, their exploratory analysis, model selection, diagnostics, and interpretation.
2. CDC Data: Nutrition, Physical Activity, Obesity
From the Behavioral Risk Factor Surveillance System at the CDC, this dataset includes information about physical activity, weight, and average adult diet.
Built for multiple linear regression and multivariate analysis, the Fish Market Dataset contains information about common fish species in market sales. The dataset includes the fish species, weight, length, height, and width.
#datasets #dataset #data-science