Data Repositories for almost Every Type of Data Science Project

Data Repositories for almost Every Type of Data Science Project

A comprehensive list of data repositories for every type of problem. Given the nature of my job, I have to work on new projects every week solving a different problem. My work requires me to parse through a lot of different kinds of datasets to design and develop instructions for Data Science aspirants.

Given the nature of my job, I have to work on new projects every week solving a different problem. My work requires me to parse through a lot of different kinds of datasets to design and develop instructions for Data Science aspirants.

The blog contains a few useful datasets and data repositories categorized in different classes of problems and industries.

Data Repositories on the web:

Image for post

Google Dataset Portal

  • Google Dataset Search — a search engine for researchers to locate online data.
  • datasetlist — offers a list of the biggest machine learning datasets from across the web.
  • UCI — one of the oldest repositories with data classified by types of problems, attributes type, data type, the field of study, etc.
  • fastai-datasets — datasets for Image classificationNLP and Image localization
  • NLP-datasets — Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing
  • Bifrost — for visual datasets classified by task, application, class, label, and format.

Images Datasets

Image for post

Open Dataset Image

  • ImageNet — ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images.
  • CT Medical Images — designed to allow for different methods to be tested for examining the trends in CT image data associated with using contrast and patient age. The data consists of a tiny subset of images from the cancer imaging archive.
  • Flickr-faces — Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN).
  • objectnet — A new kind of vision dataset borrowing the idea of controls from other areas of science.
  • CelebFaces — Large-scale CelebFaces attributes
  • Animal Faces-HQ dataset (AFHQ) — a dataset of animal faces, consisting of 15,000 high-quality images at 512×512 resolution.

NLP Datasets

Image for post

https://medium.com/@ODSC/20-open-datasets-for-natural-language-processing-538fbfaf8e38

  • nlp-datasets — Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP).
  • 1 trillion n-grams — linguistic data consortium. This data is expected to be useful for statistical language modeling, e.g., for machine translation or speech recognition, as well as for other uses.
  • litbank — LitBank is an annotated dataset of 100 works of English-language fiction to support tasks in natural language processing and the computational humanities.
  • BookCorpus — these are scripts to reproduce BookCorpus by yourself.
  • rasa-nlu-training-data — Crowd-sourced training data for the development and testing of Rasa NLU models.
  • Google book Ngram — it is an online search engine that charts the frequencies of any set of search strings using a yearly count of n-grams found in sources printed between 1500 and 2019 in Google’s text corpora in English, Chinese, French, German, Hebrew, Italian, Russian, or Spanish.

ai data-science machine-learning python data

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Applied Data Analysis in Python Machine Learning and Data Science | Scikit-Learn

Applied Data Analysis in Python Machine learning and Data science, we will investigate the use of scikit-learn for machine learning to discover things about whatever data may come across your desk.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.