In this article, I will show how to retrieve close to one million public text or PDF documents. Some of these documents are raw text, some are clean text, and some include categorical labelling. I will also introduce **KILT, **abenchmark framework for natural language models.
Thousands of PDF, Word, and Text Documents to Download for your NLP Project. Source: Unsplash
The following are non-inclusive lists of lists of NLP datasets:
#machine-learning #deep-learning #dataset #unsupervised-learning #naturallanguageprocessing