What’s interesting with NLP, is that you rarely see it mixed with time series analysis. So, in this post, we looked into examining topics changes over a period of time. We used here the dataset produced by Ben Hammer on the Neural Information Processing Systems conference, available here. To paraphrase him, the NIPS symposium is ‘ one of the top machine learning conferences in the world. It covers topics ranging from deep learning and computer vision to cognitive science and reinforcement learning ‘. Solely using sci-kit, how can we determine the evolution of the subjects treated?

The dataset is super clean and we solely focused onto the papers titles. The set starts in 1989. finishes in 2017, the most recent year with complete records. We’re picking here 2010 and 2017 to compare the subjects. All the code is available in our Kaggle’s kernel.

First conclusion: there’s an explosion of papers! 631 in 2017 compared to 221 in 2010. Secondly, a simple bag of words count show that some of the key words in 2017 were not even covered in 2010, such as ‘deep’ [‘learning’].

At this point, we have to specify we obviously removed the usual stop words but their data-science equivalent such as ‘data’, ‘large’, ‘learning’ and ‘models’ which are either occurring extremely frequently or were not bringing a lot of information.

#nips #nlp #machine-learning #lda #deep learning

LDA to map the evolution
1.25 GEEK