Semantic search is an information retrieval system that focuses on the meaning of the sentences rather than the conventional keyword matching. Even though there are many text embeddings that can be used for this purpose, scaling this up to build low latency APIs that can fetch data from a huge collection of data is something that is seldom discussed. In this article, I will discuss how we can implement a minimal semantic search engine using SOTA sentence embeddings (sentence transformer) and FAISS.
It is a framework or set of models that give dense vector representations of sentences or paragraphs. These models are transformer networks(BERT, RoBERTa, etc.) which are fine-tuned specifically for the task of Semantic textual similarity as the BERT doesn’t perform well out of the box for these tasks. Given below is the performance of different models in the STS benchmark
Image source: Sentence transformers
We can see that the Sentence transformer models outperform the other models by a large margin.
But if you look at the leaderboard by papers with code and GLUE, you would see many models above 90. So why do we need Sentence transformers?.
Well, In those models, the semantic Textual similarity is considered as a regression task. This means whenever we need to calculate the similarity score between two sentences, we need to pass them together into the model and the model outputs the numerical score between them. While this works well for the benchmarking test, it scales badly for a real-life use case, and here are the reasons.
#machine-learning #semantic-search #naturallanguageprocessing