Quora is an amazing platform where questions are asked, answered, followed, and edited by internet companies. This empowers people to learn from each other and to better understand the world. About 100 million people visit Quora every month, so it’s no surprise that many people ask similarly worded questions. It’s not a better side from quora to ask its followers to write an answer for the same question. So it will be better if there is a system that is capable of detecting that a new question is similar to the questions that have already been answered.

So our problem statement is to predict whether a pair of questions are duplicates or not. We will use various machine learning techniques to come up with a solution for this. This blog is not a complete code walkthrough, but I will explain various approaches I used to solve the problem. You can have a look at my code from my Github repository.

Some business constrains

  • The cost of misclassification can be very high. ie, if a user asked a particular question and if we provide some other answer, then it is not good. It will affect the business. This is the most important constrain.
  • We want the probability of a pair of questions to be duplicated so that you can choose any threshold of choice. So depending upon use case we can change it.
  • We don’t have any latency requirements.
  • Interpretability is partially important. ie, we don’t want users to know why a pair of questions is duplicated. But if we know that it will be better.

#nlp #machine-learning #similarity #quora #duplicate

Finding Duplicate Quora Questions Using Machine Learning
1.70 GEEK