You are given a few datasets from a client who wants to match entities between multiple disparate datasets. Your target variable are the entities, and your features could be the entity’s address, Country, Zipcode, Vendor ID, etc.

All of these features definitely help you with your search for the same entities, but the only problem is that the client gave you a dataset with a lot of missing values for each of those useful columns that could significantly aid you in your search.

#python #data-science #matching

Fuzzy Name Matching Datasets
2.55 GEEK