There is always a trade off between things in life. If you take up a certain path then there is always a possibility that you might have to compromise with some other parameter. Machine Learning models are no different, considering the case of k-Nearest Neighbor there has always been a problem which has a huge impact over classifiers that rely on pairwise distance and that problem is nothing but the “Curse of Dimensionality”. By the end of this article you will be able to create your own k-Nearest Neighbor Model and observe the impact of increasing the dimension to fit a data set. Let’s dig in!
Creating a k-Nearest Neighbor model:
Right before we get our hands dirty with the technical part, we need to lay the buttress for our analysis, which is nothing but the libraries.
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
import seaborn as sns
from scipy import stats
sns.set_style("white")
## for 3d plots
from ipywidgets import interact, fixed
from mpl_toolkits import mplot3d
from tqdm import tqdm
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import validation_curve, train_test_split
plot_colors = np.array(sns.color_palette().as_hex())
view raw
knn_lib.py hosted with ❤ by GitHub
Thanks to inbuilt machine learning packages which makes our job quite easy.
#data-science #curse-of-dimensionality #ds-in-the-real-world #data-visualization #machine-learning #data analysis