### Bridging together sets of GPS coordinates without breaking your Python interpreter

Image by Mohamed Hassan from Pixabay

Engineering features from latitude and longitude data can seem like a messy task that may tempt novices into creating their own *apply function* (or even worse: an enormous **for loop**). However, these types of *brute force* approaches are potential pitfalls that will unravel quickly when the size of the dataset increases.

For example: Imagine you have a single dataset of *n* items. The time it takes to explicitly compare these *n* items against *n-1* other items essentially approaches *n²*. Meaning that *with each doubling of rows in your dataset, the time it takes to find all nearest neighbors will increase by a factor of 4!*

Fortunately, you do not need to calculate the distance between every point. There are a few data structures to efficiently determine neighbors right in *scikit-learn* that leverage the power of *priority queues*.

They can be found **within the neighbors module** and this guide will show you how to use two of these incredible classes to tackle this problem with ease.

## Getting started

To begin we load the libraries.

```
import numpy as np
from sklearn.neighbors import BallTree, KDTree
## This guide uses Pandas for increased clarity, but these processes
## can be done just as easily using only scikit-learn and NumPy.
import pandas as pd
```

Then we’ll make two sample DataFrames based on weather station locations that are publicly available from the **National Oceanic and Atmospheric Administration****.**

#machine-learning #data-science #python #scikit-learn #knn