Recently, I was working on a project where I was trying to build a model that could predict housing prices in King County, Washington — the area that surrounds Seattle. After looking at the features, I wanted a way to determine the houses’ worth based on location.

The dataset included latitude and longitude and it was easy to google them to take a look at the houses, their neighborhoods, their distance from the water, etc. But with over 17000 observations, that was a fool’s task. I had to find an easier way.

I had used Geographic Information Systems (GIS) only once before but not in Python. So I did what I do best: I googled, and ran into this amazing package called GeoPandas. I am going to let the GeoPandas team sum up what they do because they can say much better than I can.

GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. GeoPandas further depends on fiona for file access and descartes and matplotlib for plotting. — Description from GeoPandas Website (2020)

This blew my mind, and what I wanted was really just the most basic of the features. I am going to show you how to run this code and do what I did — plotting accurate points on a map.

You are going to need several packages and some files in addition to the basic pandas and _matplotlib. _They include:

  • geopandas — the package that makes all of this possible
  • shapely — package for manipulation and analysis of planar geometric objects
  • descartes — provides a nicer integration of Shapely geometry objects with Matplotlib. It’s not needed every time but I import it just to be safe
  • Any .shp file — this is going to be the backdrop of the plot. Mine is going to have King County, but you should be able to find one from any city’s data department. Don’t delete any files from the .zip file it comes in. Something always breaks.

More information about shapefiles can be found here, but the long and short of it is that these aren’t normal images. They are a vector data storage format that has information linking to locations — coordinates and the rest.

First I imported the basic packages that I needed and then the new packages:

#geopandas #data-science #visualization #python #spatial-analysis

Using GeoPandas for Spatial Visualization
1.80 GEEK