Visualizing Seattle pet density by combining several shape files and data sources, using Geopandas, Shapely, Pyplot, and Folium
So you’re interested in data that has a geospatial component? Cool! Me too!
When I started my first project rooted in geospatial data, I did not realize the level of challenge I was taking on — not quite biting off more than I could chew, but definitely a mouthful for where I was at the time.
photo credit: Chris Martin on Flickr
My quick google search lead me, and probably you, straight to
This plus another 600+ lines of package conflicts… including NumPy.
I had a lot of package conflicts with the
geopandas installation originally, including
numpy and pretty much everything I had in my project environment. I then followed their directions and installed
geopandas first thing in a new environment on the
conda-forge channel. Worked like a charm!
In today’s example, I am doing some research for my next move: where can I find the highest density of pets in Seattle? I want to ogle all the pets as I walk, skip, and yog around my future neighborhood. For this, I downloaded files from the City of Seattle’s Open Data on pet licenses and shape data on the zip codes, municipal boundaries, and shoreline (so we can see the outline of Seattle we know and love).
Photo Credit: Joshua T. Beck [CC BY-SA 4.0] on [wikimedia commons]
When I downloaded and unzipped my first shapefile, I only moved the
.shp file into my directory, thinking, “maybe these other files work with different programs.” But no.
Even though whatever package you’re using to read shape files will likely only require a single
.shp file path, they need the whole file set in that same directory in order to work:
.shp— actual shape file — this is the geometry of your shapes
.shx— shape index format — helps to quickly move through the shapes
.dbf— this has other information about the shapes (again, it’s required, even if you don’t need it for your purpose. It’s frequently helpful with reference information about each shape, like the name.)
Sometimes there are other files too. Check out this article for more info. Now, instead of worrying too much about which I want or need, I just unzip the set in my target directory. Then you know you’ll have whatever good stuff your data source provided.
Also note: all the files must have the same filename, with the only difference being the extension/suffix. So if you like to change file names to remove spaces, capital letters, etc (just a recovering perfectionist thing?), make sure that you change the names for ALL of the files that came in the package, and watch out for typos.
When you’re thinking about a shapefile, always add that trailing s — shapefiles.
In the search for max pet density, it was just a couple lines of code to get me started with the shapes in GeoDataFrames:
import geopandas as gpd # load zip code shape data zip_codes_gdf = gpd.read_file('data/shapes/Zip_Codes.shp') # load seattle shape cities = gpd.read_file('data/shapes/Municipal_Boundaries.shp') seattle = cities.loc[cities['CITYNAME'] == 'Seattle'] seattle = seattle.reset_index() seattle_shp = seattle.loc[0, 'geometry'] # load seattle waterfront shape waterfront = gpd.read_file('data/shapes/ City_of_Seattle_Shoreline.shp')
Note here that
seattle_shp is a
shapely Polygon and calling it in a Jupyter notebook displays an image of the shape. Alternatively,
waterfront is a GeoDataFrame — so you can visualize it with
waterfront.plot(). Keeping track of object types will save you headaches and time! Or at least is a good place to start when something isn’t working as you expected.
First off, I noticed that the zip code file on Seattle Open Data actually has the shape files for zip codes in several Washington counties. I needed to whittle down the zip codes to only those in Seattle. I quickly realized that the zip code boundaries in the Seattle area do not align with the city boundaries.
geopandas’s wrapping of all those other packages, including
shapely, comes in handy (even if it makes it a struggle to install).
In order to combine data from different sources and get exactly the data you want, I have found the shapely functions to be extremely useful. Conveniently, they work great on
GeoDataFrames, computing on each line and with a single line of code (just call
geo_df.**shapely_function**()). Some of the
[shapely](https://shapely.readthedocs.io/en/stable/manual.html) functions are described in the
[geopandas](http://geopandas.org/) docs, but
geopandas does not include documentation on all the shapely functions. I was pleasantly surprised when the additional functions described in the shapely docs, but not in
geopandas worked as well, especially
object.**within**(_other)_ — Returns
True if the object’s boundary and interior intersect only with the interior of the other (not its boundary or exterior). This applies to all types and is the inverse of
So I tried it out. Here,
zip_codes_gdf.within(seattle_shp) returns a series of
False, one for each record in the
GeoDataFrame. This series, used with
.loc creates a new
GeoDataFrame containing only the records (zip codes and their shapes) that are wholly within Seattle.
seattle_zips_within = zip_codes_gdf.loc[zip_codes_gdf.within(seattle_shp)]
Next, I plotted my result, along with the Seattle outline and waterfront.
fig, ax = plt.subplots(figsize=(10,10)) seattle.plot(ax=ax, color='white', edgecolor='black', linewidth=2) waterfront.plot(ax=ax, color='blue') seattle_zips_within.plot(ax=ax, cmap='Paired', alpha=.8)
Zip codes that cross into other cities beyond Seattle were excluded from this selection process using the
within() function. This function would be great if the zip codes lined up with the city boundaries. Unfortunately, this is not the case, so I continued my search for a better zip code selection function.
object.**intersects**(_other_) — Returns
True if the boundary or interior of the object intersect in any way with those of the other. In other words, geometric objects intersect if they have any boundary or interior point in common.*
seattle_zips_intersects = zip_codes_gdf.loc[zip_codes_gdf.intersects(seattle_shp)] fig, ax = plt.subplots(figsize=(10,10)) waterfront.plot(ax=ax, color='blue') seattle_zips_intersects.plot(ax=ax, cmap='Paired', alpha=.8) seattle.plot(ax=ax, color='#00000000', edgecolor='black', linewidth=2)
Now we’re on the right track, with all of Seattle represented. Now I want to take out all that area outside the city limits. If I calculate the pet density using the number of pets registered in Seattle city limits and the whole (Seattle and non-Seattle) land area for these trans-border zip codes, this will under-represent the pet density in these Seattle neighborhoods.
Photo Credit: George Hodan [CC0 Public Domain] from PublicDomainPictures
So I looked for another
object.**intersection**(_other_) — Returns a representation of the intersection of this object with the other geometric object.*
In other words, it returns a shape of the area where the two objects overlap.
seattle_zips_intersection = zip_codes_gdf.intersection(seattle_shp) fig, ax = plt.subplots(figsize=(10,10)) waterfront.plot(ax=ax, color='black') seattle_zips_intersection.plot(ax=ax, cmap='Paired', alpha=.8, edgecolor='gray') seattle.plot(ax=ax, color='#00000000', edgecolor='black', linewidth=2)
I’m not yet sure what’s going on with all the pink in this map, but this is starting to look good! With the zip codes trimmed to the area within Seattle, as far as mapping goes, I’m ready to get rid of the Seattle outline marking the center of Hood Canal, or wherever Seattle technically ends, in favor of the waterway and zip code outlines.
With the map shapes prepped, I’m ready to put that pet data to use!
First, I added the new intersection geometries determined in the last step to my
GeoDataFrame. Note: Geopandas always plots the “geometry” column by default to plot your shape or complete shape-based calculations like those used above. You can use
geo_object.**set_geometry**(_‘column_name')_ to switch which column
geopandas will use as the geometry of interest. In this case, I decided to start afresh with a new, paired down
GeoDataFrame instead — just the Seattle zip codes with their intersection geometry and area.
# add intersection geometry to my dataframe zip_codes_gdf['intersection'] = seattle_zips_intersection # create dataframe with the seattle in-city geometries seattle_zip_geom = zip_codes_gdf.loc[:, ['ZIP', 'intersection']] # name the geometry column 'geometry' for easy geopandas plotting seattle_zip_geom.columns = ['ZIP', 'geometry'] # calculate the in-seattle area for each zipcode and add as a column seattle_zip_geom['area'] = seattle_zip_geom.area # drop zip codes that have no area in seattle seattle_zip_geom = seattle_zip_geom.loc[seattle_zip_geom['area'] > 0]
To this, I added the pet counts and pet density, using just the intersection area to calculate the density.
# create dataframe with Seattle zips and their respective pet counts seattle_pets_zip = seattle_zip_geom.merge(pet_counts, on='ZIP') # add pet_density column seattle_pets_zip['pet_density'] = (seattle_pets_zip['pets'] / seattle_pets_zip['area'])
To color the map based on the column
'pet_density', set the column to the column name in the call to plot. This kind of map, colored based on a data value, is called a choropleth (vocab I wish I knew before… googling for help works better with more precise key words…)
fig, ax = plt.subplots(figsize=(10,10)) waterfront.plot(ax=ax, color='black') seattle_pets_zip.plot(column='pet_density', ax=ax, edgecolor='gray')
Ugh! All the zip codes look the same!
Then, I noticed that two zip codes (98146, 98168) had a much larger pet density, even though they weren’t visible at my map’s scale.
To investigate these locations more closely, I mapped them on the city streets using
import folium import geopandas as gpd def create_geo_map(geo_df, index, location, zoom=17, tiles='Stamen Terrain'): """creates a folium map with the geometry highlighted Args: geo_df: a geopandas dataframe (must have a column 'geometry' index: the index of the row you want mapped location: [lat, long] list with the center location for the map zoom: start zoom level. Default: 17 tiles: Folium tiles / design. Default: Stamen Terrain Returns: folium map with geometry highlighted """ # create geodataframe for zip code zip_code = gpd.GeoDataFrame(geo_df.iloc[index]).T # get the geojson for the zipcode gjson = zip_code['geometry'].to_json() # create folium map m = folium.Map(location=location, tiles=tiles, zoom_start=zoom) # add geojson to the map folium.GeoJson(gjson, name='geojson').add_to(m) return m
Looking at these two zip codes:
The first one is a single block in Downtown Seattle, next to the library, that does contain a high rise condo complex. I’m not sure why this block has its own zip code, but oh well. I can believe that there are 35 pets registered here.
The second one is more of a mystery. This tiny section of the much larger zip code is an industrial area near the Boeing Field Airport. There is a Supervalue, but no visible housing on Google Maps satellite view (yes, I attempted to stalk these 35 pets— feel free to check it out yourself. For now, my working assumption is these are emotional support animals for the Supervalue workers.)
Example emotional support animal. Photo Credit: Kevin Honold, used with permission
Upon further investigation, zip code 98168 has some areas in unincorporated King County, where the addresses might be labeled “Seattle”, but are not technically within Seattle city limits. I wonder if some non-Seattle resident pet owners in this unincorporated area registered their pets with the city, not realizing that they are technically not city residents. You would think that the city wouldn’t take registrations for non-city addresses… it remains a mystery.
Anyways, I chose to drop these two zip codes so that the larger zip codes’ pet densities are easier to distinguish. I also added in the legend, with the assist of the
# adding a legend using code from http://geopandas.org/mapping.html from mpl_toolkits.axes_grid1 import make_axes_locatable fig, ax = plt.subplots(figsize=(10,10)) divider = make_axes_locatable(ax) cax = divider.append_axes("right", size="5%", pad=0.1) waterfront.plot(ax=ax, color='black') seattle_pets_zip_dropped.plot(column='pet_density', ax=ax, legend=True, cax=cax);
So now I have a good idea of the most and least desirable neighborhoods for pet watching! I do have some lingering questions: What if we normalize for population density? What type of pets are we talking about — would this map look different for cats/dogs/goats/pigs (yes goats and pigs are registered with the city!)? But these questions will have to wait for another day…coming soon to a data science blog near you.
Are these your neighbors? Photo Credit: Kevin Honold, used with permission
Parting tip: use
plt.savefig('fig_name.png') to save your image before displaying the figure to avoid saving a blank canvas. For example, all in one jupyter notebook cell:
fig, ax = plt.subplots(figsize=(10,10)) waterfront.plot(ax=ax, color='blue') plt.savefig('images/seattle_zips_intersects.png');
You’ll still get the image displayed and saved.
You can see the full code on my GitHub repo here.
In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.
Magic Methods are the special methods which gives us the ability to access built in syntactical features such as ‘<’, ‘>’, ‘==’, ‘+’ etc.. You must have worked with such methods without knowing them to be as magic methods. Magic methods can be identified with their names which start with __ and ends with __ like __init__, __call__, __str__ etc. These methods are also called Dunder Methods, because of their name starting and ending with Double Underscore (Dunder).
Python is an interpreted, high-level, powerful general-purpose programming language. You may ask, Python’s a snake right? and Why is this programming language named after it?
Are you looking for experienced, reliable, and qualified Python developers? If yes, you have reached the right place. At **[HourlyDeveloper.io](https://hourlydeveloper.io/ "HourlyDeveloper.io")**, our full-stack Python development services...
Python any() function returns True if any element of an iterable is True otherwise any() function returns False. The syntax is any().