Visualizing Seattle pet density by combining several shape files and data sources, using Geopandas, Shapely, Pyplot, and Folium
So you’re interested in data that has a geospatial component? Cool! Me too!
When I started my first project rooted in geospatial data, I did not realize the level of challenge I was taking on — not quite biting off more than I could chew, but definitely a mouthful for where I was at the time.
photo credit: Chris Martin on Flickr
My quick google search lead me, and probably you, straight to
This plus another 600+ lines of package conflicts… including NumPy.
I had a lot of package conflicts with the
geopandas installation originally, including
numpy and pretty much everything I had in my project environment. I then followed their directions and installed
geopandas first thing in a new environment on the
conda-forge channel. Worked like a charm!
In today’s example, I am doing some research for my next move: where can I find the highest density of pets in Seattle? I want to ogle all the pets as I walk, skip, and yog around my future neighborhood. For this, I downloaded files from the City of Seattle’s Open Data on pet licenses and shape data on the zip codes, municipal boundaries, and shoreline (so we can see the outline of Seattle we know and love).
Take away #1: A shapefile is not just one file
Photo Credit: Joshua T. Beck [CC BY-SA 4.0] on [wikimedia commons]
When I downloaded and unzipped my first shapefile, I only moved the
.shp file into my directory, thinking, “maybe these other files work with different programs.” But no.
Even though whatever package you’re using to read shape files will likely only require a single
.shp file path, they need the whole file set in that same directory in order to work:
.shp— actual shape file — this is the geometry of your shapes
.shx— shape index format — helps to quickly move through the shapes
.dbf— this has other information about the shapes (again, it’s required, even if you don’t need it for your purpose. It’s frequently helpful with reference information about each shape, like the name.)
Sometimes there are other files too. Check out this article for more info. Now, instead of worrying too much about which I want or need, I just unzip the set in my target directory. Then you know you’ll have whatever good stuff your data source provided.
Also note: all the files must have the same filename, with the only difference being the extension/suffix. So if you like to change file names to remove spaces, capital letters, etc (just a recovering perfectionist thing?), make sure that you change the names for ALL of the files that came in the package, and watch out for typos.
When you’re thinking about a shapefile, always add that trailing s — shapefiles.
In the search for max pet density, it was just a couple lines of code to get me started with the shapes in GeoDataFrames:
import geopandas as gpd # load zip code shape data zip_codes_gdf = gpd.read_file('data/shapes/Zip_Codes.shp') # load seattle shape cities = gpd.read_file('data/shapes/Municipal_Boundaries.shp') seattle = cities.loc[cities['CITYNAME'] == 'Seattle'] seattle = seattle.reset_index() seattle_shp = seattle.loc[0, 'geometry'] # load seattle waterfront shape waterfront = gpd.read_file('data/shapes/ City_of_Seattle_Shoreline.shp')
Note here that
seattle_shp is a
shapely Polygon and calling it in a Jupyter notebook displays an image of the shape. Alternatively,
waterfront is a GeoDataFrame — so you can visualize it with
waterfront.plot(). Keeping track of object types will save you headaches and time! Or at least is a good place to start when something isn’t working as you expected.
First off, I noticed that the zip code file on Seattle Open Data actually has the shape files for zip codes in several Washington counties. I needed to whittle down the zip codes to only those in Seattle. I quickly realized that the zip code boundaries in the Seattle area do not align with the city boundaries.
geopandas’s wrapping of all those other packages, including
shapely, comes in handy (even if it makes it a struggle to install).
In order to combine data from different sources and get exactly the data you want, I have found the shapely functions to be extremely useful. Conveniently, they work great on
GeoDataFrames, computing on each line and with a single line of code (just call
geo_df.**shapely_function**()). Some of the
[shapely](https://shapely.readthedocs.io/en/stable/manual.html) functions are described in the
[geopandas](http://geopandas.org/) docs, but
geopandas does not include documentation on all the shapely functions. I was pleasantly surprised when the additional functions described in the shapely docs, but not in
geopandas worked as well, especially
object.**within**(_other)_ — Returns
True if the object’s boundary and interior intersect only with the interior of the other (not its boundary or exterior). This applies to all types and is the inverse of
So I tried it out. Here,
zip_codes_gdf.within(seattle_shp) returns a series of
False, one for each record in the
GeoDataFrame. This series, used with
.loc creates a new
GeoDataFrame containing only the records (zip codes and their shapes) that are wholly within Seattle.
seattle_zips_within = zip_codes_gdf.loc[zip_codes_gdf.within(seattle_shp)]
Next, I plotted my result, along with the Seattle outline and waterfront.
fig, ax = plt.subplots(figsize=(10,10)) seattle.plot(ax=ax, color='white', edgecolor='black', linewidth=2) waterfront.plot(ax=ax, color='blue') seattle_zips_within.plot(ax=ax, cmap='Paired', alpha=.8)
Zip codes that cross into other cities beyond Seattle were excluded from this selection process using the
within() function. This function would be great if the zip codes lined up with the city boundaries. Unfortunately, this is not the case, so I continued my search for a better zip code selection function.
object.**intersects**(_other_) — Returns
True if the boundary or interior of the object intersect in any way with those of the other. In other words, geometric objects intersect if they have any boundary or interior point in common.*
seattle_zips_intersects = zip_codes_gdf.loc[zip_codes_gdf.intersects(seattle_shp)] fig, ax = plt.subplots(figsize=(10,10)) waterfront.plot(ax=ax, color='blue') seattle_zips_intersects.plot(ax=ax, cmap='Paired', alpha=.8) seattle.plot(ax=ax, color='#00000000', edgecolor='black', linewidth=2)
Now we’re on the right track, with all of Seattle represented. Now I want to take out all that area outside the city limits. If I calculate the pet density using the number of pets registered in Seattle city limits and the whole (Seattle and non-Seattle) land area for these trans-border zip codes, this will under-represent the pet density in these Seattle neighborhoods.
Photo Credit: George Hodan [CC0 Public Domain] from PublicDomainPictures
So I looked for another
object.**intersection**(_other_) — Returns a representation of the intersection of this object with the other geometric object.*
In other words, it returns a shape of the area where the two objects overlap.
seattle_zips_intersection = zip_codes_gdf.intersection(seattle_shp) fig, ax = plt.subplots(figsize=(10,10)) waterfront.plot(ax=ax, color='black') seattle_zips_intersection.plot(ax=ax, cmap='Paired', alpha=.8, edgecolor='gray') seattle.plot(ax=ax, color='#00000000', edgecolor='black', linewidth=2)
I’m not yet sure what’s going on with all the pink in this map, but this is starting to look good! With the zip codes trimmed to the area within Seattle, as far as mapping goes, I’m ready to get rid of the Seattle outline marking the center of Hood Canal, or wherever Seattle technically ends, in favor of the waterway and zip code outlines.
With the map shapes prepped, I’m ready to put that pet data to use!Take away #3: Geopandas.plot() uses pretty much all the argument’s you know and love from matplotlib.pyplot
First, I added the new intersection geometries determined in the last step to my
GeoDataFrame. Note: Geopandas always plots the “geometry” column by default to plot your shape or complete shape-based calculations like those used above. You can use
geo_object.**set_geometry**(_‘column_name')_ to switch which column
geopandas will use as the geometry of interest. In this case, I decided to start afresh with a new, paired down
GeoDataFrame instead — just the Seattle zip codes with their intersection geometry and area.
# add intersection geometry to my dataframe zip_codes_gdf['intersection'] = seattle_zips_intersection # create dataframe with the seattle in-city geometries seattle_zip_geom = zip_codes_gdf.loc[:, ['ZIP', 'intersection']] # name the geometry column 'geometry' for easy geopandas plotting seattle_zip_geom.columns = ['ZIP', 'geometry'] # calculate the in-seattle area for each zipcode and add as a column seattle_zip_geom['area'] = seattle_zip_geom.area # drop zip codes that have no area in seattle seattle_zip_geom = seattle_zip_geom.loc[seattle_zip_geom['area'] > 0]
To this, I added the pet counts and pet density, using just the intersection area to calculate the density.
# create dataframe with Seattle zips and their respective pet counts seattle_pets_zip = seattle_zip_geom.merge(pet_counts, on='ZIP') # add pet_density column seattle_pets_zip['pet_density'] = (seattle_pets_zip['pets'] / seattle_pets_zip['area'])
To color the map based on the column
'pet_density', set the column to the column name in the call to plot. This kind of map, colored based on a data value, is called a choropleth (vocab I wish I knew before… googling for help works better with more precise key words…)
fig, ax = plt.subplots(figsize=(10,10)) waterfront.plot(ax=ax, color='black') seattle_pets_zip.plot(column='pet_density', ax=ax, edgecolor='gray')
Ugh! All the zip codes look the same!
Then, I noticed that two zip codes (98146, 98168) had a much larger pet density, even though they weren’t visible at my map’s scale.
To investigate these locations more closely, I mapped them on the city streets using
import folium import geopandas as gpd def create_geo_map(geo_df, index, location, zoom=17, tiles='Stamen Terrain'): """creates a folium map with the geometry highlighted Args: geo_df: a geopandas dataframe (must have a column 'geometry' index: the index of the row you want mapped location: [lat, long] list with the center location for the map zoom: start zoom level. Default: 17 tiles: Folium tiles / design. Default: Stamen Terrain Returns: folium map with geometry highlighted """ # create geodataframe for zip code zip_code = gpd.GeoDataFrame(geo_df.iloc[index]).T # get the geojson for the zipcode gjson = zip_code['geometry'].to_json() # create folium map m = folium.Map(location=location, tiles=tiles, zoom_start=zoom) # add geojson to the map folium.GeoJson(gjson, name='geojson').add_to(m) return m
Looking at these two zip codes:
The first one is a single block in Downtown Seattle, next to the library, that does contain a high rise condo complex. I’m not sure why this block has its own zip code, but oh well. I can believe that there are 35 pets registered here.
The second one is more of a mystery. This tiny section of the much larger zip code is an industrial area near the Boeing Field Airport. There is a Supervalue, but no visible housing on Google Maps satellite view (yes, I attempted to stalk these 35 pets— feel free to check it out yourself. For now, my working assumption is these are emotional support animals for the Supervalue workers.)
Example emotional support animal. Photo Credit: Kevin Honold, used with permission
Upon further investigation, zip code 98168 has some areas in unincorporated King County, where the addresses might be labeled “Seattle”, but are not technically within Seattle city limits. I wonder if some non-Seattle resident pet owners in this unincorporated area registered their pets with the city, not realizing that they are technically not city residents. You would think that the city wouldn’t take registrations for non-city addresses… it remains a mystery.
Anyways, I chose to drop these two zip codes so that the larger zip codes’ pet densities are easier to distinguish. I also added in the legend, with the assist of the
# adding a legend using code from http://geopandas.org/mapping.html from mpl_toolkits.axes_grid1 import make_axes_locatable fig, ax = plt.subplots(figsize=(10,10)) divider = make_axes_locatable(ax) cax = divider.append_axes("right", size="5%", pad=0.1) waterfront.plot(ax=ax, color='black') seattle_pets_zip_dropped.plot(column='pet_density', ax=ax, legend=True, cax=cax);
So now I have a good idea of the most and least desirable neighborhoods for pet watching! I do have some lingering questions: What if we normalize for population density? What type of pets are we talking about — would this map look different for cats/dogs/goats/pigs (yes goats and pigs are registered with the city!)? But these questions will have to wait for another day…coming soon to a data science blog near you.
Are these your neighbors? Photo Credit: Kevin Honold, used with permission
Parting tip: use
plt.savefig('fig_name.png') to save your image before displaying the figure to avoid saving a blank canvas. For example, all in one jupyter notebook cell:
fig, ax = plt.subplots(figsize=(10,10)) waterfront.plot(ax=ax, color='blue') plt.savefig('images/seattle_zips_intersects.png');
You’ll still get the image displayed and saved.
You can see the full code on my GitHub repo here.
After analyzing clients and market requirements, TopDevelopers has come up with the list of the best Python service providers. These top-rated Python developers are widely appreciated for their professionalism in handling diverse projects. When...
After analyzing clients and market requirements, TopDevelopers has come up with the list of the best Python service providers. These top-rated Python developers are widely appreciated for their professionalism in handling diverse projects. When you look for the developer in hurry you may forget to take note of review and ratings of the company's aspects, but we at TopDevelopers have done a clear analysis of these top reviewed Python development companies listed here and have picked the best ones for you.
List of Best Python Web Development Companies & Expert Python Programmers.
Python GUI Programming Projects using Tkinter and Python 3
Learn Hands-On Python Programming By Creating Projects, GUIs and Graphics
Python is a dynamic modern object -oriented programming language
It is easy to learn and can be used to do a lot of things both big and small
Python is what is referred to as a high level language
Python is used in the industry for things like embedded software, web development, desktop applications, and even mobile apps!
SQL-Lite allows your applications to become even more powerful by storing, retrieving, and filtering through large data sets easily
If you want to learn to code, Python GUIs are the best way to start!
I designed this programming course to be easily understood by absolute beginners and young people. We start with basic Python programming concepts. Reinforce the same by developing Project and GUIs.
The Python coding language integrates well with other platforms – and runs on virtually all modern devices. If you’re new to coding, you can easily learn the basics in this fast and powerful coding environment. If you have experience with other computer languages, you’ll find Python simple and straightforward. This OSI-approved open-source language allows free use and distribution – even commercial distribution.
When and how do I start a career as a Python programmer?
In an independent third party survey, it has been revealed that the Python programming language is currently the most popular language for data scientists worldwide. This claim is substantiated by the Institute of Electrical and Electronic Engineers, which tracks programming languages by popularity. According to them, Python is the second most popular programming language this year for development on the web after Java.
Python Job Profiles
The median total pay for Python jobs in California, United States is $74,410, for a professional with one year of experience
Below are graphs depicting average Python salary by city
The first chart depicts average salary for a Python professional with one year of experience and the second chart depicts the average salaries by years of experience
Who Uses Python?
This course gives you a solid set of skills in one of today’s top programming languages. Today’s biggest companies (and smartest startups) use Python, including Google, Facebook, Instagram, Amazon, IBM, and NASA. Python is increasingly being used for scientific computations and data analysis
Take this course today and learn the skills you need to rub shoulders with today’s tech industry giants. Have fun, create and control intriguing and interactive Python GUIs, and enjoy a bright future! Best of Luck
Who is the target audience?
Anyone who wants to learn to code
For Complete Programming Beginners
For People New to Python
This course was designed for students with little to no programming experience
People interested in building Projects
Anyone looking to start with Python GUI development
Access to a computer
Download Python (FREE)
Should have an interest in programming
Interest in learning Python programming
Install Python 3.6 on your computer
What will you learn
Build Python Graphical User Interfaces(GUI) with Tkinter
Be able to use the in-built Python modules for their own projects
Use programming fundamentals to build a calculator
Use advanced Python concepts to code
Build Your GUI in Python programming
Use programming fundamentals to build a Project
Signup Login & Registration Programs
Job Interview Preparation Questions
& Much More
Guide to Python Programming Language
The course will lead you from beginning level to advance in Python Programming Language. You do not need any prior knowledge on Python or any programming language or even programming to join the course and become an expert on the topic.
The course is begin continuously developing by adding lectures regularly.
Please see the Promo and free sample video to get to know more.
Hope you will enjoy it.
An Enthusiast Mind
Basic Knowledge To Use Computer
What will you learn
Will Be Expert On Python Programming Language
Build Application On Python Programming Language