This article provides an introduction to the various public data sources that exist at the Federal, State, County and local levels that can help enhance the typical data analysis assignment.

I. Introduction

The various agencies of the US government at different levels (Federal, State, County, and local) collect vast quantities of data and make these data available to the public. The challenge is to merge the data together to generate meaningful information. In this brief note, we discuss how publicly available data can be used to enhance typical data analytics project.

For the analysis shown below, we use the R software platform though the analysis could be done via a number of software platforms and/or programming language options. We recognize that there are a number of commercial and open-source tools that are much more powerful for specific analyses (especially when it comes to processing, manipulating, analyzing and displaying geographical data) but for the purposes of this article, we will only use one software platform.

We will review both the commonly used data and the less commonly used data. We start with a general map and see what can be added to the map to create something that is, hopefully, more than the sum of its parts. We will use the State of Maryland and Montgomery County, MD, data for all the analyses discussed in this article.

II. Commonly Used Data

2.1 State and County Maps

The US Census Bureau provides cartographic boundary files at different geographic levels (national, county, congressional districts, divisions, metropolitan areas, urban areas, zip code tabulation areas, etc.) in shapefile and KML formats [1]. These files are available for different years and for some cases, different levels of accuracy, and are part of the Census Bureau’s MAF/TIGER geographic database. An example of a map created by using one of the census shapefile is shown in Figure 1 that shows the county boundaries for Maryland.

#geospatial #government #data-science #analytics #data-visualization

Leveraging Public Data to Enhance Your Analysis
1.40 GEEK