Narrowing the Data

Narrowing the Data

I used government datasets from the USDA called SNAP QC data. These are massive datasets of over 40k records and 800+ features accompanied by a technical document to explain features.

Recently I completed a project to predict at-risk communities for food insecurity in America. You can find information about that project here that includes code and a PowerPoint presentation. For that project, I used government datasets from the USDA called SNAP QC data. These are massive datasets of over 40k records and 800+ features accompanied by a technical document to explain features. I am going to go through a technical analysis of how I narrowed that dataset.

What are QC datasets?

The SNAP program from the USDA uses “QC” data. These are quality control datasets meaning that they have been hand-picked in some way for inclusion into the final set. Unfortunately, these standards for hand-picked do change every year, though it isn’t a drastic change. Since government data is so massive for these programs, statisticians are trying to implement “features” that are representative of external influence on those who participate in the program. In the case of SNAP participants, massively incomplete applications are excluded, which results in these datasets being more representative of those who receive the benefit as opposed to all of those who applied for the program. Also, these datasets are weighted. These weights are determined by economic influences on states. For example, if a state declares an emergency, the weight of the participants is lowered to reduce the outlier effect on the dataset as a whole.

Narrowing the data: GIS

A spatial analysis of 2007–2008 data showed counties that were outliers compared to their neighbors. This was done during an ESRI Spatial Data Science MOOC. First, a 2D model was built that visually displays the change over time from one year to the next in a county. This was built using ESRI’s “Time-Series Analysis 2D” tool. Next, the use of their “Emerging Hot Spot Analysis” tool was used to display counties that were statistically significant outliers from their surrounding neighbors from 2007–2008.

It does this by “using the Conceptualization of Spatial Relationships values you provide to calculate the Getis-Ord Gi* statistic (Hot Spot Analysis) for each bin. Once the space-time hot spot analysis completes, each bin in the input NetCDF cube has an associated z-score, p-value, and hot spot bin classification added to it. Next, these hot and cold spot trends are evaluated using the Mann-Kendall trend test. With the resultant trend z-score and p-value for each location with data, and with the hot spot z-score and p-value for each bin, the Emerging Hot Spot Analysis tool categorizes each study area location.” — taken from the ArcGIS Pro Documentation on the tool.

The result highlighted a hot spot in San Juan, New Mexico of increased usage. And an emerging cold spot of Cherry County, Nebraska where there was a decrease in participants. Since the QC Snap data came in at the state level, with no way to filter by county, I chose to compare Nebraska vs New Mexico to highlight the extreme cases. I also wanted to do a 10-year gap analysis to reflect the impact of the 2008 crash on food insecure communities at risk. So I ended up with 4 datasets: 2007 New Mexico & Nebraska, 2017 New Mexico & Nebraska.

data-science gis

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

Applications Of Data Science On 3D Imagery Data

The agenda of the talk included an introduction to 3D data, its applications and case studies, 3D data alignment and more.

Data Science Course in Dallas

Become a data analysis expert using the R programming language in this [data science](https://360digitmg.com/usa/data-science-using-python-and-r-programming-in-dallas "data science") certification training in Dallas, TX. You will master data...

32 Data Sets to Uplift your Skills in Data Science | Data Sets

Need a data set to practice with? Data Science Dojo has created an archive of 32 data sets for you to use to practice and improve your skills as a data scientist.

Data Cleaning in R for Data Science

A data scientist/analyst in the making needs to format and clean data before being able to perform any kind of exploratory data analysis.