Building a Database from Federal School Test Data

In this tutorial, we'll learn Building a Database from Federal School Test Data. Organizing National Center for Education mclStatistics’ (NCES) data in a relational database.


As parents of two young children, my wife and I have researched the quality of schools in several US cities. This research has led to a few popular websites which present a variety of metrics on the quality of individual schools. Two of the most popular websites that present information on school quality are and As a data professional, I wondered where these sites obtain their data. Both of these sites use data provided by the US Department of Education’s National Center for Education Statistics (NCES) ( also uses quite a bit of data from state education departments, which is more detailed than the federal data).

In this series of articles, I will share my process to download and restructure a number of these datasets. First, I will use Python to restructure the data to load it into a Mysql database, then I will do some ad hoc research projects using some of Python’s popular data science libraries.

The main goal of this first article is to cover how I successfully used Pandas to load 10 years of school-level test score results obtained from NCES into a MySQL database. By loading this data into a relational database format, I aim to make a wide array of analysis tasks more efficient as I take a deeper dive into the data in future articles. Hopefully, this will be helpful to those who are interested in exploring Pandas’ SQL capabilities.

