Archiving and Logging Your Use of Public Data

Archiving and Logging Your Use of Public Data

Archiving and Logging Your Use of Public Data. Dealing with the impermanence of public data sets

One worry that I always have when downloading data sets off the internet is their impermanence. Links die, data changes, ashes to ashes, dust to dust.

That’s why I’ve been introducing the Wayback Machine into my workflow. But even then, it’s tough to be consistent with whether I’m downloading data off an archived website or a live website and it’s tough to understand what I did in the past.

What I’ve done through my work with the Survey of Consumer Finances (SCF) is implement a system of simultaneously archiving and logging the data that I use. Below is a summary of what I’ve done but if you just want to see the code, scroll to the bottom of this post for a gist of the functions I’ve implemented with respect to the SCF.

Building Off of WaybackPy

The big thing I wanted to accomplish with this project was to make sure that I was using recent Wayback archives as much as possible when downloading data. Else, with any data that did not have archives, I wanted to be sure it got archived on Wayback for future use.

The best package I found for this was WaybackPy, although I needed to make a few changes to make it work for my purposes.

First, I needed to implement attributes that would allow me to see the age of the latest archive. This way I could check to see if a new archive would be needed given a provided archive_age_limit .

By way of illustration, if you were to get the len() of a WaybackPy Url object passing through www.google.com as an argument you’d get 0, or the number of days since the last archive.

import waybackpy

url = “https://www.google.com/"
waybackpy_url_obj = waybackpy.Url(url)
print(len(waybackpy_url_obj))

data-science workflow consumer-finance work federal-reserve

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

What Are The Advantages and Disadvantages of Data Science?

Online Data Science Training in Noida at CETPA, best institute in India for Data Science Online Course and Certification. Call now at 9911417779 to avail 50% discount.

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

8 Best Examples of Data Science in Finance

In this blog, we'll discuss the new applications of the data science in finance sector and how the developments in it revolutionize finance.

Data Science With Python Training | Python Data Science Course | Intellipaat

🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...

Applications Of Data Science On 3D Imagery Data

The agenda of the talk included an introduction to 3D data, its applications and case studies, 3D data alignment and more.