Archiving and Logging Your Use of Public Data. Dealing with the impermanence of public data sets
One worry that I always have when downloading data sets off the internet is their impermanence. Links die, data changes, ashes to ashes, dust to dust.
That’s why I’ve been introducing the Wayback Machine into my workflow. But even then, it’s tough to be consistent with whether I’m downloading data off an archived website or a live website and it’s tough to understand what I did in the past.
What I’ve done through my work with the Survey of Consumer Finances (SCF) is implement a system of simultaneously archiving and logging the data that I use. Below is a summary of what I’ve done but if you just want to see the code, scroll to the bottom of this post for a gist of the functions I’ve implemented with respect to the SCF.
The big thing I wanted to accomplish with this project was to make sure that I was using recent Wayback archives as much as possible when downloading data. Else, with any data that did not have archives, I wanted to be sure it got archived on Wayback for future use.
The best package I found for this was WaybackPy, although I needed to make a few changes to make it work for my purposes.
First, I needed to implement attributes that would allow me to see the age of the latest archive. This way I could check to see if a new archive would be needed given a provided
By way of illustration, if you were to get the
len() of a WaybackPy Url object passing through www.google.com as an argument you’d get 0, or the number of days since the last archive.
import waybackpy url = “https://www.google.com/" waybackpy_url_obj = waybackpy.Url(url) print(len(waybackpy_url_obj))
Online Data Science Training in Noida at CETPA, best institute in India for Data Science Online Course and Certification. Call now at 9911417779 to avail 50% discount.
Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.
In this blog, we'll discuss the new applications of the data science in finance sector and how the developments in it revolutionize finance.
🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...
The agenda of the talk included an introduction to 3D data, its applications and case studies, 3D data alignment and more.