Three Often Overlooked Sources of Data for your Next Passion Project

Three Often Overlooked Sources of Data for your Next Passion Project

One of the key skills a data scientist should have is being able to wrangle data from a variety of sources. In this blog, I will discuss three unorthodox data types and how you can get started working with them.

We live in the age of information, a time when there is more data at our fingertips than at any other point in history, and it's growing.

DC predicts the world’s data will grow to 175 zettabytes in 2025 … If you attempted to download 175 zettabytes at the average current internet connection speed, it would take you 1.8 billion years to download.

That is a lot of data. So why are people still using the same airline CSVs or soccer player statistics? This is a trap I myself have fallen into on occasion and I think it happens mainly for two reasons: laziness and familiarity. Laziness, because we know how easy it is to use precleaned CSVs. Familiarity, because we know how to access the data and get to work on our projects rather than messing around unpacking and importing less familiar data types.

In this blog, I am going to deviate from this status quo and explore three alternative types of data to work with, and how to get started with them.

WAV File Data (Easy)

JPEG File Data (Medium)

APK Code Data (Advanced)

(EASY) Unlocking the Data of WAV Sound Files

Many people new to the field of data science and signal processing assume that opening and analyzing sound data is a complicated and advanced process. Although some of the theory behind signal processing can get a bit advanced, actually opening and working with sound files is astonishingly easy, thanks to SciPy’s io.wavefile package.

Waveform Audio File, or .Wav, is a common audio bitstream storage file for PCs. If you work with signal processing or audio and sound engineering you will regularly encounter and have to perform various transformations on .wav files.

Working with .Wav files is easy and quick. Just pick a sound file you are interested in taking a closer look at. I have chosen to use Benjamin Tissot’s Actionable, a royalty-free rock song. Let’s import os.wavfile and open up the song:

import numpy as np ## for data transformation
import matplotlib.pyplot as plt ## for visualizing the data
import as wavfile ## for opening the data

Fs, aud ='Actionable-BenjaminTissot.wav')

In this case, “Fs” or sampling rate is the number of audio samples carried per second and “aud” is the actual audio as sound pressure. Most modern sound files have two audio channels, a left and a right for stereo sound, let’s focus on just one channel and see what we can learn about the audio file:

aud = aud[:,0] #Pick just the left channel

print("Sample rate: "+str(Fs))
print("Duration: " + str(aud.shape[0]/Fs) +" seconds")
[1] Sample rate: 44100
[2] Duration: 122.80163265306122 seconds 
#Duration is just total samples over sampling rate

As we can see, the sampling rate is 44100 Hz or 44.1 kHz, the standard for most high-quality audio files. By dividing the total samples by the sampling rate we get the length of the song, which is roughly two minutes. Let's visualize the first 1.5 seconds of this audio file:

first = aud[:int(Fs*1.5)] #Snip just the first 1.5 seconds
plt.ylabel("Sound Pressure (Pa)")
plt.xlabel("Sample Count")
plt.title("Original Audio (1.5s)");

data-science image-processing machine-learning audio computer-science

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

Audio Processing with Python | Data Science | Machine Learning | Python

Some of the most used audio processing tasks in programming include - loading and saving audio files, splitting and appending the audio files into segments,

Best Free Datasets for Data Science and Machine Learning Projects

This post will help you in finding different websites where you can easily get free Datasets to practice and develop projects in Data Science and Machine Learning.

Data Preparation Techniques and Its Importance in Machine Learning

Data Preparation Techniques and Its Importance in Machine Learning. “Data are just summaries of thousands of stories, tell a few of those stories to help make the data meaningful.”