The Metis Data Science Bootcamp is now past the midway point, and I have one more project under my belt. This is the third one after my exploratory data analysis of the New York MTA subway dataset, followed by using linear regression to model user scores of anime on MyAnimeList based on scraped data.

What is the aim of this third project? Simply put, to use supervised classification to determine whether a given fishing vessel is fishing based on features provided by the Global Fishing Watch (GFW) dataset.

TL;DR:

  • Retrieved and engineeered 8 features from ~160,000 entries in the GFW dataset, focused on trawlers
  • **Duplicate vessel ID’s **meant that I had to leave time-series data for future work
  • Random Forest outperformed Logistics Regression, Gaussian Naive Bayes and K Nearest Neighbours, and performed very well on the test dataset as well as out-of-sample trawler data
  • Next steps would be to disentangle similar vessel ID’s; extend modelling to encompass different gear types; apply models to determine the different gear types as well as whether a vessel is a fishing vessel; and investigate more sophisicated models

I. Background

According to the Food and Agriculture Organisation of the United Nations, roughly one third of global fisheries are overfished as of 2015. It goes on to explain why this is such a bad thing:

Overfishing not only reduces food production but also impairs the functioning of ecosystems and reduces biodiversity, with negative repercussions for economies and societies.

To keep track of overfishing, one would first need to keep track of which vessels at any one time are fishing. This is where GFW’s data comes into play.

Image for post

Image for post

II. Retrieving the data

GFW’s goal is to advance “ocean sustainability and stewardship through increasing transparency.” Part of this transparency includes making relevant data publicly available, and so I was able to download the csv files for fishing vessels with various types of gear. [1]

[1] As on 20 May 2020, the csv files are no longer available on GFW’s website; instead, their GitHub has a set of csv and numpy files that present the data in several different ways. Any further investigation will likely make use of this data.

Image for post

List of csv files downloaded from GFW’s website, along with number of records

As can be seen in the above table, there are a **lot **of records! For the purpose of this investigation, I chose to focus on trawlers, as they were the second largest gear type and 4.4 million rows of data was still managable on my system.

#overfishing #bootcamp #metis #data-science #data analysis

Is a trawler fishing? Modelling the Global Fishing Watch dataset
1.95 GEEK