**Follow me along on how I explored Germany’s largest travel forum Vielfliegertref. **As an inspiring data scientist, building interesting portfolio projects is key to showcase your skills. When I learned coding and data science as a business student through online courses, I disliked that datasets were made up of fake data or were solved before like Boston House Prices or the Titanic dataset on Kaggle.

In this blogpost, I want to show you how I develop interesting data science project ideas and implement them step by step, such as exploring Germany’s biggest frequent flyer forum Vielfliegertreff. If you are short on time feel free to skip to the conclusion TLDR.

Step 1: Choose your passion topic that is relevant

As a first step, I think about a potential project that fulfills the following three requirements to make it the most interesting and enjoyable:

  • Solving my own problem or burning question
  • Connected to some recent event to be relevant or especially interesting
  • Has not been solved or covered before

As these ideas are still quite abstract, let me give you a rundown how my three projects fulfilled the requirements:

As a beginner do not strive for perfection, but choose something you are genuinely curious about and write down all the questions you want to explore in your topic.

Step 2: Start Scraping together your own dataset

Given that you followed my third requirement, there will be no dataset publicly available and you will have to scrape data together yourself. Having scraped a couple of websites, there are 3 major frameworks I use for different scenarios:

For Vielfliegertreff, I used scrapy as framework for the following reasons:

There was** no Javascript **enabled elements that were hiding data. The website structure was complex having to go from each forum subject, to all the threads and from all the treads to all post website pages. With scrapy you can easily implement complex logic yielding requests that lead to new callback functions in an organized way.There were quite a lot of posts so crawling the entire forum will definitely take some time. Scrapy allows you to asynchronously scrape websites at an incredible speed.

#data-science #analytics #portfolio #guide #tutorial #web-scraping #artificial-intelligence #careers

How to Create an Authentic Data Science Project for your Portfolio
1.25 GEEK