**Follow me along on how I explored Germany’s largest travel forum Vielfliegertref. **As an inspiring data scientist, building interesting portfolio projects is key to showcase your skills. When I learned coding and data science as a business student through online courses, I disliked that datasets were made up of fake data or were solved before like Boston House Prices or the Titanic dataset on Kaggle.
In this blogpost, I want to show you how I develop interesting data science project ideas and implement them step by step, such as exploring Germany’s biggest frequent flyer forum Vielfliegertreff. If you are short on time feel free to skip to the conclusion TLDR.
As a first step, I think about a potential project that fulfills the following three requirements to make it the most interesting and enjoyable:
As these ideas are still quite abstract, let me give you a rundown how my three projects fulfilled the requirements:
As a beginner do not strive for perfection, but choose something you are genuinely curious about and write down all the questions you want to explore in your topic.
Given that you followed my third requirement, there will be no dataset publicly available and you will have to scrape data together yourself. Having scraped a couple of websites, there are 3 major frameworks I use for different scenarios:
For Vielfliegertreff, I used scrapy as framework for the following reasons:
There was** no Javascript **enabled elements that were hiding data. The website structure was complex having to go from each forum subject, to all the threads and from all the treads to all post website pages. With scrapy you can easily implement complex logic yielding requests that lead to new callback functions in an organized way.There were quite a lot of posts so crawling the entire forum will definitely take some time. Scrapy allows you to asynchronously scrape websites at an incredible speed.
#data-science #analytics #portfolio #guide #tutorial #web-scraping #artificial-intelligence #careers