Scraping Medium Publications – A Python Tutorial for Beginners

Scraping Medium Publications – A Python Tutorial for Beginners

In this tutorial, we’ll see how to code a simple but powerful web scraper that can be used in any Medium publication. Also, the concept of this scraper can be used to scrape data from lots of different websites for whatever reason you want to. Scraping Medium Publications – A Python Tutorial for Beginners

A while ago I was trying to make some analysis on a Medium publication for a personal project. However, data acquisition was a problem because only scraping the publication’s home page does not ensure you get all the data you want. 

That’s when I find out that each publication has its own archive. You just have to type “/archive” after the publication URL. You can even specify a year, month, and day and find all the stories published on that date. Something like this:

https://publicationURL/archive/year/month/day

And suddenly the problem was solved. A very simple scraper would do the job. In this tutorial, we’ll see how to code a simple but powerful web scraper that can be used in any Medium publication. Also, the concept of this scraper can be used to scrape data from lots of different websites for whatever reason you want to.

As we’re scraping a Medium publication, nothing better than use The Startup for as an example. According to them, The Startup is the largest active Medium publication with over 700k followers and therefore it should be a great source of data. In this article, you’ll see how to scrape all the articles published by them in 2019 and how this data can be useful.

Web Scraping

Web scraping is the process of collecting data from websites using automatized scripts. It consists of three main steps: fetching the page, parsing the HTML, and extracting the information you need.

The third step is the one that can be a little tricky at first. It consists basically of finding the parts of the HTML the contain the information you want. You can find this by opening the page you want to scrape and pressing the F12 key on your keyboard. Then you can select an element of the page to inspect. You can see this in the image below.

Then all you need to do is to use the tags and classes in the HTML to inform the scraper where to find the information. You need to do it for every part of the page you want to scrape. You can see it better in the code.

python medium

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Python Tricks Every Developer Should Know

In this tutorial, you’re going to learn a variety of Python tricks that you can use to write your Python code in a more readable and efficient way like a pro.

How to Remove all Duplicate Files on your Drive via Python

Today you're going to learn how to use Python programming in a way that can ultimately save a lot of space on your drive by removing all the duplicates. We gonna use Python OS remove( ) method to remove the duplicates on our drive. Well, that's simple you just call remove ( ) with a parameter of the name of the file you wanna remove done.

Basic Data Types in Python | Python Web Development For Beginners

In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.

How To Compare Tesla and Ford Company By Using Magic Methods in Python

Magic Methods are the special methods which gives us the ability to access built in syntactical features such as ‘<’, ‘>’, ‘==’, ‘+’ etc.. You must have worked with such methods without knowing them to be as magic methods. Magic methods can be identified with their names which start with __ and ends with __ like __init__, __call__, __str__ etc. These methods are also called Dunder Methods, because of their name starting and ending with Double Underscore (Dunder).

The Basics of Python OS Module

The OS module is a python module that provides the interface for interacting with the underlying operating system that Python is running.