In this web scraping tutorial we’re going to be using regular expressions to parse HTML. This is a more advanced tutorial so you can check out my video on regular expressions before going through this. We’re going to be parsing out the IMDb website, which is an Internet movie database, and I’m going to be using a website called www.regex101.com to test regular expressions against strings to make sure we’re matching them correctly. Because this is an advanced tutorial, I’ll be posting each portion of code and explaining how it works as we walk through it. Directly below is the full source code, but skip down further and I’ll walk through each portion of the code.
Download this video’s files here: