Web Scraping Tutorial: Using Python to Find Cheap Flights

Web Scraping Tutorial: Using Python to Find Cheap Flights

Do you love data science and traveling? Read on to learn how to combine the two and use Python to find cheap flights! A tutorial on how to create a web scraping program that will search for and find cheap airline flight prices, and then send this prices to your ...

Do you love data science and traveling? Read on to learn how to combine the two and use Python to find cheap flights! A tutorial on how to create a web scraping program that will search for and find cheap airline flight prices, and then send this prices to your ...

Introduction

In this tutorial, I will show you how to use Python to automatically surf a website like Expedia on an hourly basis looking for flights and sending you the best flight rate for a particular route you want every hour straight to your email.

The end result is this nice email:

We will work as follows:

  1. Connect Python to our web browser and access the website (Expedia in our example here).
  2. Choose the ticket type based on our preference (round trip, one way, etc.).
  3. Select the departure country.
  4. Select the arrival country (if round trip).
  5. Select departure and return dates.
  6. Compile all available flights in a structured format (for those who love to do some exploratory data analysis!).
  7. Connect to your email.
  8. Send the best rate for the current hour.

Let’s get started!

Importing Libraries

Let’s go ahead and import our libraries:

Selenium (for accessing websites and automation testing):

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

Pandas (we will mainly just used Pandas for structuring our data):

import pandas as pd

Time and date-time (for using delays and returning current time we will see why later):

import time
import datetime

We need those for connecting to our email and sending our message:

import smtplib
from email.mime.multipart import MIMEMultipart

Note: I will not go too deeply into web scraping using selenium, but if you want a more detailed tutorial for scraping in general check my previous tutorials for scraping using Selenium and web scraping in general Part 1 and Part 2.

Let’s Get Coding

Connect to the Web Browser

browser = webdriver.Chrome(executable_path='/chromedriver')

This will open an empty browser telling you that this browser is being controlled by automated test software like so:

Choose Ticket

Next, I will quickly go to Expedia to check the interface and the options available to choose from.

I click right click + inspect on the ticket type (roundtrip, one way, etc.) to see the tags related to it.

As we can see below it has a ‘label’ tag with ‘id = flight-type-roundtrip-label-hp-flight’.

Accordingly, I will use those to store the tags and ids for the three different ticket types as follows:

#Setting ticket types paths
return_ticket = "//label[@id='flight-type-roundtrip-label-hp-flight']"
one_way_ticket = "//label[@id='flight-type-one-way-label-hp-flight']"
multi_ticket = "//label[@id='flight-type-multi-dest-label-hp-flight']"

Then I define a function to choose a ticket type:

def ticket_chooser(ticket):

    try:
        ticket_type = browser.find_element_by_xpath(ticket)
        ticket_type.click()
    except Exception as e:
        pass

The above sequence is the same sequence I will use for the rest of the code (look for tags and ids or other attributes and define a function to make the choice on the web page).

Choose Departure and Arrival Countries

Below I define a function to choose the departure country.

def dep_country_chooser(dep_country):
    fly_from = browser.find_element_by_xpath("//input[@id='flight-origin-hp-flight']")
    time.sleep(1)
    fly_from.clear()
    time.sleep(1.5)
    fly_from.send_keys('  ' + dep_country)
    time.sleep(1.5)
    first_item = browser.find_element_by_xpath("//a[@id='aria-option-0']")
    time.sleep(1.5)
    first_item.click()

I follow the below logic:

  1. Connect Python to our web browser and access the website (Expedia in our example here).
  2. Choose the ticket type based on our preference (round trip, one way, etc.).
  3. Select the departure country.
  4. Select the arrival country (if round trip).
  5. Select departure and return dates.
  6. Compile all available flights in a structured format (for those who love to do some exploratory data analysis!).
  7. Connect to your email.
  8. Send the best rate for the current hour.

Note that I am using time.sleep between steps to give a chance to the page’s elements to update/load between steps. Without time.sleep, sometimes our script acts faster than the page loads and thus tries to access elements that didn’t load yet causing our code to break.

Let’s do the same for the arrival country.

def arrival_country_chooser(arrival_country):
    fly_to = browser.find_element_by_xpath("//input[@id='flight-destination-hp-flight']")
    time.sleep(1)
    fly_to.clear()
    time.sleep(1.5)
    fly_to.send_keys('  ' + arrival_country)
    time.sleep(1.5)
    first_item = browser.find_element_by_xpath("//a[@id='aria-option-0']")
    time.sleep(1.5)
    first_item.click()

Choosing the Departure and Return Dates

Departure date:

def dep_date_chooser(month, day, year):

    dep_date_button = browser.find_element_by_xpath("//input[@id='flight-departing-hp-flight']")
    dep_date_button.clear()
    dep_date_button.send_keys(month + '/' + day + '/' + year)

Very straight forward:

  1. Connect Python to our web browser and access the website (Expedia in our example here).
  2. Choose the ticket type based on our preference (round trip, one way, etc.).
  3. Select the departure country.
  4. Select the arrival country (if round trip).
  5. Select departure and return dates.
  6. Compile all available flights in a structured format (for those who love to do some exploratory data analysis!).
  7. Connect to your email.
  8. Send the best rate for the current hour.

Return date:

def return_date_chooser(month, day, year):
    return_date_button = browser.find_element_by_xpath("//input[@id='flight-returning-hp-flight']")

    for i in range(11):
        return_date_button.send_keys(Keys.BACKSPACE)
    return_date_button.send_keys(month + '/' + day + '/' + year)

For the return date, clearing whatever was written wasn’t working for some reason (probably due to the page having this as autofill not allowing me to override it with .clear())

The way I worked around this is by using Keys.BACKSPACE which simply tells Python to click backspace (to delete whatever is written in the date field). I put it in a for loop to click backspace 11 times to delete all the characters for the date in the field.

Getting the Results

Define the function that will click the search button.

def search():
    search = browser.find_element_by_xpath("//button[@class='btn-primary btn-action gcw-submit']")
    search.click()
    time.sleep(15)
    print('Results ready!')

Here it is better to use a long delay of 15 seconds or so to make sure all results are loaded before we proceed to the next steps.

The resulting webpage is as follows (with the fields I am interested in marked):

Compiling the Data

We will use this sequence to compile our data:

  1. Connect Python to our web browser and access the website (Expedia in our example here).
  2. Choose the ticket type based on our preference (round trip, one way, etc.).
  3. Select the departure country.
  4. Select the arrival country (if round trip).
  5. Select departure and return dates.
  6. Compile all available flights in a structured format (for those who love to do some exploratory data analysis!).
  7. Connect to your email.
  8. Send the best rate for the current hour.

Below is the code:

df = pd.DataFrame()
def compile_data():
    global df
    global dep_times_list
    global arr_times_list
    global airlines_list
    global price_list
    global durations_list
    global stops_list
    global layovers_list

    #departure times
    dep_times = browser.find_elements_by_xpath("//span[@data-test-id='departure-time']")
    dep_times_list = [value.text for value in dep_times]

    #arrival times
    arr_times = browser.find_elements_by_xpath("//span[@data-test-id='arrival-time']")
    arr_times_list = [value.text for value in arr_times]

    #airline name
    airlines = browser.find_elements_by_xpath("//span[@data-test-id='airline-name']")
    airlines_list = [value.text for value in airlines]

    #prices
    prices = browser.find_elements_by_xpath("//span[@data-test-id='listing-price-dollars']")
    price_list = [value.text.split('/div>)[1] for value in prices]

    #durations
    durations = browser.find_elements_by_xpath("//span[@data-test-id='duration']")
    durations_list = [value.text for value in durations]

    #stops
    stops = browser.find_elements_by_xpath("//span[@class='number-stops']")
    stops_list = [value.text for value in stops]

    #layovers
    layovers = browser.find_elements_by_xpath("//span[@data-test-id='layover-airport-stops']")
    layovers_list = [value.text for value in layovers]

    now = datetime.datetime.now()
    current_date = (str(now.year) + '-' + str(now.month) + '-' + str(now.day))
    current_time = (str(now.hour) + ':' + str(now.minute))
    current_price = 'price' + '(' + current_date + '---' + current_time + ')'
    for i in range(len(dep_times_list)):
        try:
            df.loc[i, 'departure_time'] = dep_times_list[i]
        except Exception as e:
            pass
        try:
            df.loc[i, 'arrival_time'] = arr_times_list[i]
        except Exception as e:
            pass
        try:
            df.loc[i, 'airline'] = airlines_list[i]
        except Exception as e:
            pass
        try:
            df.loc[i, 'duration'] = durations_list[i]
        except Exception as e:
            pass
        try:
            df.loc[i, 'stops'] = stops_list[i]
        except Exception as e:
            pass
        try:
            df.loc[i, 'layovers'] = layovers_list[i]
        except Exception as e:
            pass
        try:
            df.loc[i, str(current_price)] = price_list[i]
        except Exception as e:
            pass

    print('Excel Sheet Created!')

One thing worth mentioning is that for the price column I am renaming it every time the code runs using this snippet of code:

now = datetime.datetime.now()
current_date = (str(now.year) + '-' + str(now.month) + '-' + str(now.day))
current_time = (str(now.hour) + ':' + str(now.minute))
current_price = 'price' + '(' + current_date + '---' + current_time + ')'

This is because I want to have the header of the column stating the current time at that particular run in order to be able to see later how the price changes over time in case I want to do that.

Setting Up Our Email Functions

In this part I will set up three functions:

  • One to connect to my email.
  • One to create the message.
  • A final one to actually send it.

First, I also need to store my email login credentials in two variables as follows:

#email credentials
username = '[email protected]'
password = 'XXXXXXXXXXX'

Connect

def connect_mail(username, password):
    global server
    server = smtplib.SMTP('smtp.outlook.com', 587)
    server.ehlo()
    server.starttls()
    server.login(username, password)

Create the Message

#Create message template for email
def create_msg():
    global msg
    msg = '\nCurrent Cheapest flight:\n\nDeparture time: {}\nArrival time: {}\nAirline: {}\nFlight duration: {}\nNo. of stops: {}\nPrice: {}\n'.format(cheapest_dep_time,
                       cheapest_arrival_time,
                       cheapest_airline,
                       cheapest_duration,
                       cheapest_stops,
                       cheapest_price)

Here I create the message using placeholders ‘{}’ for the values to be passed in during each run.

Also, the variables used here like cheapest_arrival_time, cheapest_airline, etc. will be defined later when we start running all our functions to hold the values for each particular run.

Send the Message

def send_email(msg):
    global message
    message = MIMEMultipart()
    message['Subject'] = 'Current Best flight'
    message['From'] = '[email protected]'
    message['to'] = '[email protected]'

    server.sendmail('[email protected]', '[email protected]', msg)


Let’s Run Our Code!

Now we will finally run our functions. We will use the below logic.

The data scraping part:

  1. Connect Python to our web browser and access the website (Expedia in our example here).
  2. Choose the ticket type based on our preference (round trip, one way, etc.).
  3. Select the departure country.
  4. Select the arrival country (if round trip).
  5. Select departure and return dates.
  6. Compile all available flights in a structured format (for those who love to do some exploratory data analysis!).
  7. Connect to your email.
  8. Send the best rate for the current hour.

The email part:

  1. Connect Python to our web browser and access the website (Expedia in our example here).
  2. Choose the ticket type based on our preference (round trip, one way, etc.).
  3. Select the departure country.
  4. Select the arrival country (if round trip).
  5. Select departure and return dates.
  6. Compile all available flights in a structured format (for those who love to do some exploratory data analysis!).
  7. Connect to your email.
  8. Send the best rate for the current hour.

Finally, we save our DataFrame to an Excel sheet and sleep for 3600 seconds (1 hour).

This loop will run 8 times in one-hour intervals, thus it will run for 8 hours. You can tweak the timing to your preference.

for i in range(8):    
    link = 'https://www.expedia.com/'
    browser.get(link)
    time.sleep(5)

    #choose flights only
    flights_only = browser.find_element_by_xpath("//button[@id='tab-flight-tab-hp']")
    flights_only.click()

    ticket_chooser(return_ticket)

    dep_country_chooser('Cairo')

    arrival_country_chooser('New york')

    dep_date_chooser('04', '01', '2019')

    return_date_chooser('05', '02', '2019')

    search()

    compile_data()

    #save values for email
    current_values = df.iloc[0]

    cheapest_dep_time = current_values[0]
    cheapest_arrival_time = current_values[1]
    cheapest_airline = current_values[2]
    cheapest_duration = current_values[3]
    cheapest_stops = current_values[4]
    cheapest_price = current_values[-1]

    print('run {} completed!'.format(i))

    create_msg()
    connect_mail(username,password)
    send_email(msg)
    print('Email sent!')

    df.to_excel('flights.xlsx')

    time.sleep(3600)

Now I will be getting this email every hour for the next 8 hours:

I also have this neat Excel sheet with all the flights and it will keep updating each hour with a new column for the current price:

Now you can take this further by applying so many other ideas such as:

  • One to connect to my email.
  • One to create the message.
  • A final one to actually send it.

If you have other ideas don’t hesitate to share!

That’s it! I hope you found it useful.

Python GUI Programming Projects using Tkinter and Python 3

Python GUI Programming Projects using Tkinter and Python 3

Python GUI Programming Projects using Tkinter and Python 3

Description
Learn Hands-On Python Programming By Creating Projects, GUIs and Graphics

Python is a dynamic modern object -oriented programming language
It is easy to learn and can be used to do a lot of things both big and small
Python is what is referred to as a high level language
Python is used in the industry for things like embedded software, web development, desktop applications, and even mobile apps!
SQL-Lite allows your applications to become even more powerful by storing, retrieving, and filtering through large data sets easily
If you want to learn to code, Python GUIs are the best way to start!

I designed this programming course to be easily understood by absolute beginners and young people. We start with basic Python programming concepts. Reinforce the same by developing Project and GUIs.

Why Python?

The Python coding language integrates well with other platforms – and runs on virtually all modern devices. If you’re new to coding, you can easily learn the basics in this fast and powerful coding environment. If you have experience with other computer languages, you’ll find Python simple and straightforward. This OSI-approved open-source language allows free use and distribution – even commercial distribution.

When and how do I start a career as a Python programmer?

In an independent third party survey, it has been revealed that the Python programming language is currently the most popular language for data scientists worldwide. This claim is substantiated by the Institute of Electrical and Electronic Engineers, which tracks programming languages by popularity. According to them, Python is the second most popular programming language this year for development on the web after Java.

Python Job Profiles
Software Engineer
Research Analyst
Data Analyst
Data Scientist
Software Developer
Python Salary

The median total pay for Python jobs in California, United States is $74,410, for a professional with one year of experience
Below are graphs depicting average Python salary by city
The first chart depicts average salary for a Python professional with one year of experience and the second chart depicts the average salaries by years of experience
Who Uses Python?

This course gives you a solid set of skills in one of today’s top programming languages. Today’s biggest companies (and smartest startups) use Python, including Google, Facebook, Instagram, Amazon, IBM, and NASA. Python is increasingly being used for scientific computations and data analysis
Take this course today and learn the skills you need to rub shoulders with today’s tech industry giants. Have fun, create and control intriguing and interactive Python GUIs, and enjoy a bright future! Best of Luck
Who is the target audience?

Anyone who wants to learn to code
For Complete Programming Beginners
For People New to Python
This course was designed for students with little to no programming experience
People interested in building Projects
Anyone looking to start with Python GUI development
Basic knowledge
Access to a computer
Download Python (FREE)
Should have an interest in programming
Interest in learning Python programming
Install Python 3.6 on your computer
What will you learn
Build Python Graphical User Interfaces(GUI) with Tkinter
Be able to use the in-built Python modules for their own projects
Use programming fundamentals to build a calculator
Use advanced Python concepts to code
Build Your GUI in Python programming
Use programming fundamentals to build a Project
Signup Login & Registration Programs
Quizzes
Assignments
Job Interview Preparation Questions
& Much More

Guide to Python Programming Language

Guide to Python Programming Language

Guide to Python Programming Language

Description
The course will lead you from beginning level to advance in Python Programming Language. You do not need any prior knowledge on Python or any programming language or even programming to join the course and become an expert on the topic.

The course is begin continuously developing by adding lectures regularly.

Please see the Promo and free sample video to get to know more.

Hope you will enjoy it.

Basic knowledge
An Enthusiast Mind
A Computer
Basic Knowledge To Use Computer
Internet Connection
What will you learn
Will Be Expert On Python Programming Language
Build Application On Python Programming Language

Python Programming Tutorials For Beginners

Python Programming Tutorials For Beginners

Python Programming Tutorials For Beginners

Description
Hello and welcome to brand new series of wiredwiki. In this series i will teach you guys all you need to know about python. This series is designed for beginners but that doesn't means that i will not talk about the advanced stuff as well.

As you may all know by now that my approach of teaching is very simple and straightforward.In this series i will be talking about the all the things you need to know to jump start you python programming skills. This series is designed for noobs who are totally new to programming, so if you don't know any thing about

programming than this is the way to go guys Here is the links to all the videos that i will upload in this whole series.

In this video i will talk about all the basic introduction you need to know about python, which python version to choose, how to install python, how to get around with the interface, how to code your first program. Than we will talk about operators, expressions, numbers, strings, boo leans, lists, dictionaries, tuples and than inputs in python. With

Lots of exercises and more fun stuff, let's get started.

Download free Exercise files.

Dropbox: https://bit.ly/2AW7FYF

Who is the target audience?

First time Python programmers
Students and Teachers
IT pros who want to learn to code
Aspiring data scientists who want to add Python to their tool arsenal
Basic knowledge
Students should be comfortable working in the PC or Mac operating system
What will you learn
know basic programming concept and skill
build 6 text-based application using python
be able to learn other programming languages
be able to build sophisticated system using python in the future

To know more: