1560823875
In this tutorial, I will show you how to use Python to automatically surf a website like Expedia on an hourly basis looking for flights and sending you the best flight rate for a particular route you want every hour straight to your email.
The end result is this nice email:
We will work as follows:
Let’s get started!
Let’s go ahead and import our libraries:
Selenium (for accessing websites and automation testing):
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
Pandas (we will mainly just used Pandas for structuring our data):
import pandas as pd
Time and date-time (for using delays and returning current time we will see why later):
import time
import datetime
We need those for connecting to our email and sending our message:
import smtplib
from email.mime.multipart import MIMEMultipart
Note: I will not go too deeply into web scraping using selenium, but if you want a more detailed tutorial for scraping in general check my previous tutorials for scraping using Selenium and web scraping in general Part 1 and Part 2.
browser = webdriver.Chrome(executable_path='/chromedriver')
This will open an empty browser telling you that this browser is being controlled by automated test software like so:
Next, I will quickly go to Expedia to check the interface and the options available to choose from.
I click right click + inspect on the ticket type (roundtrip, one way, etc.) to see the tags related to it.
As we can see below it has a ‘label’ tag with ‘id = flight-type-roundtrip-label-hp-flight’.
Accordingly, I will use those to store the tags and ids for the three different ticket types as follows:
#Setting ticket types paths
return_ticket = "//label[@id='flight-type-roundtrip-label-hp-flight']"
one_way_ticket = "//label[@id='flight-type-one-way-label-hp-flight']"
multi_ticket = "//label[@id='flight-type-multi-dest-label-hp-flight']"
Then I define a function to choose a ticket type:
def ticket_chooser(ticket):
try:
ticket_type = browser.find_element_by_xpath(ticket)
ticket_type.click()
except Exception as e:
pass
The above sequence is the same sequence I will use for the rest of the code (look for tags and ids or other attributes and define a function to make the choice on the web page).
Below I define a function to choose the departure country.
def dep_country_chooser(dep_country):
fly_from = browser.find_element_by_xpath("//input[@id='flight-origin-hp-flight']")
time.sleep(1)
fly_from.clear()
time.sleep(1.5)
fly_from.send_keys(' ' + dep_country)
time.sleep(1.5)
first_item = browser.find_element_by_xpath("//a[@id='aria-option-0']")
time.sleep(1.5)
first_item.click()
I follow the below logic:
Note that I am using time.sleep between steps to give a chance to the page’s elements to update/load between steps. Without time.sleep, sometimes our script acts faster than the page loads and thus tries to access elements that didn’t load yet causing our code to break.
Let’s do the same for the arrival country.
def arrival_country_chooser(arrival_country):
fly_to = browser.find_element_by_xpath("//input[@id='flight-destination-hp-flight']")
time.sleep(1)
fly_to.clear()
time.sleep(1.5)
fly_to.send_keys(' ' + arrival_country)
time.sleep(1.5)
first_item = browser.find_element_by_xpath("//a[@id='aria-option-0']")
time.sleep(1.5)
first_item.click()
Departure date:
def dep_date_chooser(month, day, year):
dep_date_button = browser.find_element_by_xpath("//input[@id='flight-departing-hp-flight']")
dep_date_button.clear()
dep_date_button.send_keys(month + '/' + day + '/' + year)
Very straight forward:
Return date:
def return_date_chooser(month, day, year):
return_date_button = browser.find_element_by_xpath("//input[@id='flight-returning-hp-flight']")
for i in range(11):
return_date_button.send_keys(Keys.BACKSPACE)
return_date_button.send_keys(month + '/' + day + '/' + year)
For the return date, clearing whatever was written wasn’t working for some reason (probably due to the page having this as autofill not allowing me to override it with .clear())
The way I worked around this is by using Keys.BACKSPACE which simply tells Python to click backspace (to delete whatever is written in the date field). I put it in a for loop to click backspace 11 times to delete all the characters for the date in the field.
Define the function that will click the search button.
def search():
search = browser.find_element_by_xpath("//button[@class='btn-primary btn-action gcw-submit']")
search.click()
time.sleep(15)
print('Results ready!')
Here it is better to use a long delay of 15 seconds or so to make sure all results are loaded before we proceed to the next steps.
The resulting webpage is as follows (with the fields I am interested in marked):
We will use this sequence to compile our data:
Below is the code:
df = pd.DataFrame()
def compile_data():
global df
global dep_times_list
global arr_times_list
global airlines_list
global price_list
global durations_list
global stops_list
global layovers_list
#departure times
dep_times = browser.find_elements_by_xpath("//span[@data-test-id='departure-time']")
dep_times_list = [value.text for value in dep_times]
#arrival times
arr_times = browser.find_elements_by_xpath("//span[@data-test-id='arrival-time']")
arr_times_list = [value.text for value in arr_times]
#airline name
airlines = browser.find_elements_by_xpath("//span[@data-test-id='airline-name']")
airlines_list = [value.text for value in airlines]
#prices
prices = browser.find_elements_by_xpath("//span[@data-test-id='listing-price-dollars']")
price_list = [value.text.split('/div>)[1] for value in prices]
#durations
durations = browser.find_elements_by_xpath("//span[@data-test-id='duration']")
durations_list = [value.text for value in durations]
#stops
stops = browser.find_elements_by_xpath("//span[@class='number-stops']")
stops_list = [value.text for value in stops]
#layovers
layovers = browser.find_elements_by_xpath("//span[@data-test-id='layover-airport-stops']")
layovers_list = [value.text for value in layovers]
now = datetime.datetime.now()
current_date = (str(now.year) + '-' + str(now.month) + '-' + str(now.day))
current_time = (str(now.hour) + ':' + str(now.minute))
current_price = 'price' + '(' + current_date + '---' + current_time + ')'
for i in range(len(dep_times_list)):
try:
df.loc[i, 'departure_time'] = dep_times_list[i]
except Exception as e:
pass
try:
df.loc[i, 'arrival_time'] = arr_times_list[i]
except Exception as e:
pass
try:
df.loc[i, 'airline'] = airlines_list[i]
except Exception as e:
pass
try:
df.loc[i, 'duration'] = durations_list[i]
except Exception as e:
pass
try:
df.loc[i, 'stops'] = stops_list[i]
except Exception as e:
pass
try:
df.loc[i, 'layovers'] = layovers_list[i]
except Exception as e:
pass
try:
df.loc[i, str(current_price)] = price_list[i]
except Exception as e:
pass
print('Excel Sheet Created!')
One thing worth mentioning is that for the price column I am renaming it every time the code runs using this snippet of code:
now = datetime.datetime.now()
current_date = (str(now.year) + '-' + str(now.month) + '-' + str(now.day))
current_time = (str(now.hour) + ':' + str(now.minute))
current_price = 'price' + '(' + current_date + '---' + current_time + ')'
This is because I want to have the header of the column stating the current time at that particular run in order to be able to see later how the price changes over time in case I want to do that.
In this part I will set up three functions:
First, I also need to store my email login credentials in two variables as follows:
#email credentials
username = 'myemail@hotmail.com'
password = 'XXXXXXXXXXX'
def connect_mail(username, password):
global server
server = smtplib.SMTP('smtp.outlook.com', 587)
server.ehlo()
server.starttls()
server.login(username, password)
#Create message template for email
def create_msg():
global msg
msg = '\nCurrent Cheapest flight:\n\nDeparture time: {}\nArrival time: {}\nAirline: {}\nFlight duration: {}\nNo. of stops: {}\nPrice: {}\n'.format(cheapest_dep_time,
cheapest_arrival_time,
cheapest_airline,
cheapest_duration,
cheapest_stops,
cheapest_price)
Here I create the message using placeholders ‘{}’ for the values to be passed in during each run.
Also, the variables used here like cheapest_arrival_time, cheapest_airline, etc. will be defined later when we start running all our functions to hold the values for each particular run.
def send_email(msg):
global message
message = MIMEMultipart()
message['Subject'] = 'Current Best flight'
message['From'] = 'myemail@hotmail.com'
message['to'] = 'myotheremail@hotmail.com'
server.sendmail('myemail@hotmail.com', 'myotheremail@hotmail.com', msg)
Now we will finally run our functions. We will use the below logic.
The data scraping part:
The email part:
Finally, we save our DataFrame to an Excel sheet and sleep for 3600 seconds (1 hour).
This loop will run 8 times in one-hour intervals, thus it will run for 8 hours. You can tweak the timing to your preference.
for i in range(8):
link = 'https://www.expedia.com/'
browser.get(link)
time.sleep(5)
#choose flights only
flights_only = browser.find_element_by_xpath("//button[@id='tab-flight-tab-hp']")
flights_only.click()
ticket_chooser(return_ticket)
dep_country_chooser('Cairo')
arrival_country_chooser('New york')
dep_date_chooser('04', '01', '2019')
return_date_chooser('05', '02', '2019')
search()
compile_data()
#save values for email
current_values = df.iloc[0]
cheapest_dep_time = current_values[0]
cheapest_arrival_time = current_values[1]
cheapest_airline = current_values[2]
cheapest_duration = current_values[3]
cheapest_stops = current_values[4]
cheapest_price = current_values[-1]
print('run {} completed!'.format(i))
create_msg()
connect_mail(username,password)
send_email(msg)
print('Email sent!')
df.to_excel('flights.xlsx')
time.sleep(3600)
Now I will be getting this email every hour for the next 8 hours:
I also have this neat Excel sheet with all the flights and it will keep updating each hour with a new column for the current price:
Now you can take this further by applying so many other ideas such as:
If you have other ideas don’t hesitate to share!
That’s it! I hope you found it useful.
#python
1624402800
The Beautiful Soup module is used for web scraping in Python. Learn how to use the Beautiful Soup and Requests modules in this tutorial. After watching, you will be able to start scraping the web on your own.
📺 The video in this post was made by freeCodeCamp.org
The origin of the article: https://www.youtube.com/watch?v=87Gx3U0BDlo&list=PLWKjhJtqVAbnqBxcdjVGgT3uVR10bzTEB&index=12
🔥 If you’re a beginner. I believe the article below will be useful to you ☞ What You Should Know Before Investing in Cryptocurrency - For Beginner
⭐ ⭐ ⭐The project is of interest to the community. Join to Get free ‘GEEK coin’ (GEEKCASH coin)!
☞ **-----CLICK HERE-----**⭐ ⭐ ⭐
Thanks for visiting and watching! Please don’t forget to leave a like, comment and share!
#web scraping #python #beautiful soup #beautiful soup tutorial #web scraping in python #beautiful soup tutorial - web scraping in python
1667425440
Perl script converts PDF files to Gerber format
Pdf2Gerb generates Gerber 274X photoplotting and Excellon drill files from PDFs of a PCB. Up to three PDFs are used: the top copper layer, the bottom copper layer (for 2-sided PCBs), and an optional silk screen layer. The PDFs can be created directly from any PDF drawing software, or a PDF print driver can be used to capture the Print output if the drawing software does not directly support output to PDF.
The general workflow is as follows:
Please note that Pdf2Gerb does NOT perform DRC (Design Rule Checks), as these will vary according to individual PCB manufacturer conventions and capabilities. Also note that Pdf2Gerb is not perfect, so the output files must always be checked before submitting them. As of version 1.6, Pdf2Gerb supports most PCB elements, such as round and square pads, round holes, traces, SMD pads, ground planes, no-fill areas, and panelization. However, because it interprets the graphical output of a Print function, there are limitations in what it can recognize (or there may be bugs).
See docs/Pdf2Gerb.pdf for install/setup, config, usage, and other info.
#Pdf2Gerb config settings:
#Put this file in same folder/directory as pdf2gerb.pl itself (global settings),
#or copy to another folder/directory with PDFs if you want PCB-specific settings.
#There is only one user of this file, so we don't need a custom package or namespace.
#NOTE: all constants defined in here will be added to main namespace.
#package pdf2gerb_cfg;
use strict; #trap undef vars (easier debug)
use warnings; #other useful info (easier debug)
##############################################################################################
#configurable settings:
#change values here instead of in main pfg2gerb.pl file
use constant WANT_COLORS => ($^O !~ m/Win/); #ANSI colors no worky on Windows? this must be set < first DebugPrint() call
#just a little warning; set realistic expectations:
#DebugPrint("${\(CYAN)}Pdf2Gerb.pl ${\(VERSION)}, $^O O/S\n${\(YELLOW)}${\(BOLD)}${\(ITALIC)}This is EXPERIMENTAL software. \nGerber files MAY CONTAIN ERRORS. Please CHECK them before fabrication!${\(RESET)}", 0); #if WANT_DEBUG
use constant METRIC => FALSE; #set to TRUE for metric units (only affect final numbers in output files, not internal arithmetic)
use constant APERTURE_LIMIT => 0; #34; #max #apertures to use; generate warnings if too many apertures are used (0 to not check)
use constant DRILL_FMT => '2.4'; #'2.3'; #'2.4' is the default for PCB fab; change to '2.3' for CNC
use constant WANT_DEBUG => 0; #10; #level of debug wanted; higher == more, lower == less, 0 == none
use constant GERBER_DEBUG => 0; #level of debug to include in Gerber file; DON'T USE FOR FABRICATION
use constant WANT_STREAMS => FALSE; #TRUE; #save decompressed streams to files (for debug)
use constant WANT_ALLINPUT => FALSE; #TRUE; #save entire input stream (for debug ONLY)
#DebugPrint(sprintf("${\(CYAN)}DEBUG: stdout %d, gerber %d, want streams? %d, all input? %d, O/S: $^O, Perl: $]${\(RESET)}\n", WANT_DEBUG, GERBER_DEBUG, WANT_STREAMS, WANT_ALLINPUT), 1);
#DebugPrint(sprintf("max int = %d, min int = %d\n", MAXINT, MININT), 1);
#define standard trace and pad sizes to reduce scaling or PDF rendering errors:
#This avoids weird aperture settings and replaces them with more standardized values.
#(I'm not sure how photoplotters handle strange sizes).
#Fewer choices here gives more accurate mapping in the final Gerber files.
#units are in inches
use constant TOOL_SIZES => #add more as desired
(
#round or square pads (> 0) and drills (< 0):
.010, -.001, #tiny pads for SMD; dummy drill size (too small for practical use, but needed so StandardTool will use this entry)
.031, -.014, #used for vias
.041, -.020, #smallest non-filled plated hole
.051, -.025,
.056, -.029, #useful for IC pins
.070, -.033,
.075, -.040, #heavier leads
# .090, -.043, #NOTE: 600 dpi is not high enough resolution to reliably distinguish between .043" and .046", so choose 1 of the 2 here
.100, -.046,
.115, -.052,
.130, -.061,
.140, -.067,
.150, -.079,
.175, -.088,
.190, -.093,
.200, -.100,
.220, -.110,
.160, -.125, #useful for mounting holes
#some additional pad sizes without holes (repeat a previous hole size if you just want the pad size):
.090, -.040, #want a .090 pad option, but use dummy hole size
.065, -.040, #.065 x .065 rect pad
.035, -.040, #.035 x .065 rect pad
#traces:
.001, #too thin for real traces; use only for board outlines
.006, #minimum real trace width; mainly used for text
.008, #mainly used for mid-sized text, not traces
.010, #minimum recommended trace width for low-current signals
.012,
.015, #moderate low-voltage current
.020, #heavier trace for power, ground (even if a lighter one is adequate)
.025,
.030, #heavy-current traces; be careful with these ones!
.040,
.050,
.060,
.080,
.100,
.120,
);
#Areas larger than the values below will be filled with parallel lines:
#This cuts down on the number of aperture sizes used.
#Set to 0 to always use an aperture or drill, regardless of size.
use constant { MAX_APERTURE => max((TOOL_SIZES)) + .004, MAX_DRILL => -min((TOOL_SIZES)) + .004 }; #max aperture and drill sizes (plus a little tolerance)
#DebugPrint(sprintf("using %d standard tool sizes: %s, max aper %.3f, max drill %.3f\n", scalar((TOOL_SIZES)), join(", ", (TOOL_SIZES)), MAX_APERTURE, MAX_DRILL), 1);
#NOTE: Compare the PDF to the original CAD file to check the accuracy of the PDF rendering and parsing!
#for example, the CAD software I used generated the following circles for holes:
#CAD hole size: parsed PDF diameter: error:
# .014 .016 +.002
# .020 .02267 +.00267
# .025 .026 +.001
# .029 .03167 +.00267
# .033 .036 +.003
# .040 .04267 +.00267
#This was usually ~ .002" - .003" too big compared to the hole as displayed in the CAD software.
#To compensate for PDF rendering errors (either during CAD Print function or PDF parsing logic), adjust the values below as needed.
#units are pixels; for example, a value of 2.4 at 600 dpi = .0004 inch, 2 at 600 dpi = .0033"
use constant
{
HOLE_ADJUST => -0.004 * 600, #-2.6, #holes seemed to be slightly oversized (by .002" - .004"), so shrink them a little
RNDPAD_ADJUST => -0.003 * 600, #-2, #-2.4, #round pads seemed to be slightly oversized, so shrink them a little
SQRPAD_ADJUST => +0.001 * 600, #+.5, #square pads are sometimes too small by .00067, so bump them up a little
RECTPAD_ADJUST => 0, #(pixels) rectangular pads seem to be okay? (not tested much)
TRACE_ADJUST => 0, #(pixels) traces seemed to be okay?
REDUCE_TOLERANCE => .001, #(inches) allow this much variation when reducing circles and rects
};
#Also, my CAD's Print function or the PDF print driver I used was a little off for circles, so define some additional adjustment values here:
#Values are added to X/Y coordinates; units are pixels; for example, a value of 1 at 600 dpi would be ~= .002 inch
use constant
{
CIRCLE_ADJUST_MINX => 0,
CIRCLE_ADJUST_MINY => -0.001 * 600, #-1, #circles were a little too high, so nudge them a little lower
CIRCLE_ADJUST_MAXX => +0.001 * 600, #+1, #circles were a little too far to the left, so nudge them a little to the right
CIRCLE_ADJUST_MAXY => 0,
SUBST_CIRCLE_CLIPRECT => FALSE, #generate circle and substitute for clip rects (to compensate for the way some CAD software draws circles)
WANT_CLIPRECT => TRUE, #FALSE, #AI doesn't need clip rect at all? should be on normally?
RECT_COMPLETION => FALSE, #TRUE, #fill in 4th side of rect when 3 sides found
};
#allow .012 clearance around pads for solder mask:
#This value effectively adjusts pad sizes in the TOOL_SIZES list above (only for solder mask layers).
use constant SOLDER_MARGIN => +.012; #units are inches
#line join/cap styles:
use constant
{
CAP_NONE => 0, #butt (none); line is exact length
CAP_ROUND => 1, #round cap/join; line overhangs by a semi-circle at either end
CAP_SQUARE => 2, #square cap/join; line overhangs by a half square on either end
CAP_OVERRIDE => FALSE, #cap style overrides drawing logic
};
#number of elements in each shape type:
use constant
{
RECT_SHAPELEN => 6, #x0, y0, x1, y1, count, "rect" (start, end corners)
LINE_SHAPELEN => 6, #x0, y0, x1, y1, count, "line" (line seg)
CURVE_SHAPELEN => 10, #xstart, ystart, x0, y0, x1, y1, xend, yend, count, "curve" (bezier 2 points)
CIRCLE_SHAPELEN => 5, #x, y, 5, count, "circle" (center + radius)
};
#const my %SHAPELEN =
#Readonly my %SHAPELEN =>
our %SHAPELEN =
(
rect => RECT_SHAPELEN,
line => LINE_SHAPELEN,
curve => CURVE_SHAPELEN,
circle => CIRCLE_SHAPELEN,
);
#panelization:
#This will repeat the entire body the number of times indicated along the X or Y axes (files grow accordingly).
#Display elements that overhang PCB boundary can be squashed or left as-is (typically text or other silk screen markings).
#Set "overhangs" TRUE to allow overhangs, FALSE to truncate them.
#xpad and ypad allow margins to be added around outer edge of panelized PCB.
use constant PANELIZE => {'x' => 1, 'y' => 1, 'xpad' => 0, 'ypad' => 0, 'overhangs' => TRUE}; #number of times to repeat in X and Y directions
# Set this to 1 if you need TurboCAD support.
#$turboCAD = FALSE; #is this still needed as an option?
#CIRCAD pad generation uses an appropriate aperture, then moves it (stroke) "a little" - we use this to find pads and distinguish them from PCB holes.
use constant PAD_STROKE => 0.3; #0.0005 * 600; #units are pixels
#convert very short traces to pads or holes:
use constant TRACE_MINLEN => .001; #units are inches
#use constant ALWAYS_XY => TRUE; #FALSE; #force XY even if X or Y doesn't change; NOTE: needs to be TRUE for all pads to show in FlatCAM and ViewPlot
use constant REMOVE_POLARITY => FALSE; #TRUE; #set to remove subtractive (negative) polarity; NOTE: must be FALSE for ground planes
#PDF uses "points", each point = 1/72 inch
#combined with a PDF scale factor of .12, this gives 600 dpi resolution (1/72 * .12 = 600 dpi)
use constant INCHES_PER_POINT => 1/72; #0.0138888889; #multiply point-size by this to get inches
# The precision used when computing a bezier curve. Higher numbers are more precise but slower (and generate larger files).
#$bezierPrecision = 100;
use constant BEZIER_PRECISION => 36; #100; #use const; reduced for faster rendering (mainly used for silk screen and thermal pads)
# Ground planes and silk screen or larger copper rectangles or circles are filled line-by-line using this resolution.
use constant FILL_WIDTH => .01; #fill at most 0.01 inch at a time
# The max number of characters to read into memory
use constant MAX_BYTES => 10 * M; #bumped up to 10 MB, use const
use constant DUP_DRILL1 => TRUE; #FALSE; #kludge: ViewPlot doesn't load drill files that are too small so duplicate first tool
my $runtime = time(); #Time::HiRes::gettimeofday(); #measure my execution time
print STDERR "Loaded config settings from '${\(__FILE__)}'.\n";
1; #last value must be truthful to indicate successful load
#############################################################################################
#junk/experiment:
#use Package::Constants;
#use Exporter qw(import); #https://perldoc.perl.org/Exporter.html
#my $caller = "pdf2gerb::";
#sub cfg
#{
# my $proto = shift;
# my $class = ref($proto) || $proto;
# my $settings =
# {
# $WANT_DEBUG => 990, #10; #level of debug wanted; higher == more, lower == less, 0 == none
# };
# bless($settings, $class);
# return $settings;
#}
#use constant HELLO => "hi there2"; #"main::HELLO" => "hi there";
#use constant GOODBYE => 14; #"main::GOODBYE" => 12;
#print STDERR "read cfg file\n";
#our @EXPORT_OK = Package::Constants->list(__PACKAGE__); #https://www.perlmonks.org/?node_id=1072691; NOTE: "_OK" skips short/common names
#print STDERR scalar(@EXPORT_OK) . " consts exported:\n";
#foreach(@EXPORT_OK) { print STDERR "$_\n"; }
#my $val = main::thing("xyz");
#print STDERR "caller gave me $val\n";
#foreach my $arg (@ARGV) { print STDERR "arg $arg\n"; }
Author: swannman
Source Code: https://github.com/swannman/pdf2gerb
License: GPL-3.0 license
1669099573
In this article, we will know what is face recognition and how is different from face detection. We will go briefly over the theory of face recognition and then jump on to the coding section. At the end of this article, you will be able to make a face recognition program for recognizing faces in images as well as on a live webcam feed.
In computer vision, one essential problem we are trying to figure out is to automatically detect objects in an image without human intervention. Face detection can be thought of as such a problem where we detect human faces in an image. There may be slight differences in the faces of humans but overall, it is safe to say that there are certain features that are associated with all the human faces. There are various face detection algorithms but Viola-Jones Algorithm is one of the oldest methods that is also used today and we will use the same later in the article. You can go through the Viola-Jones Algorithm after completing this article as I’ll link it at the end of this article.
Face detection is usually the first step towards many face-related technologies, such as face recognition or verification. However, face detection can have very useful applications. The most successful application of face detection would probably be photo taking. When you take a photo of your friends, the face detection algorithm built into your digital camera detects where the faces are and adjusts the focus accordingly.
For a tutorial on Real-Time Face detection
Now that we are successful in making such algorithms that can detect faces, can we also recognise whose faces are they?
Face recognition is a method of identifying or verifying the identity of an individual using their face. There are various algorithms that can do face recognition but their accuracy might vary. Here I am going to describe how we do face recognition using deep learning.
So now let us understand how we recognise faces using deep learning. We make use of face embedding in which each face is converted into a vector and this technique is called deep metric learning. Let me further divide this process into three simple steps for easy understanding:
Face Detection: The very first task we perform is detecting faces in the image or video stream. Now that we know the exact location/coordinates of face, we extract this face for further processing ahead.
Feature Extraction: Now that we have cropped the face out of the image, we extract features from it. Here we are going to use face embeddings to extract the features out of the face. A neural network takes an image of the person’s face as input and outputs a vector which represents the most important features of a face. In machine learning, this vector is called embedding and thus we call this vector as face embedding. Now how does this help in recognizing faces of different persons?
While training the neural network, the network learns to output similar vectors for faces that look similar. For example, if I have multiple images of faces within different timespan, of course, some of the features of my face might change but not up to much extent. So in this case the vectors associated with the faces are similar or in short, they are very close in the vector space. Take a look at the below diagram for a rough idea:
Now after training the network, the network learns to output vectors that are closer to each other(similar) for faces of the same person(looking similar). The above vectors now transform into:
We are not going to train such a network here as it takes a significant amount of data and computation power to train such networks. We will use a pre-trained network trained by Davis King on a dataset of ~3 million images. The network outputs a vector of 128 numbers which represent the most important features of a face.
Now that we know how this network works, let us see how we use this network on our own data. We pass all the images in our data to this pre-trained network to get the respective embeddings and save these embeddings in a file for the next step.
Comparing faces: Now that we have face embeddings for every face in our data saved in a file, the next step is to recognise a new t image that is not in our data. So the first step is to compute the face embedding for the image using the same network we used above and then compare this embedding with the rest of the embeddings we have. We recognise the face if the generated embedding is closer or similar to any other embedding as shown below:
So we passed two images, one of the images is of Vladimir Putin and other of George W. Bush. In our example above, we did not save the embeddings for Putin but we saved the embeddings of Bush. Thus when we compared the two new embeddings with the existing ones, the vector for Bush is closer to the other face embeddings of Bush whereas the face embeddings of Putin are not closer to any other embedding and thus the program cannot recognise him.
In the field of Artificial Intelligence, Computer Vision is one of the most interesting and Challenging tasks. Computer Vision acts like a bridge between Computer Software and visualizations around us. It allows computer software to understand and learn about the visualizations in the surroundings. For Example: Based on the color, shape and size determining the fruit. This task can be very easy for the human brain however in the Computer Vision pipeline, first we gather the data, then we perform the data processing activities and then we train and teach the model to understand how to distinguish between the fruits based on size, shape and color of fruit.
Currently, various packages are present to perform machine learning, deep learning and computer vision tasks. By far, computer vision is the best module for such complex activities. OpenCV is an open-source library. It is supported by various programming languages such as R, Python. It runs on most of the platforms such as Windows, Linux and MacOS.
To know more about how face recognition works on opencv, check out the free course on face recognition in opencv.
Advantages of OpenCV:
Installation:
Here we will be focusing on installing OpenCV for python only. We can install OpenCV using pip or conda(for anaconda environment).
Using pip, the installation process of openCV can be done by using the following command in the command prompt.
pip install opencv-python
If you are using anaconda environment, either you can execute the above code in anaconda prompt or you can execute the following code in anaconda prompt.
conda install -c conda-forge opencv
In this section, we shall implement face recognition using OpenCV and Python. First, let us see the libraries we will need and how to install them:
OpenCV is an image and video processing library and is used for image and video analysis, like facial detection, license plate reading, photo editing, advanced robotic vision, optical character recognition, and a whole lot more.
The dlib library, maintained by Davis King, contains our implementation of “deep metric learning” which is used to construct our face embeddings used for the actual recognition process.
The face_recognition library, created by Adam Geitgey, wraps around dlib’s facial recognition functionality, and this library is super easy to work with and we will be using this in our code. Remember to install dlib library first before you install face_recognition.
To install OpenCV, type in command prompt
pip install opencv-python |
I have tried various ways to install dlib on Windows but the easiest of all of them is via Anaconda. First, install Anaconda (here is a guide to install it) and then use this command in your command prompt:
conda install -c conda-forge dlib |
Next to install face_recognition, type in command prompt
pip install face_recognition |
Now that we have all the dependencies installed, let us start coding. We will have to create three files, one will take our dataset and extract face embedding for each face using dlib. Next, we will save these embedding in a file.
In the next file we will compare the faces with the existing the recognise faces in images and next we will do the same but recognise faces in live webcam feed
First, you need to get a dataset or even create one of you own. Just make sure to arrange all images in folders with each folder containing images of just one person.
Next, save the dataset in a folder the same as you are going to make the file. Now here is the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Now that we have stored the embedding in a file named “face_enc”, we can use them to recognise faces in images or live video stream.
Here is the script to recognise faces on a live webcam feed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
https://www.youtube.com/watch?v=fLnGdkZxRkg
Although in the example above we have used haar cascade to detect faces, you can also use face_recognition.face_locations to detect a face as we did in the previous script
The script for detecting and recognising faces in images is almost similar to what you saw above. Try it yourself and if you can’t take a look at the code below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
Output:
InputOutput
This brings us to the end of this article where we learned about face recognition.
You can also upskill with Great Learning’s PGP Artificial Intelligence and Machine Learning Course. The course offers mentorship from industry leaders, and you will also have the opportunity to work on real-time industry-relevant projects.
Original article source at: https://www.mygreatlearning.com
1599113160
The internet is an absolutely massive source of data. Unfortunately, the vast majority if it isn’t available in conveniently organized CSV files for download and analysis. If you want to capture data from many websites, you’ll need to try web scraping.
Don’t worry if you’re still a total beginner — in this tutorial we’re going to cover how to do web scraping with Python from scratch, starting with some answers to frequently-asked questions about web scraping.
If you’re already familiar with the concept, feel free to scroll past these and jump right into the tutorial!
Some websites offer data sets that are downloadable in CSV format, or accessible via an Application Programming Interface (API). But many websites with useful data don’t offer these convenient options.
Consider, for example, the National Weather Service’s website. It contains up-to-date weather forecasts for every location in the US, but that weather data isn’t accessible as a CSV or via API. It has to be viewed on the NWS site.
If we wanted to analyze this data, or download it for use in some other app, we wouldn’t want to painstakingly copy-paste everything. Web scraping is a technique that lets us use programming to do the heavy lifting. We’ll write some code that looks at the NWS site, grabs just the data we want to work with, and outputs it in the format we need.
In this tutorial, we’ll show you how to perform web scraping using Python 3 and the Beautiful Soup library. We’ll be scraping weather forecasts from the National Weather Service, and then analyzing them using the Pandas library.
#data science tutorials #beautiful soup #beautifulsoup #intermediate #learn python #pandas #python #tutorial #tutorials #web scraping
1624595434
When scraping a website with Python, it’s common to use the
urllib
or theRequests
libraries to sendGET
requests to the server in order to receive its information.
However, you’ll eventually need to send some information to the website yourself before receiving the data you want, maybe because it’s necessary to perform a log-in or to interact somehow with the page.
To execute such interactions, Selenium is a frequently used tool. However, it also comes with some downsides as it’s a bit slow and can also be quite unstable sometimes. The alternative is to send a
POST
request containing the information the website needs using the request library.
In fact, when compared to Requests, Selenium becomes a very slow approach since it does the entire work of actually opening your browser to navigate through the websites you’ll collect data from. Of course, depending on the problem, you’ll eventually need to use it, but for some other situations, a
POST
request may be your best option, which makes it an important tool for your web scraping toolbox.
In this article, we’ll see a brief introduction to the
POST
method and how it can be implemented to improve your web scraping routines.
#python #web-scraping #requests #web-scraping-with-python #data-science #data-collection #python-tutorials #data-scraping