Royce  Reinger

Royce Reinger

1623222540

Let Us Get Started with Continuous Delivery in 5 Minutes using GitLab

Continuous Integration and Continuous Deployment are the foundations of modern DevOps. The continuous delivery model is one of the key aspects of any software development. Note the usage of terms here — Continuous Delivery is the practice of automating the delivery of changes to an environment. It may be dev or staging and then on code review, it might be moved to the production environment, It is called Continuous Deployment when the changes are released to the production environment without manual interventions. So it is always better for any organization to start up with a Continuous Delivery approach. Once there is a mastery of some form of CI/CD model and the culture and processes are set, the process can be moved to a continuous deployment approach.

In one of my previous posts, we had a small walkthrough of easily setting up a continuous integration model within 5 minutes. With a little understanding of the key concepts behind it, you can set up continuous integration for nearly any application’s codebase. If you are using GitLab for CI/CD, then all you need to understand are these key concepts.

  • Gitlab uses  runners/agents to run your jobs.
  • These runners can be S hared runners/ Group runners/ Specific runners
  • You can use Gitlab runners which comes with predefined free pipeline minutes or set up your own runner for which you need to install, configure, and manage runners based on your needs.
  • Runners come with different types of executors. You can choose an executor based on your use case like a shell, docker, docker-machine, Kubernetes. This basically defines a medium based on which the runner is going to run. If you choose a shell runner, then all dependencies for your projects can be installed and managed in the same server where the runner is installed. If a docker executor is chosen(you need to define a base image when setting up runner), then all jobs will run inside the container created from the base image. Refer to Gitlab documentation on the executors.

In short, runners run the jobs for you based on the configuration defined and they communicate with the Gitlab app on the status and progress of their jobs.

While in continuous delivery, this runner would be responsible for some

#deployment #git #automation #devops

What is GEEK

Buddha Community

Let Us Get Started with Continuous Delivery in 5 Minutes using GitLab

Everything You Need to Know About Instagram Bot with Python

How to build an Instagram bot using Python

Instagram is the fastest-growing social network, with 1 billion monthly users. It also has the highest engagement rate. To gain followers on Instagram, you’d have to upload engaging content, follow users, like posts, comment on user posts and a whole lot. This can be time-consuming and daunting. But there is hope, you can automate all of these tasks. In this course, we’re going to build an Instagram bot using Python to automate tasks on Instagram.

What you’ll learn:

  • Instagram Automation
  • Build a Bot with Python

Increase your Instagram followers with a simple Python bot

I got around 500 real followers in 4 days!

Growing an audience is an expensive and painful task. And if you’d like to build an audience that’s relevant to you, and shares common interests, that’s even more difficult. I always saw Instagram has a great way to promote my photos, but I never had more than 380 followers… Every once in a while, I decide to start posting my photos on Instagram again, and I manage to keep posting regularly for a while, but it never lasts more than a couple of months, and I don’t have many followers to keep me motivated and engaged.

The objective of this project is to build a bigger audience and as a plus, maybe drive some traffic to my website where I sell my photos!

A year ago, on my last Instagram run, I got one of those apps that lets you track who unfollowed you. I was curious because in a few occasions my number of followers dropped for no apparent reason. After some research, I realized how some users basically crawl for followers. They comment, like and follow people — looking for a follow back. Only to unfollow them again in the next days.

I can’t say this was a surprise to me, that there were bots in Instagram… It just made me want to build one myself!

And that is why we’re here, so let’s get to it! I came up with a simple bot in Python, while I was messing around with Selenium and trying to figure out some project to use it. Simply put, Selenium is like a browser you can interact with very easily in Python.

Ideally, increasing my Instagram audience will keep me motivated to post regularly. As an extra, I included my website in my profile bio, where people can buy some photos. I think it is a bit of a stretch, but who knows?! My sales are basically zero so far, so it should be easy to track that conversion!

Just what the world needed! Another Instagram bot…

After giving this project some thought, my objective was to increase my audience with relevant people. I want to get followers that actually want to follow me and see more of my work. It’s very easy to come across weird content in the most used hashtags, so I’ve planed this bot to lookup specific hashtags and interact with the photos there. This way, I can be very specific about what kind of interests I want my audience to have. For instance, I really like long exposures, so I can target people who use that hashtag and build an audience around this kind of content. Simple and efficient!

My gallery is a mix of different subjects and styles, from street photography to aerial photography, and some travel photos too. Since it’s my hometown, I also have lots of Lisbon images there. These will be the main topics I’ll use in the hashtags I want to target.

This is not a “get 1000 followers in 24 hours” kind of bot!

So what kind of numbers are we talking about?

I ran the bot a few times in a few different hashtags like “travelblogger”, “travelgram”, “lisbon”, “dronephotography”. In the course of three days I went from 380 to 800 followers. Lots of likes, comments and even some organic growth (people that followed me but were not followed by the bot).

To be clear, I’m not using this bot intensively, as Instagram will stop responding if you run it too fast. It needs to have some sleep commands in between the actions, because after some comments and follows in a short period of time, Instagram stops responding and the bot crashes.

You will be logged into your account, so I’m almost sure that Instagram can know you’re doing something weird if you speed up the process. And most importantly, after doing this for a dozen hashtags, it just gets harder to find new users in the same hashtags. You will need to give it a few days to refresh the user base there.

But I don’t want to follow so many people in the process…

The most efficient way to get followers in Instagram (apart from posting great photos!) is to follow people. And this bot worked really well for me because I don’t care if I follow 2000 people to get 400 followers.

The bot saves a list with all the users that were followed while it was running, so someday I may actually do something with this list. For instance, I can visit each user profile, evaluate how many followers or posts they have, and decide if I want to keep following them. Or I can get the first picture in their gallery and check its date to see if they are active users.

If we remove the follow action from the bot, I can assure you the growth rate will suffer, as people are less inclined to follow based on a single like or comment.

Why will you share your code?!

That’s the debate I had with myself. Even though I truly believe in giving back to the community (I still learn a lot from it too!), there are several paid platforms that do more or less the same as this project. Some are shady, some are used by celebrities. The possibility of starting a similar platform myself, is not off the table yet, so why make the code available?

With that in mind, I decided to add an extra level of difficulty to the process, so I was going to post the code below as an image. I wrote “was”, because meanwhile, I’ve realized the image I’m getting is low quality. Which in turn made me reconsider and post the gist. I’m that nice! The idea behind the image was that if you really wanted to use it, you would have to type the code yourself. And that was my way of limiting the use of this tool to people that actually go through the whole process to create it and maybe even improve it.

I learn a lot more when I type the code myself, instead of copy/pasting scripts. I hope you feel the same way!

The script isn’t as sophisticated as it could be, and I know there’s lots of room to improve it. But hey… it works! I have other projects I want to add to my portfolio, so my time to develop it further is rather limited. Nevertheless, I will try to update this article if I dig deeper.

This is the last subtitle!

You’ll need Python (I’m using Python 3.7), Selenium, a browser (in my case I’ll be using Chrome) and… obviously, an Instagram account! Quick overview regarding what the bot will do:

  • Open a browser and login with your credentials
  • For every hashtag in the hashtag list, it will open the page and click the first picture to open it
  • It will then like, follow, comment and move to the next picture, in a 200 iterations loop (number can be adjusted)
  • Saves a list with all the users you followed using the bot

If you reached this paragraph, thank you! You totally deserve to collect your reward! If you find this useful for your profile/brand in any way, do share your experience below :)

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep, strftime
from random import randint
import pandas as pd

chromedriver_path = 'C:/Users/User/Downloads/chromedriver_win32/chromedriver.exe' # Change this to your own chromedriver path!
webdriver = webdriver.Chrome(executable_path=chromedriver_path)
sleep(2)
webdriver.get('https://www.instagram.com/accounts/login/?source=auth_switcher')
sleep(3)

username = webdriver.find_element_by_name('username')
username.send_keys('your_username')
password = webdriver.find_element_by_name('password')
password.send_keys('your_password')

button_login = webdriver.find_element_by_css_selector('#react-root > section > main > div > article > div > div:nth-child(1) > div > form > div:nth-child(3) > button')
button_login.click()
sleep(3)

notnow = webdriver.find_element_by_css_selector('body > div:nth-child(13) > div > div > div > div.mt3GC > button.aOOlW.HoLwm')
notnow.click() #comment these last 2 lines out, if you don't get a pop up asking about notifications

In order to use chrome with Selenium, you need to install chromedriver. It’s a fairly simple process and I had no issues with it. Simply install and replace the path above. Once you do that, our variable webdriver will be our Chrome tab.

In cell number 3 you should replace the strings with your own username and the respective password. This is for the bot to type it in the fields displayed. You might have already noticed that when running cell number 2, Chrome opened a new tab. After the password, I’ll define the login button as an object, and in the following line, I click it.

Once you get in inspect mode find the bit of html code that corresponds to what you want to map. Right click it and hover over Copy. You will see that you have some options regarding how you want it to be copied. I used a mix of XPath and css selectors throughout the code (it’s visible in the find_element_ method). It took me a while to get all the references to run smoothly. At points, the css or the xpath directions would fail, but as I adjusted the sleep times, everything started running smoothly.

In this case, I selected “copy selector” and pasted it inside a find_element_ method (cell number 3). It will get you the first result it finds. If it was find_elements_, all elements would be retrieved and you could specify which to get.

Once you get that done, time for the loop. You can add more hashtags in the hashtag_list. If you run it for the first time, you still don’t have a file with the users you followed, so you can simply create prev_user_list as an empty list.

Once you run it once, it will save a csv file with a timestamp with the users it followed. That file will serve as the prev_user_list on your second run. Simple and easy to keep track of what the bot does.

Update with the latest timestamp on the following runs and you get yourself a series of csv backlogs for every run of the bot.

Instagram bot with Python

The code is really simple. If you have some basic notions of Python you can probably pick it up quickly. I’m no Python ninja and I was able to build it, so I guess that if you read this far, you are good to go!

hashtag_list = ['travelblog', 'travelblogger', 'traveler']

# prev_user_list = [] - if it's the first time you run it, use this line and comment the two below
prev_user_list = pd.read_csv('20181203-224633_users_followed_list.csv', delimiter=',').iloc[:,1:2] # useful to build a user log
prev_user_list = list(prev_user_list['0'])

new_followed = []
tag = -1
followed = 0
likes = 0
comments = 0

for hashtag in hashtag_list:
    tag += 1
    webdriver.get('https://www.instagram.com/explore/tags/'+ hashtag_list[tag] + '/')
    sleep(5)
    first_thumbnail = webdriver.find_element_by_xpath('//*[@id="react-root"]/section/main/article/div[1]/div/div/div[1]/div[1]/a/div')
    
    first_thumbnail.click()
    sleep(randint(1,2))    
    try:        
        for x in range(1,200):
            username = webdriver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div/article/header/div[2]/div[1]/div[1]/h2/a').text
            
            if username not in prev_user_list:
                # If we already follow, do not unfollow
                if webdriver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div/article/header/div[2]/div[1]/div[2]/button').text == 'Follow':
                    
                    webdriver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div/article/header/div[2]/div[1]/div[2]/button').click()
                    
                    new_followed.append(username)
                    followed += 1

                    # Liking the picture
                    button_like = webdriver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div/article/div[2]/section[1]/span[1]/button/span')
                    
                    button_like.click()
                    likes += 1
                    sleep(randint(18,25))

                    # Comments and tracker
                    comm_prob = randint(1,10)
                    print('{}_{}: {}'.format(hashtag, x,comm_prob))
                    if comm_prob > 7:
                        comments += 1
                        webdriver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div/article/div[2]/section[1]/span[2]/button/span').click()
                        comment_box = webdriver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div/article/div[2]/section[3]/div/form/textarea')

                        if (comm_prob < 7):
                            comment_box.send_keys('Really cool!')
                            sleep(1)
                        elif (comm_prob > 6) and (comm_prob < 9):
                            comment_box.send_keys('Nice work :)')
                            sleep(1)
                        elif comm_prob == 9:
                            comment_box.send_keys('Nice gallery!!')
                            sleep(1)
                        elif comm_prob == 10:
                            comment_box.send_keys('So cool! :)')
                            sleep(1)
                        # Enter to post comment
                        comment_box.send_keys(Keys.ENTER)
                        sleep(randint(22,28))

                # Next picture
                webdriver.find_element_by_link_text('Next').click()
                sleep(randint(25,29))
            else:
                webdriver.find_element_by_link_text('Next').click()
                sleep(randint(20,26))
    # some hashtag stops refreshing photos (it may happen sometimes), it continues to the next
    except:
        continue

for n in range(0,len(new_followed)):
    prev_user_list.append(new_followed[n])
    
updated_user_df = pd.DataFrame(prev_user_list)
updated_user_df.to_csv('{}_users_followed_list.csv'.format(strftime("%Y%m%d-%H%M%S")))
print('Liked {} photos.'.format(likes))
print('Commented {} photos.'.format(comments))
print('Followed {} new people.'.format(followed))

Instagram bot with Python

The print statement inside the loop is the way I found to be able to have a tracker that lets me know at what iteration the bot is all the time. It will print the hashtag it’s in, the number of the iteration, and the random number generated for the comment action. I decided not to post comments in every page, so I added three different comments and a random number between 1 and 10 that would define if there was any comment at all, or one of the three. The loop ends, we append the new_followed users to the previous users “database” and saves the new file with the timestamp. You should also get a small report.

Instagram bot with Python

And that’s it!

After a few hours without checking the phone, these were the numbers I was getting. I definitely did not expect it to do so well! In about 4 days since I’ve started testing it, I had around 500 new followers, which means I have doubled my audience in a matter of days. I’m curious to see how many of these new followers I will lose in the next days, to see if the growth can be sustainable. I also had a lot more “likes” in my latest photos, but I guess that’s even more expected than the follow backs.

Instagram bot with Python

It would be nice to get this bot running in a server, but I have other projects I want to explore, and configuring a server is not one of them! Feel free to leave a comment below, and I’ll do my best to answer your questions.

I’m actually curious to see how long will I keep posting regularly! If you feel like this article was helpful for you, consider thanking me by buying one of my photos.

Instagram bot with Python



How to Make an Instagram Bot With Python and InstaPy

Instagram bot with Python

What do SocialCaptain, Kicksta, Instavast, and many other companies have in common? They all help you reach a greater audience, gain more followers, and get more likes on Instagram while you hardly lift a finger. They do it all through automation, and people pay them a good deal of money for it. But you can do the same thing—for free—using InstaPy!

In this tutorial, you’ll learn how to build a bot with Python and InstaPy, which automates your Instagram activities so that you gain more followers and likes with minimal manual input. Along the way, you’ll learn about browser automation with Selenium and the Page Object Pattern, which together serve as the basis for InstaPy.

In this tutorial, you’ll learn:

  • How Instagram bots work
  • How to automate a browser with Selenium
  • How to use the Page Object Pattern for better readability and testability
  • How to build an Instagram bot with InstaPy

You’ll begin by learning how Instagram bots work before you build one.

Table of Contents

  • How Instagram Bots Work
  • How to Automate a Browser
  • How to Use the Page Object Pattern
  • How to Build an Instagram Bot With InstaPy
    • Essential Features
    • Additional Features in InstaPy
  • Conclusion

Important: Make sure you check Instagram’s Terms of Use before implementing any kind of automation or scraping techniques.

How Instagram Bots Work

How can an automation script gain you more followers and likes? Before answering this question, think about how an actual person gains more followers and likes.

They do it by being consistently active on the platform. They post often, follow other people, and like and leave comments on other people’s posts. Bots work exactly the same way: They follow, like, and comment on a consistent basis according to the criteria you set.

The better the criteria you set, the better your results will be. You want to make sure you’re targeting the right groups because the people your bot interacts with on Instagram will be more likely to interact with your content.

For example, if you’re selling women’s clothing on Instagram, then you can instruct your bot to like, comment on, and follow mostly women or profiles whose posts include hashtags such as #beauty, #fashion, or #clothes. This makes it more likely that your target audience will notice your profile, follow you back, and start interacting with your posts.

How does it work on the technical side, though? You can’t use the Instagram Developer API since it is fairly limited for this purpose. Enter browser automation. It works in the following way:

  1. You serve it your credentials.
  2. You set the criteria for who to follow, what comments to leave, and which type of posts to like.
  3. Your bot opens a browser, types in https://instagram.com on the address bar, logs in with your credentials, and starts doing the things you instructed it to do.

Next, you’ll build the initial version of your Instagram bot, which will automatically log in to your profile. Note that you won’t use InstaPy just yet.

How to Automate a Browser

For this version of your Instagram bot, you’ll be using Selenium, which is the tool that InstaPy uses under the hood.

First, install Selenium. During installation, make sure you also install the Firefox WebDriver since the latest version of InstaPy dropped support for Chrome. This also means that you need the Firefox browser installed on your computer.

Now, create a Python file and write the following code in it:

from time import sleep

from selenium import webdriver


browser = webdriver.Firefox()


browser.get('https://www.instagram.com/')


sleep(5)


browser.close()

Run the code and you’ll see that a Firefox browser opens and directs you to the Instagram login page. Here’s a line-by-line breakdown of the code:

  • Lines 1 and 2 import sleep and webdriver.
  • Line 4 initializes the Firefox driver and sets it to browser.
  • Line 6 types https://www.instagram.com/ on the address bar and hits Enter.
  • Line 8 waits for five seconds so you can see the result. Otherwise, it would close the browser instantly.
  • Line 10 closes the browser.

This is the Selenium version of Hello, World. Now you’re ready to add the code that logs in to your Instagram profile. But first, think about how you would log in to your profile manually. You would do the following:

  1. Go to https://www.instagram.com/.
  2. Click the login link.
  3. Enter your credentials.
  4. Hit the login button.

The first step is already done by the code above. Now change it so that it clicks on the login link on the Instagram home page:

from time import sleep

from selenium import webdriver


browser = webdriver.Firefox()

browser.implicitly_wait(5)


browser.get('https://www.instagram.com/')


login_link = browser.find_element_by_xpath("//a[text()='Log in']")

login_link.click()


sleep(5)


browser.close()

Note the highlighted lines:

  • Line 5 sets five seconds of waiting time. If Selenium can’t find an element, then it waits for five seconds to allow everything to load and tries again.
  • Line 9 finds the element <a> whose text is equal to Log in. It does this using XPath, but there are a few other methods you could use.
  • Line 10 clicks on the found element <a> for the login link.

Run the script and you’ll see your script in action. It will open the browser, go to Instagram, and click on the login link to go to the login page.

On the login page, there are three important elements:

  1. The username input
  2. The password input
  3. The login button

Next, change the script so that it finds those elements, enters your credentials, and clicks on the login button:

from time import sleep

from selenium import webdriver


browser = webdriver.Firefox()

browser.implicitly_wait(5)


browser.get('https://www.instagram.com/')


login_link = browser.find_element_by_xpath("//a[text()='Log in']")

login_link.click()


sleep(2)


username_input = browser.find_element_by_css_selector("input[name='username']")

password_input = browser.find_element_by_css_selector("input[name='password']")


username_input.send_keys("<your username>")

password_input.send_keys("<your password>")


login_button = browser.find_element_by_xpath("//button[@type='submit']")

login_button.click()


sleep(5)


browser.close()

Here’s a breakdown of the changes:

  1. Line 12 sleeps for two seconds to allow the page to load.
  2. Lines 14 and 15 find username and password inputs by CSS. You could use any other method that you prefer.
  3. Lines 17 and 18 type your username and password in their respective inputs. Don’t forget to fill in <your username> and <your password>!
  4. Line 20 finds the login button by XPath.
  5. Line 21 clicks on the login button.

Run the script and you’ll be automatically logged in to to your Instagram profile.

You’re off to a good start with your Instagram bot. If you were to continue writing this script, then the rest would look very similar. You would find the posts that you like by scrolling down your feed, find the like button by CSS, click on it, find the comments section, leave a comment, and continue.

The good news is that all of those steps can be handled by InstaPy. But before you jump into using Instapy, there is one other thing that you should know about to better understand how InstaPy works: the Page Object Pattern.

How to Use the Page Object Pattern

Now that you’ve written the login code, how would you write a test for it? It would look something like the following:

def test_login_page(browser):
    browser.get('https://www.instagram.com/accounts/login/')
    username_input = browser.find_element_by_css_selector("input[name='username']")
    password_input = browser.find_element_by_css_selector("input[name='password']")
    username_input.send_keys("<your username>")
    password_input.send_keys("<your password>")
    login_button = browser.find_element_by_xpath("//button[@type='submit']")
    login_button.click()

    errors = browser.find_elements_by_css_selector('#error_message')
    assert len(errors) == 0

Can you see what’s wrong with this code? It doesn’t follow the DRY principle. That is, the code is duplicated in both the application and the test code.

Duplicating code is especially bad in this context because Selenium code is dependent on UI elements, and UI elements tend to change. When they do change, you want to update your code in one place. That’s where the Page Object Pattern comes in.

With this pattern, you create page object classes for the most important pages or fragments that provide interfaces that are straightforward to program to and that hide the underlying widgetry in the window. With this in mind, you can rewrite the code above and create a HomePage class and a LoginPage class:

from time import sleep

class LoginPage:
    def __init__(self, browser):
        self.browser = browser

    def login(self, username, password):
        username_input = self.browser.find_element_by_css_selector("input[name='username']")
        password_input = self.browser.find_element_by_css_selector("input[name='password']")
        username_input.send_keys(username)
        password_input.send_keys(password)
        login_button = browser.find_element_by_xpath("//button[@type='submit']")
        login_button.click()
        sleep(5)

class HomePage:
    def __init__(self, browser):
        self.browser = browser
        self.browser.get('https://www.instagram.com/')

    def go_to_login_page(self):
        self.browser.find_element_by_xpath("//a[text()='Log in']").click()
        sleep(2)
        return LoginPage(self.browser)

The code is the same except that the home page and the login page are represented as classes. The classes encapsulate the mechanics required to find and manipulate the data in the UI. That is, there are methods and accessors that allow the software to do anything a human can.

One other thing to note is that when you navigate to another page using a page object, it returns a page object for the new page. Note the returned value of go_to_log_in_page(). If you had another class called FeedPage, then login() of the LoginPage class would return an instance of that: return FeedPage().

Here’s how you can put the Page Object Pattern to use:

from selenium import webdriver

browser = webdriver.Firefox()
browser.implicitly_wait(5)

home_page = HomePage(browser)
login_page = home_page.go_to_login_page()
login_page.login("<your username>", "<your password>")

browser.close()

It looks much better, and the test above can now be rewritten to look like this:

def test_login_page(browser):
    home_page = HomePage(browser)
    login_page = home_page.go_to_login_page()
    login_page.login("<your username>", "<your password>")

    errors = browser.find_elements_by_css_selector('#error_message')
    assert len(errors) == 0

With these changes, you won’t have to touch your tests if something changes in the UI.

For more information on the Page Object Pattern, refer to the official documentation and to Martin Fowler’s article.

Now that you’re familiar with both Selenium and the Page Object Pattern, you’ll feel right at home with InstaPy. You’ll build a basic bot with it next.

Note: Both Selenium and the Page Object Pattern are widely used for other websites, not just for Instagram.

How to Build an Instagram Bot With InstaPy

In this section, you’ll use InstaPy to build an Instagram bot that will automatically like, follow, and comment on different posts. First, you’ll need to install InstaPy:

$ python3 -m pip install instapy

This will install instapy in your system.

Essential Features

Now you can rewrite the code above with InstaPy so that you can compare the two options. First, create another Python file and put the following code in it:

from instapy import InstaPy

InstaPy(username="<your_username>", password="<your_password>").login()

Replace the username and password with yours, run the script, and voilà! With just one line of code, you achieved the same result.

Even though your results are the same, you can see that the behavior isn’t exactly the same. In addition to simply logging in to your profile, InstaPy does some other things, such as checking your internet connection and the status of the Instagram servers. This can be observed directly on the browser or in the logs:

INFO [2019-12-17 22:03:19] [username]  -- Connection Checklist [1/3] (Internet Connection Status)
INFO [2019-12-17 22:03:20] [username]  - Internet Connection Status: ok
INFO [2019-12-17 22:03:20] [username]  - Current IP is "17.283.46.379" and it's from "Germany/DE"
INFO [2019-12-17 22:03:20] [username]  -- Connection Checklist [2/3] (Instagram Server Status)
INFO [2019-12-17 22:03:26] [username]  - Instagram WebSite Status: Currently Up

Pretty good for one line of code, isn’t it? Now it’s time to make the script do more interesting things than just logging in.

For the purpose of this example, assume that your profile is all about cars, and that your bot is intended to interact with the profiles of people who are also interested in cars.

First, you can like some posts that are tagged #bmw or #mercedes using like_by_tags():

from instapy import InstaPy


session = InstaPy(username="<your_username>", password="<your_password>")

session.login()

session.like_by_tags(["bmw", "mercedes"], amount=5)

Here, you gave the method a list of tags to like and the number of posts to like for each given tag. In this case, you instructed it to like ten posts, five for each of the two tags. But take a look at what happens after you run the script:

INFO [2019-12-17 22:15:58] [username]  Tag [1/2]
INFO [2019-12-17 22:15:58] [username]  --> b'bmw'
INFO [2019-12-17 22:16:07] [username]  desired amount: 14  |  top posts [disabled]: 9  |  possible posts: 43726739
INFO [2019-12-17 22:16:13] [username]  Like# [1/14]
INFO [2019-12-17 22:16:13] [username]  https://www.instagram.com/p/B6MCcGcC3tU/
INFO [2019-12-17 22:16:15] [username]  Image from: b'mattyproduction'
INFO [2019-12-17 22:16:15] [username]  Link: b'https://www.instagram.com/p/B6MCcGcC3tU/'
INFO [2019-12-17 22:16:15] [username]  Description: b'Mal etwas anderes \xf0\x9f\x91\x80\xe2\x98\xba\xef\xb8\x8f Bald ist das komplette Video auf YouTube zu finden (n\xc3\xa4here Infos werden folgen). Vielen Dank an @patrick_jwki @thehuthlife  und @christic_  f\xc3\xbcr das bereitstellen der Autos \xf0\x9f\x94\xa5\xf0\x9f\x98\x8d#carporn#cars#tuning#bagged#bmw#m2#m2competition#focusrs#ford#mk3#e92#m3#panasonic#cinematic#gh5s#dji#roninm#adobe#videography#music#bimmer#fordperformance#night#shooting#'
INFO [2019-12-17 22:16:15] [username]  Location: b'K\xc3\xb6ln, Germany'
INFO [2019-12-17 22:16:51] [username]  --> Image Liked!
INFO [2019-12-17 22:16:56] [username]  --> Not commented
INFO [2019-12-17 22:16:57] [username]  --> Not following
INFO [2019-12-17 22:16:58] [username]  Like# [2/14]
INFO [2019-12-17 22:16:58] [username]  https://www.instagram.com/p/B6MDK1wJ-Kb/
INFO [2019-12-17 22:17:01] [username]  Image from: b'davs0'
INFO [2019-12-17 22:17:01] [username]  Link: b'https://www.instagram.com/p/B6MDK1wJ-Kb/'
INFO [2019-12-17 22:17:01] [username]  Description: b'Someone said cloud? \xf0\x9f\xa4\x94\xf0\x9f\xa4\xad\xf0\x9f\x98\x88 \xe2\x80\xa2\n\xe2\x80\xa2\n\xe2\x80\xa2\n\xe2\x80\xa2\n#bmw #bmwrepost #bmwm4 #bmwm4gts #f82 #bmwmrepost #bmwmsport #bmwmperformance #bmwmpower #bmwm4cs #austinyellow #davs0 #mpower_official #bmw_world_ua #bimmerworld #bmwfans #bmwfamily #bimmers #bmwpost #ultimatedrivingmachine #bmwgang #m3f80 #m5f90 #m4f82 #bmwmafia #bmwcrew #bmwlifestyle'
INFO [2019-12-17 22:17:34] [username]  --> Image Liked!
INFO [2019-12-17 22:17:37] [username]  --> Not commented
INFO [2019-12-17 22:17:38] [username]  --> Not following

By default, InstaPy will like the first nine top posts in addition to your amount value. In this case, that brings the total number of likes per tag to fourteen (nine top posts plus the five you specified in amount).

Also note that InstaPy logs every action it takes. As you can see above, it mentions which post it liked as well as its link, description, location, and whether the bot commented on the post or followed the author.

You may have noticed that there are delays after almost every action. That’s by design. It prevents your profile from getting banned on Instagram.

Now, you probably don’t want your bot liking inappropriate posts. To prevent that from happening, you can use set_dont_like():

from instapy import InstaPy

session = InstaPy(username="<your_username>", password="<your_password>")
session.login()
session.like_by_tags(["bmw", "mercedes"], amount=5)
session.set_dont_like(["naked", "nsfw"])

With this change, posts that have the words naked or nsfw in their descriptions won’t be liked. You can flag any other words that you want your bot to avoid.

Next, you can tell the bot to not only like the posts but also to follow some of the authors of those posts. You can do that with set_do_follow():

from instapy import InstaPy

session = InstaPy(username="<your_username>", password="<your_password>")
session.login()
session.like_by_tags(["bmw", "mercedes"], amount=5)
session.set_dont_like(["naked", "nsfw"])
session.set_do_follow(True, percentage=50)

If you run the script now, then the bot will follow fifty percent of the users whose posts it liked. As usual, every action will be logged.

You can also leave some comments on the posts. There are two things that you need to do. First, enable commenting with set_do_comment():

from instapy import InstaPy

session = InstaPy(username="<your_username>", password="<your_password>")
session.login()
session.like_by_tags(["bmw", "mercedes"], amount=5)
session.set_dont_like(["naked", "nsfw"])
session.set_do_follow(True, percentage=50)
session.set_do_comment(True, percentage=50)

Next, tell the bot what comments to leave with set_comments():

from instapy import InstaPy

session = InstaPy(username="<your_username>", password="<your_password>")
session.login()
session.like_by_tags(["bmw", "mercedes"], amount=5)
session.set_dont_like(["naked", "nsfw"])
session.set_do_follow(True, percentage=50)
session.set_do_comment(True, percentage=50)
session.set_comments(["Nice!", "Sweet!", "Beautiful :heart_eyes:"])

Run the script and the bot will leave one of those three comments on half the posts that it interacts with.

Now that you’re done with the basic settings, it’s a good idea to end the session with end():

from instapy import InstaPy

session = InstaPy(username="<your_username>", password="<your_password>")
session.login()
session.like_by_tags(["bmw", "mercedes"], amount=5)
session.set_dont_like(["naked", "nsfw"])
session.set_do_follow(True, percentage=50)
session.set_do_comment(True, percentage=50)
session.set_comments(["Nice!", "Sweet!", "Beautiful :heart_eyes:"])
session.end()

This will close the browser, save the logs, and prepare a report that you can see in the console output.

Additional Features in InstaPy

InstaPy is a sizable project that has a lot of thoroughly documented features. The good news is that if you’re feeling comfortable with the features you used above, then the rest should feel pretty similar. This section will outline some of the more useful features of InstaPy.

Quota Supervisor

You can’t scrape Instagram all day, every day. The service will quickly notice that you’re running a bot and will ban some of its actions. That’s why it’s a good idea to set quotas on some of your bot’s actions. Take the following for example:

session.set_quota_supervisor(enabled=True, peak_comments_daily=240, peak_comments_hourly=21)

The bot will keep commenting until it reaches its hourly and daily limits. It will resume commenting after the quota period has passed.

Headless Browser

This feature allows you to run your bot without the GUI of the browser. This is super useful if you want to deploy your bot to a server where you may not have or need the graphical interface. It’s also less CPU intensive, so it improves performance. You can use it like so:

session = InstaPy(username='test', password='test', headless_browser=True)

Note that you set this flag when you initialize the InstaPy object.

Using AI to Analyze Posts

Earlier you saw how to ignore posts that contain inappropriate words in their descriptions. What if the description is good but the image itself is inappropriate? You can integrate your InstaPy bot with ClarifAI, which offers image and video recognition services:

session.set_use_clarifai(enabled=True, api_key='<your_api_key>')
session.clarifai_check_img_for(['nsfw'])

Now your bot won’t like or comment on any image that ClarifAI considers NSFW. You get 5,000 free API-calls per month.

Relationship Bounds

It’s often a waste of time to interact with posts by people who have a lot of followers. In such cases, it’s a good idea to set some relationship bounds so that your bot doesn’t waste your precious computing resources:

session.set_relationship_bounds(enabled=True, max_followers=8500)

With this, your bot won’t interact with posts by users who have more than 8,500 followers.

For many more features and configurations in InstaPy, check out the documentation.

Conclusion

InstaPy allows you to automate your Instagram activities with minimal fuss and effort. It’s a very flexible tool with a lot of useful features.

In this tutorial, you learned:

  • How Instagram bots work
  • How to automate a browser with Selenium
  • How to use the Page Object Pattern to make your code more maintainable and testable
  • How to use InstaPy to build a basic Instagram bot

Read the InstaPy documentation and experiment with your bot a little bit. Soon you’ll start getting new followers and likes with a minimal amount of effort. I gained a few new followers myself while writing this tutorial.


Automating Instagram API with Python

Instagram bot with Python

Gain active followers - Algorithm

Maybe some of you do not agree it is a good way to grow your IG page by using follow for follow method but after a lot of researching I found the proper way to use this method.

I have done and used this strategy for a while and my page visits also followers started growing.

The majority of people failing because they randomly targeting the followers and as a result, they are not coming back to your page. So, the key is to find people those have same interests with you.

If you have a programming page go and search for IG pages which have big programming community and once you find one, don’t send follow requests to followers of this page. Because some of them are not active even maybe fake accounts. So, in order to gain active followers, go the last post of this page and find people who liked the post.

Unofficial Instagram API

In order to query data from Instagram I am going to use the very cool, yet unofficial, Instagram API written by Pasha Lev.

**Note:**Before you test it make sure you verified your phone number in your IG account.

The program works pretty well so far but in case of any problems I have to put disclaimer statement here:

Disclaimer: This post published educational purposes only as well as to give general information about Instagram API. I am not responsible for any actions and you are taking your own risk.

Let’s start by installing and then logging in with API.

pip install InstagramApi

from InstagramAPI import InstagramAPI

api = InstagramAPI("username", "password")
api.login()

Once you run the program you will see “Login success!” in your console.

Get users from liked list

We are going to search for some username (your target page) then get most recent post from this user. Then, get users who liked this post. Unfortunately, I can’t find solution how to paginate users so right now it gets about last 500 user.

users_list = []

def get_likes_list(username):
    api.login()
    api.searchUsername(username)
    result = api.LastJson
    username_id = result['user']['pk'] # Get user ID
    user_posts = api.getUserFeed(username_id) # Get user feed
    result = api.LastJson
    media_id = result['items'][0]['id'] # Get most recent post
    api.getMediaLikers(media_id) # Get users who liked
    users = api.LastJson['users']
    for user in users: # Push users to list
        users_list.append({'pk':user['pk'], 'username':user['username']})

Follow Users

Once we get the users list, it is time to follow these users.

IMPORTANT NOTE: set time limit as much as you can to avoid automation detection.

from time import sleep

following_users = []

def follow_users(users_list):
    api.login()
    api.getSelfUsersFollowing() # Get users which you are following
    result = api.LastJson
    for user in result['users']:
        following_users.append(user['pk'])
    for user in users_list:
        if not user['pk'] in following_users: # if new user is not in your following users                   
            print('Following @' + user['username'])
            api.follow(user['pk'])
            # after first test set this really long to avoid from suspension
            sleep(20)
        else:
            print('Already following @' + user['username'])
            sleep(10)

Unfollow Users

This function will look users which you are following then it will check if this user follows you as well. If user not following you then you are unfollowing as well.

follower_users = []

def unfollow_users():
    api.login()
    api.getSelfUserFollowers() # Get your followers
    result = api.LastJson
    for user in result['users']:
        follower_users.append({'pk':user['pk'], 'username':user['username']})

    api.getSelfUsersFollowing() # Get users which you are following
    result = api.LastJson
    for user in result['users']:
        following_users.append({'pk':user['pk'],'username':user['username']})
    for user in following_users:
        if not user['pk'] in follower_users: # if the user not follows you
            print('Unfollowing @' + user['username'])
            api.unfollow(user['pk'])
            # set this really long to avoid from suspension
            sleep(20) 

Full Code with extra functions

Here is the full code of this automation

import pprint
from time import sleep
from InstagramAPI import InstagramAPI
import pandas as pd

users_list = []
following_users = []
follower_users = []

class InstaBot:

    def __init__(self):
        self.api = InstagramAPI("your_username", "your_password")

    def get_likes_list(self,username):
        api = self.api
        api.login()
        api.searchUsername(username) #Gets most recent post from user
        result = api.LastJson
        username_id = result['user']['pk']
        user_posts = api.getUserFeed(username_id)
        result = api.LastJson
        media_id = result['items'][0]['id']

        api.getMediaLikers(media_id)
        users = api.LastJson['users']
        for user in users:
            users_list.append({'pk':user['pk'], 'username':user['username']})
        bot.follow_users(users_list)

    def follow_users(self,users_list):
        api = self.api
        api.login()
        api.getSelfUsersFollowing()
        result = api.LastJson
        for user in result['users']:
            following_users.append(user['pk'])
        for user in users_list:
            if not user['pk'] in following_users:
                print('Following @' + user['username'])
                api.follow(user['pk'])
                # set this really long to avoid from suspension
                sleep(20)
            else:
                print('Already following @' + user['username'])
                sleep(10)

     def unfollow_users(self):
        api = self.api
        api.login()
        api.getSelfUserFollowers()
        result = api.LastJson
        for user in result['users']:
            follower_users.append({'pk':user['pk'], 'username':user['username']})

        api.getSelfUsersFollowing()
        result = api.LastJson
        for user in result['users']:
            following_users.append({'pk':user['pk'],'username':user['username']})

        for user in following_users:
            if not user['pk'] in [user['pk'] for user in follower_users]:
                print('Unfollowing @' + user['username'])
                api.unfollow(user['pk'])
                # set this really long to avoid from suspension
                sleep(20) 

bot =  InstaBot()
# To follow users run the function below
# change the username ('instagram') to your target username
bot.get_likes_list('instagram')

# To unfollow users uncomment and run the function below
# bot.unfollow_users()

it will look like this:

Reverse Python

some extra functions to play with API:

def get_my_profile_details():
    api.login() 
    api.getSelfUsernameInfo()
    result = api.LastJson
    username = result['user']['username']
    full_name = result['user']['full_name']
    profile_pic_url = result['user']['profile_pic_url']
    followers = result['user']['follower_count']
    following = result['user']['following_count']
    media_count = result['user']['media_count']
    df_profile = pd.DataFrame(
        {'username':username,
        'full name': full_name,
        'profile picture URL':profile_pic_url,
        'followers':followers,
        'following':following,
        'media count': media_count,
        }, index=[0])
    df_profile.to_csv('profile.csv', sep='\t', encoding='utf-8')

def get_my_feed():
    image_urls = []
    api.login()
    api.getSelfUserFeed()
    result = api.LastJson
    # formatted_json_str = pprint.pformat(result)
    # print(formatted_json_str)
    if 'items' in result.keys():
        for item in result['items'][0:5]:
            if 'image_versions2' in item.keys():
                image_url = item['image_versions2']['candidates'][1]['url']
                image_urls.append(image_url)

    df_feed = pd.DataFrame({
                'image URL':image_urls
            })
    df_feed.to_csv('feed.csv', sep='\t', encoding='utf-8')


Building an Instagram Bot with Python and Selenium to Gain More Followers

This is image title

Let’s build an Instagram bot to gain more followers! — I know, I know. That doesn’t sound very ethical, does it? But it’s all justified for educational purposes.

Coding is a super power — we can all agree. That’s why I’ll leave it up to you to not abuse this power. And I trust you’re here to learn how it works. Otherwise, you’d be on GitHub cloning one of the countless Instagram bots there, right?

You’re convinced? — Alright, now let’s go back to unethical practices.

The Plan

So here’s the deal, we want to build a bot in Python and Selenium that goes on the hashtags we specify, likes random posts, then follows the posters. It does that enough — we get follow backs. Simple as that.

Here’s a pretty twisted detail though: we want to keep track of the users we follow so the bot can unfollow them after the number of days we specify.

Setup

So first things first, I want to use a database to keep track of the username and the date added. You might as well save/load from/to a file, but we want this to be ready for more features in case we felt inspired in the future.

So make sure you create a database (I named mine instabot — but you can name it anything you like) and create a table called followed_users within the database with two fields (username, date_added)

Remember the installation path. You’ll need it.

You’ll also need the following python packages:

  • selenium
  • mysql-connector

Getting down to it

Alright, so first thing we’ll be doing is creating settings.json. Simply a .json file that will hold all of our settings so we don’t have to dive into the code every time we want to change something.

Settings

settings.json:

{
  "db": {
    "host": "localhost",
    "user": "root",
    "pass": "",
    "database": "instabot"
  },
  "instagram": {
    "user": "",
    "pass": ""
  },
  "config": {
    "days_to_unfollow": 1,
    "likes_over": 150,
    "check_followers_every": 3600,
    "hashtags": []
  }
}

As you can see, under “db”, we specify the database information. As I mentioned, I used “instabot”, but feel free to use whatever name you want.

You’ll also need to fill Instagram info under “instagram” so the bot can login into your account.

“config” is for our bot’s settings. Here’s what the fields mean:

days_to_unfollow: number of days before unfollowing users

likes_over: ignore posts if the number of likes is above this number

check_followers_every: number of seconds before checking if it’s time to unfollow any of the users

hashtags: a list of strings with the hashtag names the bot should be active on

Constants

Now, we want to take these settings and have them inside our code as constants.

Create Constants.py:

import json
INST_USER= INST_PASS= USER= PASS= HOST= DATABASE= POST_COMMENTS= ''
LIKES_LIMIT= DAYS_TO_UNFOLLOW= CHECK_FOLLOWERS_EVERY= 0
HASHTAGS= []

def init():
    global INST_USER, INST_PASS, USER, PASS, HOST, DATABASE, LIKES_LIMIT, DAYS_TO_UNFOLLOW, CHECK_FOLLOWERS_EVERY, HASHTAGS
    # read file
    data = None
    with open('settings.json', 'r') as myfile:
        data = myfile.read()
    obj = json.loads(data)
    INST_USER = obj['instagram']['user']
    INST_PASS = obj['instagram']['pass']
    USER = obj['db']['user']
    HOST = obj['db']['host']
    PASS = obj['db']['pass']
    DATABASE = obj['db']['database']
    LIKES_LIMIT = obj['config']['likes_over']
    CHECK_FOLLOWERS_EVERY = obj['config']['check_followers_every']
    HASHTAGS = obj['config']['hashtags']
    DAYS_TO_UNFOLLOW = obj['config']['days_to_unfollow']

the init() function we created reads the data from settings.json and feeds them into the constants we declared.

Engine

Alright, time for some architecture. Our bot will mainly operate from a python script with an init and update methods. Create BotEngine.py:

import Constants


def init(webdriver):
    return


def update(webdriver):
    return

We’ll be back later to put the logic here, but for now, we need an entry point.

Entry Point

Create our entry point, InstaBot.py:

from selenium import webdriver
import BotEngine

chromedriver_path = 'YOUR CHROMEDRIVER PATH' 
webdriver = webdriver.Chrome(executable_path=chromedriver_path)

BotEngine.init(webdriver)
BotEngine.update(webdriver)

webdriver.close()

chromedriver_path = ‘YOUR CHROMEDRIVER PATH’ webdriver = webdriver.Chrome(executable_path=chromedriver_path)

BotEngine.init(webdriver)
BotEngine.update(webdriver)

webdriver.close()

Of course, you’ll need to swap “YOUR CHROMEDRIVER PATH” with your actual ChromeDriver path.

Time Helper

We need to create a helper script that will help us calculate elapsed days since a certain date (so we know if we should unfollow user)

Create TimeHelper.py:

import datetime


def days_since_date(n):
    diff = datetime.datetime.now().date() - n
    return diff.days

Database

Create DBHandler.py. It’ll contain a class that handles connecting to the Database for us.

import mysql.connector
import Constants
class DBHandler:
    def __init__(self):
        DBHandler.HOST = Constants.HOST
        DBHandler.USER = Constants.USER
        DBHandler.DBNAME = Constants.DATABASE
        DBHandler.PASSWORD = Constants.PASS
    HOST = Constants.HOST
    USER = Constants.USER
    DBNAME = Constants.DATABASE
    PASSWORD = Constants.PASS
    @staticmethod
    def get_mydb():
        if DBHandler.DBNAME == '':
            Constants.init()
        db = DBHandler()
        mydb = db.connect()
        return mydb

    def connect(self):
        mydb = mysql.connector.connect(
            host=DBHandler.HOST,
            user=DBHandler.USER,
            passwd=DBHandler.PASSWORD,
            database = DBHandler.DBNAME
        )
        return mydb

As you can see, we’re using the constants we defined.

The class contains a static method get_mydb() that returns a database connection we can use.

Now, let’s define a DB user script that contains the DB operations we need to perform on the user.

Create DBUsers.py:

import datetime, TimeHelper
from DBHandler import *
import Constants

#delete user by username
def delete_user(username):
    mydb = DBHandler.get_mydb()
    cursor = mydb.cursor()
    sql = "DELETE FROM followed_users WHERE username = '{0}'".format(username)
    cursor.execute(sql)
    mydb.commit()


#add new username
def add_user(username):
    mydb = DBHandler.get_mydb()
    cursor = mydb.cursor()
    now = datetime.datetime.now().date()
    cursor.execute("INSERT INTO followed_users(username, date_added) VALUES(%s,%s)",(username, now))
    mydb.commit()


#check if any user qualifies to be unfollowed
def check_unfollow_list():
    mydb = DBHandler.get_mydb()
    cursor = mydb.cursor()
    cursor.execute("SELECT * FROM followed_users")
    results = cursor.fetchall()
    users_to_unfollow = []
    for r in results:
        d = TimeHelper.days_since_date(r[1])
        if d > Constants.DAYS_TO_UNFOLLOW:
            users_to_unfollow.append(r[0])
    return users_to_unfollow


#get all followed users
def get_followed_users():
    users = []
    mydb = DBHandler.get_mydb()
    cursor = mydb.cursor()
    cursor.execute("SELECT * FROM followed_users")
    results = cursor.fetchall()
    for r in results:
        users.append(r[0])

    return users

Account Agent

Alright, we’re about to start our bot. We’re creating a script called AccountAgent.py that will contain the agent behavior.

Import some modules, some of which we need for later and write a login function that will make use of our webdriver.

Notice that we have to keep calling the sleep function between actions. If we send too many requests quickly, the Instagram servers will be alarmed and will deny any requests you send.

from time import sleep
import datetime
import DBUsers, Constants
import traceback
import random

def login(webdriver):
    #Open the instagram login page
    webdriver.get('https://www.instagram.com/accounts/login/?source=auth_switcher')
    #sleep for 3 seconds to prevent issues with the server
    sleep(3)
    #Find username and password fields and set their input using our constants
    username = webdriver.find_element_by_name('username')
    username.send_keys(Constants.INST_USER)
    password = webdriver.find_element_by_name('password')
    password.send_keys(Constants.INST_PASS)
    #Get the login button
    try:
        button_login = webdriver.find_element_by_xpath(
            '//*[@id="react-root"]/section/main/div/article/div/div[1]/div/form/div[4]/button')
    except:
        button_login = webdriver.find_element_by_xpath(
            '//*[@id="react-root"]/section/main/div/article/div/div[1]/div/form/div[6]/button/div')
    #sleep again
    sleep(2)
    #click login
    button_login.click()
    sleep(3)
    #In case you get a popup after logging in, press not now.
    #If not, then just return
    try:
        notnow = webdriver.find_element_by_css_selector(
            'body > div.RnEpo.Yx5HN > div > div > div.mt3GC > button.aOOlW.HoLwm')
        notnow.click()
    except:
        return

Also note how we’re getting elements with their xpath. To do so, right click on the element, click “Inspect”, then right click on the element again inside the inspector, and choose Copy->Copy XPath.

Another important thing to be aware of is that element hierarchy change with the page’s layout when you resize or stretch the window. That’s why we’re checking for two different xpaths for the login button.

Now go back to BotEngine.py, we’re ready to login.

Add more imports that we’ll need later and fill in the init function

import AccountAgent, DBUsers
import Constants
import datetime


def init(webdriver):
    Constants.init()
    AccountAgent.login(webdriver)


def update(webdriver):
    return

If you run our entry script now (InstaBot.py) you’ll see the bot logging in.

Perfect, now let’s add a method that will allow us to follow people to AccountAgent.py:

def follow_people(webdriver):
    #all the followed user
    prev_user_list = DBUsers.get_followed_users()
    #a list to store newly followed users
    new_followed = []
    #counters
    followed = 0
    likes = 0
    #Iterate theough all the hashtags from the constants
    for hashtag in Constants.HASHTAGS:
        #Visit the hashtag
        webdriver.get('https://www.instagram.com/explore/tags/' + hashtag+ '/')
        sleep(5)

        #Get the first post thumbnail and click on it
        first_thumbnail = webdriver.find_element_by_xpath(
            '//*[@id="react-root"]/section/main/article/div[1]/div/div/div[1]/div[1]/a/div')

        first_thumbnail.click()
        sleep(random.randint(1,3))

        try:
            #iterate over the first 200 posts in the hashtag
            for x in range(1,200):
                t_start = datetime.datetime.now()
                #Get the poster's username
                username = webdriver.find_element_by_xpath('/html/body/div[3]/div[2]/div/article/header/div[2]/div[1]/div[1]/h2/a').text
                likes_over_limit = False
                try:
                    #get number of likes and compare it to the maximum number of likes to ignore post
                    likes = int(webdriver.find_element_by_xpath(
                        '/html/body/div[3]/div[2]/div/article/div[2]/section[2]/div/div/button/span').text)
                    if likes > Constants.LIKES_LIMIT:
                        print("likes over {0}".format(Constants.LIKES_LIMIT))
                        likes_over_limit = True


                    print("Detected: {0}".format(username))
                    #If username isn't stored in the database and the likes are in the acceptable range
                    if username not in prev_user_list and not likes_over_limit:
                        #Don't press the button if the text doesn't say follow
                        if webdriver.find_element_by_xpath('/html/body/div[3]/div[2]/div/article/header/div[2]/div[1]/div[2]/button').text == 'Follow':
                            #Use DBUsers to add the new user to the database
                            DBUsers.add_user(username)
                            #Click follow
                            webdriver.find_element_by_xpath('/html/body/div[3]/div[2]/div/article/header/div[2]/div[1]/div[2]/button').click()
                            followed += 1
                            print("Followed: {0}, #{1}".format(username, followed))
                            new_followed.append(username)


                        # Liking the picture
                        button_like = webdriver.find_element_by_xpath(
                            '/html/body/div[3]/div[2]/div/article/div[2]/section[1]/span[1]/button')

                        button_like.click()
                        likes += 1
                        print("Liked {0}'s post, #{1}".format(username, likes))
                        sleep(random.randint(5, 18))


                    # Next picture
                    webdriver.find_element_by_link_text('Next').click()
                    sleep(random.randint(20, 30))
                    
                except:
                    traceback.print_exc()
                    continue
                t_end = datetime.datetime.now()

                #calculate elapsed time
                t_elapsed = t_end - t_start
                print("This post took {0} seconds".format(t_elapsed.total_seconds()))


        except:
            traceback.print_exc()
            continue

        #add new list to old list
        for n in range(0, len(new_followed)):
            prev_user_list.append(new_followed[n])
        print('Liked {} photos.'.format(likes))
        print('Followed {} new people.'.format(followed))

It’s pretty long, but generally here’s the steps of the algorithm:

For every hashtag in the hashtag constant list:

  • Visit the hashtag link
  • Open the first thumbnail
  • Now, execute the following code 200 times (first 200 posts in the hashtag)
  • Get poster’s username, check if not already following, follow, like the post, then click next
  • If already following just click next quickly

Now we might as well implement the unfollow method, hopefully the engine will be feeding us the usernames to unfollow in a list:

def unfollow_people(webdriver, people):
    #if only one user, append in a list
    if not isinstance(people, (list,)):
        p = people
        people = []
        people.append(p)

    for user in people:
        try:
            webdriver.get('https://www.instagram.com/' + user + '/')
            sleep(5)
            unfollow_xpath = '//*[@id="react-root"]/section/main/div/header/section/div[1]/div[1]/span/span[1]/button'

            unfollow_confirm_xpath = '/html/body/div[3]/div/div/div[3]/button[1]'

            if webdriver.find_element_by_xpath(unfollow_xpath).text == "Following":
                sleep(random.randint(4, 15))
                webdriver.find_element_by_xpath(unfollow_xpath).click()
                sleep(2)
                webdriver.find_element_by_xpath(unfollow_confirm_xpath).click()
                sleep(4)
            DBUsers.delete_user(user)

        except Exception:
            traceback.print_exc()
            continue

Now we can finally go back and finish the bot by implementing the rest of BotEngine.py:

import AccountAgent, DBUsers
import Constants
import datetime


def init(webdriver):
    Constants.init()
    AccountAgent.login(webdriver)


def update(webdriver):
    #Get start of time to calculate elapsed time later
    start = datetime.datetime.now()
    #Before the loop, check if should unfollow anyone
    _check_follow_list(webdriver)
    while True:
        #Start following operation
        AccountAgent.follow_people(webdriver)
        #Get the time at the end
        end = datetime.datetime.now()
        #How much time has passed?
        elapsed = end - start
        #If greater than our constant to check on
        #followers, check on followers
        if elapsed.total_seconds() >= Constants.CHECK_FOLLOWERS_EVERY:
            #reset the start variable to now
            start = datetime.datetime.now()
            #check on followers
            _check_follow_list(webdriver)


def _check_follow_list(webdriver):
    print("Checking for users to unfollow")
    #get the unfollow list
    users = DBUsers.check_unfollow_list()
    #if there's anyone in the list, start unfollowing operation
    if len(users) > 0:
        AccountAgent.unfollow_people(webdriver, users)

Conclusion

And that’s it — now you have yourself a fully functional Instagram bot built with Python and Selenium. There are many possibilities for you to explore now, so make sure you’re using this newly gained skill to solve real life problems!

You can get the source code for the whole project from this GitHub repository.


Building a simple Instagram bot with Python tutorial

Here we build a simple bot using some simple Python which beginner to intermediate coders can follow.

Here’s the code on GitHub
https://github.com/aj-4/ig-followers


Build A (Full-Featured) Instagram Bot With Python

Source Code: https://github.com/jg-fisher/instagram-bot 


How to Get Instagram Followers/Likes Using Python

In this video I show you how to program your own Instagram Bot using Python and Selenium.

https://www.youtube.com/watch?v=BGU2X5lrz9M 

Code Link:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import random
import sys


def print_same_line(text):
    sys.stdout.write('\r')
    sys.stdout.flush()
    sys.stdout.write(text)
    sys.stdout.flush()


class InstagramBot:

    def __init__(self, username, password):
        self.username = username
        self.password = password
        self.driver = webdriver.Chrome()

    def closeBrowser(self):
        self.driver.close()

    def login(self):
        driver = self.driver
        driver.get("https://www.instagram.com/")
        time.sleep(2)
        login_button = driver.find_element_by_xpath("//a[@href='/accounts/login/?source=auth_switcher']")
        login_button.click()
        time.sleep(2)
        user_name_elem = driver.find_element_by_xpath("//input[@name='username']")
        user_name_elem.clear()
        user_name_elem.send_keys(self.username)
        passworword_elem = driver.find_element_by_xpath("//input[@name='password']")
        passworword_elem.clear()
        passworword_elem.send_keys(self.password)
        passworword_elem.send_keys(Keys.RETURN)
        time.sleep(2)


    def like_photo(self, hashtag):
        driver = self.driver
        driver.get("https://www.instagram.com/explore/tags/" + hashtag + "/")
        time.sleep(2)

        # gathering photos
        pic_hrefs = []
        for i in range(1, 7):
            try:
                driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
                time.sleep(2)
                # get tags
                hrefs_in_view = driver.find_elements_by_tag_name('a')
                # finding relevant hrefs
                hrefs_in_view = [elem.get_attribute('href') for elem in hrefs_in_view
                                 if '.com/p/' in elem.get_attribute('href')]
                # building list of unique photos
                [pic_hrefs.append(href) for href in hrefs_in_view if href not in pic_hrefs]
                # print("Check: pic href length " + str(len(pic_hrefs)))
            except Exception:
                continue

        # Liking photos
        unique_photos = len(pic_hrefs)
        for pic_href in pic_hrefs:
            driver.get(pic_href)
            time.sleep(2)
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            try:
                time.sleep(random.randint(2, 4))
                like_button = lambda: driver.find_element_by_xpath('//span[@aria-label="Like"]').click()
                like_button().click()
                for second in reversed(range(0, random.randint(18, 28))):
                    print_same_line("#" + hashtag + ': unique photos left: ' + str(unique_photos)
                                    + " | Sleeping " + str(second))
                    time.sleep(1)
            except Exception as e:
                time.sleep(2)
            unique_photos -= 1

if __name__ == "__main__":

    username = "USERNAME"
    password = "PASSWORD"

    ig = InstagramBot(username, password)
    ig.login()

    hashtags = ['amazing', 'beautiful', 'adventure', 'photography', 'nofilter',
                'newyork', 'artsy', 'alumni', 'lion', 'best', 'fun', 'happy',
                'art', 'funny', 'me', 'followme', 'follow', 'cinematography', 'cinema',
                'love', 'instagood', 'instagood', 'followme', 'fashion', 'sun', 'scruffy',
                'street', 'canon', 'beauty', 'studio', 'pretty', 'vintage', 'fierce']

    while True:
        try:
            # Choose a random tag from the list of tags
            tag = random.choice(hashtags)
            ig.like_photo(tag)
        except Exception:
            ig.closeBrowser()
            time.sleep(60)
            ig = InstagramBot(username, password)
            ig.login()

Build An INSTAGRAM Bot With Python That Gets You Followers


Instagram Automation Using Python


How to Create an Instagram Bot | Get More Followers


Building a simple Instagram Influencer Bot with Python tutorial

#python #chatbot #web-development

Lina  Biyinzika

Lina Biyinzika

1678051620

A Practical Guide of Unsupervised Learning Algorithms

In this article, learn about Machine Learning Tutorial: A Practical  Guide of Unsupervised Learning Algorithms. Machine learning is a fast-growing technology that allows computers to learn from the past and predict the future. It uses numerous algorithms for building mathematical models and predicting future trends. Machine learning (ML) has widespread applications in the industry, including speech recognition, image recognition, churn prediction, email filtering, chatbot development, recommender systems, and much more.

Machine learning (ML) can be classified into three main categories; supervised, unsupervised, and reinforcement learning. In supervised learning, the model is trained on labeled data. While in unsupervised learning, unlabeled data is provided to the model to predict the outcomes. Reinforcement learning is feedback learning in which the agent collects a reward for each correct action and gets a penalty for a wrong decision. The goal of the learning agent is to get maximum reward points and deduce the error.

What is Unsupervised Learning?

In unsupervised learning, the model learns from unlabeled data without proper supervision.

Unsupervised learning uses machine learning techniques to cluster unlabeled data based on similarities and differences. The unsupervised algorithms discover hidden patterns in data without human supervision. Unsupervised learning aims to arrange the raw data into new features or groups together with similar patterns of data.

For instance, to predict the churn rate, we provide unlabeled data to our model for prediction. There is no information given that customers have churned or not. The model will analyze the data and find hidden patterns to categorize into two clusters: churned and non-churned customers.

Unsupervised Learning Approaches

Unsupervised algorithms can be used for three tasks—clustering, dimensionality reduction, and association. Below, we will highlight some commonly used clustering and association algorithms.

Clustering Techniques

Clustering, or cluster analysis, is a popular data mining technique for unsupervised learning. The clustering approach works to group non-labeled data based on similarities and differences. Unlike supervised learning, clustering algorithms discover natural groupings in data. 

A good clustering method produces high-quality clusters having high intra-class similarity (similar data within a cluster) and less intra-class similarity (cluster data is dissimilar to other clusters). 

It can be defined as the grouping of data points into various clusters containing similar data points. The same objects remain in the group that has fewer similarities with other groups. Here, we will discuss two popular clustering techniques: K-Means clustering and DBScan Clustering.

K-Means Clustering

K-Means is the simplest unsupervised technique used to solve clustering problems. It groups the unlabeled data into various clusters. The K value defines the number of clusters you need to tell the system how many to create.

K-Means is a centroid-based algorithm in which each cluster is associated with the centroid. The goal is to minimize the sum of the distances between the data points and their corresponding clusters.

It is an iterative approach that breaks down the unlabeled data into different clusters so that each data point belongs to a group with similar characteristics.

K-means clustering performs two tasks:

  1. Using an iterative process to create the best value of K.
  2. Each data point is assigned to its closest k-center. The data point that is closer to the particular k-center makes a cluster.

 

An illustration of K-means clustering. Image source

DBScan Clustering

“DBScan” stands for “Density-based spatial clustering of applications with noise.” There are three main words in DBscan: density, clustering, and noise. Therefore, this algorithm uses the notion of density-based clustering to form clusters and detect the noise.

Clusters are usually dense regions that are separated by lower density regions. Unlike the k-means algorithm, which works only on well-separated clusters, DBscan has a wider scope and can create clusters within the cluster. It discovers clusters of various shapes and sizes from a large set of data, which consists of noise and outliers.

There are two parameters in the DBScan algorithm:

minPts: The threshold, or the minimum number of points grouped together for a region considered as a dense region.

eps(ε): The distance measure used to locate the points in the neighborhood. 

dbscan-clustering

 

An illustration of density-based clustering. Image Source 

Association Rule Mining

An association rule mining is a popular data mining technique. It finds interesting correlations in large numbers of data items. This rule shows how frequently items occur in a transaction.

Market Basket Analysis is a typical example of an association rule mining that finds relationships between items in the grocery store. It enables retailers to identify and analyze the associations between items that people frequently buy.

Important terminology used in association rules:

Support: It tells us about the combination of items bought frequently or frequently bought items.

Confidence: It tells us how often the items A and B occur together, given the number of times A occurs.

Lift: The lift indicates the strength of a rule over the random occurrence of A and B. For instance, A->B, the life value is 5. It means that if you buy A, the occurrence of buying B is five times.

The Apriori algorithm is a well-known association rule based technique.

Apriori algorithm 

The Apriori algorithm was proposed by R. Agarwal and R. Srikant in 1994 to find the frequent items in the dataset. The algorithm’s name is based on the fact that it uses prior knowledge of frequently occurring things. 

The Apriori algorithm finds frequently occurring items with minimum support. 

It consists of two steps:

  • Generation of candidate itemsets.
  • Removing items that are infrequent and don’t fulfill the criteria of minimum support.

Practical Implementation of Unsupervised Algorithms 

In this tutorial, you will learn about the implementation of various unsupervised algorithms in Python. Scikit-learn is a powerful Python library widely used for various unsupervised learning tasks. It is an open-source library that provides numerous robust algorithms, which include classification, dimensionality reduction, clustering techniques, and association rules.

Let’s begin!

1. K-Means algorithm 

Now let’s dive deep into the implementation of the K-Means algorithm in Python. We’ll break down each code snippet so that you can understand it easily.

Import libraries

First of all, we will import the required libraries and get access to the functions.

#Let's import the required libraries
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns

Loading the dataset 

The dataset is taken from the kaggle website. You can easily download it from the given link. To load the dataset, we use the pd.read_csv() function. head() returns the first five rows of the dataset.

my_data = pd.read_csv('Customers_Mall.csv.') my_data.head() dataset-columns

The dataset contains five columns: customer ID, gender, age, annual income in (K$), and spending score from 1-100. 

Data Preprocessing 

The info() function is used to get quick information about the dataset. It shows the number of entries, columns, total non-null values, memory usage, and datatypes. 

my_data.info()

dataset-description

 

To check the missing values in the dataset, we use isnull().sum(), which returns the total number of null values.

 

#Check missing values 
my_data.isnull().sum()

dataset-null-values

 

The box plot or whisker plot is used to detect outliers in the dataset. It also shows a statistical five number summary, which includes the minimum, first quartile, median (2nd quartile), third quartile, and maximum.

my_data.boxplot(figsize=(8,4)) dataset-boxplot-detect-outliers

Using Box Plot, we’ve detected an outlier in the annual income column. Now we will try to remove it before training our model. 

#let's remove outlier from data
med =61
my_data["Annual Income (k$)"] = np.where(my_data["Annual Income (k$)"] >
 120,med,my_data['Annual Income (k$)'])


The outlier in the annual income column has been removed now to confirm we used the box plot again.

my_data.boxplot(figsize=(8,5)) outlier-removed

Data Visualization

A histogram is used to illustrate the important features of the distribution of data. The hist() function is used to show the distribution of data in each numerical column.

my_data.hist(figsize=(6,6)) 

The correlation heatmap is used to find the potential relationships between variables in the data and to display the strength of those relationships. To display the heatmap, we have used the seaborn plotting library.

plt.figure(figsize=(10,6)) sns.heatmap(my_data.corr(), annot=True, cmap='icefire').set_title('seaborn') plt.show() dataset-histogram

Choosing the Best K Value

The iloc() function is used to select a particular cell of the data. It enables us to select a value that belongs to a specific row or column. Here, we’ve chosen the annual income and spending score columns.

 

X_val = my_data.iloc[:, 3:].values X_val

 

# Loading Kmeans Library

from sklearn.cluster import KMeans

Now we will select the best value for K using the Elbow’s method. It is used to determine the optimal number of clusters in K-means clustering.

my_val = []

for i in range(1,11):

    kmeans = KMeans(n_clusters = i, init='k-means++', random_state = 123)

    kmeans.fit(X_val)

    my_val.append(kmeans.inertia_)

The sklearn.cluster.KMeans() is used to choose the number of clusters along with the initialization of other parameters. To display the result, just call the variable.

my_val dataset-iloc-function #Visualization of clusters using elbow’s method plt.plot(range(1,11),my_val) plt.xlabel('The No of clusters') plt.ylabel('Outcome') plt.title('The Elbow Method') plt.show() clusters-elbow-method

Through Elbow’s Method, when the graph looks like an arm, then the elbow on the arm is the best value of K. In this case, we’ve taken K=3, which is the optimal value for K.

kmeans = KMeans(n_clusters = 3, init='k-means++') kmeans.fit(X_val) number-of-clusters #To show centroids of clusters  kmeans.cluster_centers_ cluster-centers #Prediction of K-Means clustering  y_kmeans = kmeans.fit_predict(X_val) y_kmeans

fit-predict-function-kmeans

Splitting the dataset into three clusters

The scatter graph is used to plot the classification results of our dataset into three clusters.

plt.scatter(X_val[y_kmeans == 0,0], X_val[y_kmeans == 0,1], c='red',s=100)

plt.scatter(X_val[y_kmeans == 1,0], X_val[y_kmeans == 1,1], c='green',s=100)

plt.scatter(X_val[y_kmeans == 2,0], X_val[y_kmeans == 2,1], c='orange',s=100)

plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], s=300, c='brown')

plt.title('K-Means Unsupervised Learning')

plt.show()


2. Apriori Algorithm

To implement the apriori algorithm, we will utilize “The Bread Basket” dataset. The dataset is available on Kaggle and you can download it from the link. This algorithm suggests products based on the user’s purchase history. Walmart has greatly utilized the algorithm to recommend relevant items to its users.

Let’s implement the Apriori algorithm in Python. 

Import libraries 

To implement the algorithm, we need to import some important libraries.

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

import seaborn as sns

Loading the dataset 

The dataset contains five columns and 20507 entries. The data_time is a prominent column and we can extract many vital insights from it.

my_data= pd.read_csv("bread basket.csv") my_data.head() bread-basket-dataset-apriori

Data Preprocessing 

Convert the data_time into an appropriate format.

my_data['date_time'] = pd.to_datetime(my_data['date_time'])

#Total No of unique customers

my_data['Transaction'].nunique()

unique-customers-apriori

Now we want to extract new columns from the data_time to extract meaningful information from the data.

#Let's extract date

my_data['date'] = my_data['date_time'].dt.date

#Let's extract time

my_data['time'] = my_data['date_time'].dt.time

#Extract month and replacing it with String

my_data['month'] = my_data['date_time'].dt.month

my_data['month'] = my_data['month'].replace((1,2,3,4,5,6,7,8,9,10,11,12), 

                                          ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug',

                                          'Sep','Oct','Nov','Dec'))

#Extract hour

my_data[‘hour’] = my_data[‘date_time’].dt.hour

# Replacing hours with text

# Replacing hours with text

hr_num = (1,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23)

hr_obj = (‘1-2′,’7-8′,’8-9′,’9-10′,’10-11′,’11-12′,’12-13′,’13-14′,’14-15’,

               ’15-16′,’16-17′,’17-18′,’18-19′,’19-20′,’20-21′,’21-22′,’22-23′,’23-24′)

my_data[‘hour’] = my_data[‘hour’].replace(hr_num, hr_obj)

# Extracting weekday and replacing it with String 

my_data[‘weekday’] = my_data[‘date_time’].dt.weekday

my_data[‘weekday’] = my_data[‘weekday’].replace((0,1,2,3,4,5,6), 

                                          (‘Mon’,’Tues’,’Wed’,’Thur’,’Fri’,’Sat’,’Sun’))

#Now drop date_time column

my_data.drop(‘date_time’, axis = 1, inplace = True)

After extracting the date, time, month, and hour columns, we dropped the data_time column.

Now to display, we simply use the head() function to see the changes in the dataset.

my_data.head()

dataset-apriori

# cleaning the item column

my_data[‘Item’] = my_data[‘Item’].str.strip()

my_data[‘Item’] = my_data[‘Item’].str.lower()

my_data.head()

clean-dataset

Data Visualization 

To display the top 10 items purchased by customers, we used a barplot() of the seaborn library. 

plt.figure(figsize=(10,5))

sns.barplot(x=my_data.Item.value_counts().head(10).index, y=my_data.Item.value_counts().head(10).values,palette='RdYlGn')

plt.xlabel('No of Items', size = 17)

plt.xticks(rotation=45)

plt.ylabel('Total Items', size = 18)

plt.title('Top 10 Items purchased', color = 'blue', size = 23)

plt.show()


From the graph, coffee is the top item purchased by the customers, followed by bread.

Now, to display the number of orders received each month, the groupby() function is used along with barplot() to visually show the results.

mon_Tran =my_data.groupby('month')['Transaction'].count().reset_index() mon_Tran.loc[:,"mon_order"] =[4,8,12,2,1,7,6,3,5,11,10,9] mon_Tran.sort_values("mon_order",inplace=True) plt.figure(figsize=(12,5)) sns.barplot(data = mon_Tran, x = "month", y = "Transaction") plt.xlabel('Months', size = 14) plt.ylabel('Monthly Orders', size = 14) plt.title('No of orders received each month', color = 'blue', size = 18) plt.show() orders-received-dataset

To show the number of orders received each day, we applied groupby() to the weekday column.

wk_Tran = my_data.groupby('weekday')['Transaction'].count().reset_index()

wk_Tran.loc[:,"wk_ord"] = [4,0,5,6,3,1,2]

wk_Tran.sort_values("wk_ord",inplace=True)

plt.figure(figsize=(11,4))

sns.barplot(data = wk_Tran, x = "weekday", y = "Transaction",palette='RdYlGn')

plt.xlabel('Week Day', size = 14)

plt.ylabel('Per day orders', size = 14)

plt.title('Orders received per day', color = 'blue', size = 18)

plt.show()


 Implementation of the Apriori Algorithm 

We import the mlxtend library to implement the association rules and count the number of items.

from mlxtend.frequent_patterns import association_rules, apriori

tran_str= my_data.groupby(['Transaction', 'Item'])['Item'].count().reset_index(name ='Count')

tran_str.head(8)

association-rule-mixtend

Now we’ll make a mxn matrix where m=transaction and n=items, and each row represents whether the item was in the transaction or not.

Mar_baskt = tran_str.pivot_table(index='Transaction', columns='Item', values='Count', aggfunc='sum').fillna(0)

Mar_baskt.head()

market-basket

We want to make a function that returns 0 and 1. 0 means that the item wasn’t present in the transaction, while 1 means the item exists.

def encode(val):

    if val<=0:

        return 0

    if val>=1:

        return 1

#Let's apply the function to the dataset

Basket=Mar_baskt.applymap(encode)

Basket.head()

basket-head

#using apriori algorithm to set min_support 0.01 means 1% freq_items = apriori(Basket, min_support = 0.01,use_colnames = True) freq_items.head()

frequent-items-apriori

Using the association_rules() function to generate the most frequent items from the dataset.

App_rule= association_rules(freq_items, metric = "lift", min_threshold = 1) App_rule.sort_values('confidence', ascending = False, inplace = True) App_rule.head() association-rules-apriori

From the above implementation, the most frequent items are coffee and toast, both having a lift value of 1.47 and a confidence value of 0.70. 

3. Principal Component Analysis 

Principal component analysis (PCA) is one of the most widely used unsupervised learning techniques. It can be used for various tasks, including dimensionality reduction, information compression, exploratory data analysis and Data de-noising.

Let’s use the PCA algorithm!

First we import the required libraries to implement this algorithm.

import numpy as np 

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

from sklearn.decomposition import PCA

from sklearn.datasets import load_digits

Loading the Dataset 

To implement the PCA algorithm the load_digits dataset of Scikit-learn is used which can easily be loaded using the below command. The dataset contains images data which include 1797 entries and 64 columns.

 

#Load the dataset

my_data= load_digits()

#Creating features

X_value = my_data.data

#Creating target

#Let's check the shape of X_value

X_value.shape

 

dataset-X-shape

 

 

#Each image is 8x8 pixels therefore 64px  my_data.images[10] image-pixels #Let's display the image plt.gray()  plt.matshow(my_data.images[34])  plt.show()

image-pixels

Now let’s project data from 64 columns to 16 to show how 16 dimensions classify the data.

X_val = my_data.data 

y_val = my_data.target

my_pca = PCA(16)  

X_projection = my_pca.fit_transform(X_val)

print(X_val.shape)

print(X_projection.shape)

projection-shape

Using colormap we visualize that with only ten dimensions  we can classify the data points. Now we’ll select the optimal number of dimensions (principal components) by which data can be reduced into lower dimensions.

plt.scatter(X_projection[:, 0], X_projection[:, 1], c=y_val, edgecolor='white',

            cmap=plt.cm.get_cmap("gist_heat",12))

plt.colorbar();

x-projection

pca=PCA().fit(X_val)

plt.plot(np.cumsum(my_pca.explained_variance_ratio_))

plt.xlabel('Principal components')

plt.ylabel('Explained variance')

Based on the below graph, only 12 components are required to explain more than 80% of the variance which is still better than computing all the 64 features. Thus, we’ve reduced the large number of dimensions into 12 dimensions to avoid the dimensionality curse. pca=PCA().fit(X_val)

plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlabel('Principal components')

plt.ylabel('Explained variance')



#Let's visualize how it looks like

Unsupervised_pca = PCA(12)  

X_pro = Unsupervised_pca.fit_transform(X_val)

print("New Data Shape is =>",X_pro.shape)

#Let's Create a scatter plot

plt.scatter(X_pro[:, 0], X_pro[:, 1], c=y_val, edgecolor='white',

            cmap=plt.cm.get_cmap("nipy_spectral",10))

plt.colorbar();


principal-component-analysis

Wrapping Up 

beyond machine

In this machine learning tutorial, we’ve implemented the Kmeans, Apriori, and PCA algorithms. These are some of the most widely used algorithms, having numerous industrial applications and solve many real world problems. For instance, K-means clustering is used in astronomy to study stellar and galaxy spectra, solar polarization spectra, and X-ray spectra. And, Apriori is used by retail stores to optimize their product inventory. 

Dreaming of becoming a data scientist or data analyst even without a university and a college degree? Do you need the knowledge of data science and analysis for promotions in your current role?

Are you interested in securing your dream job in data science and analysis and looking for a way to get started, we can help you? With over 10 years of experience in data science and data analysis, we will teach you the rubrics, guiding you with one-on-one lessons from the fundamentals until you become a pro.

Our courses are affordable and easy to understand with numerous exercises and assignments you can learn from. At the completion of our courses, you’ll be readily equipped with technical and practical skills to take on any data science and data analysis role in companies, collaborate effectively among teams and help businesses meet and exceed their objectives by extracting actionable insights from data.

Original article sourced at: https://thedatascientist.com

#machine-learning 

Chloe  Butler

Chloe Butler

1667425440

Pdf2gerb: Perl Script Converts PDF Files to Gerber format

pdf2gerb

Perl script converts PDF files to Gerber format

Pdf2Gerb generates Gerber 274X photoplotting and Excellon drill files from PDFs of a PCB. Up to three PDFs are used: the top copper layer, the bottom copper layer (for 2-sided PCBs), and an optional silk screen layer. The PDFs can be created directly from any PDF drawing software, or a PDF print driver can be used to capture the Print output if the drawing software does not directly support output to PDF.

The general workflow is as follows:

  1. Design the PCB using your favorite CAD or drawing software.
  2. Print the top and bottom copper and top silk screen layers to a PDF file.
  3. Run Pdf2Gerb on the PDFs to create Gerber and Excellon files.
  4. Use a Gerber viewer to double-check the output against the original PCB design.
  5. Make adjustments as needed.
  6. Submit the files to a PCB manufacturer.

Please note that Pdf2Gerb does NOT perform DRC (Design Rule Checks), as these will vary according to individual PCB manufacturer conventions and capabilities. Also note that Pdf2Gerb is not perfect, so the output files must always be checked before submitting them. As of version 1.6, Pdf2Gerb supports most PCB elements, such as round and square pads, round holes, traces, SMD pads, ground planes, no-fill areas, and panelization. However, because it interprets the graphical output of a Print function, there are limitations in what it can recognize (or there may be bugs).

See docs/Pdf2Gerb.pdf for install/setup, config, usage, and other info.


pdf2gerb_cfg.pm

#Pdf2Gerb config settings:
#Put this file in same folder/directory as pdf2gerb.pl itself (global settings),
#or copy to another folder/directory with PDFs if you want PCB-specific settings.
#There is only one user of this file, so we don't need a custom package or namespace.
#NOTE: all constants defined in here will be added to main namespace.
#package pdf2gerb_cfg;

use strict; #trap undef vars (easier debug)
use warnings; #other useful info (easier debug)


##############################################################################################
#configurable settings:
#change values here instead of in main pfg2gerb.pl file

use constant WANT_COLORS => ($^O !~ m/Win/); #ANSI colors no worky on Windows? this must be set < first DebugPrint() call

#just a little warning; set realistic expectations:
#DebugPrint("${\(CYAN)}Pdf2Gerb.pl ${\(VERSION)}, $^O O/S\n${\(YELLOW)}${\(BOLD)}${\(ITALIC)}This is EXPERIMENTAL software.  \nGerber files MAY CONTAIN ERRORS.  Please CHECK them before fabrication!${\(RESET)}", 0); #if WANT_DEBUG

use constant METRIC => FALSE; #set to TRUE for metric units (only affect final numbers in output files, not internal arithmetic)
use constant APERTURE_LIMIT => 0; #34; #max #apertures to use; generate warnings if too many apertures are used (0 to not check)
use constant DRILL_FMT => '2.4'; #'2.3'; #'2.4' is the default for PCB fab; change to '2.3' for CNC

use constant WANT_DEBUG => 0; #10; #level of debug wanted; higher == more, lower == less, 0 == none
use constant GERBER_DEBUG => 0; #level of debug to include in Gerber file; DON'T USE FOR FABRICATION
use constant WANT_STREAMS => FALSE; #TRUE; #save decompressed streams to files (for debug)
use constant WANT_ALLINPUT => FALSE; #TRUE; #save entire input stream (for debug ONLY)

#DebugPrint(sprintf("${\(CYAN)}DEBUG: stdout %d, gerber %d, want streams? %d, all input? %d, O/S: $^O, Perl: $]${\(RESET)}\n", WANT_DEBUG, GERBER_DEBUG, WANT_STREAMS, WANT_ALLINPUT), 1);
#DebugPrint(sprintf("max int = %d, min int = %d\n", MAXINT, MININT), 1); 

#define standard trace and pad sizes to reduce scaling or PDF rendering errors:
#This avoids weird aperture settings and replaces them with more standardized values.
#(I'm not sure how photoplotters handle strange sizes).
#Fewer choices here gives more accurate mapping in the final Gerber files.
#units are in inches
use constant TOOL_SIZES => #add more as desired
(
#round or square pads (> 0) and drills (< 0):
    .010, -.001,  #tiny pads for SMD; dummy drill size (too small for practical use, but needed so StandardTool will use this entry)
    .031, -.014,  #used for vias
    .041, -.020,  #smallest non-filled plated hole
    .051, -.025,
    .056, -.029,  #useful for IC pins
    .070, -.033,
    .075, -.040,  #heavier leads
#    .090, -.043,  #NOTE: 600 dpi is not high enough resolution to reliably distinguish between .043" and .046", so choose 1 of the 2 here
    .100, -.046,
    .115, -.052,
    .130, -.061,
    .140, -.067,
    .150, -.079,
    .175, -.088,
    .190, -.093,
    .200, -.100,
    .220, -.110,
    .160, -.125,  #useful for mounting holes
#some additional pad sizes without holes (repeat a previous hole size if you just want the pad size):
    .090, -.040,  #want a .090 pad option, but use dummy hole size
    .065, -.040, #.065 x .065 rect pad
    .035, -.040, #.035 x .065 rect pad
#traces:
    .001,  #too thin for real traces; use only for board outlines
    .006,  #minimum real trace width; mainly used for text
    .008,  #mainly used for mid-sized text, not traces
    .010,  #minimum recommended trace width for low-current signals
    .012,
    .015,  #moderate low-voltage current
    .020,  #heavier trace for power, ground (even if a lighter one is adequate)
    .025,
    .030,  #heavy-current traces; be careful with these ones!
    .040,
    .050,
    .060,
    .080,
    .100,
    .120,
);
#Areas larger than the values below will be filled with parallel lines:
#This cuts down on the number of aperture sizes used.
#Set to 0 to always use an aperture or drill, regardless of size.
use constant { MAX_APERTURE => max((TOOL_SIZES)) + .004, MAX_DRILL => -min((TOOL_SIZES)) + .004 }; #max aperture and drill sizes (plus a little tolerance)
#DebugPrint(sprintf("using %d standard tool sizes: %s, max aper %.3f, max drill %.3f\n", scalar((TOOL_SIZES)), join(", ", (TOOL_SIZES)), MAX_APERTURE, MAX_DRILL), 1);

#NOTE: Compare the PDF to the original CAD file to check the accuracy of the PDF rendering and parsing!
#for example, the CAD software I used generated the following circles for holes:
#CAD hole size:   parsed PDF diameter:      error:
#  .014                .016                +.002
#  .020                .02267              +.00267
#  .025                .026                +.001
#  .029                .03167              +.00267
#  .033                .036                +.003
#  .040                .04267              +.00267
#This was usually ~ .002" - .003" too big compared to the hole as displayed in the CAD software.
#To compensate for PDF rendering errors (either during CAD Print function or PDF parsing logic), adjust the values below as needed.
#units are pixels; for example, a value of 2.4 at 600 dpi = .0004 inch, 2 at 600 dpi = .0033"
use constant
{
    HOLE_ADJUST => -0.004 * 600, #-2.6, #holes seemed to be slightly oversized (by .002" - .004"), so shrink them a little
    RNDPAD_ADJUST => -0.003 * 600, #-2, #-2.4, #round pads seemed to be slightly oversized, so shrink them a little
    SQRPAD_ADJUST => +0.001 * 600, #+.5, #square pads are sometimes too small by .00067, so bump them up a little
    RECTPAD_ADJUST => 0, #(pixels) rectangular pads seem to be okay? (not tested much)
    TRACE_ADJUST => 0, #(pixels) traces seemed to be okay?
    REDUCE_TOLERANCE => .001, #(inches) allow this much variation when reducing circles and rects
};

#Also, my CAD's Print function or the PDF print driver I used was a little off for circles, so define some additional adjustment values here:
#Values are added to X/Y coordinates; units are pixels; for example, a value of 1 at 600 dpi would be ~= .002 inch
use constant
{
    CIRCLE_ADJUST_MINX => 0,
    CIRCLE_ADJUST_MINY => -0.001 * 600, #-1, #circles were a little too high, so nudge them a little lower
    CIRCLE_ADJUST_MAXX => +0.001 * 600, #+1, #circles were a little too far to the left, so nudge them a little to the right
    CIRCLE_ADJUST_MAXY => 0,
    SUBST_CIRCLE_CLIPRECT => FALSE, #generate circle and substitute for clip rects (to compensate for the way some CAD software draws circles)
    WANT_CLIPRECT => TRUE, #FALSE, #AI doesn't need clip rect at all? should be on normally?
    RECT_COMPLETION => FALSE, #TRUE, #fill in 4th side of rect when 3 sides found
};

#allow .012 clearance around pads for solder mask:
#This value effectively adjusts pad sizes in the TOOL_SIZES list above (only for solder mask layers).
use constant SOLDER_MARGIN => +.012; #units are inches

#line join/cap styles:
use constant
{
    CAP_NONE => 0, #butt (none); line is exact length
    CAP_ROUND => 1, #round cap/join; line overhangs by a semi-circle at either end
    CAP_SQUARE => 2, #square cap/join; line overhangs by a half square on either end
    CAP_OVERRIDE => FALSE, #cap style overrides drawing logic
};
    
#number of elements in each shape type:
use constant
{
    RECT_SHAPELEN => 6, #x0, y0, x1, y1, count, "rect" (start, end corners)
    LINE_SHAPELEN => 6, #x0, y0, x1, y1, count, "line" (line seg)
    CURVE_SHAPELEN => 10, #xstart, ystart, x0, y0, x1, y1, xend, yend, count, "curve" (bezier 2 points)
    CIRCLE_SHAPELEN => 5, #x, y, 5, count, "circle" (center + radius)
};
#const my %SHAPELEN =
#Readonly my %SHAPELEN =>
our %SHAPELEN =
(
    rect => RECT_SHAPELEN,
    line => LINE_SHAPELEN,
    curve => CURVE_SHAPELEN,
    circle => CIRCLE_SHAPELEN,
);

#panelization:
#This will repeat the entire body the number of times indicated along the X or Y axes (files grow accordingly).
#Display elements that overhang PCB boundary can be squashed or left as-is (typically text or other silk screen markings).
#Set "overhangs" TRUE to allow overhangs, FALSE to truncate them.
#xpad and ypad allow margins to be added around outer edge of panelized PCB.
use constant PANELIZE => {'x' => 1, 'y' => 1, 'xpad' => 0, 'ypad' => 0, 'overhangs' => TRUE}; #number of times to repeat in X and Y directions

# Set this to 1 if you need TurboCAD support.
#$turboCAD = FALSE; #is this still needed as an option?

#CIRCAD pad generation uses an appropriate aperture, then moves it (stroke) "a little" - we use this to find pads and distinguish them from PCB holes. 
use constant PAD_STROKE => 0.3; #0.0005 * 600; #units are pixels
#convert very short traces to pads or holes:
use constant TRACE_MINLEN => .001; #units are inches
#use constant ALWAYS_XY => TRUE; #FALSE; #force XY even if X or Y doesn't change; NOTE: needs to be TRUE for all pads to show in FlatCAM and ViewPlot
use constant REMOVE_POLARITY => FALSE; #TRUE; #set to remove subtractive (negative) polarity; NOTE: must be FALSE for ground planes

#PDF uses "points", each point = 1/72 inch
#combined with a PDF scale factor of .12, this gives 600 dpi resolution (1/72 * .12 = 600 dpi)
use constant INCHES_PER_POINT => 1/72; #0.0138888889; #multiply point-size by this to get inches

# The precision used when computing a bezier curve. Higher numbers are more precise but slower (and generate larger files).
#$bezierPrecision = 100;
use constant BEZIER_PRECISION => 36; #100; #use const; reduced for faster rendering (mainly used for silk screen and thermal pads)

# Ground planes and silk screen or larger copper rectangles or circles are filled line-by-line using this resolution.
use constant FILL_WIDTH => .01; #fill at most 0.01 inch at a time

# The max number of characters to read into memory
use constant MAX_BYTES => 10 * M; #bumped up to 10 MB, use const

use constant DUP_DRILL1 => TRUE; #FALSE; #kludge: ViewPlot doesn't load drill files that are too small so duplicate first tool

my $runtime = time(); #Time::HiRes::gettimeofday(); #measure my execution time

print STDERR "Loaded config settings from '${\(__FILE__)}'.\n";
1; #last value must be truthful to indicate successful load


#############################################################################################
#junk/experiment:

#use Package::Constants;
#use Exporter qw(import); #https://perldoc.perl.org/Exporter.html

#my $caller = "pdf2gerb::";

#sub cfg
#{
#    my $proto = shift;
#    my $class = ref($proto) || $proto;
#    my $settings =
#    {
#        $WANT_DEBUG => 990, #10; #level of debug wanted; higher == more, lower == less, 0 == none
#    };
#    bless($settings, $class);
#    return $settings;
#}

#use constant HELLO => "hi there2"; #"main::HELLO" => "hi there";
#use constant GOODBYE => 14; #"main::GOODBYE" => 12;

#print STDERR "read cfg file\n";

#our @EXPORT_OK = Package::Constants->list(__PACKAGE__); #https://www.perlmonks.org/?node_id=1072691; NOTE: "_OK" skips short/common names

#print STDERR scalar(@EXPORT_OK) . " consts exported:\n";
#foreach(@EXPORT_OK) { print STDERR "$_\n"; }
#my $val = main::thing("xyz");
#print STDERR "caller gave me $val\n";
#foreach my $arg (@ARGV) { print STDERR "arg $arg\n"; }

Download Details:

Author: swannman
Source Code: https://github.com/swannman/pdf2gerb

License: GPL-3.0 license

#perl 

Shubham Ankit

Shubham Ankit

1657081614

How to Automate Excel with Python | Python Excel Tutorial (OpenPyXL)

How to Automate Excel with Python

In this article, We will show how we can use python to automate Excel . A useful Python library is Openpyxl which we will learn to do Excel Automation

What is OPENPYXL

Openpyxl is a Python library that is used to read from an Excel file or write to an Excel file. Data scientists use Openpyxl for data analysis, data copying, data mining, drawing charts, styling sheets, adding formulas, and more.

Workbook: A spreadsheet is represented as a workbook in openpyxl. A workbook consists of one or more sheets.

Sheet: A sheet is a single page composed of cells for organizing data.

Cell: The intersection of a row and a column is called a cell. Usually represented by A1, B5, etc.

Row: A row is a horizontal line represented by a number (1,2, etc.).

Column: A column is a vertical line represented by a capital letter (A, B, etc.).

Openpyxl can be installed using the pip command and it is recommended to install it in a virtual environment.

pip install openpyxl

CREATE A NEW WORKBOOK

We start by creating a new spreadsheet, which is called a workbook in Openpyxl. We import the workbook module from Openpyxl and use the function Workbook() which creates a new workbook.

from openpyxl
import Workbook
#creates a new workbook
wb = Workbook()
#Gets the first active worksheet
ws = wb.active
#creating new worksheets by using the create_sheet method

ws1 = wb.create_sheet("sheet1", 0) #inserts at first position
ws2 = wb.create_sheet("sheet2") #inserts at last position
ws3 = wb.create_sheet("sheet3", -1) #inserts at penultimate position

#Renaming the sheet
ws.title = "Example"

#save the workbook
wb.save(filename = "example.xlsx")

READING DATA FROM WORKBOOK

We load the file using the function load_Workbook() which takes the filename as an argument. The file must be saved in the same working directory.

#loading a workbook
wb = openpyxl.load_workbook("example.xlsx")

 

GETTING SHEETS FROM THE LOADED WORKBOOK

 

#getting sheet names
wb.sheetnames
result = ['sheet1', 'Sheet', 'sheet3', 'sheet2']

#getting a particular sheet
sheet1 = wb["sheet2"]

#getting sheet title
sheet1.title
result = 'sheet2'

#Getting the active sheet
sheetactive = wb.active
result = 'sheet1'

 

ACCESSING CELLS AND CELL VALUES

 

#get a cell from the sheet
sheet1["A1"] <
  Cell 'Sheet1'.A1 >

  #get the cell value
ws["A1"].value 'Segment'

#accessing cell using row and column and assigning a value
d = ws.cell(row = 4, column = 2, value = 10)
d.value
10

 

ITERATING THROUGH ROWS AND COLUMNS

 

#looping through each row and column
for x in range(1, 5):
  for y in range(1, 5):
  print(x, y, ws.cell(row = x, column = y)
    .value)

#getting the highest row number
ws.max_row
701

#getting the highest column number
ws.max_column
19

There are two functions for iterating through rows and columns.

Iter_rows() => returns the rows
Iter_cols() => returns the columns {
  min_row = 4, max_row = 5, min_col = 2, max_col = 5
} => This can be used to set the boundaries
for any iteration.

Example:

#iterating rows
for row in ws.iter_rows(min_row = 2, max_col = 3, max_row = 3):
  for cell in row:
  print(cell) <
  Cell 'Sheet1'.A2 >
  <
  Cell 'Sheet1'.B2 >
  <
  Cell 'Sheet1'.C2 >
  <
  Cell 'Sheet1'.A3 >
  <
  Cell 'Sheet1'.B3 >
  <
  Cell 'Sheet1'.C3 >

  #iterating columns
for col in ws.iter_cols(min_row = 2, max_col = 3, max_row = 3):
  for cell in col:
  print(cell) <
  Cell 'Sheet1'.A2 >
  <
  Cell 'Sheet1'.A3 >
  <
  Cell 'Sheet1'.B2 >
  <
  Cell 'Sheet1'.B3 >
  <
  Cell 'Sheet1'.C2 >
  <
  Cell 'Sheet1'.C3 >

To get all the rows of the worksheet we use the method worksheet.rows and to get all the columns of the worksheet we use the method worksheet.columns. Similarly, to iterate only through the values we use the method worksheet.values.


Example:

for row in ws.values:
  for value in row:
  print(value)

 

WRITING DATA TO AN EXCEL FILE

Writing to a workbook can be done in many ways such as adding a formula, adding charts, images, updating cell values, inserting rows and columns, etc… We will discuss each of these with an example.

 

CREATING AND SAVING A NEW WORKBOOK

 

#creates a new workbook
wb = openpyxl.Workbook()

#saving the workbook
wb.save("new.xlsx")

 

ADDING AND REMOVING SHEETS

 

#creating a new sheet
ws1 = wb.create_sheet(title = "sheet 2")

#creating a new sheet at index 0
ws2 = wb.create_sheet(index = 0, title = "sheet 0")

#checking the sheet names
wb.sheetnames['sheet 0', 'Sheet', 'sheet 2']

#deleting a sheet
del wb['sheet 0']

#checking sheetnames
wb.sheetnames['Sheet', 'sheet 2']

 

ADDING CELL VALUES

 

#checking the sheet value
ws['B2'].value
null

#adding value to cell
ws['B2'] = 367

#checking value
ws['B2'].value
367

 

ADDING FORMULAS

 

We often require formulas to be included in our Excel datasheet. We can easily add formulas using the Openpyxl module just like you add values to a cell.
 

For example:

import openpyxl
from openpyxl
import Workbook

wb = openpyxl.load_workbook("new1.xlsx")
ws = wb['Sheet']

ws['A9'] = '=SUM(A2:A8)'

wb.save("new2.xlsx")

The above program will add the formula (=SUM(A2:A8)) in cell A9. The result will be as below.

image

 

MERGE/UNMERGE CELLS

Two or more cells can be merged to a rectangular area using the method merge_cells(), and similarly, they can be unmerged using the method unmerge_cells().

For example:
Merge cells

#merge cells B2 to C9
ws.merge_cells('B2:C9')
ws['B2'] = "Merged cells"

Adding the above code to the previous example will merge cells as below.

image

UNMERGE CELLS

 

#unmerge cells B2 to C9
ws.unmerge_cells('B2:C9')

The above code will unmerge cells from B2 to C9.

INSERTING AN IMAGE

To insert an image we import the image function from the module openpyxl.drawing.image. We then load our image and add it to the cell as shown in the below example.

Example:

import openpyxl
from openpyxl
import Workbook
from openpyxl.drawing.image
import Image

wb = openpyxl.load_workbook("new1.xlsx")
ws = wb['Sheet']
#loading the image(should be in same folder)
img = Image('logo.png')
ws['A1'] = "Adding image"
#adjusting size
img.height = 130
img.width = 200
#adding img to cell A3

ws.add_image(img, 'A3')

wb.save("new2.xlsx")

Result:

image

CREATING CHARTS

Charts are essential to show a visualization of data. We can create charts from Excel data using the Openpyxl module chart. Different forms of charts such as line charts, bar charts, 3D line charts, etc., can be created. We need to create a reference that contains the data to be used for the chart, which is nothing but a selection of cells (rows and columns). I am using sample data to create a 3D bar chart in the below example:

Example

import openpyxl
from openpyxl
import Workbook
from openpyxl.chart
import BarChart3D, Reference, series

wb = openpyxl.load_workbook("example.xlsx")
ws = wb.active

values = Reference(ws, min_col = 3, min_row = 2, max_col = 3, max_row = 40)
chart = BarChart3D()
chart.add_data(values)
ws.add_chart(chart, "E3")
wb.save("MyChart.xlsx")

Result
image


How to Automate Excel with Python with Video Tutorial

Welcome to another video! In this video, We will cover how we can use python to automate Excel. I'll be going over everything from creating workbooks to accessing individual cells and stylizing cells. There is a ton of things that you can do with Excel but I'll just be covering the core/base things in OpenPyXl.

⭐️ Timestamps ⭐️
00:00 | Introduction
02:14 | Installing openpyxl
03:19 | Testing Installation
04:25 | Loading an Existing Workbook
06:46 | Accessing Worksheets
07:37 | Accessing Cell Values
08:58 | Saving Workbooks
09:52 | Creating, Listing and Changing Sheets
11:50 | Creating a New Workbook
12:39 | Adding/Appending Rows
14:26 | Accessing Multiple Cells
20:46 | Merging Cells
22:27 | Inserting and Deleting Rows
23:35 | Inserting and Deleting Columns
24:48 | Copying and Moving Cells
26:06 | Practical Example, Formulas & Cell Styling

📄 Resources 📄
OpenPyXL Docs: https://openpyxl.readthedocs.io/en/stable/ 
Code Written in This Tutorial: https://github.com/techwithtim/ExcelPythonTutorial 
Subscribe: https://www.youtube.com/c/TechWithTim/featured 

#python 

Face Recognition with OpenCV and Python

Introduction

What is face recognition? Or what is recognition? When you look at an apple fruit, your mind immediately tells you that this is an apple fruit. This process, your mind telling you that this is an apple fruit is recognition in simple words. So what is face recognition then? I am sure you have guessed it right. When you look at your friend walking down the street or a picture of him, you recognize that he is your friend Paulo. Interestingly when you look at your friend or a picture of him you look at his face first before looking at anything else. Ever wondered why you do that? This is so that you can recognize him by looking at his face. Well, this is you doing face recognition.

But the real question is how does face recognition works? It is quite simple and intuitive. Take a real life example, when you meet someone first time in your life you don't recognize him, right? While he talks or shakes hands with you, you look at his face, eyes, nose, mouth, color and overall look. This is your mind learning or training for the face recognition of that person by gathering face data. Then he tells you that his name is Paulo. At this point your mind knows that the face data it just learned belongs to Paulo. Now your mind is trained and ready to do face recognition on Paulo's face. Next time when you will see Paulo or his face in a picture you will immediately recognize him. This is how face recognition work. The more you will meet Paulo, the more data your mind will collect about Paulo and especially his face and the better you will become at recognizing him.

Now the next question is how to code face recognition with OpenCV, after all this is the only reason why you are reading this article, right? OK then. You might say that our mind can do these things easily but to actually code them into a computer is difficult? Don't worry, it is not. Thanks to OpenCV, coding face recognition is as easier as it feels. The coding steps for face recognition are same as we discussed it in real life example above.

  • Training Data Gathering: Gather face data (face images in this case) of the persons you want to recognize
  • Training of Recognizer: Feed that face data (and respective names of each face) to the face recognizer so that it can learn.
  • Recognition: Feed new faces of the persons and see if the face recognizer you just trained recognizes them.

OpenCV comes equipped with built in face recognizer, all you have to do is feed it the face data. It's that simple and this how it will look once we are done coding it.

visualization

OpenCV Face Recognizers

OpenCV has three built in face recognizers and thanks to OpenCV's clean coding, you can use any of them by just changing a single line of code. Below are the names of those face recognizers and their OpenCV calls.

  1. EigenFaces Face Recognizer Recognizer - cv2.face.createEigenFaceRecognizer()
  2. FisherFaces Face Recognizer Recognizer - cv2.face.createFisherFaceRecognizer()
  3. Local Binary Patterns Histograms (LBPH) Face Recognizer - cv2.face.createLBPHFaceRecognizer()

We have got three face recognizers but do you know which one to use and when? Or which one is better? I guess not. So why not go through a brief summary of each, what you say? I am assuming you said yes :) So let's dive into the theory of each.

EigenFaces Face Recognizer

This algorithm considers the fact that not all parts of a face are equally important and equally useful. When you look at some one you recognize him/her by his distinct features like eyes, nose, cheeks, forehead and how they vary with respect to each other. So you are actually focusing on the areas of maximum change (mathematically speaking, this change is variance) of the face. For example, from eyes to nose there is a significant change and same is the case from nose to mouth. When you look at multiple faces you compare them by looking at these parts of the faces because these parts are the most useful and important components of a face. Important because they catch the maximum change among faces, change the helps you differentiate one face from the other. This is exactly how EigenFaces face recognizer works.

EigenFaces face recognizer looks at all the training images of all the persons as a whole and try to extract the components which are important and useful (the components that catch the maximum variance/change) and discards the rest of the components. This way it not only extracts the important components from the training data but also saves memory by discarding the less important components. These important components it extracts are called principal components. Below is an image showing the principal components extracted from a list of faces.

Principal Components eigenfaces_opencv source

You can see that principal components actually represent faces and these faces are called eigen faces and hence the name of the algorithm.

So this is how EigenFaces face recognizer trains itself (by extracting principal components). Remember, it also keeps a record of which principal component belongs to which person. One thing to note in above image is that Eigenfaces algorithm also considers illumination as an important component.

Later during recognition, when you feed a new image to the algorithm, it repeats the same process on that image as well. It extracts the principal component from that new image and compares that component with the list of components it stored during training and finds the component with the best match and returns the person label associated with that best match component.

Easy peasy, right? Next one is easier than this one.

FisherFaces Face Recognizer

This algorithm is an improved version of EigenFaces face recognizer. Eigenfaces face recognizer looks at all the training faces of all the persons at once and finds principal components from all of them combined. By capturing principal components from all the of them combined you are not focusing on the features that discriminate one person from the other but the features that represent all the persons in the training data as a whole.

This approach has drawbacks, for example, images with sharp changes (like light changes which is not a useful feature at all) may dominate the rest of the images and you may end up with features that are from external source like light and are not useful for discrimination at all. In the end, your principal components will represent light changes and not the actual face features.

Fisherfaces algorithm, instead of extracting useful features that represent all the faces of all the persons, it extracts useful features that discriminate one person from the others. This way features of one person do not dominate over the others and you have the features that discriminate one person from the others.

Below is an image of features extracted using Fisherfaces algorithm.

Fisher Faces eigenfaces_opencv source

You can see that features extracted actually represent faces and these faces are called fisher faces and hence the name of the algorithm.

One thing to note here is that even in Fisherfaces algorithm if multiple persons have images with sharp changes due to external sources like light they will dominate over other features and affect recognition accuracy.

Getting bored with this theory? Don't worry, only one face recognizer is left and then we will dive deep into the coding part.

Local Binary Patterns Histograms (LBPH) Face Recognizer

I wrote a detailed explaination on Local Binary Patterns Histograms in my previous article on face detection using local binary patterns histograms. So here I will just give a brief overview of how it works.

We know that Eigenfaces and Fisherfaces are both affected by light and in real life we can't guarantee perfect light conditions. LBPH face recognizer is an improvement to overcome this drawback.

Idea is to not look at the image as a whole instead find the local features of an image. LBPH alogrithm try to find the local structure of an image and it does that by comparing each pixel with its neighboring pixels.

Take a 3x3 window and move it one image, at each move (each local part of an image), compare the pixel at the center with its neighbor pixels. The neighbors with intensity value less than or equal to center pixel are denoted by 1 and others by 0. Then you read these 0/1 values under 3x3 window in a clockwise order and you will have a binary pattern like 11100011 and this pattern is local to some area of the image. You do this on whole image and you will have a list of local binary patterns.

LBP Labeling LBP labeling

Now you get why this algorithm has Local Binary Patterns in its name? Because you get a list of local binary patterns. Now you may be wondering, what about the histogram part of the LBPH? Well after you get a list of local binary patterns, you convert each binary pattern into a decimal number (as shown in above image) and then you make a histogram of all of those values. A sample histogram looks like this.

Sample Histogram LBP labeling

I guess this answers the question about histogram part. So in the end you will have one histogram for each face image in the training data set. That means if there were 100 images in training data set then LBPH will extract 100 histograms after training and store them for later recognition. Remember, algorithm also keeps track of which histogram belongs to which person.

Later during recognition, when you will feed a new image to the recognizer for recognition it will generate a histogram for that new image, compare that histogram with the histograms it already has, find the best match histogram and return the person label associated with that best match histogram. 

Below is a list of faces and their respective local binary patterns images. You can see that the LBP images are not affected by changes in light conditions.

LBP Faces LBP faces source

The theory part is over and now comes the coding part! Ready to dive into coding? Let's get into it then.

Coding Face Recognition with OpenCV

The Face Recognition process in this tutorial is divided into three steps.

  1. Prepare training data: In this step we will read training images for each person/subject along with their labels, detect faces from each image and assign each detected face an integer label of the person it belongs to.
  2. Train Face Recognizer: In this step we will train OpenCV's LBPH face recognizer by feeding it the data we prepared in step 1.
  3. Testing: In this step we will pass some test images to face recognizer and see if it predicts them correctly.

[There should be a visualization diagram for above steps here]

To detect faces, I will use the code from my previous article on face detection. So if you have not read it, I encourage you to do so to understand how face detection works and its Python coding.

Import Required Modules

Before starting the actual coding we need to import the required modules for coding. So let's import them first.

  • cv2: is OpenCV module for Python which we will use for face detection and face recognition.
  • os: We will use this Python module to read our training directories and file names.
  • numpy: We will use this module to convert Python lists to numpy arrays as OpenCV face recognizers accept numpy arrays.
#import OpenCV module
import cv2
#import os module for reading training data directories and paths
import os
#import numpy to convert python lists to numpy arrays as 
#it is needed by OpenCV face recognizers
import numpy as np

#matplotlib for display our images
import matplotlib.pyplot as plt
%matplotlib inline 

Training Data

The more images used in training the better. Normally a lot of images are used for training a face recognizer so that it can learn different looks of the same person, for example with glasses, without glasses, laughing, sad, happy, crying, with beard, without beard etc. To keep our tutorial simple we are going to use only 12 images for each person.

So our training data consists of total 2 persons with 12 images of each person. All training data is inside training-data folder. training-data folder contains one folder for each person and each folder is named with format sLabel (e.g. s1, s2) where label is actually the integer label assigned to that person. For example folder named s1 means that this folder contains images for person 1. The directory structure tree for training data is as follows:

training-data
|-------------- s1
|               |-- 1.jpg
|               |-- ...
|               |-- 12.jpg
|-------------- s2
|               |-- 1.jpg
|               |-- ...
|               |-- 12.jpg

The test-data folder contains images that we will use to test our face recognizer after it has been successfully trained.

As OpenCV face recognizer accepts labels as integers so we need to define a mapping between integer labels and persons actual names so below I am defining a mapping of persons integer labels and their respective names.

Note: As we have not assigned label 0 to any person so the mapping for label 0 is empty.

#there is no label 0 in our training data so subject name for index/label 0 is empty
subjects = ["", "Tom Cruise", "Shahrukh Khan"]

Prepare training data

You may be wondering why data preparation, right? Well, OpenCV face recognizer accepts data in a specific format. It accepts two vectors, one vector is of faces of all the persons and the second vector is of integer labels for each face so that when processing a face the face recognizer knows which person that particular face belongs too.

For example, if we had 2 persons and 2 images for each person.

PERSON-1    PERSON-2   

img1        img1         
img2        img2

Then the prepare data step will produce following face and label vectors.

FACES                        LABELS

person1_img1_face              1
person1_img2_face              1
person2_img1_face              2
person2_img2_face              2

Preparing data step can be further divided into following sub-steps.

  1. Read all the folder names of subjects/persons provided in training data folder. So for example, in this tutorial we have folder names: s1, s2.
  2. For each subject, extract label number. Do you remember that our folders have a special naming convention? Folder names follow the format sLabel where Label is an integer representing the label we have assigned to that subject. So for example, folder name s1 means that the subject has label 1, s2 means subject label is 2 and so on. The label extracted in this step is assigned to each face detected in the next step.
  3. Read all the images of the subject, detect face from each image.
  4. Add each face to faces vector with corresponding subject label (extracted in above step) added to labels vector.

[There should be a visualization for above steps here]

Did you read my last article on face detection? No? Then you better do so right now because to detect faces, I am going to use the code from my previous article on face detection. So if you have not read it, I encourage you to do so to understand how face detection works and its coding. Below is the same code.

#function to detect face using OpenCV
def detect_face(img):
    #convert the test image to gray image as opencv face detector expects gray images
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    #load OpenCV face detector, I am using LBP which is fast
    #there is also a more accurate but slow Haar classifier
    face_cascade = cv2.CascadeClassifier('opencv-files/lbpcascade_frontalface.xml')

    #let's detect multiscale (some images may be closer to camera than others) images
    #result is a list of faces
    faces = face_cascade.detectMultiScale(gray, scaleFactor=1.2, minNeighbors=5);
    
    #if no faces are detected then return original img
    if (len(faces) == 0):
        return None, None
    
    #under the assumption that there will be only one face,
    #extract the face area
    (x, y, w, h) = faces[0]
    
    #return only the face part of the image
    return gray[y:y+w, x:x+h], faces[0]

I am using OpenCV's LBP face detector. On line 4, I convert the image to grayscale because most operations in OpenCV are performed in gray scale, then on line 8 I load LBP face detector using cv2.CascadeClassifier class. After that on line 12 I use cv2.CascadeClassifier class' detectMultiScale method to detect all the faces in the image. on line 20, from detected faces I only pick the first face because in one image there will be only one face (under the assumption that there will be only one prominent face). As faces returned by detectMultiScale method are actually rectangles (x, y, width, height) and not actual faces images so we have to extract face image area from the main image. So on line 23 I extract face area from gray image and return both the face image area and face rectangle.

Now you have got a face detector and you know the 4 steps to prepare the data, so are you ready to code the prepare data step? Yes? So let's do it.

#this function will read all persons' training images, detect face from each image
#and will return two lists of exactly same size, one list 
# of faces and another list of labels for each face
def prepare_training_data(data_folder_path):
    
    #------STEP-1--------
    #get the directories (one directory for each subject) in data folder
    dirs = os.listdir(data_folder_path)
    
    #list to hold all subject faces
    faces = []
    #list to hold labels for all subjects
    labels = []
    
    #let's go through each directory and read images within it
    for dir_name in dirs:
        
        #our subject directories start with letter 's' so
        #ignore any non-relevant directories if any
        if not dir_name.startswith("s"):
            continue;
            
        #------STEP-2--------
        #extract label number of subject from dir_name
        #format of dir name = slabel
        #, so removing letter 's' from dir_name will give us label
        label = int(dir_name.replace("s", ""))
        
        #build path of directory containin images for current subject subject
        #sample subject_dir_path = "training-data/s1"
        subject_dir_path = data_folder_path + "/" + dir_name
        
        #get the images names that are inside the given subject directory
        subject_images_names = os.listdir(subject_dir_path)
        
        #------STEP-3--------
        #go through each image name, read image, 
        #detect face and add face to list of faces
        for image_name in subject_images_names:
            
            #ignore system files like .DS_Store
            if image_name.startswith("."):
                continue;
            
            #build image path
            #sample image path = training-data/s1/1.pgm
            image_path = subject_dir_path + "/" + image_name

            #read image
            image = cv2.imread(image_path)
            
            #display an image window to show the image 
            cv2.imshow("Training on image...", image)
            cv2.waitKey(100)
            
            #detect face
            face, rect = detect_face(image)
            
            #------STEP-4--------
            #for the purpose of this tutorial
            #we will ignore faces that are not detected
            if face is not None:
                #add face to list of faces
                faces.append(face)
                #add label for this face
                labels.append(label)
            
    cv2.destroyAllWindows()
    cv2.waitKey(1)
    cv2.destroyAllWindows()
    
    return faces, labels

I have defined a function that takes the path, where training subjects' folders are stored, as parameter. This function follows the same 4 prepare data substeps mentioned above.

(step-1) On line 8 I am using os.listdir method to read names of all folders stored on path passed to function as parameter. On line 10-13 I am defining labels and faces vectors.

(step-2) After that I traverse through all subjects' folder names and from each subject's folder name on line 27 I am extracting the label information. As folder names follow the sLabel naming convention so removing the letter s from folder name will give us the label assigned to that subject.

(step-3) On line 34, I read all the images names of of the current subject being traversed and on line 39-66 I traverse those images one by one. On line 53-54 I am using OpenCV's imshow(window_title, image) along with OpenCV's waitKey(interval) method to display the current image being traveresed. The waitKey(interval) method pauses the code flow for the given interval (milliseconds), I am using it with 100ms interval so that we can view the image window for 100ms. On line 57, I detect face from the current image being traversed.

(step-4) On line 62-66, I add the detected face and label to their respective vectors.

But a function can't do anything unless we call it on some data that it has to prepare, right? Don't worry, I have got data of two beautiful and famous celebrities. I am sure you will recognize them!

training-data

Let's call this function on images of these beautiful celebrities to prepare data for training of our Face Recognizer. Below is a simple code to do that.

#let's first prepare our training data
#data will be in two lists of same size
#one list will contain all the faces
#and other list will contain respective labels for each face
print("Preparing data...")
faces, labels = prepare_training_data("training-data")
print("Data prepared")

#print total faces and labels
print("Total faces: ", len(faces))
print("Total labels: ", len(labels))
Preparing data...
Data prepared
Total faces:  23
Total labels:  23

This was probably the boring part, right? Don't worry, the fun stuff is coming up next. It's time to train our own face recognizer so that once trained it can recognize new faces of the persons it was trained on. Read? Ok then let's train our face recognizer.

Train Face Recognizer

As we know, OpenCV comes equipped with three face recognizers.

  1. EigenFace Recognizer: This can be created with cv2.face.createEigenFaceRecognizer()
  2. FisherFace Recognizer: This can be created with cv2.face.createFisherFaceRecognizer()
  3. Local Binary Patterns Histogram (LBPH): This can be created with cv2.face.LBPHFisherFaceRecognizer()

I am going to use LBPH face recognizer but you can use any face recognizer of your choice. No matter which of the OpenCV's face recognizer you use the code will remain the same. You just have to change one line, the face recognizer initialization line given below.

#create our LBPH face recognizer 
face_recognizer = cv2.face.createLBPHFaceRecognizer()

#or use EigenFaceRecognizer by replacing above line with 
#face_recognizer = cv2.face.createEigenFaceRecognizer()

#or use FisherFaceRecognizer by replacing above line with 
#face_recognizer = cv2.face.createFisherFaceRecognizer()

Now that we have initialized our face recognizer and we also have prepared our training data, it's time to train the face recognizer. We will do that by calling the train(faces-vector, labels-vector) method of face recognizer.

#train our face recognizer of our training faces
face_recognizer.train(faces, np.array(labels))

Did you notice that instead of passing labels vector directly to face recognizer I am first converting it to numpy array? This is because OpenCV expects labels vector to be a numpy array.

Still not satisfied? Want to see some action? Next step is the real action, I promise!

Prediction

Now comes my favorite part, the prediction part. This is where we actually get to see if our algorithm is actually recognizing our trained subjects's faces or not. We will take two test images of our celeberities, detect faces from each of them and then pass those faces to our trained face recognizer to see if it recognizes them.

Below are some utility functions that we will use for drawing bounding box (rectangle) around face and putting celeberity name near the face bounding box.

#function to draw rectangle on image 
#according to given (x, y) coordinates and 
#given width and heigh
def draw_rectangle(img, rect):
    (x, y, w, h) = rect
    cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
    
#function to draw text on give image starting from
#passed (x, y) coordinates. 
def draw_text(img, text, x, y):
    cv2.putText(img, text, (x, y), cv2.FONT_HERSHEY_PLAIN, 1.5, (0, 255, 0), 2)

First function draw_rectangle draws a rectangle on image based on passed rectangle coordinates. It uses OpenCV's built in function cv2.rectangle(img, topLeftPoint, bottomRightPoint, rgbColor, lineWidth) to draw rectangle. We will use it to draw a rectangle around the face detected in test image.

Second function draw_text uses OpenCV's built in function cv2.putText(img, text, startPoint, font, fontSize, rgbColor, lineWidth) to draw text on image.

Now that we have the drawing functions, we just need to call the face recognizer's predict(face) method to test our face recognizer on test images. Following function does the prediction for us.

#this function recognizes the person in image passed
#and draws a rectangle around detected face with name of the 
#subject
def predict(test_img):
    #make a copy of the image as we don't want to chang original image
    img = test_img.copy()
    #detect face from the image
    face, rect = detect_face(img)

    #predict the image using our face recognizer 
    label= face_recognizer.predict(face)
    #get name of respective label returned by face recognizer
    label_text = subjects[label]
    
    #draw a rectangle around face detected
    draw_rectangle(img, rect)
    #draw name of predicted person
    draw_text(img, label_text, rect[0], rect[1]-5)
    
    return img
  • line-6 read the test image
  • line-7 detect face from test image
  • line-11 recognize the face by calling face recognizer's predict(face) method. This method will return a lable
  • line-12 get the name associated with the label
  • line-16 draw rectangle around the detected face
  • line-18 draw name of predicted subject above face rectangle

Now that we have the prediction function well defined, next step is to actually call this function on our test images and display those test images to see if our face recognizer correctly recognized them. So let's do it. This is what we have been waiting for.

print("Predicting images...")

#load test images
test_img1 = cv2.imread("test-data/test1.jpg")
test_img2 = cv2.imread("test-data/test2.jpg")

#perform a prediction
predicted_img1 = predict(test_img1)
predicted_img2 = predict(test_img2)
print("Prediction complete")

#create a figure of 2 plots (one for each test image)
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

#display test image1 result
ax1.imshow(cv2.cvtColor(predicted_img1, cv2.COLOR_BGR2RGB))

#display test image2 result
ax2.imshow(cv2.cvtColor(predicted_img2, cv2.COLOR_BGR2RGB))

#display both images
cv2.imshow("Tom cruise test", predicted_img1)
cv2.imshow("Shahrukh Khan test", predicted_img2)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.waitKey(1)
cv2.destroyAllWindows()
Predicting images...
Prediction complete

wohooo! Is'nt it beautiful? Indeed, it is!

End Notes

Face Recognition is a fascinating idea to work on and OpenCV has made it extremely simple and easy for us to code it. It just takes a few lines of code to have a fully working face recognition application and we can switch between all three face recognizers with a single line of code change. It's that simple.

Although EigenFaces, FisherFaces and LBPH face recognizers are good but there are even better ways to perform face recognition like using Histogram of Oriented Gradients (HOGs) and Neural Networks. So the more advanced face recognition algorithms are now a days implemented using a combination of OpenCV and Machine learning. I have plans to write some articles on those more advanced methods as well, so stay tuned!

Download Details:
Author: informramiz
Source Code: https://github.com/informramiz/opencv-face-recognition-python
License: MIT License

#opencv  #python #facerecognition