How to add a if condition in re.sub in python

I am using the following code to replace the strings in words with words[0] in the given sentences.

I am using the following code to replace the strings in words with words[0] in the given sentences.

import re
sentences = ['industrial text minings', 'i love advanced data minings and text mining']

words = ["data mining", "advanced data mining", "data minings", "text mining"]

start_terms = sorted(words, key=lambda x: len(x), reverse=True)
start_re = "|".join(re.escape(item) for item in start_terms)

results = []

for sentence in sentences:
for terms in words:
    if terms in sentence:
        result = re.sub(start_re, words[0], sentence)
        results.append(result)
        break

print(results)

My expected output is as follows:

[industrial text minings', 'i love data mining and data mining]

However, what I am getting is:

[industrial data minings', 'i love data mining and data mining]

In the first sentence text minings is not in words. However, it contains "text mining" in the words list, so the condition "text mining" in "industrial text minings" becomes True. Then post replacement, it "text mining" becomes "data mining", with the 's' character staying at the same place. I want to avoid such situations.

Therefore, I am wondering if there is a way to use if condition in re.sub to see if the next character is a space or not. If a space, do the replacement, else do not do it.

I am also happy with other solutions that could resolve my issue.

An Introduction to Regex in Python

An Introduction to Regex in Python

An Introduction to Regex in Python

What is Regex?

Regex stands for Regular Expression and essentially is an *easy *way to define a pattern of characters. Regex is mostly used in pattern identification, text mining or input validation.

Regex puts a lot of people off, because it looks like gibberish on first look; as for the people who know how to use it, they can’t seem to stop! It’s a very powerful tool that is worth learning about if you don’t already know.

Introduction to Regex

The first thing you need to know about regex, is that you can match a specific character or words.

Let’s assume, that we want to know whether a specific string, contains the letter ‘a’ or word ‘lot’. If that is the case, we can use the following python code:

import re
str = "Learning regex can be a lot of fun"
lst = re.findall('a', str)
lst2 = re.findall('lot', str)
print(lst)
print(lst2)

which will return, a list with 3 matches and a list of 1:

['a', 'a', 'a']
['lot']

Keeping our set up the same, imagine that you want to search for the following 3 letters in any order a, b or c. You can use a list, by using square brackets:

lst = re.findall('[abc]', str)
lst2 = re.findall('[a-c]', str)
print(lst)
print(lst2)

returning:

['a', 'c', 'a', 'b', 'a']
['a', 'c', 'a', 'b', 'a']

Photo by Dayne Topkin on Unsplash

The Regex Cheat Sheet

Every time I am about to write a complicate regular expression, my first port of contact is the following list, by Dr Chuck Severance:

Python Regular Expression Quick Guide

^        Matches the beginning of a line
$        Matches the end of the line
.        Matches any character
\s       Matches whitespace
\S       Matches any non-whitespace character
*        Repeats a character zero or more times
*?       Repeats a character zero or more times 
         (non-greedy)
+        Repeats a character one or more times
+?       Repeats a character one or more times 
         (non-greedy)
[aeiou]  Matches a single character in the listed set
[^XYZ]   Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
(        Indicates where string extraction is to start
)        Indicates where string extraction is to end

Using the above cheat sheet as a guide, you can pretty much come up with any syntax. Let’s take a closer look in some more complicate search patterns.

Stepping it Up

Imagine that you are building some sort of validation on an input field where the user can input any number followed by the letter d, m or y.

Your regex algorithm would look something like this:

^[0-9]+[dmy]$

Decomposing the above: ^ signifies the beginning of the match followed by a 0–9 number. However the + sign means it needs to be at least one 0–9 number though there can be more. Then the string needs to be followed by d, m or y, which have to be at the end because of $.

Testing the above in python:

import re
str = '1d'
str2 = '200y'
str3 = 'y200'
lst = re.findall('^[0-9]+[dmy]$', str)
lst2 = re.findall('^[0-9]+[dmy]$', str2)
lst3 = re.findall('^[0-9]+[dmy]$', str3)
print(lst)
print(lst2)
print(lst3)

Returning:

['1d']
['200y']
[]

Photo by Arget on Unsplash

Escaping Special Characters

When it comes to regular expressions, certain characters are special. For instance, dot, star and dollar sign are all used for matching purposes. So what happens if you want to match those characters?

In that case, we can use the back slash.

import re
str = 'Sentences have dots. How do we escape them?'
lst = re.findall('.', str)
lst1 = re.findall('\.', str)
print(lst)
print(lst1)

The above example is using dot, and backslash dot. As you would expect, it returns two results. The first one matches all characters, while the second one, only the dot.

['S', 'e', 'n', 't', 'e', 'n', 'c', 'e', 's', ' ', 'h', 'a', 'v', 'e', ' ', 'd', 'o', 't', 's', '.', ' ', 'H', 'o', 'w', ' ', 'd', 'o', ' ', 'w', 'e', ' ', 'e', 's', 'c', 'a', 'p', 'e', ' ', 't', 'h', 'e', 'm', '?']
['.']

Matching exact number of characters

Imagine that you want to match a date. You know that what the format will be, DD/MM/YYYY. Sometimes there will be 2Ds or 2Ms, sometimes just one, but always 4Ys.

import re
str = 'The date is 22/10/2018'
str1 = 'The date is 3/1/2019'
lst = re.findall('[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4}', str)
lst = re.findall('[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4}', str1)
print(lst)
print(lst1)

Which gives the following results:

['22/10/2018']
['3/1/2019']

Extracting the matched pattern

There are certain times, that knowing the fact that you’re matching a pattern is not enough. You want to have the ability to extract information from the match.

For instance, imagine that you are scanning a large data set looking for email addresses. If you use what we learnt about, you could search for a pattern of:

  • Could start with a letter, number, dot or underscore
  • Then followed by at least another letter, or number
  • Which could be followed by a dot or an underscore
  • Then there’s a @
  • Then follow the same logic again as before the @
  • Finally look for a dot followed by at least a letter
^[a-zA-Z0-9\.\_]*[a-zA-Z0-9]+[\.\_]*\@[a-zA-Z0-9\.\_]*[a-zA-Z0-9]+[\.\_]*\.[a-zA-z]+

From the above match, you only want to extract the domain name ie everything after the @. All you have to do is add brackets around what you’re after:

import re
str = '[email protected]'
lst = re.findall('^[a-zA-Z0-9\.\_]*[a-zA-Z0-9]+[\.\_]*\@([a-zA-Z0-9\.\_]*[a-zA-Z0-9]+[\.\_]*\.[a-zA-z]+)', str)
print(lst)

Returning:

['gmail.com']

In Summary

In summary, you can use regex to match strings of data and it can be used in a number of different ways. Python includes a regex package called re, which will allow you to use this. Should you find yourself on a Unix machine however, you can use regular expression along with grep, awk or sed. On Windows should you want to access all these commands, you can use tools like Cygwin.

Thanks for reading ❤

If you liked this post, share it with all of your programming buddies!

Python GUI Programming Projects using Tkinter and Python 3

Python GUI Programming Projects using Tkinter and Python 3

Python GUI Programming Projects using Tkinter and Python 3

Description
Learn Hands-On Python Programming By Creating Projects, GUIs and Graphics

Python is a dynamic modern object -oriented programming language
It is easy to learn and can be used to do a lot of things both big and small
Python is what is referred to as a high level language
Python is used in the industry for things like embedded software, web development, desktop applications, and even mobile apps!
SQL-Lite allows your applications to become even more powerful by storing, retrieving, and filtering through large data sets easily
If you want to learn to code, Python GUIs are the best way to start!

I designed this programming course to be easily understood by absolute beginners and young people. We start with basic Python programming concepts. Reinforce the same by developing Project and GUIs.

Why Python?

The Python coding language integrates well with other platforms – and runs on virtually all modern devices. If you’re new to coding, you can easily learn the basics in this fast and powerful coding environment. If you have experience with other computer languages, you’ll find Python simple and straightforward. This OSI-approved open-source language allows free use and distribution – even commercial distribution.

When and how do I start a career as a Python programmer?

In an independent third party survey, it has been revealed that the Python programming language is currently the most popular language for data scientists worldwide. This claim is substantiated by the Institute of Electrical and Electronic Engineers, which tracks programming languages by popularity. According to them, Python is the second most popular programming language this year for development on the web after Java.

Python Job Profiles
Software Engineer
Research Analyst
Data Analyst
Data Scientist
Software Developer
Python Salary

The median total pay for Python jobs in California, United States is $74,410, for a professional with one year of experience
Below are graphs depicting average Python salary by city
The first chart depicts average salary for a Python professional with one year of experience and the second chart depicts the average salaries by years of experience
Who Uses Python?

This course gives you a solid set of skills in one of today’s top programming languages. Today’s biggest companies (and smartest startups) use Python, including Google, Facebook, Instagram, Amazon, IBM, and NASA. Python is increasingly being used for scientific computations and data analysis
Take this course today and learn the skills you need to rub shoulders with today’s tech industry giants. Have fun, create and control intriguing and interactive Python GUIs, and enjoy a bright future! Best of Luck
Who is the target audience?

Anyone who wants to learn to code
For Complete Programming Beginners
For People New to Python
This course was designed for students with little to no programming experience
People interested in building Projects
Anyone looking to start with Python GUI development
Basic knowledge
Access to a computer
Download Python (FREE)
Should have an interest in programming
Interest in learning Python programming
Install Python 3.6 on your computer
What will you learn
Build Python Graphical User Interfaces(GUI) with Tkinter
Be able to use the in-built Python modules for their own projects
Use programming fundamentals to build a calculator
Use advanced Python concepts to code
Build Your GUI in Python programming
Use programming fundamentals to build a Project
Signup Login & Registration Programs
Quizzes
Assignments
Job Interview Preparation Questions
& Much More

Guide to Python Programming Language

Guide to Python Programming Language

Guide to Python Programming Language

Description
The course will lead you from beginning level to advance in Python Programming Language. You do not need any prior knowledge on Python or any programming language or even programming to join the course and become an expert on the topic.

The course is begin continuously developing by adding lectures regularly.

Please see the Promo and free sample video to get to know more.

Hope you will enjoy it.

Basic knowledge
An Enthusiast Mind
A Computer
Basic Knowledge To Use Computer
Internet Connection
What will you learn
Will Be Expert On Python Programming Language
Build Application On Python Programming Language