Capturing dates of a particular format from raw data

I'm trying to capture dates of the form -

I'm trying to capture dates of the form -

20 Apr 2009

20 April 2009

20 Apr. 2009

20 April, 2009

...from raw text in a pandas dataframe. I want to get rid of rest of the text apart from the dates

I'm been partially successful in my attempt

df['some_column'] = df['some_column'].str.replace(r'(.*?)(\d{1,2}[ ](?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\.?,?[ ]\d{4})(.*?)\n\1', lambda x: x.groups()[1])

But for some cases I'm getting the preceding/succedding text as well .. Any inputs would be appreciated

Angular 9 Tutorial: Learn to Build a CRUD Angular App Quickly

What's new in Bootstrap 5 and when Bootstrap 5 release date?

What’s new in HTML6

How to Build Progressive Web Apps (PWA) using Angular 9

What is new features in Javascript ES2020 ECMAScript 2020

How to Use Regex in Java

How to Use Regex in Java

In this article, we will learn about Java Regex, and how to use Regex with examples in Java. Java regex is also known as Java Regular Expression.

Introduction

In this article, we will learn about Java Regex, and how to use Regex with examples in Java. Java regex is also known as Java Regular Expression.

What is Regex in Java?

Regular expressions or Java Regex is an API built to define string patterns that can be used to read, alter, and delete data. For pattern matching with the regular expressions, Java offers java.util.regex bundle.

In other words, a regular expression is a special sequence of characters that helps us match or find other strings using a special syntax held in a pattern that is used to search, edit, or manipulate text and data.

java.util.regex package

This provides regular expressions with three classes and a single interface. The classes Matcher and Pattern are usually used in standard Java language.

The complete program of regex package is listed below.

import java.util.regex.Pattern;  
  
public class RegexPackageExample {  
    public static void main(String args[]) {  
        System.out.println(Pattern.matches(".y", "toy"));  
        System.out.println(Pattern.matches("s..", "sam"));  
        System.out.println(Pattern.matches(".a", "mia"));  
    }  
} 

The above program generates the following output.

PatternSyntaxException class

PatternSyntaxException is a unresolved exception object which means a syntax error in a normal speaking pattern.

The complete program of Showing the PatternSyntaxException class example is listed below.

import java.util.regex.Pattern;  
  
public class PatternSyntaxExceptionExample {  
    public static void main(String... args) {  
        String regex = "["; // invalid regex  
        Pattern pattern = Pattern.compile(regex);  
    }   
} 

The above program generates the following output.

Note

In the above example program, we use the invalid syntax of regex. So, when we run the program it generates the PatternSyntaxException: Unclosed character class near index 0;

java.util.regex.Matcher;

A Matcher entity is a motor that interprets the template against an input string and executes operations of play. Matcher doesn't describe any public builders, as the class Template. By calling the matcher method) (you get a Matcher object on a Pattern object.

Methods of Matcher class

public boolean matches()

The matches method is used to check the pattern string is matches with matcher string or not. It returns the boolean value. If the string matches, it returns true otherwise false. It does not take any argument. It does not throw any exception.

Syntax

public boolean matches();

The complete program of java.util.regex.Matcher.matches() method is listed below.

import java.util.regex.*;  
public class MatchesMethodExample {  
    public static void main(String[] args) {  
        boolean result;  
        // Get the string value to be checked  
        String value1 = "CsharpCorner";  
  
        // Create a pattern from regex  
        Pattern pattern = Pattern.compile(value1);  
  
        // Get the String value to be matched  
        String value2 = "CsharpC";  
  
        // Create a matcher for the input String  
        Matcher matcher = pattern.matcher(value2);  
  
        // Get the current matcher state  
        System.out.println("result : " + matcher.matches());  
    }  
} 

The above program generates the following output.

public int start() Method

The start() method is used to get the start subsequence index. public int start() method does not take any argument. It returns the index of the first character matched 0. If the operation is failed it throws IllegalStateException.

Syntax

public int start();

The complete program of java.util.regex.Matcher.start() method is listed below.

import java.util.regex.*;  
  
public class StartMethodExample {  
    public static void main(String[] args) {  
  
        // Get the string value to be checked  
        String value1  = "CsharpCorner";  
  
        // Create a pattern from regex  
        Pattern pattern = Pattern.compile(value1);  
  
        // Get the String value to be matched  
        String value2 = "Csharp";  
        // Create a matcher for the input String  
        Matcher matcher = pattern.matcher(value2);  
  
        // Get the current matcher state  
        MatchResult result = matcher.toMatchResult();  
        System.out.println("Current Matcher: " + result);  
  
        while (matcher.find()) {  
            // Get the first index of match result  
            System.out.println(matcher.start());  
        }  
    }  
}

The above program generates the following output.

public boolean find() Method

The find method is used to find the next subsequence of the input sequence that finds the pattern.  It returns a boolean value. If the input string matches then it returns true otherwise returns false. This method does not take any argument. This method does not throw any exception.

Syntax

public boolean find()

The complete program of java.util.regex.Matcher.find() method is listed below.

import java.util.regex.*;  
public class FindMethodExample {  
    public static void main(String args[]) {  
        // Get the regex to be checked  
        String value = "CsharpCorner";  
        String value1 = "Java Programming";  
  
        // Create a string from regex  
        Pattern pattern = Pattern.compile(value);  
        Pattern pattern1 = Pattern.compile(value1);  
  
        // Get the String for matching  
        String matchString = "CsharpCorner";  
        String matchString1 ="Java";  
  
        // Create a matcher for the String  
        Matcher match = pattern.matcher(matchString);  
        Matcher match1 = pattern.matcher(matchString1);  
        //find() method  
        System.out.println(match.find());  
        System.out.println(match1.find());  
  
    }  
} 

The above program generates the following output.

public boolean find(int start) Method

The find(int start) method is used to find the next subsequence of the input sequence that finds the pattern, according to the given argument. It returns a boolean value. This method does not take any argument. This method throws IndexOutOfBoundException if the given argument is less then zero or greater then the length of the string.

Syntax

public boolean find(int start);

The complete program of java.util.regex.Matcher.find() method is listed below.

import java.util.regex.*;  
  
public class FindMethodExample2 {  
    public static void main(String args[]) {  
        // Get the regex to be checked  
        String value = "CsharpCorner";  
        String value1 = "Java Programming";  
  
        // Create a string from regex  
        Pattern pattern = Pattern.compile(value);  
        Pattern pattern1 = Pattern.compile(value1);  
  
        // Get the String for matching  
        String matchString = "CsharpCorner";  
        String matchString1 = "Java";  
  
        // Create a matcher for the String  
        Matcher match = pattern.matcher(matchString);  
        Matcher match1 = pattern.matcher(matchString1);  
        //find() method  
        System.out.println(match.find(3));  
        System.out.println(match1.find(6));  
  
    }  
} 

The above program generates the following output.

public int end() Method

The end method is used to get the offset after the last match of the character is done. This method doesn't take any argument. this method throws IllegalStateException if the operation fails.

Syntax

public int end()

The complete program example of java.util.regex.Matcher.end() is listed below.

import java.util.regex.*;  
public class endMethodExample {  
    public static void main(String[] args) {  
        // TODO Auto-generated method stub  
        Pattern p=Pattern.compile("Hello C#Corner");  
        Matcher m=p.matcher("Hello C#Corner");  
        if(m.matches())  
            System.out.println("Both are matching till "+m.end()+" character");  
        else  
            System.out.println("Both are not matching"+m.end());  
    }  
} 

The above program generates the following output.

java.util.regex.Pattern

A Pattern object is a compiled representation of a regular expression. There are no Template level public designers. To construct a template, you first need to invoke one of its public static compiles (methods which subsequently return a Template item, which acknowledges a regular expression as the first statement

It is the compiled form of a regular expression and is used to describe the Regex engine template.

Methods of Pattern class

static Pattern compile(String regex)

The compile() method is used to match a text from a regular expression(regex) pattern. If the operation is failed it returns false otherwise true. This method takes a pattern string value as the argument.

Syntax

static Pattern compile(String regex)

The complete program of the java.util.regex.pattern.compile();

import java.util.regex.Matcher;  
import java.util.regex.Pattern;  
  
public class CompileMethodExample {  
  
    public static void main(String args[]) {  
  
        // Get the string value to be checked  
        Pattern p = Pattern.compile(".o");  
          
        //Matcher string for  
        Matcher m = p.matcher("to");  
        boolean m1 = m.matches();  
        System.out.println(m1);  
    }  
}  

The above program generates the following output.

public boolean matches(regex, String)

The matches() method is used to check the given string matches the given regular expression or not. This method returns the boolean value true if the string matches the regex otherwise it returns false.  If the syntax is invalid then this method throws PatternStateException.

This method takes two arguments.

  • regex- This argument is the regular expression value which has to check from the string.
  • String- This string value has to check from the regex through the matches() method.

The complete program of the public boolean matches(regex, String) method is listed below.

import java.util.regex.Pattern;  
  
public class PatternClassMatchesMethod {  
    public static void main(String args[]) {  
        System.out.println(Pattern.matches("[bad]", "abcd"));  
        System.out.println(Pattern.matches("[as]", "a"));  
        System.out.println(Pattern.matches("[ass]", "asssna"));  
    }  
} 

The above program generates the following output.

Summary

In this article, we learned about Java Regular Expression(regex) in Java Programming Language and the varoius methods of regex.

Thank for reading and keep visiting!

An Introduction to Regex in Python

An Introduction to Regex in Python

An Introduction to Regex in Python

What is Regex?

Regex stands for Regular Expression and essentially is an *easy *way to define a pattern of characters. Regex is mostly used in pattern identification, text mining or input validation.

Regex puts a lot of people off, because it looks like gibberish on first look; as for the people who know how to use it, they can’t seem to stop! It’s a very powerful tool that is worth learning about if you don’t already know.

Introduction to Regex

The first thing you need to know about regex, is that you can match a specific character or words.

Let’s assume, that we want to know whether a specific string, contains the letter ‘a’ or word ‘lot’. If that is the case, we can use the following python code:

import re
str = "Learning regex can be a lot of fun"
lst = re.findall('a', str)
lst2 = re.findall('lot', str)
print(lst)
print(lst2)

which will return, a list with 3 matches and a list of 1:

['a', 'a', 'a']
['lot']

Keeping our set up the same, imagine that you want to search for the following 3 letters in any order a, b or c. You can use a list, by using square brackets:

lst = re.findall('[abc]', str)
lst2 = re.findall('[a-c]', str)
print(lst)
print(lst2)

returning:

['a', 'c', 'a', 'b', 'a']
['a', 'c', 'a', 'b', 'a']

Photo by Dayne Topkin on Unsplash

The Regex Cheat Sheet

Every time I am about to write a complicate regular expression, my first port of contact is the following list, by Dr Chuck Severance:

Python Regular Expression Quick Guide

^        Matches the beginning of a line
$        Matches the end of the line
.        Matches any character
\s       Matches whitespace
\S       Matches any non-whitespace character
*        Repeats a character zero or more times
*?       Repeats a character zero or more times 
         (non-greedy)
+        Repeats a character one or more times
+?       Repeats a character one or more times 
         (non-greedy)
[aeiou]  Matches a single character in the listed set
[^XYZ]   Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
(        Indicates where string extraction is to start
)        Indicates where string extraction is to end

Using the above cheat sheet as a guide, you can pretty much come up with any syntax. Let’s take a closer look in some more complicate search patterns.

Stepping it Up

Imagine that you are building some sort of validation on an input field where the user can input any number followed by the letter d, m or y.

Your regex algorithm would look something like this:

^[0-9]+[dmy]$

Decomposing the above: ^ signifies the beginning of the match followed by a 0–9 number. However the + sign means it needs to be at least one 0–9 number though there can be more. Then the string needs to be followed by d, m or y, which have to be at the end because of $.

Testing the above in python:

import re
str = '1d'
str2 = '200y'
str3 = 'y200'
lst = re.findall('^[0-9]+[dmy]$', str)
lst2 = re.findall('^[0-9]+[dmy]$', str2)
lst3 = re.findall('^[0-9]+[dmy]$', str3)
print(lst)
print(lst2)
print(lst3)

Returning:

['1d']
['200y']
[]

Photo by Arget on Unsplash

Escaping Special Characters

When it comes to regular expressions, certain characters are special. For instance, dot, star and dollar sign are all used for matching purposes. So what happens if you want to match those characters?

In that case, we can use the back slash.

import re
str = 'Sentences have dots. How do we escape them?'
lst = re.findall('.', str)
lst1 = re.findall('\.', str)
print(lst)
print(lst1)

The above example is using dot, and backslash dot. As you would expect, it returns two results. The first one matches all characters, while the second one, only the dot.

['S', 'e', 'n', 't', 'e', 'n', 'c', 'e', 's', ' ', 'h', 'a', 'v', 'e', ' ', 'd', 'o', 't', 's', '.', ' ', 'H', 'o', 'w', ' ', 'd', 'o', ' ', 'w', 'e', ' ', 'e', 's', 'c', 'a', 'p', 'e', ' ', 't', 'h', 'e', 'm', '?']
['.']

Matching exact number of characters

Imagine that you want to match a date. You know that what the format will be, DD/MM/YYYY. Sometimes there will be 2Ds or 2Ms, sometimes just one, but always 4Ys.

import re
str = 'The date is 22/10/2018'
str1 = 'The date is 3/1/2019'
lst = re.findall('[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4}', str)
lst = re.findall('[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4}', str1)
print(lst)
print(lst1)

Which gives the following results:

['22/10/2018']
['3/1/2019']

Extracting the matched pattern

There are certain times, that knowing the fact that you’re matching a pattern is not enough. You want to have the ability to extract information from the match.

For instance, imagine that you are scanning a large data set looking for email addresses. If you use what we learnt about, you could search for a pattern of:

  • Could start with a letter, number, dot or underscore
  • Then followed by at least another letter, or number
  • Which could be followed by a dot or an underscore
  • Then there’s a @
  • Then follow the same logic again as before the @
  • Finally look for a dot followed by at least a letter
^[a-zA-Z0-9\.\_]*[a-zA-Z0-9]+[\.\_]*\@[a-zA-Z0-9\.\_]*[a-zA-Z0-9]+[\.\_]*\.[a-zA-z]+

From the above match, you only want to extract the domain name ie everything after the @. All you have to do is add brackets around what you’re after:

import re
str = '[email protected]'
lst = re.findall('^[a-zA-Z0-9\.\_]*[a-zA-Z0-9]+[\.\_]*\@([a-zA-Z0-9\.\_]*[a-zA-Z0-9]+[\.\_]*\.[a-zA-z]+)', str)
print(lst)

Returning:

['gmail.com']

In Summary

In summary, you can use regex to match strings of data and it can be used in a number of different ways. Python includes a regex package called re, which will allow you to use this. Should you find yourself on a Unix machine however, you can use regular expression along with grep, awk or sed. On Windows should you want to access all these commands, you can use tools like Cygwin.

Thanks for reading ❤

If you liked this post, share it with all of your programming buddies!

How to create a regex that accepts specific characters?

I have this regex:

I have this regex:

^[[email protected]#$%&'*+-/=?^`{|}~!(),:;<>[-\]]{8,}$

I need a regex to accept a minimum word length of 8, letters(uppercase & lowercase), numbers and these characters:

!#$%&'*+-/=?^_`{|}~"(),:;<>@[]

It works when I tested it here.

This is how I used it in Java Android.

public static final String regex = "^[[email protected]#$%&'*+-/=?^`{|}~!(),:;<>[-\\]]{8,}$";

This is the error that I received.

java.util.regex.PatternSyntaxException: Missing closing bracket in character class near index 49
    ^[[email protected]#$%&'*+-/=?^`{|}~!(),:;<>[-\]]{8,}$