Sometimes in a text, the same word can be written in different ways. This is most commonly the case with proper nouns. Instead of starting with an uppercase letter, sometimes they are written in all lowercase letters.
1
$
grep
"[Jj]ayant"
Both the versions of the word, irrespective of their case have been matched.
Another interesting case can be observed with the word ‘IoT’. A word like this might occur several times across the text with different variations. to match all the words irrespective of the case use :
1
$
grep
"[iI][oO][tT]"
Regular expressions can be used to extract mobile number from a text.
The format of the mobile number has to be known beforehand. For example, a regular expression designed to match mobile numbers won’t work for home telephone numbers.
In this example, mobile number which is in the following format: 91-1234567890 (i.e TwoDigit-TenDigit) will be matched.
1
$
grep
"[[:digit:]]\{2\}[ -]\?[[:digit:]]\{10\}"
As is evident, only the mobile number in the above-mentioned format is matched.
Extracting email address out of a text is very useful and can be achieved using grep.
An email address has a particular format. The part before the ‘@’ is the username that identifies the mailbox. Then there is a domain like gmail.com or yahoo.in.
The regular expression can be designed keeping these things in mind.
1
$
grep
-E
"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}"
Input File For Email
grep command on input.txt
A URL has a particular format of representation. A regex can be built that verifies if a URL is in proper form or not.
A URL must start with http/https/ftp followed by ‘://’. Then there is the domain name which can end with ‘.com’, ‘.in’, ‘.org’ etc.
1
$
grep
-E
"^(http|https|ftp):[\/]{2}([a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,4})"
Input Text For domain.txt
Grep On domain.txt
-E used in this example and the previous signifies extended grep which uses Extended Regular Expression set instead of Basic Regular Expression set. This means that certain special characters are not required to be escaped. It makes the process of writing a complex regex less tiresome. Read more about it
#unix/linux #regex