How to stop fearing the RegExp object and learn to love it

Originally published by Fernando Doglio at https://blog.bitsrc.io

Regular expressions are often feared by new developers, they see the strange syntax and opt to avoid them adding extra logic to solve their needs instead of trying to understand the logic behind them.

Don’t get me wrong, I did this myself when I was starting out, dealing with one language’s syntax is enough, so the simple idea of having to learn some strange extra syntax in order to use these regular expression was just not my cup of tea.

The main thing that made me change my mind and help me decide to try and learn how to read and write them, was understanding what kind of use cases there were for them. And in this article, I want to do the same for you, so let’s get started.

But first, a quick intro to Regular Expressions in JavaScript

I like to describe Regular Expressions as “Strings in steroids” (feel free to quote me on that one) and that is because of how much more you can do with them compared to the good ol’string objects.

While your normal strings would let you do things like concatenation, length calculation or even now, with ES6: templating; regular expressions allow you find patterns, do fuzzy matching, even perform selective replacement on top of our trusted friend: the string.

I know what you’re thinking though: what about that horrible syntax?! And I’m right there with you, I’ve been using them for years now and every time I need to do something other than your basic pattern matching, I need to go online to check the correct way to do it.

That being said, how else would you have implemented it? They literally added too many features to the string entity to have them all be part of the object’s API (and not to mention Regular Expressions are part of non-object oriented languages as well, so what do you do then?).

Let me break down the basic syntax to make sure we’re all on the same page, and you’ll see how things start to make sense.

The anatomy of a Regular Expression

Just as a final disclaimer, let me confirm that I’ll be using the JavaScript flavor of Regular Expressions. If you are trying to adapt the following examples into another language make sure you check out the proper syntax since there might be minor changes.

In JavaScript, a Regular Expression can be defined in one of two ways:

  1. Using the RegExp object, which is a global object available to you everywhere without having to add or require (I’m looking at you Node.js devs) anything extra.
let regExp = new RegExp('a|b');

2. Using the literal notation, which is to define it surrounded by a pair of “/”

let regExp = /a|b/;

Both versions return the same thing, I personally prefer the second one, because it doesn’t require an extra direct instantiation. The first one though, comes in very handy if you’re trying to create the regular expression from a string (i.e you might have a string where you define the actual expressions based on different conditions). So make sure you remember both.

Modifiers or Flags

No matter how you call them, they add extra meaning to your Regular Expressions. There are six and some of them you’ll be using them all the time, and others maybe once or twice in your life, so let’s quickly mention them:

  • g : Performs a global search. In other words, instead of returning once the first match is found, it’ll return all matches found on the string.
  • i : Case-insensitive search. This one is pretty straight forward (and helpful), since it will ignore the case during match, otherwise words such as “Hello” and “HELLO” won’t be considered a match.
  • m : Multi-line search. Similar to the first one, but if there are line-breaking characters in the string, this flag will ignore them and not stop on them.
  • s : Allows . to match newline characters. Normally the dot character matches any single character, except the newline.
  • u : "unicode"; treat a pattern as a sequence of unicode code points .
  • y : Performs a "sticky" search that matches starting at the current position in the target string. This comes in handy if you’re doing one search at a time, because it’ll start searching from the last position it found during the previous attempt.

These flags are added to the regular expression at the end of it, like so:

//If you're using the RegExp object

let re = new RegExp(‘[H|h]ello’, ‘gm’);

//If you’re going with the literal syntax

let re = /[H|h]ello/gm;

That’s about it for my custom intro to Regular Expressions, if you want to get details about how they work, check out the documentation, but first, stick around and look at the following practical examples so you have something to understand with the docs.

Regular Expression Use Cases

The following 4 use cases are meant to show you how useful Regular Expressions are, not only for code logic needs, but most IDEs actually support using them for searching and replacing text in your code.

Password pattern matching

Have you ever seen one of those messages when trying to create an account on your favorite site, saying: “Your password must have at least 8 characters, at least an upper case letter, an lowercase letter, a number and probably a symbol so you make sure you’ll never remember it in the future”

OK, maybe that last part is mine, but you get the point: they describe a pattern you need to follow in order to provide a valid password. You can of course, use simple JavaScript code to validate that, but why would you if you can write a single line that describes the entire pattern?

You can use the following Regular Expressions for that:

/^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.\W).{8,}$/g

Here’s a quick snippet for you to test:

let re = /^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.\W).{8,}$/g

let passwords = [“Fernando”, “f3rn4”, “F3rnand0!”, “fernando123!”]

passwords.forEach( p => {
let matches = p.match(re)
if(!matches) console.log(p, “INVALID PASSWORD”)
else console.log(p, “is a valid password!”)
})

/*
Fernando INVALID PASSWORD
f3rn4 INVALID PASSWORD
F3rnand0! is a valid password!
fernando123! INVALID PASSWORD
/

Essentially, we’re using something called “positive lookaheads” and are sections of the expression that the engine will search for inside the text, no matter where they are. Everything inside the (?=…) is the section of the expression that we care about.

  • (?=.[a-z]) essentially means that it’ll match any character that is followed by a lowercase letter.
  • (?=.[A-Z]) just like the previous one, but instead of lowercase, it’ll match if the following character was uppercase.
  • (?=.\d) will match anything that is followed by a digit (a number).
  • (?=.\W) matches any character (other than a line break) that is followed by a symbol.
  • .{8,} makes sure the length of the match is at least, 8 characters (any character thanks to the dot there).
  • ^ and $ make sure the match starts at the beginning of a word (thanks to the caret at the start of the expression) and ends with the word (thanks to the dollar sign). Essentially, only whole word matches are allowed. Partial matches aren’t considered.

If all the above conditions are met, then the match is returned, otherwise it won’t be a valid password.

Email Format Checker

I’ve had to implement this one, probably close to a million times back when I was doing Web Development. How many times have you seen the message “Invalid Email format” in your sign-up form? Nowadays the input element of type “email” already performs this validation.

That being said, if you’re working on a back-end validation or for some reason, not having access to this field, Regular Expressions can help you validate this format in a single line of code, instead of having several different IF statements.

Here is the magic Regular Expression to completely check an email address:

/^[a-zA-Z0-9.!#$%&’+/=?^{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/ </pre><p>I know, that’s a lot, but if you look closely, you can identify all three parts of the address expected format in there:</p><p>First, we check if the username is valid, this is simply checking that all valid characters are being used and that at least one of them was added (that’s what the “+” at the end means):</p><pre class="ql-syntax" spellcheck="false">^[a-zA-Z0-9.!#$%&amp;’*+/=?^_{|}~-]+

Then, we’re checking for the @ character and the host name:

@[a-zA-Z0-9-]+

Again, nothing fancy, the host name needs to be alphanumeric and have at least one character.

The last, optional part, takes care of checking the TLD (Top Level Domain), or basically the domain name extension:

(?:.[a-zA-Z0-9-]+)$/

And you can tell this part is optional, because of the * at the end. That means 0 or more instances of that group (the group is delimited by the parenthesis) are required (so .com would match, but also .co.uk ).

Here is a quick snippet showing the expression at work:

let emailRE = /^[a-zA-Z0-9.!#$%&’+/=?^`{|}~-]+@[a-zA-Z0-9-]+(?:.[a-zA-Z0-9-]+)*$/

let emails = [“fernando”, “fernadno@”, “fernando@test”, “fernando@test.com”, “valid_email123@host2.com”, “a@1.com”]

emails.forEach( p => {
let matches = p.match(emailRE)
if(!matches) console.log(p, “INVALID EMAIL”)
else console.log(p, “is a valid email!”)
})

/*
fernando INVALID EMAIL
fernadno@ INVALID EMAIL
fernando@test is a valid email!
fernando@test.com is a valid email!
valid_email123@host2.com is a valid email!
a@1.com is a valid email!
*/

Smart Character Replacement

Enough with the pattern validation, let’s do some string modifications, shall we?

This is another area where Regular Expressions shine by allowing you to do some very intricate character replacement. For this particular example, I’m going to show you how to turn camel case notation (you know, the one where youWriteEverythingLikeThis) into normal notation. It’s a quick example, but should be enough to show you what you can do with capturing groups.

Now, before looking at the code, think about it for a second, how would you go about doing this without a Regular Expression? You would probably require some for of list of capitalized letters and run a replace routine for each and everyone of them. There are probably other ways, but that one’s the easiest I can think of.

Here is the Regular Expression alternative:

let camelRE = /([A-Z])/g

let phrase = “thisIsACamelCaseString”

console.log(phrase.replace(camelRE, " $1")

/*
this Is A Camel Case String
*/

Yeap, that is it! The capturing group (the parenthesis and everything inside it) saves the matching part and you can reference it with “$1”. If you had more than one group, you would increment that number ($2, $3 and so on). The point here is that the expressions will only match single upper cased characters anywhere on the string (thanks to the trailing g flag there) and you’ll replace it (thanks to the replace method call) with itself prefixed by a blank space.

Let me show you now a more complex case of string replacement.

Old School Function to Arrow Function

This one is interesting, because you can write some code for it for fun, or in a more realistic scenario, you might be doing this using your IDE’s Search & Replace feature!

Considering that arrow functions are relatively new there is still a lot of legacy code that is not using them and you might be inclined to want to switch, but modifying every function manually can take forever, so instead, you can use a Regular Expression.

And to make things clear, I want to turn this:

function sayHello(first_name, last_name){
console.log("Hello there ", first_name, last_name)
}

Into this:

const sayHello = (first_name, last_name) => {
console.log("Hello there ", first_name, last_name)
}

So essentially, we need to capture the function’s name, it’s parameters list and it’s content, and then restructure it so we remove the function word and create the new constant. In other words, we need three capturing groups, and here they are:

function (.+)((.+))({.+})

Then it’s just a matter of calling the replace method. Again, you can probably use your favorite IDE for this, but here is a quick Node.js script to play with:

const fs = require(“fs”)

const regExp = /function (.+)((.+))({.+})/gms

fs.readFile(“./test2.js”, (err, cnt) => {
console.log(cnt.toString().replace(regExp, “const $1 = $2 => $3”))
})

The above code will output our desired arrow function and any other you need. The other considerations to have, are the flags I used. Because we need to make sure we capture the new line characters as well, we need to do a multi-line match and allow the dot character to match those as well.

And with that being said, this concludes the list of practical use cases I wanted to show you.

Conclusion

Hopefully by now, with the above examples, you’ve seen the power that Regular Expressions can bring to the table and that, even though they’re no pretty to look at, they’re not that hard to understand either.

So if you haven’t already, give them a shot and try to add this new tool to your development tool set.

Leave a comment below if you’re not new to Regular Expressions and tell us how you’re using them!

Thanks for reading

If you liked this post, please do share/like it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading about JavaScript

The Complete JavaScript Course 2019: Build Real Projects!

Vue JS 2 - The Complete Guide (incl. Vue Router & Vuex)

JavaScript Bootcamp - Build Real World Applications

The Web Developer Bootcamp

JavaScript Programming Tutorial - Full JavaScript Course for Beginners

New ES2019 Features Every JavaScript Developer Should Know

Best JavaScript Frameworks, Libraries and Tools to Use in 2019

React vs Angular vs Vue.js by Example

Microfrontends — Connecting JavaScript frameworks together (React, Angular, Vue etc)

Creating Web Animations with Anime.js

Ember.js vs Vue.js - Which is JavaScript Framework Works Better for You

Do we still need JavaScript frameworks?



#regex #web-development #javascript

How to stop fearing the RegExp object and learn to love it
13.45 GEEK