Regular expressions (or regex in short) is a much-hated & underrated topic so far with Modern C++. But at the same time, correct use of regex can spare you writing many lines of code. If you have spent quite enough time in the industry. And not knowing regex then you are missing out on 20–30% productivity. In that case, I highly recommend you to learn regex, as it is one-time investment(something similar to learn once, write anywhere philosophy).

/!: This article has been originally published on my blog. If you are interested in receiving my latest articles, please sign up to my newsletter.

Initially, In this article, I have decided to include regex-in-general also. But it doesn’t make sense, as there is already people/tutorial out there who does better than me in teaching regex. But still, I left a small section to address Motivation & Learning Regex. For the rest of the article, I will be focusing on functionality provided by C++ to work with regex. And if you are already aware of regex, you can use the above mind-map as a refresher.

Pointer: The C++ standard library offers several different “flavours” of regex syntax, but the default flavour (the one you should always use & I am demonstrating here) was borrowed wholesale from the standard for ECMAScript.

Motivation

  • I know its pathetic and somewhat confusing tool-set. Consider the below regex pattern for an example that extract time in 24-hour format i.e. HH:MM.
\b([01]?[0-9]|2[0-3]):([0-5]\d)\b
  • I mean! Who wants to work with this cryptic text?
  • And whatever running in your mind is 100% reasonable. In fact, I have procrastinated learning regex twice due to the same reason. But, believe me, all the ugly looking things are not that bad.
  • The way() I am describing here won’t take more than 2–3 hours to learn regex that too intuitively. And After learning it you will see the compounding effect with return on investment over-the-time.

Learning Regex

  • Do not google much & try to analyse which tutorial is best. In fact, don’t waste time in such analysis. Because there is no point in doing so. At this point in time(well! if you don’t know the regex) what really matters is “Getting Started” rather than “What Is Best!”.
  • Just go to https://regexone.com** without much overthinking**. And complete all the lessons. Trust me here, I have explored many articles, courses(<=this one is free, BTW) & books. But this is best among all for getting started without losing motivation.
  • And after it, if you still have an appetite to solve more problem & exercises. Consider the below links:
  1. Exercises on regextutorials.com
  2. Practice problem on regex by hackerrank

std::regex & std::regex_error Example

int main() {
    try {
        static const auto r = std::regex(R"(\)"); // Escape sequence error
    } catch (const std::regex_error &e) {
        assert(strcmp(e.what(), "Unexpected end of regex when escaping.") == 0);
        assert(e.code() == std::regex_constants::error_escape);
    }
    return EXIT_SUCCESS;
}
  • You see! I am using raw string literals. You can also use the normal string. But, in that case, you have to use a double backslash for an escape sequence.
  • The current implementation of std::regex is slow(as it needs regex interpretation & data structure creation at runtime), bloated and unavoidably require heap allocation(not allocator-aware). So, beware if you are using**_std::regex_** in a loop(see C++ Weekly – Ep 74 – std::regex optimize by Jason Turner). Also, there is only a single member function that I think could be of use is std::regex::mark_count() which returns a number of capture groups.
  • Moreover, if you are using multiple strings to create a regex pattern at run time. Then you may need exception handling i.e. std::regex_error to validate its correctness.

std::regex_search Example

int main() {
    const string input = "ABC:1->   PQR:2;;;   XYZ:3<<<"s;
    const regex r(R"((\w+):(\w+);)");
    smatch m;
    if (regex_search(input, m, r)) {
        assert(m.size() == 3);
        assert(m[0].str() == "PQR:2;");                // Entire match
        assert(m[1].str() == "PQR");                   // Substring that matches 1st group
        assert(m[2].str() == "2");                     // Substring that matches 2nd group
        assert(m.prefix().str() == "ABC:1->   ");      // All before 1st character match
        assert(m.suffix().str() == ";;   XYZ:3<<<");   // All after last character match
        // for (string &&str : m) { // Alternatively. You can also do
        //     cout << str << endl;
        // }
    }
    return EXIT_SUCCESS;
}
  • smatch is the specializations of std::match_results that stores the information about matches to be retrieved.

std::regex_match Example

  • Short & sweet example that you may always find in every regex book is email validation. And that is where our std::regex_match function fits perfectly.
bool is_valid_email_id(string_view str) {
    static const regex r(R"(\w+@\w+\.(?:com|in))");
    return regex_match(str.data(), r);
}
int main() {
    assert(is_valid_email_id("vishalchovatiya@ymail.com") == true);
    assert(is_valid_email_id("@abc.com") == false);
    return EXIT_SUCCESS;
}
  • I know this is not full proof email validator regex pattern. But my intention is also not that.
  • Rather you should wonder why I have used std::regex_match! not std::regex_search! The rationale is simple **_std::regex_match_** matches the whole input sequence.
  • Also, Noticeable thing is static regex object to avoid constructing (“compiling/interpreting”) a new regex object every time the function entered.
  • The irony of above tiny code snippet is that it produces around 30k lines of assembly that too with -O3 flag. And that is ridiculous. But don’t worry this is already been brought to the ISO C++ community. And soon we may get some updates. Meanwhile, we do have other alternatives (mentioned at the end of this article).

#coding #cpp #regular-expressions #programming #expression

Introduction to Regular Expression With Modern C++
2.35 GEEK