Regular expressions (or regex in short) is a much-hated & underrated topic so far with Modern C++. But at the same time, correct use of regex can spare you writing many lines of code. If you have spent quite enough time in the industry. And not knowing regex then you are missing out on 20–30% productivity. In that case, I highly recommend you to learn regex, as it is one-time investment(something similar to learn once, write anywhere philosophy).
/!: This article has been originally published on my blog. If you are interested in receiving my latest articles, please sign up to my newsletter.
Initially, In this article, I have decided to include regex-in-general also. But it doesn’t make sense, as there is already people/tutorial out there who does better than me in teaching regex. But still, I left a small section to address Motivation & Learning Regex. For the rest of the article, I will be focusing on functionality provided by C++ to work with regex. And if you are already aware of regex, you can use the above mind-map as a refresher.
Pointer: The C++ standard library offers several different “flavours” of regex syntax, but the default flavour (the one you should always use & I am demonstrating here) was borrowed wholesale from the standard for ECMAScript.
\b([01]?[0-9]|2[0-3]):([0-5]\d)\b
int main() {
try {
static const auto r = std::regex(R"(\)"); // Escape sequence error
} catch (const std::regex_error &e) {
assert(strcmp(e.what(), "Unexpected end of regex when escaping.") == 0);
assert(e.code() == std::regex_constants::error_escape);
}
return EXIT_SUCCESS;
}
std::regex
is slow(as it needs regex interpretation & data structure creation at runtime), bloated and unavoidably require heap allocation(not allocator-aware). So, beware if you are using**_std::regex_**
in a loop(see C++ Weekly – Ep 74 – std::regex optimize by Jason Turner). Also, there is only a single member function that I think could be of use is std::regex::mark_count() which returns a number of capture groups.std::regex_error
to validate its correctness.int main() {
const string input = "ABC:1-> PQR:2;;; XYZ:3<<<"s;
const regex r(R"((\w+):(\w+);)");
smatch m;
if (regex_search(input, m, r)) {
assert(m.size() == 3);
assert(m[0].str() == "PQR:2;"); // Entire match
assert(m[1].str() == "PQR"); // Substring that matches 1st group
assert(m[2].str() == "2"); // Substring that matches 2nd group
assert(m.prefix().str() == "ABC:1-> "); // All before 1st character match
assert(m.suffix().str() == ";; XYZ:3<<<"); // All after last character match
// for (string &&str : m) { // Alternatively. You can also do
// cout << str << endl;
// }
}
return EXIT_SUCCESS;
}
smatch
is the specializations of std::match_results that stores the information about matches to be retrieved.std::regex_match
function fits perfectly.bool is_valid_email_id(string_view str) {
static const regex r(R"(\w+@\w+\.(?:com|in))");
return regex_match(str.data(), r);
}
int main() {
assert(is_valid_email_id("vishalchovatiya@ymail.com") == true);
assert(is_valid_email_id("@abc.com") == false);
return EXIT_SUCCESS;
}
std::regex_match
! not std::regex_search
! The rationale is simple **_std::regex_match_**
matches the whole input sequence.-O3
flag. And that is ridiculous. But don’t worry this is already been brought to the ISO C++ community. And soon we may get some updates. Meanwhile, we do have other alternatives (mentioned at the end of this article).#coding #cpp #regular-expressions #programming #expression