1662543060
Swifty regular expressions
This is a wrapper for NSRegularExpression
that makes it more convenient and type-safe to use regular expressions in Swift.
Add the following to Package.swift
:
.package(url: "https://github.com/sindresorhus/Regex", from: "0.1.0")
First, import the package:
import Regex
Check if it matches:
Regex(#"\d+"#).isMatched(by: "123")
//=> true
Get first match:
Regex(#"\d+"#).firstMatch(in: "123-456")?.value
//=> "123"
Get all matches:
Regex(#"\d+"#).allMatches(in: "123-456").map(\.value)
//=> ["123", "456"]
Replacing first match:
"123🦄456".replacingFirstMatch(of: #"\d+"#, with: "")
//=> "🦄456"
Replacing all matches:
"123🦄456".replacingAllMatches(of: #"\d+"#, with: "")
//=> "🦄"
Named capture groups:
let regex = Regex(#"\d+(?<word>[a-z]+)\d+"#)
regex.firstMatch(in: "123unicorn456")?.group(named: "word")?.value
//=> "unicorn"
switch "foo123" {
case Regex(#"^foo\d+$"#):
print("Match!")
default:
break
}
switch Regex(#"^foo\d+$"#) {
case "foo123":
print("Match!")
default:
break
}
Multiline and comments:
let regex = Regex(
#"""
^
[a-z]+ # Match the word
\d+ # Match the number
$
"""#,
options: .allowCommentsAndWhitespace
)
regex.isMatched(by: "foo123")
//=> true
#
?Those are raw strings and they make it possible to, for example, use \d
without having to escape the backslash.
Author: sindresorhus
Source code: https://github.com/sindresorhus/Regex
License: MIT license
#swift
1623050167
Everyone loves Mad Libs! And everyone loves Python. This article shows you how to have fun with both and learn some programming skills along the way.
Take 40% off Tiny Python Projects by entering fccclark into the discount code box at checkout at manning.com.
When I was a wee lad, we used to play at Mad Libs for hours and hours. This was before computers, mind you, before televisions or radio or even paper! No, scratch that, we had paper. Anyway, the point is we only had Mad Libs to play, and we loved it! And now you must play!
We’ll write a program called mad.py
which reads a file given as a positional argument and finds all the placeholders noted in angle brackets like <verb>
or <adjective>
. For each placeholder, we’ll prompt the user for the part of speech being requested like “Give me a verb” and “Give me an adjective.” (Notice that you’ll need to use the correct article.) Each value from the user replaces the placeholder in the text, and if the user says “drive” for “verb,” then <verb>
in the text replaces with drive
. When all the placeholders have been replaced with inputs from the user, print out the new text.
#python #regular-expressions #python-programming #python3 #mad libs: using regular expressions #using regular expressions
1601055000
Regular expressions is a powerful search and replace technique that you probably have used even without knowing. Be it your text editor’s “Find and Replace” feature, validation of your http request body using a third party npm module or your terminal’s ability to return list of files based on some pattern, all of them use Regular Expressions in one way or the other. It is not a concept that programmers must definitely learn but by knowing it you are able to reduce the complexity of your code in some cases.
_In this tutorial we will be learning the key concepts as well as some use cases of Regular Expressions in _javascript.
There are two ways of writing Regular expressions in Javascript. One is by creating a **literal **and the other is using **RegExp **constructor.
//Literal
const myRegex=/cat/ig
//RegExp
const myRegex=new RegExp('cat','ig')
While both types of expressions will return the same output when tested on a particular string, the benefit of using the RegExp
constructor is that it is evaluated at runtime hence allowing use of javascript variables for dynamic regular expressions. Moreover as seen in this benchmark test the RegExp
constructor performs better than the literal regular expression in pattern matching.
The syntax in either type of expression consists of two parts:
#regular-expressions #javascript #programming #js #regex #express
1599572340
We live in a data-centric age. Data has been described as the new oil. But just like oil, data isn’t always useful in its raw form. One form of data that is particularly hard to use in its raw form is unstructured data.
A lot of data is unstructured data. Unstructured data doesn’t fit nicely into a format for analysis, like an Excel spreadsheet or a data frame. Text data is a common type of unstructured data and this makes it difficult to work with. Enter regular expressions, or regex for short. They may look a little intimidating at first, but once you get started, using them will be a picnic!
More comfortable with python? Try my tutorial for using regex with python instead:
Regular expressions are the data scientist’s most formidable weapon against unstructured text
stringr
LibraryWe’ll use the stringr
library. The stringr
library is built off a C library, so all of its functions are very fast.
To install and load the stringr
library in R, use the following commands:
## Install stringer
install.packages("stringr")
## Load stringr
library(stringr)
See how easy that is? To make things even easier, most function names in the stringr
package start with str
. Let’s take a look at a couple of the functions we have available to us in this module:
str_extract_all(string, pattern)
: This function returns a list with a vector containing all instances of pattern
in string
str_replace_all(string, pattern, replacement)
: This function returns string
with instances of pattern
in string
replaced with replacement
You may have already used these functions. They have pretty straightforward applications without adding regex. Think back to the times before social distancing and imagine a nice picnic in the park, like the image above. Here’s an example string with what everyone is bringing to the picnic. We can use it to demonstrate the basic usage of the regex functions:
basicString <- "Drew has 3 watermelons, Alex has 4 hamburgers, Karina has 12 tamales, and Anna has 6 soft pretzels"
If I want to pull every instance of one person’s name from this string, I would simply pass the name and basic_string
to str_extract_all()
:
basicExtractAll <- str_extract_all(basicString, "Drew")
print(basicExtractAll)
The result will be a list with all occurrences of the pattern. Using this example, basicExtractAll
will have the following list with 1 vector as output:
[[1]]
[1] "Drew"
Now let’s imagine that Alex left his 4 hamburgers unattended at the picnic and they were stolen by Shawn. str_replace_all
can replace any instances of Alex with Shawn:
basicReplaceAll <- str_replace_all(basicString, "Alex", "Shawn")
print(basicReplaceAll)
The resulting string will show that Shawn now has 4 hamburgers. What a lucky guy 🍔.
"Drew has 3 watermelons, Shawn has 4 hamburgers, Karina has 12 tamales, and Anna has 6 soft pretzels"
The examples so far are pretty basic. There is a time and place for them, but what if we want to know how many total food items there are at the picnic? Who are all the people with items? What if we need this data in a data frame for further analysis? This is where you will start to see the benefits of regex.
#regex #regular-expressions #r #text-processing #unstructured-data #express
1592357333
Regular expressions or regex puts a lot of people off, just because of its look at first glance. But once you master this it will open a whole new different level of doing string manipulation and the best part of it is that it can be used with mostly all of the programming language as well as with Linux commands. It can be used to find any kind of pattern that you can think of within the text and once you find the text you can do pretty much whatever you want to do with that text. By this example, you can get an idea of how powerful and useful regex is.
What is Regex?
If you are reading this post then most probably you already know what a regex is, if you don’t know here is a quick and easy definition
Regex stands for Regular Expression and is essentially an easy way to define a pattern of characters. The most common use of regex is in pattern identification, text mining, or input validation.
#regular-expressions #python #regex #python-regex #pattern-finding
1597046145
To fully utilize the power of shell scripting (and programming), one needs to master Regular Expressions. Certain commands and utilities commonly used in scripts, such as grep
, expr
, sed
and awk
use REs.
In this article we are going to talk about Regular Expressions
Regular Expressions are sets of characters and/or metacharacters that match (or specify) patterns. The main uses for Regular Expressions (REs) are text searches and string manipulation. An RE matches a single character or a set of characters — a string or a part of a string.
Those characters having an interpretation above and beyond their literal meaning are called metacharacters.
Regex Pattern:
Generally you define a Regex pattern by enclosing that pattern (without any additional quotes) within two forward-slashes. For example, _/\w/_
, and _/[aeiou]/_
.
Case Sensitivity:
Note that regex engines are case sensitive by default, unless you tell the regex engine to ignore the differences in case.
Regex uses:
When you scan a string (may be multi-line) with a regex pattern, you can get following information:
\A
, and \Z
, rather than a matching substring, we can match whole of the given string as a unitInside a pattern, all characters except (
, )
, [
, ]
, {
, }
, |
, \
, ?
, *
, +
, .
, ^
, and $
match themselves. If you want to match one of the special characters literally in a pattern, precede it with a backslash.
Note: _Even __/_
cannot be used inside a pattern, you can escape it by preceding it with backslash.
Most regular expression flavors treat the brace {
as a literal character, unless it is part of a repetition operator like {1,3}
. So you generally do not need to escape it with a backslash, though you can do so if you want. An exception to this rule is the java.util.regex package
which requires all literal braces to be escaped.
Escaping a Metacharacter:
The _\_
(backslash) is used to escape special characters and is used to give special meaning to some normal characters. For example, _\1_
is used to back reference first word and _\d_
means a digit character, and _\D_
means non-digit character, and to specify non-printable characters such as _\n_
(LF), _\r_
(CR), and _\t_
(tab).
Note: You can also escape backslash with backslash.
Escaping a single meta-character with a backslash works in all regular expression flavors.
All other characters should not be escaped with a backslash. That is because the backslash is also a special character. The backslash in combination with a literal character can create a regex token with a special meaning. For example, \d
will match a single digit from 0
to 9
.
As a programmer, you may be surprised that characters like the single quote and double quote are not special characters.
Special characters and programming languages:
In your source code, you have to keep in mind which characters get special treatment inside strings by your programming language. That is because those characters will be processed by the compiler, before the regex library sees the string.
Non-Printable Characters:
You can use special character sequences to put non-printable characters in your regular expression.
**\t**
to match a tab character (ASCII 0x09
), **\r**
for carriage return (0x0D
) and **\n**
for line feed (0x0A
).\a
(bell, 0x07
), \e
(escape, 0x1B
), \f
(form feed, 0x0C
) and \v
(vertical tab, 0x0B
).**\r\n**
to terminate lines, while UNIX (Linux and Mac OS X) text files use **\n**
** (LF)**, and \r
(CR) in older versions of Mac OS.0xA9
. So to search for the copyright symbol, you can use \xA9
.\uFFFF
rather than \xFF
to insert a Unicode character. The euro currency sign occupies code point 0x20AC
. If you cannot type it on your keyboard, you can insert it into a regular expression with \u20AC
.Basic vs. Extended Regular Expressions:
Refer: http://www.gnu.org/software/grep/manual/html_node/Basic-vs-Extended.html
In basic regular expressions the meta-characters ?
, +
, {
, |
, (
, and )
lose their special meaning; instead use the backslashed versions \?
, \+
, \{
, \|
, \(
, and \)
.
Portable scripts should avoid {
is **grep -E**
patterns and should use [{]
to match a literal {
. Some implementations support \{
as meta-character.
Knowing how the regex engine works will enable you to craft better regexes more easily.
The regex-directed engines are more powerful:
There are two kinds of regular expression engines:
Certain very useful features, such as lazy quantifiers and backreferences, can only be implemented in regex-directed engines. No surprise that this kind of engine is more popular.
Notable tools that use text-directed engines are awk
, egrep
, flex
, lex
, MySQL
and Procmail
. For awk
and egrep
, there are a few versions of these tools that use a regex-directed engine.
You can easily find out whether the regex flavor you intend to use has a text-directed or regex-directed engine. If backreferences and/or lazy quantifiers are available, you can be certain the engine is regex-directed. You can do the test by applying the regex /regex|regex not/
to the string regex not
. If the resulting match is only regex, the engine is regex-directed. If the result is regex not, then it is text-directed. The reason behind this is that the regex-directed engine is eager.
The Regex-Directed Engine Always Returns the Leftmost Match:
This is a very important point to understand: a regex-directed engine will always return the leftmost match, even if a “better” match could be found later. When applying a regex to a string, the engine will start at the first character of the string. It will try all possible permutations of the regular expression at the first character. Only if all possibilities have been tried and found to fail, will the engine continue with the second character in the text. Again, it will try all possible permutations of the regex, in exactly the same order. The result is that the regex-directed engine will return the leftmost match.
#regex #patterns #programming #linux #regular-expressions