Regular Expressions in JavaScript for Beginners: A Comprehensive Guide

 

What is regular expression?

Regular Expression came from the mathematical study of formal languages. It is a string containing a combination of normal characters and special metacharacters or metasequences. It is used to construct a specific search pattern from text. Normal characters match themselves in the specified string, while metacharacters and metasequences are characters or strings that represent the location, quantity or type of characters. Regular expression work by pattern matching, which consists of finding a section or part of text that it describe. The underlying code that does all this work is the regular expression engine.

Regular expression Character classes

JavaScript regular expressions support several types of characters and character classes, which can be used to match different types of characters, like distinguishing between letters, digits and special characters. Here are a few examples:

  • . - any character except new line.
  • [abc] - any of a, b or c.
  • [^abc] - not a, b or c.
  • [a-g] - characters between and including a and g.
  • \w - any letter or digit including underscore. Equivalent to [a-zA-Z0-9_]
  • \d - any digit from 0 to 9. Equivalent to [0-9]
  • \s - any whitespace.
  • \W - not word.
  • \D - not digit.
  • \S - not whitespace.

Regular Expression Anchors

Includes boundaries, like beginning or endings of the specified lines or words.

  • ^ - indicates start of the string.
  • $ - indicates end of the string.
  • \b - indicates word boundary.
  • \B - not word boundary.

Regular Expression Groups and Lookahead/Lookbehind

Distinguishes between certain patterns by surrounding them in groups, and using look-ahead, look-behind to give some conditions to your regular expression.

  • (abc) - capture abc as a group.
  • \1 - backreference to group number 1, in JavaScript we use $1 instead in replace function to say include the matched group number 1 here.
  • (?:abc) - non-capturing group, it's used as a pattern to test but don't include the following match.
  • (?=abc) - positive lookahead, means test if it have this pattern after it but don't include watch you catch with that pattern.
  • (?!abc) - negative lookbehind, means test if it have this pattern before it but don't include what you catch with that pattern in the result.

Regular Expression Quantifiers

  • a* - 0 or more matches to character a.
  • b+ - 1 or more matches to character b.
  • c? - 0 or 1 match to character c.
  • a{3} - matches "aaa"
  • a{3,} - matches "aaa,aaaa,aaaaa...".
  • a{2,3} - matches "aa,aaa".

This was a brief introduction to regular expressions, if you want to know more you can visit, Regular Expression.

Regular Expression in JavaScript

The syntax of regular expressions in JavaScript is similar to the original formulation from Bell Labs, with some reinterpretation adopted from the Perl programming language. Writing regular expressions can sometimes be hard to read or understand if you weren't the one who wrote it. It's dangerous, because if for example you created a search pattern to strip HTML tags before inserting the data in the Database, and that regular expression was flawed then that would cause your website to be vulnerable to XSS (Cross Site Scripting),  I can give plenty of examples but I hope you get the idea.

It is necessary to have a fairly complete understanding of the full complexity of regular expressions to correctly read them. Regular expressions tend to be extremely cryptic. They are easier to use in their simplest form, but they can quickly become confusing. Despite their their obvious drawbacks, regular expressions are widely used in JavaScript.

Regular expressions in javascripts are concidered to be objects. and they are used with these methods of String class:

  • exec()- method executes a search for a match in a specified string and returns a result array, or null.
  • test() - method executes a search for a match between a regular expression and a specified string. Returns true or false.
  • match() - method retrieves the result of matching a string against a regular expression.
  • matchAll() - method returns an iterator of all results matching a string against a regular expression, including capturing groups.
  • replace() - method returns a new string with one, some, or all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function called for each match. If pattern is a string, only the first occurrence will be replaced. The original string is left unchanged.
  • relaceAll() -  method returns a new string with all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function to be called for each match. The original string is left unchanged.
  • search() - method executes a search for a match between a regular expression and this String object.
  • split() - method takes a pattern and divides a String into an ordered list of substrings by searching for the pattern, puts these substrings into an array, and returns the array.

Example of using regular expressions in JavaScript

JavaScript provides a built-in object called RegExp, which is used to represent regular expressions. You can create a new regular expression by either using the RegExp constructor or by using a regular expression literal. A regular expression literal is a pattern enclosed between two forward slashes (/) and can be used to create a RegExp object. For example:

// Using the RegExp constructor
let re1 = new RegExp("abc");

// Using a regular expression literal
let re2 = /abc/;

Both re1 and re2 are equivalent and match the string "abc".

Once you have created a regular expression, you can use it to match patterns in a string. The most common method for this is the .test() method, which returns a boolean indicating whether the regex matches the string or not. For example:

let re = /abc/;
console.log(re.test("abcdef")); // true
console.log(re.test("abxdef")); // false

Another common method is the .exec() method, which returns an array containing the matched text and any capture groups, or null if there is no match. For example:

let re = /a(b)c/;
console.log(re.exec("abcdef")); // ["abc", "b"]
console.log(re.exec("abxdef")); // null

You can also use the .match() method to get an array containing all of the matches, and the .search() method to get the index of the first match.

Lastly, the .replace() method can be used to replace all occurrences of a regex match in a string. For example:

let re = /a(b)c/g;
console.log("abcdefabc".replace(re, "x")); // "xdefx"

Let's utilize all of the knowledge we have acquired thus far in this final example by matching all IPv4 addresses from the inputted string. Here is an example:

function matchIPv4s(txt)
{
	const re = /\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b/g; // regular expression will any IPv4 addresses in the given text
	return txt.match(re)// will return array of matched ip addresses
}


const result = matchIPv4s("Lorem ipsum dolor sit amet, consectetur adipisicing elit, \
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, \
127.0.0.1 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. \
Duis aute irure dolor in reprehenderit 255.255.255.0 in voluptate velit esse cillum dolore eu \
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, 8.8.4.4 sunt in culpa qui \
officia deserunt mollit anim id est laborum.");
console.log(result); // will print => Array(3) [ "127.0.0.1", "255.255.255.0", "8.8.4.4" ]

As you can see the regular expression in the 3rd line,  is a little bit hard to understand. It search for 4 octets separated with dots, also checks for the validity of the IPv4.

  • /g - means global so look for match in the whole string not just the first match.
  • \b - means word boundary, which is located when the type of word change, like in this "123test456" it's between 123 and test and 456.
  • (?:) - non-capturing group it's used to avoid capturing that specific group inside it
  • [0-1]?[0-9]?[0-9] - The first octet can be from 0 or 00 or 000 to 199 with the same pattern, like 1, 01 or 001, etc...
  • 2(?:[0-4][0-9]|5[0-5]) - The first octet can start with first digit as 2 and the other two digits are either from 00 to 49, or from 50 to 55 etc...
  • and this repeats for each octet.

Conclusion

I hope this tutorial has provided a useful introduction to regular expressions in JavaScript for beginners. While there is much more to learn about regular expression, this should be enough to get you started on your journey. Remember to keep practicing and experimenting until you become the best, thank you.

Post a Comment

0Comments
Post a Comment (0)

#buttons=(Accept !) #days=(20)

Our website uses cookies to enhance your experience. Learn More
Accept !