Regular Expressions : An Overview

Regular Expressions or regex for short is a pattern that describes a set of strings. In their most general form they are used to define a set without listing out all the possible elements of the set. These expressions provide a flexible means for identifying text of interest including words, special characters, or character patterns. In programming we often use regular expressions to validate data or to manipulate information by use of delimiters.

Special Characters

Regular Expressions utilize the following characters [^$.|?*+() therefore if you want to match any of these characters you will have to escape them by use of the character. For example if you wanted your pattern to match any . in your string you would use the pattern .

  • [
    The [ character denotes the start of a Character Class. A Character Class matches a single character out of all the possibilities listed before the close of the Character Class with ]. Full details on a Character Class can be found within the Basic Syntax section.
    Example: [abc] => matches the chacters a or b or c

  • The backslash character is used to escape a special character to surpress their special meaning it is also used in combination with other character to represent tabs (t), newlines (n) and other characters.
  • ^
    The carat denotes to match at the beginning of the string
    Example: ^start would match a string that begins with start
  • $
    The dollar sign denotes to match at the end of the string
    Example: $end would match a string that terminates with end
  • .
    The dot is used to represent any character
    Example: a.b would match any element that was of the format a character b e.g. aab, acb, arb, a.b
  • |
    The pipe character is used to represent the or logical operator
    Example: a|b would match a or b
  • ?
    The question mark is to make the proceeding character optional in the pattern.
    Example: ab? would match a or ab
  • *
    The star character is used to make the proceeding character repeat zero or more times.
    Example: ab* would match a, ab, abb, abbb, etc.
  • +
    The plus character is used to make the proceeding character repeat one or more times.
    Example ab+ would match ab, abb, abbb, etc
  • ()
    The parenthesis characters are used to group operations.
    Example a(b|c) would match ab and ac

Character Class

A Character Class matches a single character out of the set of characters listed in the set.  To make it easier to understand it acts as grouping of or statements where [abc] is the same as (a|b|c) with the benefit of containing special character to create additional patterns without having to list out the whole set. It is also important to note that any special character listed above that is not in the set -]^/ do not need to be escaped when used inside a Character Class


  • A hyphen character is used to denote a character range except if it is placed immediately after the opening bracket
    Example: [a-z] would match all lowercase letters
  • ^
    A carat character immediately after the opening bracket is used to negate the Character Class so that it means the opposite.
    Example: [^abc] would match all charcters except a, b, and c
  • /
    The backslash character is used to escape any special characters within the Character Class
    Example: [^]] would match the characters ^ and ]
  • ]
    The left bracket character is used to denote the close of a character class

Summary

Utilizing grouping and character classes you can create powerful expressions perform tasks such as data validation and  search and replace. To get further information you can review the Regular Expression page on Wikipedia or the site regular-expressions.info

Comments & Questions

Add Your Comment