340 likes | 750 Views
Regular Expressions in PHP. Supported RE’s. The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE library (Perl-Compatible Regular Expressions )
E N D
Supported RE’s • The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE library (Perl-Compatible Regular Expressions) • The oldest set of regex functions are those that start with ereg. They implement POSIX Extended Regular Expressions. Portable Operating System Interface [for Unix]
PCRE library (Perl-Compatible Regular Expressions • When using the PCRE functions, it is required that the pattern is enclosed by delimiters. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character. • Often used delimiters are forward slashes (/), hash signs (#) and tildes (~). • The following are all examples of valid delimited patterns. • /foo bar/ • #^[^0-9]$# • +php+ • %[a-zA-Z0-9_-]%
Delimiters • If the delimiter needs to be matched inside the pattern it must be escaped using a backslash. • If the delimiter appears often inside the pattern, it is a good idea to choose another delimiter in order to increase readability. • /http:\/\// • #http://#
Meta-characters • The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. • These are encoded in the pattern by the use of meta-characters. Some of the meta chars are.. • \ • general escape character with several uses • ^ • assert start of subject (or line, in multiline mode) • $ • assert end of subject (or line, in multiline mode) • . • match any character except newline (by default) • [ • start character class Definition
Meta Chars contd.. • ] • end character class definition • | • start of alternative branch • ( • start subpattern • ) • end subpattern • ? • extends the meaning of (, also 0 or 1 quantifier, also makes greedy quantifiers lazy • * • 0 or more quantifier • + • 1 or more quantifier • { • start min/max quantifier
} • end min/max quantifier • Part of a pattern that is in square brackets is called a "character class". In a character class the only meta-characters are: • \ • general escape character • ^ • negate the class, but only if the first character • - • indicates character range • ] • terminates the character class
Escape Sequences • The backslash character has several uses. • Firstly, if it is followed by a non-alphanumeric character, it takes away any special meaning that character may have. • This use of backslash as an escape character applies both inside and outside character classes. • For example, if you want to match a "*" character, you write "\*" in the pattern.
A second use of backslash provides a way of encoding non-printing characters in patterns in a visible manner • \a • alarm, that is, the BEL character (hex 07) • \f • formfeed (hex 0C) • \n • newline (hex 0A) • \t • tab (hex 09)
PCRE Functions • preg_filter — Perform a regular expression search and replace • preg_grep — Return array entries that match the pattern • preg_last_error — Returns the error code of the last PCRE regex execution • preg_match_all — Perform a global regular expression match • preg_match — Perform a regular expression match • preg_quote — Quote regular expression characters • preg_replace_callback — Perform a regular expression search and replace using a callback • preg_replace — Perform a regular expression search and replace • preg_split — Split string by a regular expression
POSIX Regex Functions • ereg_replace — Replace regular expression • ereg — Regular expression match • eregi_replace — Replace regular expression case insensitive • eregi — Case insensitive regular expression match • split — Split string into array by regular expression • spliti — Split string into array by regular expression case insensitive • sql_regcase — Make regular expression for case insensitive match
ereg • (PHP 4, PHP 5) • ereg — Regular expression match • Syntax • int ereg ( string $pattern , string $string [, array &$regs ] ) • Searches a string for matches to the regular expression given in pattern in a case-sensitive way. • pattern • Case sensitive regular expression. • string • The input string. • regs • If matches are found for parenthesized substrings of pattern and the function is called with the third argument regs, the matches will be stored in the elements of the array regs. • $regs[1] will contain the substring which starts at the first left parenthesis; $regs[2] will contain the substring starting at the second, and so on. $regs[0] will contain a copy of the complete string matched.
Example • <?phpif (ereg ("([0-9]{4})-([0-9]{1,2})-([0-9]{1,2})", $date, $regs)) { echo "$regs[3].$regs[2].$regs[1]";} else { echo "Invalid date format: $date";}?> • code snippet takes a date in ISO format (YYYY-MM-DD) and prints it in DD.MM.YYYY format:
eregi • (PHP 4, PHP 5) • eregi — Case insensitive regular expression match • Syntax • int eregi ( string $pattern , string $string [, array &$regs ] ) • This function is identical to ereg() except that it ignores case distinction when matching alphabetic characters. • Example • <?php$string = 'XYZ';if (eregi('z', $string)) { echo "'$string' contains a 'z' or 'Z'!";}?>
Differences from POSIX regex • As of PHP 5.3.0, the POSIX Regex extension is deprecated. • There are a number of differences between POSIX regex and PCRE regex • The PCRE functions require that the pattern is enclosed by delimiters. • Unlike POSIX, the PCRE extension does not have dedicated functions for case-insensitive matching. Instead, this is supported using the /i pattern modifier • The POSIX functions find the longest of the leftmost match, but PCRE stops on the first valid match.
References • PCRE Functions • http://www.php.net/manual/en/book.pcre.php • POSIX Regex Functions • http://www.php.net/manual/en/ref.regex.php • String Functions • http://www.php.net/manual/en/ref.strings.php