180 likes | 477 Views
Regular Expressions for PHP. Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com). What are Regular Expressions. Regular expressions are a syntax to match text. They date back to mathematical notation made in the 1950s.
E N D
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)
What are Regular Expressions • Regular expressions are a syntax to match text. • They date back to mathematical notation made in the 1950s. • Became embedded in unix systems through tools like ed and grep.
What are RE • Perl in particular promoted the use of very complex regular expressions. • They are now available in all popular programming languages. • They allow much more complex matching than strpos()
Why use RE • You can use RE to enforce rules on formats like phone numbers, email addresses or URLs. • You can use them to find key data within logs, configuration files or webpages.
Why use RE • They can quickly make replacements that may be complex like finding all email addresses in a page and making them address [AT] site [dot] com. • You can make your code really hard to understand
Syntax basics • The entire regular expression is a sequence of characters between two forward slashes (/) • abc - most characters are normal character matches. This is looking for the exact character sequence a, b and then c • . - a period will match any character (except a newline but that can change) • [abc] - square brackets will match any of the characters inside. Here: a, b or c.
Syntax basics • ? - marks the previous as optional. so a? means there might be an a • (abc)* - parenthesis group patterns and the asterix marks zero or more of the previous character. So this would match an empty string or abcabcabcabc • \.+ - the backslash is an all purpose escape character. the + marks one or more of the previous character. So this would match ......
More syntax tricks • [0-4] - match any number from 0 to 4 • [^0-4] - match anything not the number 0-4 • \sword\s - match word where there is white space before and after • \bword\b - \b marks a word boundary. This could be white space, new line or end of the string
More syntax tricks • \d{3,12} - \d matches any digit ([0-9]) while the braces mark the min and max count of the previous character. In this case 3 to 12 digits • [a-z]{8,} - must be at least 8 letters
Matching Text • Simple check: preg_match(“/^[a-z0-9]+@([a-z0-9]+\.)*[a-z0-9]+$/i”, $email_address) > 0 • Finding: preg_match(“/\bcolou?r:\s+([a-zA-Z]+)\b/”, $text, $matches); echo $matches[1]; • Find all: preg_match_all(“/<([^>]+)>/”, $html, $tags); echo $tags[2][1];
Matching Lines • This is more for looking through files but could be for any array of text. • $new_lines = preg_grep(“/Jan[a-z]*[\s\/\-](20)?07/”, $old_lines); • Or lines that do not match by adding a third parameter of PREG_GREP_INVERT rather than complicating your regular expression into something like /^[^\/]|(\/[^p])|(\/p[^r]) etc...
Replacing text preg_replace( “/\b[^@]+(@)[a-zA-Z-_\d]+(\.)[a-zA-Z-_\d\.]+\b/”, array(“ [AT] “, “ [dot] “), $post);
Splitting text • $date_parts = preg_split(“/[-\.,\/\\\s]+/”, $date_string);
Tips • Comment what your regular expression is doing. • Test your regular expression for speed. Some can cause a noticeable slowdown. • There are plenty of simple uses like /Width: (\d+)/ • Watch out for greedy expressions. Eg /(<(.+)>)/ will not pull out “b” and “/b” from “<b>test</b>” but instead will pull “b>test</b”. A easy way to change this behaviour is like this: /(<(.+?)>)/
References • http://en.wikipedia.org/wiki/Regular_expressions • http://php.net/manual/en/ref.pcre.php • Thank you