170 likes | 300 Views
Regular Expressions Theory and Practice. Jeff Schoolcraft MDCFUG 12/13/2005. Who am I?. Jeff Schoolcraft Senior Architect / Operations Manager at RGII Technologies. Speaker at Usergroups President WinProTeam Vienna Usergroup (.NET) TDD Evangelist Tool guy. What can you expect?.
E N D
Regular ExpressionsTheory and Practice Jeff Schoolcraft MDCFUG 12/13/2005
Who am I? • Jeff Schoolcraft • Senior Architect / Operations Manager at RGII Technologies. • Speaker at Usergroups • President WinProTeam Vienna Usergroup (.NET) • TDD Evangelist • Tool guy
What can you expect? • “The gist” in 60 seconds or less. • Theory • Practical Usage • Best Practices • Hands On • A sermon • Q & A
The Gist • Regular Expressions (regex) describe patterns in strings and are often used for data validation, searching and text transformations.
Theory • BasicsA regular expression, often called a pattern, is an expression that describes a set of strings without actually listing its elements. • Say what? The set of strings {“dog”, “bog”, “fog”} can be described by this regular expression (regex): [bdf]og
Formal Language Theory • A Regular Language is any language where all possible strings of that language can be described by a regular expression
Formal Language Theory (cont’d) • Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, respectively. Given a finite alphabet Σ the following constants are defined: • (empty set) ∅ denoting the set ∅ • (empty string) ε denoting the set {ε} • (literal character) a in Σ denoting the set {a} • and the following operations: • (concatenation) RS denoting the set { αβ | α in R and β in S }. For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}. • (alternation) R|S denoting the set union of R and S. • (Kleene star) R* denoting the smallest superset of R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating zero or more strings in R. For example, {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", ... }. http://en.wikipedia.org/wiki/Regular_expression
DEMO • All binary numbers • All binary numbers that start and end in 1 • All binary number that have 00 any other bits followed by 111 and any other bits.
Practical Usage • In order of popularity • Searching • Much nicer than * or % • String Manipulation • Parsing • Replacement • Validation • Input validation • Database check constraints
Best Practices • The most important thing to remember: • Regular Expressions are greedy • Make the most explicit match possible • Just because some implementations allow *? Don’t fall back on that.
A Sermon • Some people develop a religious fascination with new tools & technologies (design patterns, regex, whatever). • Use the tools that make the most sense for your problem/solution.
Email Validation with REGEX? • Are you kidding me? • See email.regex • Multi-tiered approach, regex to test format, some code to test validity of email address.
Further Resources • Mastering Regular Expressionshttp://www.oreilly.com/catalog/regex/ • A website, 1000’s on google. http://www.regular-expressions.info/ • Mehttp://thequeue.net/blog/http://regexadvice.com/blogs/jschoolcraft/jeff@thequeue.net
Tools • The Regulator (http://regex.osherove.com/) • Expresso (http://www.ultrapico.com/Expresso.htm)