340 likes | 540 Views
Regular Expressions. Adapted from Javascript Regular Expressions by Bob Molnar Indiana University/IUPUI Streaming Media Laboratory http://wally.cs.iupui.edu/n341_05/. Goals. By the end of this unit you should … Understand what regular expressions are
E N D
Regular Expressions Adapted from Javascript Regular Expressions by Bob Molnar Indiana University/IUPUI Streaming Media Laboratory http://wally.cs.iupui.edu/n341_05/
Goals By the end of this unit you should … • Understand what regular expressions are • Be able to use regular expressions to match text against a particular string pattern • Be able to use special regular expression characters to match multiple search terms against strings
What is a Regular Expression? • A regular expression is a pattern of characters. • We use regular expressions to search for matches on particular text. • In JavaScript, we can use regular expressions by creating instances of the regular expression object, RegExp.
The Regular Expression Constructor • We can declare a regular expression by using a constructor. General Form:var regExpName = new RegExp(“RegExp”, “flags”); • Example:var searchTermRE = new RegExp(“s”,“gi”);(search for the letter “s” globally, ignore case)
The Regular Expression Literal • Declaring using a reg. expression literal:var searchTermRE = /X1X4/gi; • When declaring regular expression literals, do NOT include quotation marks and offset the expression with a pair of forward slashes. • By convention, variables acting as regular expressions end with the suffix “RE.” Flags come after the second forward slash.
Literal Characters • A lot of the time, we use regular expressions to match specific patterns, like the word “java”:var firstRE = /java/;(would match the words “java”, “javascript”, “javabeans”, “myJava”) • Matching character for character is termed matching literals.
Non-Printing Literal Characters • We also consider some non-printing characters as literals: • \t (tab character) • \n (newline character) • \0 (NUL character – null value)
Metacharacters • Sometimes, we want to search not for specific patterns, but for parts of patterns. • Consider searching for all lines that end with the letter “s”. To do so, we’ll need to use metacharacters:var firstRE = /s$/;(finds all phrases that end with “s”)
What are Metacharacters? • Metacharacters are characters used to represent special patterns that don’t necessarily fit in the range of standard letters and numbers (A-Z; a-z; 0-9, etc.). • We often use symbols as metacharacters to indicate a special circumstance. • Some of these symbols include:$ . ^ * ?
Metacharacters as Literals • What if I want to search for a literal symbol that is also used as a metacharacters? To search for a symbol as a literal and not as a metacharacter, we use the \ (backslash) to turn “off” the metacharacter property. • $ used as a metacharacter:var firstRE = /s$/; • $ used as a literal character:var firstRE = /\$/;
Flags • When searching, flags can help refine or expand a search • Flags modify a particular search to fit certain criteria • There are three common flags, the global flag, ignore case flag and the multiline mode flag.
The Global flag • In a regular expression without flags, JavaScript will return only the first instance of a search term:var mySearchRE = /X1X4/;(returns only the first instance of “X1X4”) • To modify the search to include all instances of “X1X4”, we would use the global flag:var mySearchRE = /X1X4/g;(returns all instances of “X1X4”)
The Ignore Case flag • In a regular expression without flags, JavaScript only returns an exact match:var mySearchRE = /X1X4/;(returns only an instance of “X1X4”, but not “x1x4” or “x1X4”, etc.) • To modify the search to include instances of “X1X4”, regardless of case, we would use the ignore case flag:var mySearchRE = /X1X4/i;(returns an instance of “X1X4”,“x1x4”, x1X4”, etc.)
The Multiline flag • A single string may include newline characters. • We can use the multiline flag which allows us to search at the beginning or end of a line, not just the beginning or end of a string. To turn it on:var mySearchRE = /^X1X4/m;
Combining Flags • We can also combine flags to expand our search:var mySearchRE = /X1X4/gi;(returns all instances of “x1x4”, “x1X4”, “X1x4” & “X1X4”)
Searching for Matches Only at the Beginning of a Line • Consider the following string:Jimmy the Scot scooted his scooter through the Park.The park guard watched Jimmy do this. • The code:var mySearchRE = /^Jimmy/gm;(would only return “Jimmy” from the first line) • The ^ metacharacter says “look only for matches at the beginning of the string or line (multiline mode).”
Searching for Matches Only at the End of a Line • Consider the following string:Jimmy the Scot scooted his scooter through the Park.The park guard watched Jimmy do this. • The code:var mySearchRE = /his$/gm;(would only return “his” from the second line) • The $ metacharacter says “look only for matches at the end of the string or line (multiline mode).”
Using Boundaries • Consider the following string:Jimmy the Scot scooted his scooter through the Park.The park guard watched Jimmy do this. • To search for the all instances of the word “the” we could use the space metacharacter (\s):var mySearchRE = /\sthe\s/gim;(Ignores “The” that begins the second line, since it has no space before it -- it starts a line)
Using Boundaries • Consider the following string:Jimmy the Scot scooted his scooter through the Park.The park guard watched Jimmy do this. • Instead of using a space character, we can use the boundary (\b). The boundary metacharacter searches for all instances of a pattern which are not a prefix (\b at the beginning of a search pattern) or a suffix (\b at the end of a search pattern) of another word:var mySearchRE = /\bthe\b/gim;
Using Boundaries (continued) • Our string:Jimmy the Scot scooted his scooter throughthe Park.The park guard watched Jimmy do this. • Code:var mySearchRE = /\bt/gim; • Search for all matches that begin with “t”. Ignore “t” if “t” is in the middle or at the end of a word.
Searching for Multiple Patterns at the Same Time • Consider the following string:lop, mop, bop, sop, pop, gop, top, fop • To search for the all instances that end with “op” we would use a wildcard character (.) There no need for the global flag, because the global is inherent in the wildcard character:var mySearchRE = /.op/;(returns the all the words)
Searching for Multiple Patterns at the Same Time • Consider the following string:lop, mop, bop, sop, pop, gop, top, fop • To search only for the instances that match “bop”, “lop” or “pop” we would use brackets to include the search characters, but exclude all others ([]):var mySearchRE = /[blp]op/;
Searching for Multiple Patterns at the Same Time • Consider the following string:lop, mop, bop, sop, pop, gop, top, fop • We can also use ranges of letters in the brackets:var mySearchRE = /[a-m]op/;(returns “bop”, “fop”, “gop”, “lop” and “mop”, but ignores all other words ending with “op”)
Excluding Patterns • Consider the following string:lop, mop, bop, sop, pop, gop, top, fop • To search for the all instances that end with “op” except those that begin with “b”, “l” or “p”, we use the not metacharacter (^):var mySearchRE = /[^blp]op/;(returns the all the words except “bop”, “lop” and “pop”) • Inside brackets, the ^ symbol means “not” and DOES NOT mean the beginning of a line!
Excluding Patterns • Our string:lop, mop, bop, sop, pop, gop, top, fop • We can also use ranges of letters in the brackets:var mySearchRE = /[^a-m]op/;(returns all words except “bop”, “fop”, “gop”, “lop” and “mop”)
Other Metacharacters: ?, * and + • To match zero or one characters:var mySearchRE = /b?onk/;(matches “bonk” or “onk”) • To match zero or n characters:var mySearchRE = /b*onk/;(matches “bonk”, “onk” or “bbonk”) • To match one or n characters:var mySearchRE = /b+onk/;(matches “bonk” or “bbonk”, but not “onk”)
Other Metacharacters:{ } • To match a specific number of characters:var mySearchRE = /g{2}op/;(matches “goop”, but not “gop” or “gooop”) • To match between nand mcharacters:var mySearchRE = /g{1,3}p/;(matches “gop” “goop” or “gooop” only)
String.search() Method • The String.search() method gives us the character position (index number) of where the search term starts or –1 if there is not match. • The String.search() does not perform global searches and will ignore the “g” flag!
String.match() Method • The String.match() method returns an array containing all of the matches from a string. • Unlike the String.search()method, the String.match()method does perform global searches.
Summary • We can use a regular expression to search for a pattern of characters. • We can create a JavaScript regular expression by using the RegExp constructor or by creating a regular expression literal. continued …
Summary • We can use the String.search() method to find the find first occurrence of a regular expression. • We can use the String.match() method to return an array of all occurrences of a regular expression.