250 likes | 282 Views
Advanced Find and Replace with Regular Expressions. Robert Kiffe Senior Customer Support Engineer. Agenda. Review: Global Find and Replace Introduction to Regular Expressions Challenge #1 Solution Advanced Regular Expressions Challenge #2 Solution Hands On Q & A.
E N D
Advanced Find and Replacewith Regular Expressions Robert KiffeSenior Customer Support Engineer
Agenda • Review: Global Find and Replace • Introduction to Regular Expressions • Challenge #1 • Solution • Advanced Regular Expressions • Challenge #2 • Solution • Hands On • Q & A
Global Find and Replace • Location: Content > Find and Replace • Administrators Only (User Level 10) • Searches a single site • Adjust ‘Scope’ to limit searchable content • Literal Text or Regex patterns
Global Find and Replace • Find • Simple search with results list • Preview Replace • Safe multi-step process • Perform ‘sample’ find/replace and display results list • Select pages from results to perform the actual find/replace operation • (Optional) Publish selected results
Regular Expressions • Regular Expression • A pattern that ‘describes’ a certain amount of text • The concept arose in the 1950s when the American mathematician Stephen Cole Kleene formalized the description of a regular language. (Thanks Wikipedia) • Now used in almost every major programming language
Literal Characters • Literal Text Matches • Most characters match exactly themselves • Case Sensitive Robert does not like to be called robert. Robert does not like to be called robert. Robert
Special Characters • Symbol characters that have special purpose (explained later) • Full List: \ ^ $ . | ? * + ( ) [ { • To match as literal characters, you must ‘escape’ them by adding “\” in front Rob does not like to be called Robert? Rob does not like to be called Robert? Robert\?
Special Character: Period • ‘Wildcard’ Character • Matches any character except newline. Robert does not like to be calledoberth, Bobert, or Goobert. Robert does not like to be calledoberth, Bobert, or Goobert. .obert
Special Characters: Quantifiers • Symbol characters that define how many of the previous character(s) to match • ? (0 or 1) • * (0 or More) • +(1 or More) • Use Curly Brackets to indicate an exact number or range • {3} (Exactly 3) • {3,} (3 or More) • {3,5} (3, 4, or 5) • Only modifies the previous character (or group)
Special Characters: Quantifiers • Quantifiers: Example • ? : 0 or 1 Robert does not like to be called Roberta. Robert does not like to be called Roberta. Roberta?
Special Characters: Parenthesis • Capture Groups • Encapsulate a character sequence using parentheses: “(…)” • Add a quantifier to affect the whole group • Replace • In the ‘replace field’, refer to your groups using the “dollar sign” and then the group number: $# • Count the opening parenthesis characters, “(” , to determine the correct #
Special Characters: Parenthesis • Capture Group: Example FIND I like https://school.edubut not https://www.school.edu. I like https://school.edu but not https://www.school.edu. https://www\.(school\.edu) REPLACE I like https://school.edubut not https://school.edu. https://$1
Challenge #1 • Find All Links to a Particular Domain • Problem is that it can have many formats: • Root-relative “/” • /about/contact.html • Absolute (either protocol) • http://www.gallena.com/about/contact.html • https://www.gallena.com/about/contact.html • No Subdomain • http://gallena.com/about/contact.html • Examples: • <a href="/about/"> • <a href="http://www.gallena.com/about/">
Challenge #1: Tips • Use a quantifier (ie. ‘?’) to make a part of the URL optional • a? • Combine a quantifier with Parenthesis to make a substring of the URL optional • (abc)?
Challenge #1: Solution Steps to Build the Regex Pattern: • href="https?://www\.gallena\.com/ (HTTPS protocol) • href="https?://(www\.)?gallena\.com/ (+Subdomain optional) • href="(https?://(www\.)?gallena\.com)?/ (+Root-relative) • Example Matches: • <a href="http://www.gallena.com/about/">About</a> • <a href="http://gallena.com/records/index.html">Records</a> • <a href="/academics/index.html">Academics</a> • <a href="https://www.gallena.com/portal/">Portal Login</a>
Special Characters: Square Brackets • Character Sets • Characters encased inside square brackets define all possible matches for a single text character: [abc] • A quantifier placed directly after the set will affect the whole character set • Placing a “-” between characters indicates a ‘range’ • Placing a “^” as the first item in the set creates a ‘negative pattern’ • Quantifier characters become literal matches: ? + * { } • Period character becomes literal match: .
Character Sets: Examples Robert does not like to be called robert. Robert does not like to be called robert. [Rr]obert Robert does not like to be called Richard. Robert does not like to be called Richard. [A-Z][a-z]+ RobertdoesnotliketobecalledRoberta. Robert doesnotliketobecalled Roberta. [^A-Z .]+
Shorthand Character Classes • Certain characters can reference a range of characters when ‘escaped’ by a backslash (\) • Common Examples: • \d matches all digit characters: [0-9] • \w matches all ‘word’ characters: [A-Za-z0-9_] • \s matches all ‘space’ characters (including line breaks) • Using the capital letter will ‘inverse’ the match • \S matches all non-space characters: [^\s]
Character Classes: Example Jenny’s number is 867-5309. Jenny’s number is 867-5309. \d{3}-\d{4}
Greedy Matches • When using quantifiers, a careless (or purposeful) pattern could match beyond an expected result • Apply an extra coating of “?” after the initial quantifier, to make the pattern stop at the first successful match Robert likes dogs! Robert likes cats! Robert likes .*! Robert likes dogs!Robert likes cats! Robert likes .*?!
Challenge #2 • Set External Links to Create a New Window • Need to add the attribute target="_blank" • Links will start with “http” or “https” • Examples: • <a href="http://www.omniupdate.com/">OmniUpdate</a> • <a href="https://petitions.whitehouse.gov/">Petitions</a> • Desired Result: • <a href="http://www.omniupdate.com/" target="_blank">OmniUpdate</a> • <a href="https://petitions.whitehouse.gov/" target="_blank">Petitions</a>
Challenge #2: Tips • Remember lessions learned from Challenge #1 • (abc)? • Remember syntax requirements of HTML (or XML) • HTML/XML have special characters that can only be used in certain places • Use a “Not” to match any character not in the set • [^abc] • Use capture groups to re-place content as needed • (abc) -> $1
Challenge #2: Solution Steps to Build the Regex Pattern FIND: • <a href="http://www\.omniupdate\.com/">OmniUpdate</a>(Starting Pattern) • <a\s*href="http://www\.omniupdate\.com/"\s*>(Account for whitespace) • <a\s*href="https?://[^"]+"\s*>(Match any absolute URL) • (<a\s*href="https?://[^"]+"\s*)>(Capture Group) REPLACE: $1 target="_blank">(Use capture group, then end anchor tag) • Example Match/Replace: • <a href="http://www.omniupdate.com/about/">About</a> (Full Match) • <a href="http://www.omniupdate.com/about/">About</a> (Capture) • <a href="http://www.omniupdate.com/about/" target="_blank">About</a> (Replace)
Thank you. Robert Kiffe Sr. Customer Support Engineer OmniUpdate 805-484-9400 ext 223 rkiffe@omniupdate.com outc18.com/surveys