280 likes | 388 Views
Regular Expressions. A Gentle Introduction Rani Nelken * Chinese examples and reference prepared by Chen Shih-Pei. Motivation. Powerful language for specifying text patterns Search, substitute, split, etc. jEdit support. Running example. "Exuberance is Beauty." --William Blake
E N D
Regular Expressions A Gentle Introduction Rani Nelken *Chinese examples and reference prepared by Chen Shih-Pei
Motivation • Powerful language for specifying text patterns • Search, substitute, split, etc. • jEdit support
Running example "Exuberance is Beauty." --William Blake 魏良弼,字師說,新建人。嘉靖二年進士。授松陽知縣,召拜刑科給事中。 --明史列傳
Simple match "Exuberance is Beauty." --William Blake be 魏良弼,字師說,新建人。嘉靖二年進士。授松陽知縣,召拜刑科給事中。 --明史列傳 字
Period "Exuberance is Beauty." --William Blake . 魏良弼,字師說,新建人。嘉靖二年進士。授松陽知縣,召拜刑科給事中。 --明史列傳 . => Period matches any single character
Repeating Things "Exuberance is Beauty." --William Blake .* .+ * means repeat zero or more times + means repeat one or more times
Repeating Things 魏良弼,字師說,新建人。嘉靖二年進士。授松陽知縣,召拜刑科給事中。--明史列傳 .* .+ * means repeat zero or more times + means repeat one or more times
Repeating Things "Exuberance is Beauty." --William Blake l+ "Exuberance is Beauty." --William Blake b+
Repeating Things 魏良弼,字師說,新建人。嘉靖二年進士。授松陽知縣,召拜刑科給事中。--明史列傳 -+ 魏良弼,字師說,新建人。嘉靖二年進士。授松陽知縣,召拜刑科給事中。--明史列傳 ,+
Repeating Things "Exuberance is Beauty." --William Blake ut+ "Exuberance is Beauty." --William Blake ut*
Repeating Things 魏良弼,字師說,新建人。嘉靖二年進士。授松陽知縣,召拜刑科給事中。--明史列傳 知縣+ 魏良弼,字師說,新建人。嘉靖二年進士。授松陽知縣,召拜刑科給事中。--明史列傳 知州*
Counting Repetitions "Exuberance is Beauty." --William Blake ut? "Exuberance is Beauty." --William Blake l{2} l{2,3}
Counting Repetitions 魏良弼,字師說,新建人。嘉靖二年進士。授松陽知縣,召拜刑科給事中。--明史列傳 知州? 魏良弼,字師說,新建人。嘉靖二年進士。授松陽知縣,召拜刑科給事中。--明史列傳 -{2} -{2,3}
Character classes "Exuberance is Beauty." --William Blake \w "Exuberance is Beauty." --William Blake \W "Exuberanceis Beauty." --William Blake \s "Exuberanceis Beauty." --William Blake \S
Character classes 魏良弼,字師說,新建人。嘉靖二年進士。 \w 魏良弼,字師說,新建人。嘉靖二年進士。 \W 魏良弼,字師說,新建人。嘉靖二年進士。 \s (no match) 魏良弼,字師說,新建人。嘉靖二年進士。 \S *Note: \w is a more vague class for Chinese characters
Character classes "Exuberance is 123." \d [0-9] "Exuberance is 123." \d+ [1-3]+ "Exuberance is 123." [x-y] : character range
Character classes Define your own character classes: 魏良弼,字師說,新建人。嘉靖二年進士。 [一二三四五六七八九十] For consecutive Chinese numbers: 正德十二年進士。 [一二三四五六七八九十]+
Greedy and Ungreedy Matching "Exuberance is Beauty." --William Blake .+b "Exuberance is Beauty." --William Blake .+B "Exuberance is Beauty." --William Blake .+?B
Greedy and Ungreedy Matching "Exuberance is Beauty." --William Blake .+b "Exuberance is Beauty." --William Blake .+B "Exuberance is Beauty." --William Blake .+?B
Greedy and Ungreedy Matching 宣宗孝恭皇后孫氏,鄒平人。幼有美色。父忠,永城縣主簿也。誠孝皇后母彭城伯夫人,故永城人,時時入禁中 ,.+人 宣宗孝恭皇后孫氏,鄒平人。幼有美色。父忠,永城縣主簿也。誠孝皇后母彭城伯夫人,故永城人,時時入禁中 ,.+?人
Notes on writing your own regex • Multiple styles: the one we’re using is Perl-like • Be careful and precise on what you write: you might match more patterns than you want
Greedy quantifiers Greedy matching (default): is to match with the longest possible string of characters that matches the pattern
Reluctant quantifiers Reluctant matching: to force the expression to find the shortest rather than the longest match
Logical operators • Back references