240 likes | 269 Views
Learn how regular expressions (RegEx) can be used in search, text processing, and programming languages like Perl and Java. Understand the structure, literal matching, character classes, negation, start/end of line, repeated matches, word selection, and more. Explore examples and references.
E N D
Regular Expressions ashishFA@cse
Where can I use • Powerful in - Search , search and replace, text processing • Text Editors – vi, editplus • Programming languages - perl, Java • Grep,awk
Before RegEx • Wildcard • *.txt • My_report*.doc Here the * indicates any number of any characters.
! Regular expressions (RegEx) tend to be easier to write than they are to read
What is RegEx • a regular expression-- a pattern that describes or matches a set of strings • Matched text– chunk of text which matches the regular expression. • ca[trn] • Matches car, can, cat Editplus is used throughout this presentation as tool to demonstrate the regular expressions
Structure of RegEx • Made up of normal characters and metacharacters. • Metacharacters – special function $^ . \ [] \( \) + ? • $ means end of line • ^ start of line
Literal match • RegEx: cat will match the word cat • It will also match words like concatenation , delicate, located, modification • It is not desired sometimes ? • solution
Matching • Match the space before and after “cat” • “ cat ” • ? Still problem
Character class • Want to search ‘in’ or ‘on’ .. • So searching RegEx : [io]n will match in and on both • [ ] : used to specify a set of character to select from. • [a-h] : indicates set of all characters from a to h • [4-9A-T]
Character class • It can also contain individual characters as : [acV5y0] • [0-9] : ? • [0-9][0-9] :? • 18[0-9][0-9]:?
Example • set of vowels • [aeiou] • set of consonents • [bcdfghjklmnpqrstvwxyz] • Consider matching words which start with 2 vowels and end with consonant • [aeiou][aeiou][bcdfghjklmnpqrstvwxyz] ? • “ [aeiou][aeiou][bcdfghjklmnpqrstvwxyz] ”
Negation • The absence of any character or set of character can be shown using ^ symbol • [^ab^8] : means not a , but b , but not 8 • [^c-p] : means any character other than c..p • “[^t]ion ” : select all words ending with ion but with not before it
Start/End of line • ^ : indicates start of line • $ : indicates end of line Example: search lines starting with I Use RegEx : “^I ” search lines ending ending with is Use RegEx : “ is$”
match • . : Any character match • e.e : match all strings where first letter is e and last is e. • Try “ e.e ” • If you want only words to be searched then change the query to • e[a-z]e
Repeated match • * : match the previous character or character-class zero or more times • be* : will match sequence of zero or more ‘e’ preceded by b • + : similar to * • Only difference is that it matches sequence of one or more.
Selecting a number • Single digit : [0-9] • When single digit is repeated zero or more times it is a number. • (digit)repeat • [0-9]* • $[0-9]* : ? • \$[0-9]*
Selecting a word • Word is composed of alphabets • A word is : [a-z]* • A word in all capital letters : ?? • A word starting with capital letter :[ ][ ]*
Alternate match • | : symbol is used to specify alternate match • Search: (above)|(below)
Search • Day Words • [a-z]*day • “[a-z]+day ” - “[A-Z][a-z]+day ”
Escaping Special meaning • How to match (, ) or * • To match the characters which are used as Metacharacter, ‘\’ is added before them as an escape character. • i.e. to match ( write \( and to match period . write \.
Search patterns • has, have, had • not, n’t • ((have)|(had)|(has)) • (( )|(n't)|( not ))* • ((have)|(had)|(has))(( )|(n't)|( not ))*
References • Editplus help pages • http://gnosis.cx/publish/programming/regular_expressions.html • OReilly - Mastering Regular Expressions • Google “regular expression tutorial”