1 / 27

Mastering Regular Expressions for Effective Text Processing

Learn the fundamentals of regular expressions with grep command usage, various operators, extended expressions, exercises, and references for efficient string processing in Linux.

Download Presentation

Mastering Regular Expressions for Effective Text Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCNU Linux User Group 2010 <Regular Expression> 王惟綸(Wei-Lun Wang) 2010/07/07

  2. Outline • What’s a Regular Expression? • The Purpose • What’s grep? • Various Operators • Extended Regular Expressions • Exercises • References

  3. What’s a Regular Expression? • A regular expression is a pattern that describes a set of strings. • ExamplesX[2-7] = {X2, X3, X4, X5, X6, X7} T[ae]ste? = {Taste, Tast, Teste, Test}

  4. The Purpose • The regular expression is used to process strings. It makes users easily do searching, replacement, and deletion though the aid of special characters. • T[ae]ste? = {Taste, Tast, Teste, Test} -- These four strings, Taste, Tast, Teste, and Test, can be found out by only searching the pattern “T[ae]ste?”.

  5. What’s grep? • global regular expression print • The grep command searches for the pattern specified by the Pattern parameter and writes each matching line to standard output. [-i ] : ignore the type of upper and lower cases [-v] : inverse the output

  6. alias & unalias

  7. Various Operators • [ ] presents any one character among those characters inside. • [ - ] presents any one character among the code range. • [^ ] represents the characters not in the range of a list. • ^ Matches the empty string at the beginning of a line. • $ Matches the empty string at the end of a line. • . Matches any single character. • * The preceding item will be matched zero or more times.

  8. 1. [ ] presents any one character among those characters inside. th[ei] = {the, thi}

  9. 2. [ - ] presents any one character among the code range. LANG=C :0 1 2 3 4 ... A B C D ... Z a b c d ...z LANG=zh_TW.Big5 :0 1 2 3 4 ... a A b B c C d D ... z Z

  10. 2. [ - ] presents any one character among the code range. LANG=C :0 1 2 3 4 ... A B C D ... Z a b c d ...z LANG=zh_TW.Big5 :0 1 2 3 4 ... a A b B c C d D ... z Z

  11. 2. [ - ] presents any one character among the code range.

  12. 3. [^] represents the characters not in the range of a list.

  13. 4. ^ Matches the empty string at the beginning of a line.

  14. 5. $ Matches the empty string at the end of a line.

  15. 6. . Matches any single character.

  16. 7. * The preceding item will be matched zero or more times. go* = {g, go, goo, gooo, …} goo* = {go, goo, gooo, …}

  17. Extended Regular Expressions • In basic regular expressions the metacharacters "?", "+", "{", "|", "(", and ")" lose their special meaning; instead use the backslashed versions "\?", "\+", "\{", "\|", "\(", and "\)". • Using grep -E or egrep instead of grep. • + The preceding item will be matched one or more times. • ? The preceding item will be matched zero or one time. • | represents the preceding item or the following item. • ( ) represents group strings. • {N} The preceding item is matched exactly N times. • {N, } The preceding item is matched N or more times. • {N,M} The preceding item is matched at least N times, but not more than M times.

  18. 1. + The preceding item will be matched one or more times. goo+ = {goo, gooo, goooo, …}

  19. 2. ? The preceding item will be matched zero or one time. goog? = {goog, goo}

  20. 3. | represents the preceding item or the following item. goo|fav = {goo, fav}

  21. 4. ( ) represents group strings. f(oo|ee)d = {food, feed}

  22. 5. {N} The preceding item is matched exactly N times. go\{2\} = {goo} go\{5\} = {gooooo}

  23. 6. {N, } The preceding item is matched N or more times.

  24. 7. {N,M} The preceding item is matched at least N times, but not more than M times. go\{2,5\}g = {goog, gooog, goooog, gooooog}

  25. Exercises • What does grep -n '^[^A-z] ' mean? • How to find out empty lines? • How to find out “[LUG2010]”? • Find all files and their contents containing the symbol “*” under /etc

  26. References • http://linux.vbird.org/linux_basic/0330regularex.php • http://tldp.org/LDP/Bash-Beginners-Guide/html/chap_04.html • http://en.wikipedia.org/wiki/Regular_expression • http://www.regular-expressions.info/posix.html

More Related