1 / 8

CS 311 – Lecture 06 Outline

CS 311 – Lecture 06 Outline. Regular Expressions Worksheet 1 practice. What is Regexp?. A regular expression is a pattern describing a certain amount of text Used to describe search patterns Useful To extract information from log files To generate reports. Defining regex.

amato
Download Presentation

CS 311 – Lecture 06 Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 311 – Lecture 06 Outline • Regular Expressions • Worksheet 1 practice. CS 311 - Operating Systems I

  2. What is Regexp? • A regular expression is a pattern describing a certain amount of text • Used to describe search patterns • Useful • To extract information from log files • To generate reports CS 311 - Operating Systems I

  3. Defining regex • Literal characters and meta-characters are used to describe regex patterns. • Example of literal characters are any alphabet, numbers, symbols. • Example of meta-characters are • [] – define character sets and range • () – grouping patterns and backreferencing • \ - Escape Sequence • ^ - Matches the beginning of a string (or negation when inside []) • $ - Matches the end of a string • | - OR operation • ? – matches 0 or 1 character • * - matches 0 or more characters • + - matches 1 or more characters • . – Matches any character (except newline “\n”) CS 311 - Operating Systems I

  4. Regex Rules • Regex returns the longest possible match. • Usual meta-characters like +,*,.,?,\,|,$ are regular characters inside []. • All characters inside \Q..\E are interpreted as literal characters. CS 311 - Operating Systems I

  5. More meta-characters • \A and \Z – Matches the start and end of line without considering newline (\n). • \b..\b – word boundary to perform whole word search • \d – same as [0-9] (digits) • \w – same as [0-9a-zA-Z_] (alpha-numeric) • {m,n} – repetition (a pattern may occur m to n times) Example: [0-9]{2,3} matches digits that occurs two or three times like 45, 100, etc. [0-9]{2} matches two digits exactly. CS 311 - Operating Systems I

  6. Backreferencing • Patterns within () are captured and stored by the regex engine. • We can access such stored pattern results using \1, \2, \3, etc. • Example: \d([:-])\d\1\d will match 1-2-3 or 4:9:8 and does not match 4-8:9 etc. (\w)([:-])\1\2\w will match A-A-y or 0:0:g etc and does not match A-B:C etc. CS 311 - Operating Systems I

  7. Example • String : yabadabadoo • Pattern • y.*ba – matches yabadaba • a.*da – matches abada • ^b – does not match anything • ^.ba – matches yabadaba • [abd] – matches a,b,a,d,a,b,a,d CS 311 - Operating Systems I

  8. Lets solve the worksheet! CS 311 - Operating Systems I

More Related