1 / 12

Lecture 5 Regular Expressions

Lecture 5 Regular Expressions. CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine. Lecture Introduction. Reading Rosen - Section 13.4 ( pages 879 - 880 ) Review of Strings Regular Expressions. Review of Strings.

Download Presentation

Lecture 5 Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5Regular Expressions CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine

  2. Lecture Introduction • Reading • Rosen - Section 13.4 (pages 879 - 880) • Review of Strings • Regular Expressions CSCI 1900

  3. Review of Strings • Recall: String: a sequence of letters or symbols written without commas • Example: • The sequences of characters : W, a, k, e, , u, p • Is represented by the string “Wake up” • Another example • a, b, a, b, a, b, a, … is a sequence, i.e., “abababa…” is a string • The corresponding set is {a, b} CSCI 1900

  4. Strings and Regular Expressions • Given a set A, A*is the set of all finite sequences of elements of A • Example: • A = alphabet = {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z} • A* = words (the finite sequences, from A, written without commas) • A* contains all possible words, even those that are unpronounceable or make no sense such as “prsartkc” • The empty sequence or empty string is represented with  CSCI 1900

  5. Catenation • Two strings may be joined into a single string • Assume w1 = s1s2s3s4…sn and w2 = t1t2t3t4…tk • The catenation of w1 with w2 is the sequence s1s2s3s4…snt1t2t3t4…tk • Notation: catenation of w1 with w2 is written as w1w2 or w1w2, • Example • w1 = Bat w2 = woman • w1w2= Batwoman CSCI 1900

  6. Catenation (cont) • In many computer languages, the | pipe symbol or + is usually used to denote catenation • Sometimes catenation is referred to as concatenation CSCI 1900

  7. Some Properties of Catenation • If w1,w2 are elements of A*, then w1w2 is an element of A* • w = w and w = w where  is the null string • A subset B of A* has its own set B* which contains sentences made up from the words of A • For example:B = {Kirk, Spock, Flies, Runs, Well, Ship} is a subset of A* where A = Latin alphabetThe string “KirkRunsWell” is an element of B* CSCI 1900

  8. Regular Expressions • The following is from http://etext.lib.virginia.edu/helpsheets/regex.html:"Regular expressions trace back to the work of an American mathematician by the name of Stephen Kleene (one of the most influential figures in the development of theoretical computer science) who developed regular expressions as a notation for describing what he called 'the algebra of regular sets.' His work eventually found its way into some early efforts with computational search algorithms, and from there to some of the earliest text-manipulation tools on the Unix platform (including ed and grep). In the context of computer searches, the '*' is formally known as a 'Kleene star.'“ CSCI 1900

  9. Regular Expressions (cont) • A regular expression on a set A is a recursive formula for a sequence • A regular expression consists of • The elements of A, • And the symbols ( , ) , , * ,  • These symbols have the following interpretations • ( and ) are grouping symbols •  is the OR symbol • * means zero or more catenations •  is the null string CSCI 1900

  10. Regular Expressions (cont) • An expression is regular if it can be constructed according to the following five rules • The symbol  is a regular expression (RE1) • If x A, the symbol x is a regular expression (RE2) • If  and  are regular expressions, then the expression  is regular (RE3) • If  and  are regular expressions, then the expression () is regular (RE4) • If  is a regular expression, then the expression ()* is regular (RE5) CSCI 1900

  11. Regular Expressions (cont) • A regular expression over A corresponds to a subset of A* • This is called a regular subset of A* or just regular set • These subsets are built based on the rules corresponding to the previous five rules CSCI 1900

  12. Key Concepts Summary • Review of Strings • Regular Expressions CSCI 1900

More Related