120 likes | 219 Views
Lecture 5 Regular Expressions. CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine. Lecture Introduction. Reading Rosen - Section 13.4 ( pages 879 - 880 ) Review of Strings Regular Expressions. Review of Strings.
E N D
Lecture 5Regular Expressions CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine
Lecture Introduction • Reading • Rosen - Section 13.4 (pages 879 - 880) • Review of Strings • Regular Expressions CSCI 1900
Review of Strings • Recall: String: a sequence of letters or symbols written without commas • Example: • The sequences of characters : W, a, k, e, , u, p • Is represented by the string “Wake up” • Another example • a, b, a, b, a, b, a, … is a sequence, i.e., “abababa…” is a string • The corresponding set is {a, b} CSCI 1900
Strings and Regular Expressions • Given a set A, A*is the set of all finite sequences of elements of A • Example: • A = alphabet = {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z} • A* = words (the finite sequences, from A, written without commas) • A* contains all possible words, even those that are unpronounceable or make no sense such as “prsartkc” • The empty sequence or empty string is represented with CSCI 1900
Catenation • Two strings may be joined into a single string • Assume w1 = s1s2s3s4…sn and w2 = t1t2t3t4…tk • The catenation of w1 with w2 is the sequence s1s2s3s4…snt1t2t3t4…tk • Notation: catenation of w1 with w2 is written as w1w2 or w1w2, • Example • w1 = Bat w2 = woman • w1w2= Batwoman CSCI 1900
Catenation (cont) • In many computer languages, the | pipe symbol or + is usually used to denote catenation • Sometimes catenation is referred to as concatenation CSCI 1900
Some Properties of Catenation • If w1,w2 are elements of A*, then w1w2 is an element of A* • w = w and w = w where is the null string • A subset B of A* has its own set B* which contains sentences made up from the words of A • For example:B = {Kirk, Spock, Flies, Runs, Well, Ship} is a subset of A* where A = Latin alphabetThe string “KirkRunsWell” is an element of B* CSCI 1900
Regular Expressions • The following is from http://etext.lib.virginia.edu/helpsheets/regex.html:"Regular expressions trace back to the work of an American mathematician by the name of Stephen Kleene (one of the most influential figures in the development of theoretical computer science) who developed regular expressions as a notation for describing what he called 'the algebra of regular sets.' His work eventually found its way into some early efforts with computational search algorithms, and from there to some of the earliest text-manipulation tools on the Unix platform (including ed and grep). In the context of computer searches, the '*' is formally known as a 'Kleene star.'“ CSCI 1900
Regular Expressions (cont) • A regular expression on a set A is a recursive formula for a sequence • A regular expression consists of • The elements of A, • And the symbols ( , ) , , * , • These symbols have the following interpretations • ( and ) are grouping symbols • is the OR symbol • * means zero or more catenations • is the null string CSCI 1900
Regular Expressions (cont) • An expression is regular if it can be constructed according to the following five rules • The symbol is a regular expression (RE1) • If x A, the symbol x is a regular expression (RE2) • If and are regular expressions, then the expression is regular (RE3) • If and are regular expressions, then the expression () is regular (RE4) • If is a regular expression, then the expression ()* is regular (RE5) CSCI 1900
Regular Expressions (cont) • A regular expression over A corresponds to a subset of A* • This is called a regular subset of A* or just regular set • These subsets are built based on the rules corresponding to the previous five rules CSCI 1900
Key Concepts Summary • Review of Strings • Regular Expressions CSCI 1900