Lecture 3 Graph Representation for Regular Expressions

Lecture 3 Graph Representation for Regular Expressions

digraph (directed graph) • A digraph is a pair of sets (V, E) such that each element of E is an ordered pair of elements in V. • A path is an alternative sequence of vertices and edges such that all edges are in the same direction.

string-labeled digraph • A string-labeled digraph is a digraph in which each edge is labeled by a string. • In a string-labeled digraph, every path is associated with a string which is obtained by concatenating all strings on the path. • This string is called the label of the path.

G(r) • For each regular expression r, we can construct a digraph G(r) with edges labeled by symbols and ε as follows. • If r=Φ, then • If r≠Φ, then

Φ* ε ε

Theorem 1 • G(r) has a property that a string x belongs to r if and only if x is the label of a path from the initial vertex to the final vertex. • Proof is done by induction on r.

Graph Representation • A graph representation of a regular expression r is a string-labeled graph with an initial vertex s and a final vertex f such that a string x belongs to r if and only if x is associated with a path from s to f.

Corollary 2 • For any regular expression r, there exists a string-labeled digraph with two special vertices, a initial vertex s and a final vertex f, such that a string x belongs to r if and only if x is associated with a path from s to f.

Puzzle: If a regular expression r contains u ``+''s, v ``·''s, and w ``*''s, how many ε-edges does G(r) contain? Question: How to reduce the number of ε-edges?

Theorem 3 • An ε-edge (u,v) in G(r) which is a unique out-edge from a nonfinal vertex u or a unique in-edge to a noninitial vertex v can be shrunk to a single vertex. (If one of u and v is the initial vertex or the final vertex, so is the resulting vertex.) • Remark: Shrinking should be done one by one.

Lecture 4 Deterministic Finite Automata (DFA)

tape head Finite Control DFA

e p h b t a l a The tape is divided into finitely many cells. Each cell contains a symbol in an alphabet Σ.

a • The head scans at a cell on the tape and can read a symbol on the cell. In each move, the head can move to the right cell.

The finite control has finitely many states which form a set Q. For each move, the state is changed according to the evaluation of a transition function δ : Q x Σ → Q .

a a • δ(q, a) = p means that if the head reads symbol a and the finite control is in the state q, then the next state should be p, and the head moves one cell to the right. q p

There are some special states: an initial states and a set F of final states. • Initially, the DFA is in the initial state s and the head scans the leftmost cell. The tape holds an input string. s

x • When the head gets off the tape, the DFA stops. An input string x is accepted by the DFA if the DFA stops at a final state. • Otherwise, the input string is rejected. h

The DFA can be represented by M = (Q, Σ, δ, s, F) where Σ is the alphabet of input symbols. • The set of all strings accepted by a DFA M is denoted by L(M). We also say that the language L(M) is accepted by M.

The transition diagram of a DFA is an alternative way to represent the DFA. • For M = (Q, Σ, δ, s, F), the transition diagram of M is a symbol-labeled digraph G=(V, E) satisfying the following: V = Q (s = , f = for f \in F) E = { q p | δ(q, a) = p}. a

δ 0 1 s p s p q s q q q L(M) = (0+1)*00(0+1)*. 1 0, 1 0 0 s p q 1

The transition diagram of the DFA M has the following properties: • For every vertex q and every symbol a, there exists an edge with label a from q. • For each string x, there exists exactly one path starting from the initial state s associated with x. • A string x is accepted by M if and only if this path ends at a final state.

Lecture 3 Graph Representation for Regular Expressions