401 likes | 537 Views
CYK Algorithm & CFL reachability. By - Lohit Krishnan Chetas Mahajan. Outline. CYK Algorithm Background Problem statement. Intuition Terminologies Formal description and example. Background. Named after C ocke, Y ounger K asami. Some fascinating qualities:
E N D
CYK Algorithm & CFL reachability By - Lohit Krishnan Chetas Mahajan
Outline • CYK Algorithm • Background • Problem statement. • Intuition • Terminologies • Formal description and example
Background • Named after Cocke, Younger Kasami. • Some fascinating qualities: • It shows that deciding if s ϵ L(G) is in P for any CNF CFG G. • Uses a “dynamic programming” or “table-filling algorithm” which solves decision problem.
Problem Statement • Given the CFG G : S -> AB | BC A -> BA | a B -> CC | b C -> AB | a • L be the language generated by G. • Is the string “baaba”, a valid member of the L ? • How many substrings of “baaba” are valid members of L ? • How many distinct substrings of the given string are valid members of L ? • How many non-empty substrings of the given string are not valid members of L ? • How many substrings of the given string are only generated by B ?
Problem Statement • Given a context-free grammar G and a string w • G = (V, Σ ,P , S) where • V finite set of variables • Σ (the alphabet) finite set of terminal symbols • P finite set of rules • S start symbol (distinguished element of V) • V and Σ are assumed to be disjoint • G is used to generate the strings of language L • Does w ϵ L(G) ?? (Membership Problem)
Terminology • Let n be the length of the string w. • Partition the given string using n+1 lines. • Number those lines from 0 to n. • Now, we define • xijas the substring of the string w which lies between the lines i and j. (Here i < j). • Tij be the set of non-terminals which generate string xij
Terminology • Grammar : S-> AB | BC A -> BA | a B -> CC | b C -> AB | a • String to be checked is “baaba”. • x13 = aa • x35 = ba • x05 = baaba • T23 = Non-terminals generating x23 (i.e “a”). • T23 = { A, C } 0 1 2 3 4 5 • Build a table T of Tij , 0 ≤ i ≤ n -1 ; • 1 ≤ j ≤ n ; • i < j
Intuition of the algorithm • Tij are the subproblems of Dynamic Programming. • In this problem, we need to decide whether the start symbol belongs in T0n. • Formation of DP: - • T(T1T2) = { X | X->t1t2 and t1 ϵ T1 and t2ϵ T2 } • Tij = U T(TikTkj) j-1 k = i+1
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
0 1 2 3 4 5 S -> AB | BC A -> BA | a B -> CC | b C -> AB | a
Answers • Is the string “baaba”, a valid member of the L ? • Yes !!
Answers • Is the string “baaba”, a valid member of the L ? • Yes !! • How many substrings of “baaba” are valid members of L ? • 5
Answers • Is the string “baaba”, a valid member of the L ? • Yes !! • How many substrings of “baaba” are valid members of L ? • 5 • How many distinct substrings of the given string are valid members of L ? • 4
Answers • Is the string “baaba”, a valid member of the L ? • Yes !! • How many substrings of “baaba” are valid members of L ? • 5 • How many distinct substrings of the given string are valid members of L ? • 4 • How many non-empty substrings of the given string are not valid members of L ? • 15 – 5 = 10
Answers • Is the string “baaba”, a valid member of the L ? • Yes !! • How many substrings of “baaba” are valid members of L ? • 5 • How many distinct substrings of the given string are valid members of L ? • 4 • How many non-empty substrings of the given string are not valid members of L ? • 15 – 5 = 10 • How many substrings of the given string are only generated by B ? • 5
Outline • CFL reachability • Motivation • Problem definition • Variants of CFL Reachability problem • Relation with other Problems • Algorithm • Example
Motivation “Program Analysis via Graph-reachability” By Thomas Reps
Motivation • Program analysis requires extraction of information from a program without actually running it. • Classical data-flow analysis maintains set of “dataflow facts” with each program point. • Program analysis Graph Reachability problem(GRP) • GRP is a special case of CFL Reachability problem.
Problem Definition • Let L be a context-free language over alphabet ∑, and let G be a graph whose edges are labeled with members of ∑. • Each path in G defines a word over ∑*, namely, the word obtained by concatenating, in order, the labels of the edges on the path. A path in G is an L-path if its word is a member of L.
Variants of CFL Reachability Problem • The all-pairs L-path problem. • The single-source L-path problem. • The single-target L-path problem • The single-source/single-target L-path problem. • Other Variants : Multi-source L-path problem, the multi-target L-path problem, and the multi-source/multi-target L-path problem
Example • L be the language that consists of strings of matched parentheses and square brackets, with zero or more e’s inside it. • Only one L-Path : [(e[])eee[e]]
Relation with other problems • Ordinary Graph Reachability Problem • Put all the labels as e, and L = e* • CFL Recognition Problem • “Given a string w and a context-free language L, is w ϵL?” • Create a linear graph s →... → t, that has |w| edges, and label the ith edge with the ith letter of w. • There is an L-path from s to t iff w ϵL.
Algorithm • Normalize the grammar so that the right-hand side of each production has at most two symbols (either terminals or nonterminals). • Add additional edges as shown in the figure below. • A ϵN B, C ϵ (N U T) • Solution can be obtained via edges labelled with Start Symbol of the Grammar.
Example • Grammar : S-> AB | BC A -> BA | a B -> CC | b C -> AB | a • Graph G : • All pair L-Path Problem. b a a b