Syntax-aware Editors for Visual Languages

Syntax-aware Editors for Visual Languages by Mehmud Abliz March 22, 2006

Outlines • Introduction • Motivation • Recognizing visual subsentence with LR-based parses • Subsentence parser for VL • Symbol Prompting • Empirical evaluation

Introduction • Researchers exploited the VL grammar to model the syntax of visual notations • Two Types of editors: freehand editors vs. syntax-directed editors • Freehand editors (suitable for experienced users) • Allow incomplete and incorrect sketches • Insert visual symbols in any order • Syntax-directed editors (suitable for novice users) • Maintain an internal semantic model of the diagram under editing, check consistency at every step • Editing actions leading to inconsistent states are rejected

Introduction cont. • Syntax-aware editors: combines the positive aspects of both approaches above • Syntax-aware editors • Do not prevent users from entering incorrect syntactic states in a sequence • Inform the user when objects are syntactically incorrect • How can we implement it? • Support editing of visual sentences in freehand style • Underlying parsers incrementally analyze the sentence while they’re entered, and provide feedback by highlighting correct and incorrect subsentences • We need to come up with the parsing technique for it

Introduction cont. • The parsing technique • non-deterministic incremental subsentence parsing; Combine two such parser to construct a bi-directional subsentence parser to start parsing process from an arbitrary input symbol

Motivation • Users have superficial knowledge of visual languages • In the presence of visual languages having visual symbols whose syntax and semantics highly depend on the context surrounding them • See the example in next slide

Recognizing visual subsentence with LR-based parses • This part includes • Modeling visual sentence with XPG • Incremental subsentence parsing for VL

Modeling visual sentence with XPG • Visual symbol (vsymbol for short): a graphical object characterized by a physical appearance and a type. • Physical appearance: described through size, color, shape etc. • Type: a set of syntactic or semantic attributes. Their value depend on the “position” of the vsymbol in the sentence • Visual token (or vtoken): instance of vsymbol • Visual alphabet: a set of vsymbol types • Visual sentence (vsentence for short) on visual alphabet S is: a set vsymbols {x1, x2, .., xn} whose types are in S and their syntactic attributes are completely instantiated.

Modeling visual sentence with XPG Data Flow Diagram

Modeling visual sentence with XPG Attribute-based representation of the DFD

Modeling visual sentence with XPG • An XPG is the pair (G, PE), where PE is a Positional Evaluator, and G is a particular type of context-free string attributed grammar (N, T ÈPOS, S, P) where: • N is a finite non-empty set of non-terminal vsymbols • T is a finite non-empty set of terminal vsymbols, with N∩T=Ø • POS is a finite set of binary relation identifiers, with POS∩ N = Ø and POS∩ T = Ø • SєN, denotes the starting vsymbol • P is a finite non-empty set of productions of the following format: A→ x1R1x2R2…xm-1Rm-1xm, Δ, Γ Where A is a non-terminal vsymbol, x1R1x2R2…xm-1Rm-1xm is linear representation with respect to POS where each xi is a vsymbol in NÈT and each Rj is partitioned into 2 subseq

Modeling visual sentence with XPG ( <RELH1J1,…, RELHkJk> , < RELHk+1Jk+1,…, RELHnJn > ) with 1 ≤k≤n Each RELHiJi relates syntactic attributes of xj+1 with syntactic attributes of xj-hi, with 0 ≤ hi ≤ j. The relation identifiers in the first subsequence of an Rj are called driver relations, whereas the ones in the second subsequence are called tester relations. Driver relations are used during syntax analysis to determine the next vsymbol to be scanned, whereas tester relations are used to check whether the last scanned vsymbol is properly related to previously scanned vsymbols.

Modeling visual sentence with XPG • Δ is a set of rules used to synthesize the values of the syntactic attributes of A from those of x1, x2, .., xn. • Γ is a set of triples (Nj, Condj, Δj), j = 1, … , t and t≥0, used to dynamically insert new vsymbols in the input visual sentence during the parsing process. In particular, • Nj is a terminal vsymbol to be inserted in the input visual sentence; • Condj is a pre-condition to be verified in order to insert Nj; • Δj is the rule used to compute the values of the syntactic attributes of Nj from those x1, x2, .., xn.

Modeling visual sentence with XPG • Given the two pairs (x, k) and (y, j), where x єN È T, y єT, k is a syntactic attribute of x, and j is a syntactic attribute of y, we say that (y, j) is reachable from (x, k) iff one of the following situations occurs: • 1. x = y; • 2. there exists a production x → x1R1x2 . . . xi . . .Rm-1xm, Δ, Γin P such that attribute k of x is synthesized from attribute h of xi by means of Δ, and (y, j) is reachable from (xi, h). • If (y, j) is reachable from (x, k) , we also say that y is reachable from x. As an example, given the productions • A → Br1Cr2D,Δ : (A1=C2) • C → c, Δ : (C2=c3) • we have that (c, 3) is reachable from (A, 1).

Modeling visual sentence with XPG • Let DFD be the name of the grammar, the set of non-terminals is • N = {DataFD, Node}, • where each vsymbol has one attaching region as syntactic attribute, and DataFD is the starting vsymbol of DFD, i.e., S = DataFD. • The set of terminals is given by • T= {PROCESS;STORE;ENTITY;FLOW;PLACEHOLD}.

Modeling visual sentence with XPG The terminals for the grammar DFD

Modeling visual sentence with XPG • The set of relations is given by • POS = {LINKh,k, any}, • where the relation identifier any denotes a relation that is always satisfied between any pair of vsymbols, • whereas the relation identifier LINKi,j is defined as: ‘‘a vsymbol x is in relation with a vsymbol y iff attaching point i of x is connected to attaching point j of y’’, and will be denoted as i_j to simplify the notation. Moreover, we use the notation h_k when describing the absence of a connection between two attaching areas h and k.

Modeling visual sentence with XPG

Incremental subsentence parsing for VL • Parsers based on XPGs are an extension of LR parsing, named XpLR parsing. • Disadvantages of common XpLR parsers: • its scanning of the input in a non-sequential way increases the occurrence of parsing conflicts; • the language grammar must be unambiguous and conform to the limitations of the particular table-generation algorithm, which, in many cases, is quite restrictive and requires significant ‘‘grammar-hacking’’. • XpLR parser does not provide any feedback while the user composes a sentence.

Incremental subsentence parsing for VL • In order to give immediate feedback to the user, a visual interactive environment requires the use of fast parsing methods. To this aim, introduce an incremental and generalized version of the XpLR parser for recognizing visual subsentences, namely X-Parser, which is based on the generalized LR parsing (GLR). • GLR parsing is a technique for parsing arbitrarycontext-free grammars that utilizes conventional LR table construction methods.

Incremental subsentence parsing for VL Components of X-parser:

Incremental subsentence parsing for VL Incremental parsing of a data flow diagram

Incremental subsentence parsing for VL • Problem with above parsing: • In fact, the parsing of a DFD with an X-parser starts always looking for a PROCESS vsymbol since, in the grammar, it is the first reachable vsymbol from the starting non-terminal DataFD. This limitation prevents the parser from the possibility to recognize portions of correct sentences, and consequently prevents the editor to assist the user in the sentence composition. • To solve it, parsing algorithms has to be modified

Incremental subsentence parsing for VL

Incremental subsentence parsing for VL • The idea is: use two parsers that proceed in parallel, scanning the input sentence in opposite directions from an arbitrary starting vsymbol. • To this aim, the algorithm creates the XpLR parsing tables for the original XPG grammar G = ((N,TÈPOS, S,P),PE) and for its reverse version rev(G) = ((N,TÈinv(POS), S,P'),PE).

Incremental subsentence parsing for VL

Incremental subsentence parsing for VL • A sentence w is recognized by the bidirectional parser if there exist a backward parser B and a forward parser F such that: • (1) each vsymbol of w is visited by only one of the two parsers, except the starting one that is visited by both, and, • (2) if s1 = xw1 and s2 = xw2 are the subsentences recognized by B and F, respectively, then w = inv(w1)xw2. Notice that x corresponds to the vsymbol from which the parsing starts.

Symbol Prompting • The approach requires • the modification of the parsing table construction algorithms in order to extract from the XPG grammar further static information about the relations among the grammar vsymbols; and • the modification of the parsing algorithm in order to associate dynamic parsing context information to each analyzed vsymbol. • By joining the dynamic and static informationrelated to an edited vsymbol, the parser is able to determine all the possible related vsymbols.

Symbol Prompting • Definition of JSET: • For A production p of the form A→ x1R1x2R2…xm-1Rm-1xm, Δ, Γ for m≥1 • 1. a couple (M.h, N.k) is a link of p if there exists a relation in p that relates the grammar vsymbolsM and N in p through the syntactic attributesh and k, respectively. • 2. a couple (A.h, N.k) is a tie-point of p, denoted by (A.h = N.k), if the syntactic attribute h of A is synthesized from the syntactic attribute k of N, i.e., the value of A.h depends on N.k. • JSET(p) denotes the set of links and tie-points associated to a production p.

Symbol Prompting • JSETs contain both the relations defined in the productions explicitly and the relations obtained by applying the synthesis rules Δ. • The link in production 5 is (Node'.1, FLOW.1); the links in production 2 are (DataFD'.1, FLOW.1) and (FLOW.2, Node.1). The tie-points in production 2 are (DataFD.1 = DataFD'.1) and (PLACEHOLD.1 = Node.1).

Symbol Prompting • The notion of JSET for a production can be extended to the collection of XpLRitem sets. • An XpLR item of an extended positional grammar is • a production without the Δand Γ rules, and with a dot at some position of the right-hand side. However, a dot can never be placed between a relation identifier and the terminal or non-terminal vsymbol to its right.

Symbol Prompting • A → XR1YR2Z, Δ, Γleads to the following four types of XpLR(0) items: • [A → . XR1YR2Z], [A → X . R1YR2Z], [A → XR1Y . R2Z], [A → XR1YR2Z . ] • an item indicates how much of a production has already been examined during the parsing process and what is yet to come. • For instance, the item [DataFD → DataFD’ . <<1_1>, <2_1>> FLOW] means that the non-terminal DataFD’ has already been seen and a FLOW in <<1_1>, <2_1>> with DataFD’ is expected next.

Symbol Prompting • By keeping track of the relations among the grammar vsymbols during the construction of the XpLR itemset collection, we are able to prompt the insertion of correct vsymbols during the editing of a visual sentence. • at each step of the editing process we combine information about the current states (a set of items) of the parsers with therelations of theJSETs associated to such set of items, which yield the symbols that have been recognized to that point and those that can be related to them.

Symbol Prompting • Two strategies of symbol prompting can be supported. • The first one, targeted to unexperienced users of the visual language being considered, exploits the entries in the next column in the parsing table to suggest the possible vsymbols that can be inserted into the vsentence together with their relations. • The second strategy, targeted to grammar designers and to users with deeper knowledge of the visual language, exploits the information in the RelationSets to suggest the vsymbols that can be related to a particular vsymbol in the edited vsentence.

Symbol Prompting

Symbol Prompting • An entry next[k] for a state sk contains the pair (Rdriver, x), which drives the parser in selecting the next vsymbol (derivable from x) by using the sequence of driver relations Rdriver. • next[k] = (Rdriver, x) • We use an array p storing the state of each parser. Thus, given the index of a parser, the algorithm stores in s the state of parser j. Successively, it accesses to the next column and analyzes its entries (Rdriver , x) determining all the vsymbols reachable from such couples. In particular, the set next_set stores all the vsymbols able to satisfy all the relations in Rdriver.

Symbol Prompting • This algorithm has to be executed for both the forward and backward parsers, and the set of vsymbols to be prompted is given by the union of the returned sets.

Symbol Prompting

Empirical evaluation

QUSTIONS?

Syntax-aware Editors for Visual Languages