320 likes | 445 Views
Technical University of Valencia Computer Science Department. SOFSEM’07 (22/01/2007). A Program Slicing Based Method to Filter XML/DTD documents. Josep F. Silva Galiana. Contents. Motivation Program Slicing XML DTD XSLT
E N D
Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents Josep F. Silva Galiana
Contents • Motivation • Program Slicing • XML • DTD • XSLT • Slicing XML Documents • Example • Implementation • Conclusions & Future Work Program Slicing
Program Slicing • Definition: Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest. • Origin: Originally introduced by Weiser. • Example: (1) read(n); (2) i:=1;(3) sum:=0;(4) product:=1;(5) while (i<=n) do begin(6) sum:=sum+i;(7) product:=product*i;(8) i:=i+1; end;(9) write(sum);(10) write(product); Slicing Criterion = (10, product)
Program Slicing • Definition: Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest. • Origin: Originally introduced by Weiser. • Example: (1) read(n); (2) i:=1;(3) sum:=0;(4) product:=1;(5) while (i<=n) do begin(6) sum:=sum+i;(7) product:=product*i;(8) i:=i+1; end;(9) write(sum);(10) write(product); Slicing Criterion = (10, product)
Program Slicing • Applications: • Debugging • Code understanding • Specialization • etc. • All the applications are based on the Program Dependence Graphs (PDGs) (structure and behaviour of programs) What would happen if Program Slicing was applied to a data structure? Would it be interesting?
Contents • Motivation • Program Slicing • XML • DTD • XSLT • Slicing XML Documents • Example • Implementation • Conclusions & Future Work XML
XML XML (eXtensible Markup Language) • Origin:XML was developed by an XML Working Group formed under the auspices of the World Wide Web Consortium (W3C) in 1996. • Structure: Documents are trees composed by ‘ELEMENTS’ which contain attributes. Example of XML document
XML DTD (Document Type Definition) • Objective:The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. • Structure: Documents are graphs composed by ‘ELEMENTS’. Example of DTD document
XML (eXtensible Markup Language) DTD (Document Type Definition) <PersonalInfo> <Contact> <Status> Professor </Status> <Name> Ryan </Name> <Surname> Gibson <Surname> </Contact> <Teaching> <Subject> <Name> Logic </Name> <Sched> Mon/Wed 16-18 </Sched> <Course> 4-Mathematics </Course> </Subject> <Subject> <Name> Algebra </Name> <Sched> Mon/Tur 11-13 </Sched> <Course> 3-Mathematics </Course> </Subject> … </Teaching> <Research> <Project name = “SysLog’’ year = “2003-2004’’ budget = “16000€’’ /> ... </Research> </PersonalInfo> <!ELEMENT PersonalInfo (Contact, Teaching, Research)> <!ELEMENT Contact (Status, Name, Surname)> <!ELEMENT Status ANY> <!ELEMENT Name ANY> <!ELEMENT Surname ANY> <!ELEMENT Teaching (Subject+)> <!ELEMENT Subject (Name, Sched, Course)> <!ELEMENT Sched ANY> <!ELEMENT Course ANY> <!ELEMENT Research (Project*)> <!ELEMENT Project ANY> <!ATTLIST Project name CDATA #REQUIRED year CDATA #REQUIRED budget CDATA #IMPLIED >
XML XSLT (eXtensible Stylesheet Language Transformations) • Objective:XSLT is a language for transforming XML. • Structure: An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary, such as (X)HTML or XSL-FO • XSLT is a programming language Example of XSLT document (Source Code) Example of XSLT document (Result)
Contents • Motivation • Program Slicing • XML • DTD • XSLT • Slicing XML Documents • Example • Implementation • Conclusions & Future Work Slicing XML Documents
PersonalInfo Contact Teaching Research Professor Gibson Ryan … Subject … Subject Logic Mon/Wed 16-18 4-Mathematics Algebra Mon/Tur 11-13 3-Mathematics Project Syslog 2003-2004 16000 € Slicing XML Documents • We see XML documents and DTDs as trees. <PersonalInfo> <Contact> <Status> Professor </Status> <Name> Ryan </Name> <Surname> Gibson <Surname> </Contact> <Teaching> <Subject> <Name> Logic </Name> <Sched> Mon/Wed 16-18 </Sched> <Course> 4-Mathematics </Course> </Subject> <Subject> <Name> Algebra </Name> <Sched> Mon/Tur 11-13 </Sched> <Course> 3-Mathematics </Course> </Subject> … </Teaching> <Research> <Project name = “SysLog’’ year = “2003-2004’’ budget = “16000€’’ /> ... </Research> </PersonalInfo>
Slicing XML Documents • The Slicing Criterion is composed by a set of nodes in the tree. • For each node in the slicing criterion, we extract from the tree all those nodes that are in the path from the root to the node. Web Page (Original) Web Page (Slice) XML / DTD Forward / Backward
PersonalInfo PersonalInfo Contact Contact Teaching Teaching Research Research Status Status Surname Surname Name Name Subject Subject Project Project Name Name Sched Sched Course Course Name Name Year Year Budget Budget Slicing XML Documents • DTD backward slicing criterion. Web Page (Original) Web Page (Slice) <!ELEMENT PersonalInfo (Contact, Teaching, Research)> <!ELEMENT Contact (Status, Name, Surname)> <!ELEMENT Status ANY> <!ELEMENT Name ANY> <!ELEMENT Surname ANY> <!ELEMENT Teaching (Subject+)> <!ELEMENT Subject (Name, Sched, Course)> <!ELEMENT Sched ANY> <!ELEMENT Course ANY> <!ELEMENT Research (Project*)> <!ELEMENT Project ANY> <!ATTLIST Project name CDATA #REQUIRED year CDATA #REQUIRED budget CDATA #IMPLIED > <!ELEMENT PersonalInfo (Contact, Teaching, Research)> <!ELEMENT Contact (Status, Name, Surname)> <!ELEMENT Status ANY> <!ELEMENT Name ANY> <!ELEMENT Surname ANY> <!ELEMENT Teaching (Subject+)> <!ELEMENT Subject (Name, Sched, Course)> <!ELEMENT Sched ANY> <!ELEMENT Course ANY> <!ELEMENT Research (Project*)> <!ELEMENT Project ANY> <!ATTLIST Project name CDATA #REQUIRED year CDATA #REQUIRED budget CDATA #IMPLIED >
PersonalInfo PersonalInfo Contact Contact Teaching Teaching Research Research Professor Professor Gibson Gibson Ryan Ryan … … Subject Subject … … Subject Subject Logic Logic Mon/Wed 16-18 Mon/Wed 16-18 4-Mathematics 4-Mathematics Algebra Algebra Mon/Tur 11-13 Mon/Tur 11-13 3-Mathematics 3-Mathematics Project Project Syslog Syslog 2003-2004 2003-2004 16000 € 16000 € Slicing XML Documents • XML backward slicing criterion. Web Page (Original) Web Page (Slice) <PersonalInfo> <Contact> <Status> Professor </Status> <Name> Ryan </Name> <Surname> Gibson <Surname> </Contact> <Teaching> <Subject> <Name> Logic </Name> <Sched> Mon/Wed 16-18 </Sched> <Course> 4-Mathematics </Course> </Subject> <Subject> <Name> Algebra </Name> <Sched> Mon/Tur 11-13 </Sched> <Course> 3-Mathematics </Course> </Subject> … </Teaching> <Research> <Project name = “SysLog’’ year = “2003-2004’’ budget = “16000€’’ /> ... </Research> </PersonalInfo> <PersonalInfo> <Contact> <Status> Professor </Status> <Name> Ryan </Name> <Surname> Gibson <Surname> </Contact> <Teaching> <Subject> <Name> Logic </Name> <Sched> Mon/Wed 16-18 </Sched> <Course> 4-Mathematics </Course> </Subject> <Subject> <Name> Algebra </Name> <Sched> Mon/Tur 11-13 </Sched> <Course> 3-Mathematics </Course> </Subject> … </Teaching> <Research> <Project name = “SysLog’’ year = “2003-2004’’ budget = “16000€’’ /> ... </Research> </PersonalInfo>
PersonalInfo Contact Teaching Research Professor Gibson Ryan … Subject … Subject Logic Mon/Wed 16-18 4-Mathematics Algebra Mon/Tur 11-13 3-Mathematics Project Syslog 2003-2004 16000 € Slicing XML Documents • XML backward slicing criterion. Web Page (Original) Web Page (Slice) <PersonalInfo> <Contact> <Status> Professor </Status> <Name> Ryan </Name> <Surname> Gibson <Surname> </Contact> <Teaching> <Subject> <Name> Logic </Name> <Sched> Mon/Wed 16-18 </Sched> <Course> 4-Mathematics </Course> </Subject> <Subject> <Name> Algebra </Name> <Sched> Mon/Tur 11-13 </Sched> <Course> 3-Mathematics </Course> </Subject> … </Teaching> <Research> <Project name = “SysLog’’ year = “2003-2004’’ budget = “16000€’’ /> ... </Research> </PersonalInfo>
Slicing XML Documents • We distinguish between DTDandXML slicing criterions. • XML slicing criterions are more fine-grained than DTD slicing criterions • We distinguish between forward and backward slices(or a combination). Web Page (Original) Web Page (Slice) XML / DTD Forward / Backward
PersonalInfo PersonalInfo Contact Contact Teaching Teaching Research Research Status Status Surname Surname Name Name Subject Subject Project Project Name Name Sched Sched Course Course Name Name Year Year Budget Budget Slicing XML Documents • DTD backward slicing criterion. Web Page (Original) Web Page (Slice) <!ELEMENT PersonalInfo (Contact, Teaching, Research)> <!ELEMENT Contact (Status, Name, Surname)> <!ELEMENT Status ANY> <!ELEMENT Name ANY> <!ELEMENT Surname ANY> <!ELEMENT Teaching (Subject+)> <!ELEMENT Subject (Name, Sched, Course)> <!ELEMENT Sched ANY> <!ELEMENT Course ANY> <!ELEMENT Research (Project*)> <!ELEMENT Project ANY> <!ATTLIST Project name CDATA #REQUIRED year CDATA #REQUIRED budget CDATA #IMPLIED > <!ELEMENT PersonalInfo (Contact, Teaching,Research)> <!ELEMENT Contact (Status, Name, Surname)> <!ELEMENT Status ANY> <!ELEMENT Name ANY> <!ELEMENT Surname ANY> <!ELEMENT Teaching (Subject+)> <!ELEMENT Subject (Name, Sched, Course)> <!ELEMENT Sched ANY> <!ELEMENT Course ANY> <!ELEMENT Research (Project*)> <!ELEMENT Project ANY> <!ATTLIST Project name CDATA #REQUIRED year CDATA #REQUIRED budget CDATA #IMPLIED >
PersonalInfo PersonalInfo Contact Contact Teaching Teaching Research Research Professor Professor Gibson Gibson Ryan Ryan … … Subject Subject … … Subject Subject Logic Logic Mon/Wed 16-18 Mon/Wed 16-18 4-Mathematics 4-Mathematics Algebra Algebra Mon/Tur 11-13 Mon/Tur 11-13 3-Mathematics 3-Mathematics Project Project Syslog Syslog 2003-2004 2003-2004 16000 € 16000 € Slicing XML Documents • XML forward slicing criterion. Web Page (Original) Web Page (Slice) <PersonalInfo> <Contact> <Status> Professor </Status> <Name> Ryan </Name> <Surname> Gibson <Surname> </Contact> <Teaching> <Subject> <Name> Logic </Name> <Sched> Mon/Wed 16-18 </Sched> <Course> 4-Mathematics </Course> </Subject> <Subject> <Name> Algebra </Name> <Sched> Mon/Tur 11-13 </Sched> <Course> 3-Mathematics </Course> </Subject> … </Teaching> <Research> <Project name = “SysLog’’ year = “2003-2004’’ budget = “16000€’’ /> ... </Research> </PersonalInfo> <PersonalInfo> <Contact> <Status> Professor </Status> <Name> Ryan </Name> <Surname> Gibson <Surname> </Contact> <Teaching> <Subject> <Name> Logic </Name> <Sched> Mon/Wed 16-18 </Sched> <Course> 4-Mathematics </Course> </Subject> <Subject> <Name> Algebra </Name> <Sched> Mon/Tur 11-13 </Sched> <Course> 3-Mathematics </Course> </Subject> … </Teaching> <Research> <Project name = “SysLog’’ year = “2003-2004’’ budget = “16000€’’ /> ... </Research> </PersonalInfo>
PersonalInfo PersonalInfo Contact Contact Teaching Teaching Research Research Professor Professor Gibson Gibson Ryan Ryan … … Subject Subject … … Subject Subject Logic Logic Mon/Wed 16-18 Mon/Wed 16-18 4-Mathematics 4-Mathematics Algebra Algebra Mon/Tur 11-13 Mon/Tur 11-13 3-Mathematics 3-Mathematics Project Project Syslog Syslog 2003-2004 2003-2004 16000 € 16000 € Slicing XML Documents Web Page (Original) Web Page (Slice) • XML backward-forward slicing criterion. <PersonalInfo> <Contact> <Status> Professor </Status> <Name> Ryan </Name> <Surname> Gibson <Surname> </Contact> <Teaching> <Subject> <Name> Logic </Name> <Sched> Mon/Wed 16-18 </Sched> <Course> 4-Mathematics </Course> </Subject> <Subject> <Name> Algebra </Name> <Sched> Mon/Tur 11-13 </Sched> <Course> 3-Mathematics </Course> </Subject> … </Teaching> <Research> <Project name = “SysLog’’ year = “2003-2004’’ budget = “16000€’’ /> ... </Research> </PersonalInfo> <PersonalInfo> <Contact> <Status> Professor </Status> <Name> Ryan </Name> <Surname> Gibson <Surname> </Contact> <Teaching> <Subject> <Name> Logic </Name> <Sched> Mon/Wed 16-18 </Sched> <Course> 4-Mathematics </Course> </Subject> <Subject> <Name> Algebra </Name> <Sched> Mon/Tur 11-13 </Sched> <Course> 3-Mathematics </Course> </Subject> … </Teaching> <Research> <Project name = “SysLog’’ year = “2003-2004’’ budget = “16000€’’ /> ... </Research> </PersonalInfo>
Slicing XML Documents • What happens with DTDs? Slices are well-formed, but are they valid? • For each XML slice we produce a DTD slice and viceversa • We guarantee that XML slices are valid with respect to DTD slices. DTD document DTD Slice document Slicer XML document XML Slice document Slicing Criterion
Slicing XML Documents • A simple slicing algorithm
Slicing XML Documents • In the case of a DTD criterion composed by a set of positions C = {p1…pn} Pos(D), the algorithm would be the same, except that the first loop would be: • For each v1.v2.(…).vn C do • V’ := V’ {v1, v1.v2, …, v1.v2.(…).vn}; • W’ := W’ {v1|i.v2|j.(…).vn|k} • Where v1.v2.(…).vn v’ and v1|i.v2|j.(…).vn|k X • Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
Slicing XML Documents The following theorem states the correctness of the technique: Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D. Given a slice D’ of D and a slice X’ of X computed with an XML slicing criterion C, and given a slice D’’ of D and a slice X’’ of X computed with a DTD slicing criterion C’, then a) D’ is well-formedand X’ is validwith respect to D’ b) D’’ is well-formed and X’’ is valid with respect to D’’ If all the elements in C are of one of the types in C’, then c) D’ = D’’ d) X’is a subtree of X’’
Contents • Motivation • Program Slicing • XML • DTD • XSLT • Slicing XML Documents • Example • Implementation • Conclusions & Future Work Implementation
Implementation We have implemented a prototype in Haskell. Haskell provides us a formal basis with many advantages for the manipulation of XML documents. - The HaXml library. It allows us to automatically translate XML or HTML documents into a Haskell representation. In particular, we use the following data structures that can represent any XML/HTML document: data Element = Elem Name [Attribute] [Content] data Attribute = (Name, Value) data Content = CElem Element | CText String
XML XML WebPage WebPage XSLT XSLT (Presentation) (Presentation) (Data) (Data) Implementation From XML slices to Webpage slices
Implementation XSLT Implementation Guidelines XSLT documents must generate the information and the presentation elements under the same conditions (i.e., the former is generated if and only if the later is generated). Both the XML data and the presentation labels are generated together. This does not imposes any restriction on the power of XSLT, since the same webpages can be generated. On the contrary, this way of programming forces the programmer to build transformations that can be easily reused and maintained, because both the information and presentation data depending on the same condition are put together.
Implementation XSLT Implementation Guidelines
Implementation The implementation, some examples and other material is publicly available at: www.dsic.upv.es/~jsilva/xml
Contents • Motivation • Program Slicing • XML • DTD • XSLT • Slicing XML Documents • Example • Implementation • Conclusions & Future Work Conclusions & Future Work
Conclusions • We proposed the application of program slicing techniques to XML data structures • We defined an algorithm to slice XML and DTD documents • XML and DTD slices that are well-formed and valid • Previous slicers can be used with a modest implementation effort • Slicing Web Pages • The slicer can use XSLT in order to slice webpages • We proposed some guidelines to generate XSLT files • Future Work • Migration to XML Schema • New implementation based on XQuery