350 likes | 450 Views
eXVisXML , uma ferramenta emblemática na análise documental. Daniela da Cruz, Pedro Rangel Henriques Departamento de Informática Universidade do Minho. Context. Motivation. Motivation. Motivation. Motivation. Motivation. XML Document Visualization.
E N D
eXVisXML, uma ferramenta emblemática na análise documental Universidade de Aveiro Daniela da Cruz, Pedro Rangel Henriques Departamento de Informática Universidade do Minho
Context Universidade de Aveiro
Motivation Universidade de Aveiro
Motivation Universidade de Aveiro
Motivation Universidade de Aveiro
Motivation Universidade de Aveiro
Motivation Universidade de Aveiro
XML DocumentVisualization • The role of the visualization technology (in PC and SE) is recognized as very fruitful. • The use of SV features allows us to capture a great amount of information in a faster way • Graphical representations cause a positive impact in learning process Universidade de Aveiro
XML DocumentVisualization • Retrieve information from plain documents efficiently IS NOT AN EASY TASK • Machine manipulation: • XSL and other production-systems can easily extract information and transform them • Human manipulation: • It is not as easy as desirable • The annotation is complex / Document is too big Universidade de Aveiro
XML DocumentVisualization • Many tools appear to aid in the visualization of XML documents: • XML Schema Designer (Microsoft) • Xpath Analyzer (Altova) • … Although these tools offer highlighted syntax, and easy manipulation (collapse/expand), their view is a hierarchical and textual. Universidade de Aveiro
Traditional XML DocumentVisualization Universidade de Aveiro
OurProposal for XML DocumentVisualization In this context, we want to get a visualization that makes easier the comprehension process. However, we should take care with the graphical or iconic representations hence it depends on problem domain. Inspired in Alma, the eXVisXML interface for the visual inspection of XML documents is divided into 3 main parts: Universidade de Aveiro
OurProposal for XML DocumentVisualization • One window that displays the source document; • One window exhibiting the textual hierarchy • One window to show the tree associated with the source document (graphical); Universidade de Aveiro
OurProposal for XML DocumentVisualization Universidade de Aveiro
XML DocumentSlicing • Slicing concept appears in 1979, by Weiser. • Its applied to a program considering a slicing criterion (a pair composed by a line number and a set of variables). • The objective is to find the statements that possibly affect those variables. • This technique can be also applied to XML documents. How? Universidade de Aveiro
XML DocumentSlicing • XML document + slicing criterion (a Xpath expression can be regarded as a slicing criterion, but simplified) • A document slice is a new XML document composed by those elements that are strictly necessary to maintain the tree structure. Universidade de Aveiro
XML DocumentSlicing It is proved, by Josep Silva, in Slicing XML documents, that slicing techniques applied to XML and DTD documents produce valid XML and DTD slices with the respect to the slicing criterion. Universidade de Aveiro
XML DocumentSlicing • Given the whole XML document of Romeo and Juliet screenplay and • The slicing criterion Greg the result is: Universidade de Aveiro
XML DocumentsSlicing Universidade de Aveiro
XML DocumentMetrics • Effective management of any process requires quantification, measurement, and modeling. • Software metrics provide a quantitative basis for the development and validation of models of the software development process • Metrics can be used to improve software productivity and quality Universidade de Aveiro
XML DocumentMetrics In the field of XML, quality assessment is also relevant because the approach followed by engineers or end-users, to design the annotation schema or even to markup existent tests, is many times improvised and naïf. Concepts like well-formedness or validity are not sufficient to appraise XML documents. So, a set of metrics were defined to form the basis of the quality measurement of a XML document. Universidade de Aveiro
XML DocumentMetrics • Size • Structure Complexity • Structure Depth • Fan-in / Fan-out • Instability • Tree impurity • Attributes per Element • Non-used components • Text length Universidade de Aveiro
XML DocumentMetrics Sucessor Graph Given a DTD, we say that a new component (element/attribute) is an immediate successor of the element under definition. Then, we introduce an arrow (oriented edge) from the element to the component. Example: < !ELEMENT Item (FileName, Artist?) > <!ELEMENT FileName (#PCDATA)> <!ELEMENT Artist (#PCDATA)> Universidade de Aveiro
Sucessor Graph (RomeoandJulietscreenplay) Universidade de Aveiro
XML DocumentMetrics Size Given a DTD, its size (i.e. the value for this metric) is the total number of nodes in the SG (number of DTD components). Universidade de Aveiro
XML DocumentMetrics Structure complexity Where e is the number of edges in the SG, n is the number of nodes in the SG and n_idref is the number of IDREF attributes. Universidade de Aveiro
XML DocumentMetrics Structure Depth According to Meike Klettke, in Metrics for XML document collections, a SG with a depth much higher than 7 is complex and reveals a bad DTD design. Universidade de Aveiro
XML DocumentMetrics Fan-in / Fan-out For the graph as a whole, the average and the maximum values for those parameters can be useful to spot unusual nodes, which can be inspected to detect the anomaly and fix the problem. Elements with a high Fan-in/Fan-out value are more complex than other elements with a lower value. Universidade de Aveiro
XML DocumentMetrics Instability A node with a low instability allows us to conclude that it is less dependent of other nodes, while many nodes are depend on it. Universidade de Aveiro
XML DocumentMetrics Tree Impurity A tree impurity of 0% means that a graph is a tree and a tree impurity of 100% means that it is a fully connected graph. Universidade de Aveiro
XML DocumentMetrics Attributes per Element The AttrsEle(DTD) metric allows us to figure out the average number of attributes defined per element in the DTD. The AttrsEle(XML) metric, applied directly to the XML document, allows us to figure out the average number of attributes actually used per effective elements present in the XML document. Universidade de Aveiro
XML DocumentMetrics Non-used Components if Attr(DTD) represents the set of attributes defined in the DTD, and Attr(XML) represents the set of actual attributes (the attributes used in the XML document instance), then NonAttr(XML) is the set of non-used attributes. Universidade de Aveiro
XML DocumentMetrics Text Length where, length(PCDATA) computes the total length of the document's text (the sum of the length of all text fragments, i.e., text associated with element tags, or untagged text), and nPCDATA is the number of text fragments (the number of PCDATA leaves that appear in the XML document tree). Universidade de Aveiro
MetricResults(RomeoandJulietscreenplay) Universidade de Aveiro
Conclusion Universidade de Aveiro