130 likes | 235 Views
CS4221 Presentation. Presentation Group P O8. P08 – XML Semi Structure Extractor. Project XML Semi Structure Extractor Project Members: Tran Duy Thien , A0096031M Nguyen Thi Mai Huong , A0075106M Truong Hoang Phuoc , A0074527B Daniyar Kosmukhanbetov , A0075100Y. XML.
E N D
CS4221 Presentation Presentation Group P O8
P08 – XML Semi Structure Extractor • Project XML Semi Structure Extractor • Project Members: • Tran DuyThien, A0096031M • Nguyen Thi Mai Huong, A0075106M • Truong Hoang Phuoc, A0074527B • DaniyarKosmukhanbetov, A0075100Y
XML • While HTML is for presentation, XML is for data. • Semi-structured • User-defined tags • HTML tag-liked tree structure • Created & consumed by application • XML vs. RDBMS • More flexible structure • Can change schema easier
DTD & XML Schema • Methods to capture the semi-structure of XML • DTD is part of original XML specification • XML Schema (XSD) provides a more detailed and powerful ways to capture the structure of XML. • But often criticized by its complexity.
Web based application and its 3 tier architecture • Web Based Application: client-server application • 3-tier architecture • Presentation Layer • Business Layer • Services Layer
Presentation Layer • UI content built from Facelets (XHTML), CSS and JS. • Configuration Items • Web.xml: configures the application settings and contexts • Faces-config.xml: specific for Facelets Platform like JSF • Persistence.xml: database configuration • Data is inputted from user and captured in various JSF components.
Business Layer • Map data transfer from UI level to programming level items. • Handle business logic, processing data passed from UI level. • Handle File Upload and File Download. • Managed beans, and their components • Invoke services from Service Layer to perform operations. • Handle exception, and their message and pass back to UI
Service Layer • Also known as Data Layer. • Provide services to the Business Layer. • Handle data processing, writing to and reading from backend data storage. • Contain logic to process low-level data form. • Exceptions are thrown to Business Layer.
Storing XML • XML file is upload to server. • XML data and structure is broken down and store in DOM-based data structure. • Front end JavaScript ensure document must be XML type • Back end logic enforce well-formed and valid XML documents.
Analyze and write DTD • Having main data structures: TreeMap and Stack • Parsing using XMLReader. • Writing process followed the Tree structure to navigate XML tag elements. • Writing element followed by its list of attributes. • Schema is stored on server and download link is displayed to user.
Analyze and write XSD • Document tree is built using XOM (XML Object Model) • Elements are processed recursively, using the document tree. • Attempted to catch ID/IDREF relations under the “foreignrelation” attribute
Explicit vs. Implicit Relationship • Although the application can capture Foreign Key Relationship via ID and IDREF(s), and even marked their relationships. • Implicit relationships are much more difficult to discovered. • Require understanding of the semantics and role of each entity. • Matrix of relationship between entities. • Due to time constraint, this can be considered as future improvement.