1 / 28

Semi-Structured Data and XML

Semi-Structured Data and XML. Agenda. Semi-Structured Data XML. Semi-Structured Data: an Introduction. What is structured data What is non-structured data What is semi-structured data How is semi-structured data represented? What can we do with semi-structured data?.

mira
Download Presentation

Semi-Structured Data and XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Structured Data and XML Jacob (Jack) Gryn - Presented November 28, 2002

  2. Agenda • Semi-Structured Data • XML Jacob (Jack) Gryn - Presented November 28, 2002

  3. Semi-Structured Data: an Introduction • What is structured data • What is non-structured data • What is semi-structured data • How is semi-structured data represented? • What can we do with semi-structured data? Jacob (Jack) Gryn - Presented November 28, 2002

  4. What is Structured Data? • Strongly typed variables/attributes (ie. int, float, string[20]) • Every attribute in a relation is defined for all records • Data is represented in some organized fashion Jacob (Jack) Gryn - Presented November 28, 2002

  5. An Example of Structured Data A relational database can be considered structured data Jacob (Jack) Gryn - Presented November 28, 2002

  6. What is Non-Structured Data? • Data that has no type definitions • Data is not organized according to any pattern • No concept of variables or attributes Jacob (Jack) Gryn - Presented November 28, 2002

  7. An Example of Non-Structured Data “Bob was born sometime in August of 1949. He has a reasonable salary of 52000. Someone else was born on the 12th of a different month, his name is Bill. By the way, Bob was born on the 13th of August.” As you can see, such data would be almost impossible to have a computer automatically parse. Jacob (Jack) Gryn - Presented November 28, 2002

  8. Then what is Semi-Structured Data? Anything in between structured and non-structured data! Jacob (Jack) Gryn - Presented November 28, 2002

  9. Then what is Semi-Structured Data? • Everything in between structured and non-structured data • Variables are loosely typed • x=1 is valid, so is x=“hello” • A record does not need to have all attributes defined • ie. In a database of cars, if we don’t know the engine type, we can choose not to define the field for tha particular record. Whereas in a structured database, the attribute would be defined, but set to NULL. • An attribute of a record could be another record • It does not necessarily have to differentiate between an identifier and a value Jacob (Jack) Gryn - Presented November 28, 2002

  10. So how is semi-structured data represented? Semi-Structured data can be represented as a tree Jacob (Jack) Gryn - Presented November 28, 2002

  11. So how is semi-structured data represented? Semi-Structured data can be represented in the form of indented text: Bob Birthday 1949 August 13 Salary $52,000 Bill Birthday 1967 April Jacob (Jack) Gryn - Presented November 28, 2002

  12. So how is semi-structured data represented? Semi-Structured data can be represented as a markup language:(ie. HTML, XML, LISP, AceDB, Tsimmis) <employee id=”3”> <name>Bob</name> <extension>5513</extension> <department>Sales</department> <salary>45000</salary> </employee> <employee id=”1”> <name>Ed</name> <extension>6766</extension> <office>312</office> <department>Executive</department> <salary>Confidential</salary> <employee> Jacob (Jack) Gryn - Presented November 28, 2002

  13. Overview • Semi-Structured data is not necessarily created with the intention of being processed. • ie. Web pages are not necessarily intended to be queried by a language like SQL; the web designer, not taking this into consideration may not make it easy for the data to be processed by a machine. Jacob (Jack) Gryn - Presented November 28, 2002

  14. What can we do with Semi-Structured Data? • Since there is some structure, it can be scanned and parsed • Once the data is parsed, we can query it using specialized query languages such as UnQL, GEXT and Lorel • We can “clean it up” to be placed into a structured relational database Jacob (Jack) Gryn - Presented November 28, 2002

  15. XML: an Introduction to XML • What is XML? • What does it offer to creators of DB’s? • How can XML be used as a DB? • Representations of XML • Other features of XML • Disadvantages to XML Jacob (Jack) Gryn - Presented November 28, 2002

  16. Summary / Key Points of Semi-Structured data • In between structured and non-structured data • Loosely typed attributes • Not all attributes need to be defined for every record • Can be parsed and queried Jacob (Jack) Gryn - Presented November 28, 2002

  17. What is XML? • XML stands for eXtensible Markup Language • Based on tags similar to HTML • Actually, XHTML is a form of XML • Used to define markup languages Jacob (Jack) Gryn - Presented November 28, 2002

  18. What does XML offer to database designers? • Readable by humans using Unicode or ASCII text • Easy for computers to parse • Can easily be used as ‘back-end’ for web sites Jacob (Jack) Gryn - Presented November 28, 2002

  19. How can XML be used as a database? Consider the following data: <employee id=”3”> <name>Bob</name> <extension>5513</extension> <department>Sales</department> <salary>45000</salary> </employee> <employee id=”1”> <name>Ed</name> <extension>6766</extension> <office>312</office> <department>Executive</department> <salary>Confidential</salary> <employee> It can be written in XML as follows: Notice that this is semi-structured data, since not all the fields are filled in and because they are loosely typed. Jacob (Jack) Gryn - Presented November 28, 2002

  20. In XML, there are few restrictions to how data can be laid out • The tag names can represent either attribute names or data itself • Tag names can be defined to anything the creator wishes Jacob (Jack) Gryn - Presented November 28, 2002

  21. But, there are still a few restrictions • Every tag that is opened, must be closed. • <name>Bob</name> • Close tag is not needed for empty data • <myelement/> • If one tag is opened inside the field of another tag, it must be closed before the outer tag is closed. • <employee><name>Bob</employee></name> • <employee><name>Bob></name></employee> • Tags are case sensitive Jacob (Jack) Gryn - Presented November 28, 2002

  22. How can XML be represented? • As a tree structure • As text/markup tags Jacob (Jack) Gryn - Presented November 28, 2002

  23. How can XML be represented? As a tree structure: Take our previous example: • Leaf nodes generally, but do not necessarily store the data • Recent web browsers will show this structure Jacob (Jack) Gryn - Presented November 28, 2002

  24. How can XML be represented? As a text/markup language: Take our previous example: <employee id=”3”> <name>Bob</name> <extension>5513</extension> <department>Sales</department> <salary>45000</salary> </employee> <employee id=”1”> <name>Ed</name> <extension>6766</extension> <office>312</office> <department>Executive</department> <salary>Confidential</salary> <employee> Jacob (Jack) Gryn - Presented November 28, 2002

  25. Other features of XML • It is easy to parse • It can be queried like a database • It can be used with XSL Templates to easily generate web pages from data • It can be used with DTS (Document Type Definition) to run as a fully structured database Jacob (Jack) Gryn - Presented November 28, 2002

  26. Disadvantages to XML • Difficult create indexes on • Difficult to optimize queries • Requires additional disk space • Text format • Redundant data in tags • No single standard of how data should be stored in XML Jacob (Jack) Gryn - Presented November 28, 2002

  27. Summary / Key points of XML • Data stored using text-based markup language • Can also be represented in tree format • Can store structured and semi-structured data • Easy to parse and query, but inefficient Jacob (Jack) Gryn - Presented November 28, 2002

  28. Where to Get More Information • Search the web, you’ll find something! Jacob (Jack) Gryn - Presented November 28, 2002

More Related