1 / 48

Models and languages for semistructured data

Explore data models and query languages for semistructured data, bridging documents and databases. Learn about relational databases, object databases, XML, and embedded query languages. Dive into types, assumptions, SSD expressions, path expressions, label patterns, and label variables for efficient data handling. Engage in exercises and examples to deepen your understanding of semistructured data modeling and querying.

eldridgec
Download Presentation

Models and languages for semistructured data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Models and languages forsemistructured data Bridging documents and databases

  2. Lectures 1. Introduction to data models 2. Query languages for relational databases 3. Models and query languages for object databases 4. Models and query languages for semistructured data, XML 5. Embedded query languages 6. Guest lecture on Object Role Modelling

  3. Why do we like types? • Types facilitate understanding • Types enable compact representations • Types enable query optimisation • Types facilitate consistency enforcement

  4. Background assumptions fortyped data • Data stable over time • Organisational body to control data • Exercise: Give an example of a context where these assumptions do not hold

  5. Semistructured data Semistructured data is schemaless and self describing The data and the description of the data are integrated

  6. name tel email 112233 “john@123.edu” first last “John” “Smith” An example {name: {first: “John”, last: “Smith”}, tel: 112233, email: “john@123.edu”}

  7. person person child &o1 &o2 name age name age “Eva” 40 “Abel” 20 Another example {person: &o1{name: “Eva”, age: 40, child: &o2}, person: &o2{name: “Abel”, age: 20}} An object identifier, such as &o1, before a structure, binds the object identifier to the identity of that structure. The object identifier can then be used to refer to the structure.

  8. Value Label Objectidentifier Terminology The following is an ssd-expression: &o1{name: “Eva”, age: 40, child: &o2}

  9. A database author Crick DNA spiral n1 author Wallace 1956 paper title date Origin 1848 Darwin author biblio book db n2 title date book Kapital 1860 Marx author ……. n3 title date

  10. Path expressions A path expression is a sequence of labels: l1.l2…ln A path expression results in a set of nodes Path properties are specified by regular expressions on two levels: on the alphabet of labels and on the alphabet of characters that comprise labels

  11. A path expression author Crick DNA spiral biblio.book.author n1 author Wallace 1956 paper title date Origin 1848 Darwin author biblio book db n2 title date book Kapital 1860 Marx author ……. n3 title date

  12. A path expression author Crick DNA spiral biblio.(book l paper).author n1 author Wallace 1956 paper title date Origin 1848 Darwin author biblio book db n2 title date book Kapital 1860 Marx author ……. n3 title date

  13. Examples of path expressions • biblio.book.author - authors of books • biblio.paper.author - authors of papers • biblio.(book l paper).author - authors of books or papers • biblio._.author - authors of anything • biblio._*.author - nodes at the ends of paths starting with biblio, ending with author, and having an arbitrary sequence of labels between

  14. Example of a label pattern • ((b l B)ook l (a l A)uthor) (s)? - book, Book, author, Author, books, Books, authors, Authors

  15. An exercise biblio._*.author.(“[s l S]ection”) Which ones of the following paths match the path expression above? 1. Biblio.author.Section 2. Biblio.cat.rat.hat.author.section 3. Biblio.author 4. Biblio.cat.author.section.Section

  16. A simple query Select author: X from biblio.book.author X Result: {author: “Darwin”, author: “Marx”}

  17. A query with a condition select row: X from biblio._ X where “Crick” in X.author Result: {row: {author: “Crick”, author: “Wallace”, date: 1956, title: “The spiral DNA”}, …}

  18. Two exercises select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z select row: {author: Y, date: Z} from biblio.book X, X.author Y, X.date Z

  19. A database select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z author Crick DNA spiral n1 author Wallace 1956 paper title date Origin 1848 Darwin author biblio book db n2 title date book Kapital 1860 Marx author ……. n3 title date

  20. A database author Crick DNA spiral n1 author Wallace 1956 paper title date Origin 1848 Darwin author biblio book db n2 title date book Kapital 1860 Marx author ……. n3 title date

  21. Nested queries select row: (select author: Y from X.author Y) from biblio.book X

  22. Three exercises • Which authors have written a book or a paper in 1992? • Which authors have written a book together with Jones? • Which authors have written both a book and a paper?

  23. Expressing relations r1 r2 a b c b d e 1 2 3 1 1 3 3 2 2 3 4 2 4 3 1 2 3 1 { r1: { row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2} }, r2: { row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2} } }

  24. Expressing relational joins select a: A, d: D from r1.row X r2.row Y X.a A, X.b B, Y.b B’, Y.d D where B = B’

  25. Label variables Label variable select L: X from biblio._*.L X where matches(“.*Shakespeare.*”, X) Macbeth 1622 Shakespeare author biblio book db n2 title date book Best of Shakespeare 1992 Smith author ……. n3 title date

  26. Label variables select L: X from biblio._*.L X where matches(“.*Shakespeare.*”, X) {author: “Shakespeare”, title: “Best of Shakespeare”}

  27. author Crick DNA spiral n1 author Wallace 1956 paper title date Origin 1848 Darwin author biblio book db n2 title date Turning labels into data select publ: {type: L, author: A} from biblio.L X, X.author A {publ: {type: “paper”, author: “Crick”}, publ: {type: “paper”, author: “Wallace”}, publ: {type: “book”, author: “Darwin”}

  28. An exercise • List all publications in 1992, their types, and titles.

  29. element content end-tag start-tag Basic XML syntax XML is a textual representation of data An element is a text bounded by tags <name> John </name> <name> </name> can be abbreviated as <name/>

  30. Basic XML syntax Elements may contain subelements <person> <name> John </name> <tel> 112233 </tel> <email> john@123.edu </email> </person>

  31. XML attributes An attribute is defined by a name-value pair within a tag <price currency = “dollar”> 500 </price> <length unit = “cm”> 25 </length>

  32. XML attributes and elements <product> <name> widget </name> <price> 10 </price> </product> <product price = “10”> <name> widget </name> </product> <product name = “widget” price = “10”/>

  33. XML and ssd-expressions <person> <name> John </name> <tel> 112233 </tel> <email> john@123.edu </email> </person> {person: {name: “John”, tel: 112233, email: “john@123.edu”}}

  34. element identifier reference attribute XML references <person id = “p1”> <name> John </name> <tel> 112233 </tel> </person> <person id = “p2”> <name> Peter </name> <tel> 998877 </tel> <boss idref = “p1”/> </person>

  35. Document Type Definitions <!DOCTYPE db [ <!ELEMENT db (person*)> <!ELEMENT person (name, age, email)> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <!ELEMENT email (#PCDATA)> ]>

  36. An exercise on DTDs as schemas <db> <r1> <a> a1 </a> <b> b1 </b> </r1> <r1> <a> a2 </a> <b> b2 </b> </r1> <r2> <c> a1 </c> <d> b1 </d> </r1> <r2> <c> c2 </c> <d> d2 </d> </r1> <r3> <a> a1 </a> <c> b1 </c> </r1> </db> Write down a DTD for the data above!

  37. Attributes in DTDs <product> <name language = “Swedish” department = “music”> trumpet </name> <price currency = “dollar”> 500 </price> <length unit = “cm”> 25 </length> </product> <!ATTLIST name language CDATA #REQUIRED department CDATA #IMPLIED> <!ATTLIST price currency CDATA #REQUIRED> <!ATTLIST length unit CDATA #REQUIRED>

  38. Reference attributes in DTDs <!DOCTYPE people [ <!ELEMENT people (person*)> <!ELEMENT person (name)> <!ELEMENT name (PCDATA)> <!ATTLIST person id ID #REQUIRED boss IDREF #REQUIRED friends IDREFS #IMPLIED> ]>

  39. An exercise <people> <person> id = “sven” boss = “olle”> <name> Sven Svensson </name> </person> <person> id = “olle” friends = “nils eva”> <name> Olle Olsson </name> </person> <person> id = “pelle” boss = “nils eva”> <name> Per Persson </name> </person> <people> Does this XML element conform to the previous DTD?

  40. Limitations of DTDs as schemas • DTDs impose order • No base types • The types of IDREFs cannot be constrained

  41. XSL - extensible stylesheet language <bib> <book> <title> t1 </title> <author> a1 </author> <author> a2 </author> </book> <paper> <title> t2 </title> <author> a3 </author> <author> a4 </author> </paper> <book> <title> t3 </title> <author> a5 </author> <author> a6 </author> </book> </bib>

  42. } Template rule XSL pattern Template rules and XSL patterns <xsl: template> <xsl: apply-templates/> </xsl: template> <xsl: template match = “bib/*/title”> <result> <xsl: value-of/> </result> </xsl: template> <result> t1 </result> <result> t2 </result> <result> t3 </result>

  43. Two exercises select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z {row: {title: “The spiral DNA”, date: 1956}, {title: “Origin”, date: 1848}, {title: “Kapital”, date: 1860}} select row: {author: Y, date: Z} from biblio.book X, X.author Y, X.date Z

  44. Which authors have written a book or a paper in 1992? select author: X from biblio.(book | paper) Y, Y.author X where Y.date = 1992

  45. Which authors have written a book together with Jones? select author: X from biblio.book Y, Y.author X where “Jones” in Y.author

  46. Which authors have written both a book and a paper? select author: A from biblio.book B, biblio.paper P, B.author A where B.author = P.author select author: A1 from biblio.book B, biblio.paper P, B.author A1, P.author A2 where A1 = A2

  47. List all publications in 1992, their types, and titles. select publ: {type: L, title: T} from biblio.L X, X.title T where X.date = 1992

  48. <!DOCTYPE db [ <!ELEMENT db (r1*, r2*, r3*)> <!ELEMENT r1 (a, b)> <!ELEMENT r2 (c, d)> <!ELEMENT r3 (a, c)> <!ELEMENT a (#PCDATA)> <!ELEMENT b (#PCDATA)> <!ELEMENT c (#PCDATA)> <!ELEMENT d (#PCDATA)> ]> <db> <r1> <a> a1 </a> <b> b1 </b> </r1> <r1> <a> a2 </a> <b> b2 </b> </r1> <r2> <c> a1 </c> <d> b1 </d> </r1> <r2> <c> c2 </c> <d> d2 </d> </r1> <r3> <a> a1 </a> <c> b1 </c> </r1> </db>

More Related