570 likes | 846 Views
XML and the Semi-Structured Data Model. Motivation. We have seen that relational databases are very convenient to query. However: There is a LOT of data not in relational databases!! Perhaps the most widely accessed database is the web, and it certainly isn’t a relational database.
E N D
Motivation • We have seen that relational databases are very convenient to query. However: • There is a LOT of data not in relational databases!! • Perhaps the most widely accessed database is the web, and it certainly isn’t a relational database.
Querying the Web • The web can be queried using a search engine, however, we can’t ask questions like: • What is the weather in Zanzibar today? • What is the lowest price for which a Jaguar is sold on the web? • Problems: • There are no facilities for asking complex questions, such as aggregation of data • Words have overloaded meanings (Jaguar)
Understanding the Web • In order to query the web, we must be able to understand it. • 2 Computer Science Approaches: • Artificial Intelligence Approach • Database Approach
Artificial Intelligence Approach “The web is unstructured and we must deal with it” • Use techniques for machine learning to understand the web. • Example: To understand the word “Jaguar” check if it appears on a page with the word car or automobile; or rather with jungle and Africa • Problem: Such techniques tend to be inexact and have a large percentage of mistakes
Database Approach “The web is unstructured and we will structure it” • Sometimes problems that are very difficult can be solved easily by enforcing a standard • Encourage the use of XML as a standard for data exchange on the web
Example XML Document Opening Tag <?xml version=“1.0”?> <transaction> <account>89-344</account> <buy shares = “100”> <ticker exch = “NASDAQ”>WEBM</ticker> </buy> <sell shares = “30”> <ticker exch = “NYSE”>GE</ticker> </sell> </transaction> Closing Tag Element Attribute Name Attribute Value
XML Representation of a Table <?xml version=“1.0”?> <ROWSET> <ROW num = “1” > <ENAME>KING </ENAME> <SAL>5000</SAL> </ROW> <ROW num = “2” > <ENAME>SCOTT </ENAME> <SAL>3000</SAL> </ROW> </ROWSET>
Very Unstructured XML <?xml version=“1.0”?> <DamageReport> The insured’s <Vehicle Make = “Volks”> Beetle </Vehicle> broke through the guard rail and plummeted into the ravine. The cause was determined to be <Cause>faulty brakes </Cause>. Amazingly there were no casualties. </DamageReport>
XML Vs. HTML • XML and HTML are brothers. They are both special cases of SGML. • HTML has specific tag and attribute names. These are associated with a specific meaning • XML can have any tag and attribute name. These are not associated with any meaning • HTML is used to specify visual style • XML is used to specify meaning
Rule 1 – XML Declaration • An XML document should begin with an XML declaration. A simple declaration is: <?xml version=“1.0”?> Other things can be specified, such as character encoding.
Rule 2 – Document Element • Use exactly one top-level document element: Example: <?xml version=“1.0”?> <Question> This is legal </Question> <?xml version=“1.0”?> <Question> Is this legal? </Question> <Answer> No. </Answer>
Rule 3 – Match Opening and Closing Tags • XML is case sensitive. The following examples are all illegal Example: <Question> This is legal </QUESTION> <Question> <B> Is this legal? </Question> </B>
Rule 4 – Comments • Comments are between <!-- and --> characters. Comments can’t appear as attribute values or within a tag. Example: <!-- This is a legal comment --> <Question <!-- This is illegal -->> Why is this illegal <!-- This is a legal comment --> </Question>
Rule 5 – Element Names • Element and attribute names must be continuous sequences of letters or hyphens or underscores. Example: Legal Names: <_legal> <This-is-OK> I Illegal Names: <2-Part-Question> <Two Part Question><Question 4You = “Yes”>
Rule 6 – Attribute Values • Attribute values • go in opening tags. • should be enclosed by matching quotes (‘ or “) • should have only text and not tags Legal Example: <Question Poster = “Yitzchak”>Do you like XML? </Question> <Answer Poster = ‘Yaakov’>I do.</Answer>
Rule 6 – Continued Illegal Examples: <Question Poster = “Yitzchak’>Do you like XML? </Question> <Question>Do you like XML? </Question Poster = “Yitzchak”> <Question Poster = “<first>Yitzchak</first>”>Do you like XML? </Question>
Rule 7 – Empty Elements • Empty elements are elements that do not contain text or nested elements. They can be written in a compact syntax: <Person First = “Shmuel” Last = “Levy”></Person> is the same as <Person First = “Shmuel” Last = “Levy” />
An Example <?xml version=“1.0”?> <transaction> <account>89-344</account> <buy shares = “100”> <ticker exch = “NASDAQ”>WEBM</ticker> </buy> <sell shares = “30”> <ticker exch = “NYSE”>GE</ticker> </sell> </transaction>
Corresponding Tree transaction account buy sell 89-344 shares shares ticker ticker 100 30 exch exch NASDAQ NYSE WEBM GE
Using XML • Quering XML: There are query languages that query XML and return XML. Examples: XQuery, XPath, SQL4X • Displaying XML:An XML document can have an associated style-sheet which specifies how the document should be translated to HTML. Examples: CSS, XSL
Namespaces • Namespaces are used to attach an accepted meaning to a set of tags. • Syntax for defining a namespace <SomeElement xmlns:prefixname=“namespaceURL” > the namespace will be recognized within the SomeElement element.
Example Namespace <irs:Form id=“1040” xmlns:irs=“http://www.irs.gov”> <irs:Name>Tina Wells</irs:Name> <PhoneNumber>03-5655666</PhoneNumber> </irs:Name> • In order for the namespace to be recognized in all elements, the declaration should be in the document element
What are XSQL Pages? • XSQL pages are XML documents that have SQL queries embedded in them. • When a user requests to view an XSQL page, the web server: • Dynamically computes the embedded queries • Translates the query results into XML • Inserts the results in the proper places in the document • Transforms the result to HTML if a stylesheet is given
A Simple Example <?xml version=“1.0”?> <xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”> SELECT sname FROM Sailors </xsql:query> You should specify the connection and the namespace on the document element
Page Seen in Browser <?xml version=“1.0”?> <ROWSET> <ROW num = “1” > <SNAME>Rusty</SNAME> </ROW> <ROW num = “2” > <SNAME>Justin </SNAME> </ROW> </ROWSET> • A ROWSET element encloses query result • Each ROW element encloses each row • Each column in the row is within a tag with its column’s name
Another Example <?xml version=“1.0”?> <RESULTS connection=“scott” xmlns:xsql=“urn:oracle-xsql”> Here is something interesting: <xsql:query> SELECT sname, age + rating as ra FROM Sailors WHERE sid = 13 </xsql:query> </RESULTS>
Resulting Document <?xml version=“1.0”?> <RESULTS> Here is something interesting: <ROWSET> <ROW num = “1” > <SNAME>Rusty</SNAME> <RA>55</RA> </ROW> </ROWSET> </RESULTS>
Using Parameters • Your page can use parameters. The value of a parameter param is determined in the following fashion: • The value of the URL parameter param if supplied • The value of the HTTP session object param if supplied • The value of the closest ancestor’s attribute named param, if present • An empty string
Example with Parameters <?xml version=“1.0”?> <xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql” sname = “Joe”> SELECT * FROM Sailors WHERE sname = ‘{@sname}’ </xsql:query>
Evaluating the Query • Suppose the XSQL document is at: http://cs.huji.ac.il/~db/query1.xsql • Then, requesting the url: http://cs.huji.ac.il/~db/query1.xsql?sname=Jim will return all the details of Jim. • Requesting http://cs.huji.ac.il/~db/query1.xsql will return all the details of Joe (the defualt value)
A Strange Example <?xml version=“1.0”?> <xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql” select = “*” where = “1=1” order=“1”> SELECT {@select} FROM {@from} WHERE {@where} ORDER BY {@order} </xsql:query>
Customizing Results • The query tag can have different attributes that customize the query results. Here are some of the important options: • max-rows: The maximum number of rows returned • skip-rows: The number of rows to skip before returning rows • rowset-element: The name of the rowset element • row-element: The name of the row element
Customizing Results <?xml version=“1.0”?> <xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql” skip = “0” max-rows=“2” skip-rows={@skip} > SELECT * FROM Program ORDER BY url </xsql:query> By calling the same page with different values for skip, we can see the different programs
Notes • An XSQL document can have many queries. • The queries can appear within arbitrary XML tags • We can produce XML that has a more nested structure using the CURSOR function...
Remembering Subqueries in the SELECT Clause • Subqueries in the SELECT clause must return a single value. What do we do if we want for each boat, all the sailors who reserved the boat? • We want each bid to be associated with a table of Sailors data!
Using the CURSOR Function <?xml version=“1.0”?> <xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”> SELECT bid, CURSOR(SELECT sid, sname FROM Sailors S, Reserves R WHERE S.sid = R.sid and R.bid = B.bid) as Reservers FROM Boats B; </xsql:query>
Note use of select query alias instead of inner row set and row tags. <?xml version=“1.0”?> <ROWSET> <ROW num = “1” > <BID>113</BID> <RESERVERS> <RESERVERS_ROW num = “1” > <SID> 13 </SID> <SNAME> Joe </SNAME> </RESERVERS_ROW> <RESERVERS_ROW num = “2” > .... </RESERVERS_ROW> </RESERVERS> </ROW> </ROWSET>
Setting Page Level Parameters • The following statement defines a parameter pname. The value of pname is the value in the first column of the first row of the query • The variable pname will be recognized in the page <xsql:set-page-param name=“pname”> SELECT Statement </xsql:set-page-param>
Example <?xml version=“1.0”?> <page connection=“scott” xmlns:xsql=“urn:oracle-xsql”> <xsql:set-page-param name=“num-stories”> SELECT headings_num FROM user_prefs WHERE userid={@user} </xsql:set-page-param> <xsql:query max-rows={@num-stories} > SELECT title, url FROM latest_news </xsql:query> </page>
Another Way to Define a Page Level Parameter • Page level parameters can also be set with the statement: <xsql:set-page-param name=“pname” value=“val”/> • For example: <xsql:set-page-param name=“num-stories” value=“10”/>
Additional Options • The set-page-param element can have the following attributes: • only-if-unset: If the value is “yes” then the parameter will be set only if it has no value • ignore-empty-value: If value is “yes” then the parameter will be set only if its value will not be an empty string
Setting Cookie Values • The following statement defines a parameter pname. The value of pname is the value in the first column of the first row of the query • The variable pname will be recognized until the cookie expires <xsql:set-cookie name=“pname”> SELECT Statement </xsql:set-cookie>
Additional Attributes for Set-Cookie • The set-cookie element can have the following attributes: • max-age: The number of seconds before the cookie expires (defaults to expire when user exits current browser instance) • only-if-unset • ignore-empty-value
Example <?xml version=“1.0”?> <page connection=“scott” xmlns:xsql=“urn:oracle-xsql”> <xsql:set-cookie name=“siteuser” max-age=“31536000” only-if-unset=“yes” ignore-empty-value=“yes”> SELECT username FROM site_users WHERE username= ‘{@username}’ and password=‘{@password}’ </xsql:set-cookie> <!-- Other Actions Here --> </page>