340 likes | 474 Views
Access to the whole document, able to reconstruct the original. Declarative mappings. Persistent XML via your favorite programming language. Full-Fidelity Flexible Object-Oriented XML Access. James F. Terwilliger, Philip A. Bernstein, Sergey Melnik. LRX: LINQ over Relations and XML.
E N D
Access to the whole document, able to reconstruct the original Declarative mappings Persistent XML via your favorite programming language Full-Fidelity FlexibleObject-Oriented XML Access James F. Terwilliger, Philip A. Bernstein, Sergey Melnik
LRX: LINQ over Relations and XML Language-Integrated Query Object-oriented access to stored data Classes Tables ? XML Native programming language Object-based queries and updates Static type checking ORM’s do not handle XML! SQL, XQuery
Problem • from o inDB.JobCandidates • whereo.Resume.Skills.Contains("production") • selecto.Resume.Name.Name_Last; • WITH XMLNAMESPACES ('http://.../adventure-works/Resume' AS r) • SELECT [Extent1].[Resume].value(N'/*[1]/r:Name/r:Name.Last', • N'nvarchar(max)')AS [C1] • FROM [HumanResources].[JobCandidate] AS [Extent1] • WHERE [Extent1].[Resume].exist(N'/*[1]/r:Skills[ • contains(., "production"]') =CAST(1 as bit) • Currently common: use ORM mapper, bring XML data to client as a string and process on client • Load into XML-like objects and do XPath through API
Inspiration: O-R MappingEntity Framework (Melnik et al. 2007) Client-side (Objects): Store side (Relations): Classes Q1 = Q1’ Q2 = Q2’ Q3 = Q3’ … (select-project only) Tables Mapping specified at schema level Mapping compiled to views Preserve fidelity of the source data Object Queries (LINQ) Query view VQ Update view VU Object Updates Merge view VM
EF Example Client-side (Classes): Store side (Relations): Person1( id integer PRIMARY KEY, name varchar(50), ) Person2( id integer PRIMARY KEY, title varchar(50), details varchar(2000) ) Person: id name title πid, namePerson = πid, name Person1 • πid, title Person = πid, title Person2 • Person = πid, name, title Person1⋈ Person2
Extending EF for XML:Design Requirements Classes Tables XML • Map classes to XML using similar mechanism • Schema-level mapping language • Compile into query and update procedures • In-place updates to maintain fidelity of source • BONUS: Full-Fidelity object representation
Challenges and Related Work • Express O-X mappings declaratively • Some existing tools are canonical (not flexible) • E.g., LINQ-to-XSD • Translate mappings into bidirectional procedures • Some existing tools are unidirectional • E.g., Clio • Translate client queries and updates into server analogs • Some existing tools are state-based • E.g., Lenses, Bidirectional XQuery
Outline Introduction Mappings Mapping compilation and query translation Full Fidelity and update translation Performance Conclusion
Running Example type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name Example Document: Store Side (XML Schema): <contact> <Address> … </Address> <phone_type>Home</phone_type> <number>555-5123</number> <phone_type>Cell</phone_type> <phone_type>Work</phone_type> <number>555-5234</number> <number>555-5345</number> <Address> … </Address> <Address> … </Address> <Name Prefix=“Ms.”> <First_name>Sue</First_name> <Last_name>Wall</Last_name> </Name> </contact> xsd:choice
Component Designator (CD) Store Side (XML Schema): type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name /type::ns:Contact/model::sequence /schemaElement::ns:Address[1] /type::ns:Contact/model::sequence /model::sequence[1] /type::ns:Contact/model::sequence /schemaElement::ns:Address[2] schemaElement::ns:Name /type::0/model::sequence[1] /schemaElement::First_Name[1] Name/First_Name
Mappings and Flexibility: Intuition Client Side (Objects): Store Side (XML Schema): type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name Contact: Address Phone1 … Phone5 ContactType (P/B) PersonName BusinessInfo Address[1], 1 model::sequence, 1 model::sequence, 5 = P PhoneInfo: type Numbers PersonInfo: Prefix First_Name Last_Name
Alternative Representation Client Side (Objects): Store Side (XML Schema): type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name Contact: Address Phone[5] model::sequence, all EXISTS Person: Prefix First_Name Last_Name Name/@Prefix, 1 Name/Last_Name, 1
Mappings • A type mapping: • Associates one client-side class with one XML type • Assigns to each class member a CD expression • “Mapping Fragment” • Essentially maps to a schema element • Might include a position reference if mapping into list • Allows conditions on either side • Client-side can have equality conditions on values • XML-side can have equality conditions on values, tag names, or existence of elements
Compiling CD Expressions: UPA Unique Particle Attribution Example Document: Store Side (XML Schema): type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name <contact> <Address> … </Address> <phone_type>Home</phone_type> <number>555-5123</number> <phone_type>Cell</phone_type> <phone_type>Work</phone_type> <number>555-5234</number> <number>555-5345</number> <Address> … </Address> <Address> … </Address> <Name Prefix=“Ms.”> <First_name>Sue</First_name> <Last_name>Wall</Last_name> </Name> </contact>
Compiling CD Expressions CD Expression (Compiles to) Query to retrieve all elements that match the element Store Side (XML Schema): type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name /type::ns:T/model::sequence /schemaElement::ns:Address[1] (Compiles to) /Address[.<<../phone_type[1]] Name/First_Name (Compiles to) Name/First_Name
Queries and Query Translation:Intuition from p in ObjectContext.People from e in p.resume.employers where e.address.city.Contains(“Port”) select new {pname = p.name, ename = e.name} • from p in ObjectContext.People • from e in SEQUENCE(p.resume, “/resume/employers”) • where TEST(e, “/resume/address/city[contains(., “Port”)]”) • select new {pname = p.name, ename = • VALUE(e, “/name”, string)} from p in ObjectContext.People from e in p.resume.employers where e.address.city.Contains(“Port”) select new {pname = p.name, ename = e.name}
Queries and Query Translation:Basics PH’(Q/Q’) Q’: Compiled query for CD expression of T.bar3 PH(Q) Foo.bar1.bar2.bar3 VALUE: Run query, cast result as primitive type QUERY: Run query SEQUENCE: Run query, iterate over results TEST: Run query, return boolean indicating if result is non-empty Type T PH and PH’ in {VALUE, QUERY, SEQUENCE, TEST}
Type Translation Client Side (Objects): Store Side (XML Schema): type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name Contact: Address Phone[5] Person: Prefix First_Name Last_Name EXISTS where obj is Person where TEST(obj, “./Name”)
Full Fidelity classContact { … AddressTypeAddress; … } Not part of the schema for the document <contact> <!-- added by Tom --><Address source=“corporate”> … </Address> … <!-- Need to review addresses --> </contact>
Delta Representation <contact> <!-- added by Tom --><Address source=“corporate”> … </Address> … <!-- Need to review addresses --> </contact> contactObject Address = new Address (…) Phone[1] = … contactObject BeforeEnd: “Need to review addresses” (Comment) Address = new Address (…) Before: “added by Tom” (Comment) Start: source=“corporate” (Attribute) Phone[1] = … • Each mappable location (anchor) is a key into the delta • Unmapped data becomes associated with an anchor with a relative position reference • Anchors stored in document order
UpdatesAKA: What Does Full Fidelity Get Us • Re-serialization is always an option • Repackage the entire XML document and overwrite • In-place updates may be substantially faster • Oracle, SQL Server, DB2 support in-place updates • In-place updates based on XPath/XQuery • Insert new node, replace existing node, delete node • Inserts are relative to an existing node in tree • After, before, as first, as last
Relative Location Client Side (Objects): Store Side (XML Schema): type Contact: Address? sequence (1..5) phone_type number* Address* Name @Prefix First_Name Last_Name BusinessInfo Business_Name • After? Before? As first? • Correct location depends on pre-existing data • Schema is insufficient • Use delta representation to determine correct placement Contact: Address Phone[5] Person: Prefix First_Name Last_Name First_Name = Bob Phone[1] = new PhoneType (…)
Performance • XMark queries over partially shredded data (4GB) • Q1: Simple paths • Q5: Aggregation and filtering • Q9: Joins • QN: Variant of Query Q6, Descendant axis • Query 6 needed to be re-written because the interesting part of Q6 had been shredded • LRX versus bringing data to client first • Currently, only other option is manual XQuery
Example: Q5 Data is pulled to client Filter and count on client With LRX var c = (from o indb.ClosedAuctions whereo.closed_auction.price >= 40 selecto.auto_id).Count(); Without LRX var q = from o indb.ClosedAuctions selecto.closed_auction; inti = 0; foreach (var o in q) if ((decimal)o.Element("price") >= 40) i++;
Takeaway: Benefit from Pushing Operations to Server Results are in seconds per 100 runs Blue bars are LRX, green bars are without Tried with cold (C) and hot (H) page buffers
Conclusion and Future Directions • Query optimization • Pushing operations to either relational or XML • Keyrefs Object pointers • Queries/updates directly to delta • LRX versus Lorax
Attention, my VLDB attendees! Our system is LRX, it speaks for the trees The XML trees overlooked by the tools That follow the object-relational rules Of course, one can always resort to XQuery But FLWOR’s the deed that makes optimists dreary We leave all relational portions pristine But add new components for XML seen
LRX takes fragments on schema expressed And compiles them to queries whose structures suggest How to draw the right data from trees intertwined And pack into objects of custom design But what of the stuff ‘twixt the elements fall? The comments, the whitespace, the order of xsd:all? Our LRX tucks all of that data away In a structure called “delta”, an indexed array
We draw from the keys in the delta in case We must locate the space to do updates in place When queries or updates on clients arrive Native XQuery does LRX contrive Inspection of query performance has shown That LRX is faster than client alone This is how we make objects of stored XML My talk is now done, so I bid you farewell
Thanks! var q = from c indb.ClosedAuctions wherec.closed_auction.price > 20 from t inc.closed_auction .annotation.description.text .Descendants(“emph") select t; WITH XMLNAMESPACES('http://.../Auction' AS a) SELECT T FROM (SELECT T FROM ClosedAuctions C, SEQUENCE(C.closed_auction, 'a:auction/a:annotation/a:description /a:text//emph') AS T) WHERE VALUE(C.closed_auction, 'a:auction/a:price', int) > 20
Escape Hatch: LINQ-to-XML from c indb.ClosedAuctions wherec.closed_auction.price > 20 from t inc.closed_auction .annotation.description.text .Descendants("emph") select t; Object is of type XElement Pushed to server as corresponding XPath axis • xsd:anyType, mixed content • Or a preference for the XPath model • Map to class XElement • XPath-like interface to XML-like data • Each method invocation translated into XPath on server
Queries and Query Translation:Conditions IF conditions cover a client-side mapping fragment condition, translate into the corresponding store-side XML conditions where e.foo == “bar” IF method has an XML analog, translate into the analogous XQuery function where e.foo.Contains(“bar”) Find fragment for barType, then translate into type or element conditions on XML where e.foo is barType
Relation-mapped classes XML-mapped classes Relation-mapped classes XML-mapped classes Client-side object space Object queries and updates via LINQ Object queries and updates via LINQ Translate XML-mapped references to placeholder (PH) functions Shred XML into objects according to object type and mappings Translate XML-mapped references to placeholder (PH) functions Shred XML into objects according to object type and mappings O-X mappings Package queries and updates into abstract trees, then transform by applying mappings Build objects from query results Package queries and updates into abstract trees, then transform by applying mappings Build objects from query results O-R mappings DB2 Provider Ora Provider SS Provider Translation to vendor-specific SQL syntax PH XQuery PH XQuery PH XQuery PH XQuery DB2 Oracle SQL Server