XML (with a bias towards query language issues)

XML(with a bias towards query language issues) A boring research topic? A new frontier? A means to keep standards people busy? Prepared by S. Abiteboul and J. Widom

XML • Rapidly adopted by industry • A format for exchange of small/medium pieces of data? • Yes • But when archived, grows to large volumes • Also a data model? • Yes, for all kinds of data • From unstructured documents to collections of structured data • Warning: this is not the relational world! • Permissive typing, full-text search, … The database community should be very involved, perhaps very concerned

Some XML Issues • Storage of XML • Native vs. XML-relational • Lesson from OODB: it’s not only a technical issue but a business one • Situation is different: more $ involved • Efficient representation, compression • Key issue: interface • DOM, SAX, query language, DB-like API, … • Revisiting old topics • Database design • Integrity constraints • Concurrency control • Access control • Etc. All topics are under active investigation, sometimes reinventing the wheel

Universal Query Language for XML • Problems with XQuery • Focus on complex queries; really need simple filters + IR-style search • Too complex, too ambitious, lack of underlying paradigm, too much politics! • Too broad spectrum of applications? From documents to data • Undermining XQuery with something better? • Personal viewpoint: Small core OQL-like + functional plug-ins • Too late? We need a standard now! • What about updates, standing queries, deltas, constraints, …? This directionhas been mostly deactivated by XQuery Scientific: Is Xquery good or bad from a scientific viewpoint? Politics: Should we push for XQuery?

Back to Basics: Query Optimization • For subsets of the language • The tree structure is definitely a new ball game • New index structures • New cost models • New everything • Depends on storage • Relational, native, others • Revisit old problems • Distributed query processing • View maintenance All topics are under active investigation (but more effort on distribution wouldn’t hurt)

Back to Basics: Foundations • Lots of work on semistructured data • First-order logic and relational languages: strong • OQL/functional languages: reasonable • Full-text search: messy • Significant issue: typing • Much more complex than in relational world • Not settled (XML Schema, tree automata, …) • Query type-checking, type inferencing, update consistency • Very active area • People from database theory, functional programming, automata theory, … All topics already active, not simple, require more work

OLD data management Closed world Client/server Distributed databases Query/answer Active databases QBE interfaces NEW data management Openness P2P applications Web-scale data Subscription queries Queries over streams ADB + Web services New interfaces The Real Frontier(The World is Changing) Research should focus on the new issues rather than on traditional processing of single-site XML data Beyond XML: semantic Web

Discussion

XML (with a bias towards query language issues)

XML (with a bias towards query language issues)

Presentation Transcript

XML Algebra

XQL (XML Query Language)

XML Query Language

XML Databases

XML-QL A Query Language for XML

XML - QL A Query Language for XML

Querying Distributed Data using XML

Integrating Keyword Search into XML Query Processing

Query Languages for XML: XQuery

XML Query Languages

Query Processing with XML

8.2 W3C XML Query Language

9 Querying XML Data and Documents

8 Querying XML Data and Documents

Query Languages for XML

XML query

XML Query Languages

XML 과 Database

XML and Databases

XML Data Management XQuery