80 likes | 148 Views
XML (with a bias towards query language issues). A boring research topic? A new frontier? A means to keep standards people busy ? Prepared by S. Abiteboul and J. Widom. XML. Rapidly adopted by industry A format for exchange of small/medium pieces of data? Yes
E N D
XML(with a bias towards query language issues) A boring research topic? A new frontier? A means to keep standards people busy? Prepared by S. Abiteboul and J. Widom
XML • Rapidly adopted by industry • A format for exchange of small/medium pieces of data? • Yes • But when archived, grows to large volumes • Also a data model? • Yes, for all kinds of data • From unstructured documents to collections of structured data • Warning: this is not the relational world! • Permissive typing, full-text search, … The database community should be very involved, perhaps very concerned
Some XML Issues • Storage of XML • Native vs. XML-relational • Lesson from OODB: it’s not only a technical issue but a business one • Situation is different: more $ involved • Efficient representation, compression • Key issue: interface • DOM, SAX, query language, DB-like API, … • Revisiting old topics • Database design • Integrity constraints • Concurrency control • Access control • Etc. All topics are under active investigation, sometimes reinventing the wheel
Universal Query Language for XML • Problems with XQuery • Focus on complex queries; really need simple filters + IR-style search • Too complex, too ambitious, lack of underlying paradigm, too much politics! • Too broad spectrum of applications? From documents to data • Undermining XQuery with something better? • Personal viewpoint: Small core OQL-like + functional plug-ins • Too late? We need a standard now! • What about updates, standing queries, deltas, constraints, …? This directionhas been mostly deactivated by XQuery Scientific: Is Xquery good or bad from a scientific viewpoint? Politics: Should we push for XQuery?
Back to Basics: Query Optimization • For subsets of the language • The tree structure is definitely a new ball game • New index structures • New cost models • New everything • Depends on storage • Relational, native, others • Revisit old problems • Distributed query processing • View maintenance All topics are under active investigation (but more effort on distribution wouldn’t hurt)
Back to Basics: Foundations • Lots of work on semistructured data • First-order logic and relational languages: strong • OQL/functional languages: reasonable • Full-text search: messy • Significant issue: typing • Much more complex than in relational world • Not settled (XML Schema, tree automata, …) • Query type-checking, type inferencing, update consistency • Very active area • People from database theory, functional programming, automata theory, … All topics already active, not simple, require more work
OLD data management Closed world Client/server Distributed databases Query/answer Active databases QBE interfaces NEW data management Openness P2P applications Web-scale data Subscription queries Queries over streams ADB + Web services New interfaces The Real Frontier(The World is Changing) Research should focus on the new issues rather than on traditional processing of single-site XML data Beyond XML: semantic Web