XML Web: Reinventing the mistakes of the past

XML Web: Reinventing the mistakes of the past

Why it is a big deal : Bringing the Web to Programs • The web has touched many facets of our lives • The web has not yet affected the way we program, because little of the web is m/c understandable • This is beginning to change, e.g., Amazon’s Web Service • This will change the way we look at search • Most content on the web comes from databases • Why go through the cycle of structured data to html to guessing the structure and user’s query? • Huge opportunity, but some key problems are being overlooked

A key lesson from the web • Current development path of the data web is analogous to pre-web hypertext systems and RDBMS today • Lots of islands of data • More money is spent on systems integration than on Databases today. • Lesson from the Web: • There is only one web! • Integration cannot be an after-thought • Has to be built into the core architecture

One vs many XML Webs • What we are getting now --- islands of XML from disparate web services, e.g., Tori Amos • Up to consumer to put these chunks together

USA North Carolina The core of the problem:We get a mess like this Located in Located in Geo Almanac City instanceof USNC0491 Weather channel NTNC temperature 62 F Newton,_NorthCar CDNow birthplace 328723677 Under The Pink Atlantic Musician publisher Author instanceof 0,9855,109071,00 instanceof Author Date Of Birth instanceof publisher Music Album Crucify “8/22/63” EMI People Magazine

What we would like to see … • Create a schematically coherent data web from disparate chunks • Should function like DNS, I.e., consuming program should be able to pretend the whole thing is sitting on a nearby server

Some key problems that need to be solved to realize this vision • Plumbing/Protocols • Scalable query languages • Integration, or the problem of names • Caching • Trust • Bootstrapping Knowledge Bases • Applications • Focus on many-to-many data exchange, not just point-to-point enterprise centric exchanges

Query Languages • Functional interfaces vs query interfaces • Functional interface => SOAP • Query Interface => ? • General, expressive languages like SQL and XML Query inappropriate as public query interface … too expensive, too unpredictable to expose to everyone • We need the equivalent of DNS’s GetHostByName • Simple, but works remarkably well

The Name Problem • Names are crucial in information exchange. • 2 parties cannot exchange information about an object without agreeing on how they are going to refer to it. • The Problem : too many names to keep track off! • No URN for <Newton, NC> or <Tori Amos>. • Different sites have different names for the same thing! • URN efforts to date largely failures. • Traditional Approach : Name-Mapping tables. • Potential Solution : Semantic Negotiation. • Bootstrapping from some to more shared vocab • Names have network effects: One of the few of which there can be only one.

Trust • How do machines know whose data to trust? • Centralized clearing house model: a la Yahoo! • Decentralized Web of Trust model: a la Google/Epinions • Too much focus (by UDDI, etc.) on the former, too little on the latter

Potential Appl: Internet Wet Lab • In many sciences, more data will be produced in the next 2 years than exists today • Increasingly, research consists of writing programs that mine this data • Data is isolated as islands in different labs • Data from one lab not easily available to programs in another lab • Imagine a single virtual net-wide “database” containing all this experimental data • Example : Clinical Trial Data

XML Web: Reinventing the mistakes of the past