1 / 11

XML Web: Reinventing the mistakes of the past

XML Web: Reinventing the mistakes of the past. Why it is a big deal : Bringing the Web to Programs. The web has touched many facets of our lives The web has not yet affected the way we program, because little of the web is m/c understandable

rhys
Download Presentation

XML Web: Reinventing the mistakes of the past

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Web: Reinventing the mistakes of the past

  2. Why it is a big deal : Bringing the Web to Programs • The web has touched many facets of our lives • The web has not yet affected the way we program, because little of the web is m/c understandable • This is beginning to change, e.g., Amazon’s Web Service • This will change the way we look at search • Most content on the web comes from databases • Why go through the cycle of structured data to html to guessing the structure and user’s query? • Huge opportunity, but some key problems are being overlooked

  3. A key lesson from the web • Current development path of the data web is analogous to pre-web hypertext systems and RDBMS today • Lots of islands of data • More money is spent on systems integration than on Databases today. • Lesson from the Web: • There is only one web! • Integration cannot be an after-thought • Has to be built into the core architecture

  4. One vs many XML Webs • What we are getting now --- islands of XML from disparate web services, e.g., Tori Amos • Up to consumer to put these chunks together

  5. USA North Carolina The core of the problem:We get a mess like this Located in Located in Geo Almanac City instanceof USNC0491 Weather channel NTNC temperature 62 F Newton,_NorthCar CDNow birthplace 328723677 Under The Pink Atlantic Musician publisher Author instanceof 0,9855,109071,00 instanceof Author Date Of Birth instanceof publisher Music Album Crucify “8/22/63” EMI People Magazine

  6. What we would like to see … • Create a schematically coherent data web from disparate chunks • Should function like DNS, I.e., consuming program should be able to pretend the whole thing is sitting on a nearby server

  7. Some key problems that need to be solved to realize this vision • Plumbing/Protocols • Scalable query languages • Integration, or the problem of names • Caching • Trust • Bootstrapping Knowledge Bases • Applications • Focus on many-to-many data exchange, not just point-to-point enterprise centric exchanges

  8. Query Languages • Functional interfaces vs query interfaces • Functional interface => SOAP • Query Interface => ? • General, expressive languages like SQL and XML Query inappropriate as public query interface … too expensive, too unpredictable to expose to everyone • We need the equivalent of DNS’s GetHostByName • Simple, but works remarkably well

  9. The Name Problem • Names are crucial in information exchange. • 2 parties cannot exchange information about an object without agreeing on how they are going to refer to it. • The Problem : too many names to keep track off! • No URN for <Newton, NC> or <Tori Amos>. • Different sites have different names for the same thing! • URN efforts to date largely failures. • Traditional Approach : Name-Mapping tables. • Potential Solution : Semantic Negotiation. • Bootstrapping from some to more shared vocab • Names have network effects: One of the few of which there can be only one.

  10. Trust • How do machines know whose data to trust? • Centralized clearing house model: a la Yahoo! • Decentralized Web of Trust model: a la Google/Epinions • Too much focus (by UDDI, etc.) on the former, too little on the latter

  11. Potential Appl: Internet Wet Lab • In many sciences, more data will be produced in the next 2 years than exists today • Increasingly, research consists of writing programs that mine this data • Data is isolated as islands in different labs • Data from one lab not easily available to programs in another lab • Imagine a single virtual net-wide “database” containing all this experimental data • Example : Clinical Trial Data

More Related