190 likes | 422 Views
The Niagara Project. “I have avoided networking like the plague. I am terrified of getting [a connection] because it’s like drinking from Niagara Falls.” - Arthur C. Clarke. Who is working on Niagara?. Professors: DeWitt and Naughton @ UW, Maier @ OGI Students: Lots of them!
E N D
The Niagara Project “I have avoided networking like the plague. I am terrified of getting [a connection] because it’s like drinking from Niagara Falls.” - Arthur C. Clarke
Who is working on Niagara? • Professors: DeWitt and Naughton @ UW, Maier @ OGI • Students: Lots of them! • See http://www.cs.wisc.edu/niagara
Goals of the Niagara Project: • In broadest terms, to: • improve the precision of Internet searching • allow queries over the whole Internet (the “FROM *” clause) • work over streams as well as static files • monitor the Internet for changes • Not finished yet...
Current status: • Completed three java prototypes: • A “text-in-context” XML search engine. • An XML-QL query engine. • An XML-QL trigger engine. • Doing the same thing (again) in C++, maybe with Quilt as query language. • Finding (solving?) interesting research problems along the way...
Text-in-Context XML SE • Rather than ask: What are all the documents that contain the string “Montreal”? We can ask: What are all the documents that contain ship departure information for a ship whose name is “Montreal”?
How it works: • Locate documents by crawling the web or using explicit input from user. • Build local index on these docs that supports fast evaluation of Search Engine Query Language (SEQL) queries. • Return URL’s of documents that satisfy SEQL queries. • Two uses: stand alone, or part of XML-QL
XML-QL Query Engine • Evaluates queries expressed in XML-QL. • Result is XML • Different from Search Engine: Instead of asking: Find all files with ship departure events where the ship’s name is “Montreal”? We can ask What is a list of departure dates for ships named “Montreal”?
Ex: Fragment of XML file... <department> <deptname> Electrical Engineering </deptname> <faculty> <name> <lastname>Robertson</lastname> <firstname>Pedro</firstname> </name> <phone>6988086</phone> <email>Robertson.Pedro@foo.edu</email> <office>660</office> </faculty> </department>
XML-QL Query... WHERE <department> <deptname>"Electrical Engineering"</> <faculty> <name> <lastname> $v2 </> <firstname> $v3 </> </> </> content_as $v4 CONSTRUCT <fname> $v4 </>
Important Question • Which documents should be consulted to answer an XML-QL query? We support three approaches: • explicitly listed documents (“in foo.xml”) • documents conforming to DTD (“conforms to some_dtd.xml”) • documents that satisfy search engine predicates extracted from query
Example of third approach: • Given the previous XML-QL query finding first and last names of EE faculty members, the system will extract this Search Engine query: department CONTAINS (deptname IS "Electrical Engineering" AND faculty CONTAINS name CONTAINS (lastname AND firstname))
Control Flow for Typical Query • So full flow of typical XML-QL query: • user submits XML-QL query • system extracts SEQL query from XML-QL, passes it to search engine • search engine evaluates SEQL query, returns list of URLs to XML-QL query engine • XML-QL engine fetches documents from URL list, evaluates query • Answer returned to the user.
XML-QL Trigger Engine • Goal: • allow users to define “triggers” on XML files using XML-QL predicates. • Scale to huge numbers of triggers by exploiting commonality among sets of triggers.
Some research topics... • Semantics and impl. of queries over streams? • Use RDBMS for anything at all? • How smart should the search engine be? • Can you use caching anywhere? • Query optimization: plan space, stats? • How should you index (cached?) XML? • What do you do with queryable sources? • How do you handle huge #s of triggers? • Performance, performance, performance.
A Petabyte in your Pocket David DeWitt, Dave Maier @ OGI, Jeff Naughton
Title of NSF ITR Project • What does it mean? • Goal is to have, available from a PDA, your evolving and customized view of all the on-line digital data that exists anywhere. • Goal is not to develop holographic memory technology or DNA-based storage units.
What the PetDB is: • An example of what can be done with new software infrastructure termed “Net Data Managers” (NDMs.) • NDMs: • focus on data movement as well as storage • store and query data of arbitrary types without a schema having been defined • execute queries and triggers over tens of thousands of information sites
Connection with Niagara... • Niagara is a very early prototype of a simple NDM. • Project goal: • continue developing Niagara and working on research problems that arise • prototype a simple NDM application using Niagara to see if we are on the right track
For more information... • Web site http://www.cs.wisc.edu/niagara • Talk to me or any other Niagara project member...