The Niagara Project

The Niagara Project “I have avoided networking like the plague. I am terrified of getting [a connection] because it’s like drinking from Niagara Falls.” - Arthur C. Clarke

Who is working on Niagara? • Professors: DeWitt and Naughton @ UW, Maier @ OGI • Students: Lots of them! • See http://www.cs.wisc.edu/niagara

Goals of the Niagara Project: • In broadest terms, to: • improve the precision of Internet searching • allow queries over the whole Internet (the “FROM *” clause) • work over streams as well as static files • monitor the Internet for changes • Not finished yet...

Current status: • Completed three java prototypes: • A “text-in-context” XML search engine. • An XML-QL query engine. • An XML-QL trigger engine. • Doing the same thing (again) in C++, maybe with Quilt as query language. • Finding (solving?) interesting research problems along the way...

Text-in-Context XML SE • Rather than ask: What are all the documents that contain the string “Montreal”? We can ask: What are all the documents that contain ship departure information for a ship whose name is “Montreal”?

How it works: • Locate documents by crawling the web or using explicit input from user. • Build local index on these docs that supports fast evaluation of Search Engine Query Language (SEQL) queries. • Return URL’s of documents that satisfy SEQL queries. • Two uses: stand alone, or part of XML-QL

XML-QL Query Engine • Evaluates queries expressed in XML-QL. • Result is XML • Different from Search Engine: Instead of asking: Find all files with ship departure events where the ship’s name is “Montreal”? We can ask What is a list of departure dates for ships named “Montreal”?

Ex: Fragment of XML file... <department> <deptname> Electrical Engineering </deptname> <faculty> <name> <lastname>Robertson</lastname> <firstname>Pedro</firstname> </name> <phone>6988086</phone> <email>Robertson.Pedro@foo.edu</email> <office>660</office> </faculty> </department>

XML-QL Query... WHERE <department> <deptname>"Electrical Engineering"</> <faculty> <name> <lastname> $v2 </> <firstname> $v3 </> </> </> content_as $v4 CONSTRUCT <fname> $v4 </>

Important Question • Which documents should be consulted to answer an XML-QL query? We support three approaches: • explicitly listed documents (“in foo.xml”) • documents conforming to DTD (“conforms to some_dtd.xml”) • documents that satisfy search engine predicates extracted from query

Example of third approach: • Given the previous XML-QL query finding first and last names of EE faculty members, the system will extract this Search Engine query: department CONTAINS (deptname IS "Electrical Engineering" AND faculty CONTAINS name CONTAINS (lastname AND firstname))

Control Flow for Typical Query • So full flow of typical XML-QL query: • user submits XML-QL query • system extracts SEQL query from XML-QL, passes it to search engine • search engine evaluates SEQL query, returns list of URLs to XML-QL query engine • XML-QL engine fetches documents from URL list, evaluates query • Answer returned to the user.

XML-QL Trigger Engine • Goal: • allow users to define “triggers” on XML files using XML-QL predicates. • Scale to huge numbers of triggers by exploiting commonality among sets of triggers.

Some research topics... • Semantics and impl. of queries over streams? • Use RDBMS for anything at all? • How smart should the search engine be? • Can you use caching anywhere? • Query optimization: plan space, stats? • How should you index (cached?) XML? • What do you do with queryable sources? • How do you handle huge #s of triggers? • Performance, performance, performance.

A Petabyte in your Pocket David DeWitt, Dave Maier @ OGI, Jeff Naughton

Title of NSF ITR Project • What does it mean? • Goal is to have, available from a PDA, your evolving and customized view of all the on-line digital data that exists anywhere. • Goal is not to develop holographic memory technology or DNA-based storage units.

What the PetDB is: • An example of what can be done with new software infrastructure termed “Net Data Managers” (NDMs.) • NDMs: • focus on data movement as well as storage • store and query data of arbitrary types without a schema having been defined • execute queries and triggers over tens of thousands of information sites

Connection with Niagara... • Niagara is a very early prototype of a simple NDM. • Project goal: • continue developing Niagara and working on research problems that arise • prototype a simple NDM application using Niagara to see if we are on the right track

For more information... • Web site http://www.cs.wisc.edu/niagara • Talk to me or any other Niagara project member...

The Niagara Project

The Niagara Project

Presentation Transcript

The Niagara Escarpment

Niagara Falls

Niagara Falls

The Niagara Escarpment

Niagara Falls

Niagara Falls

Niagara Falls

Operation Niagara

Buffalo Niagara Wait Time Pilot Project Update

Niagara County

The Niagara Escarpment

NIAGARA-ON-THE-LAKE

Niagara

Niagara Falls

Niagara

Niagara: The Next Generation

The Niagara Falls

NIAGARA FALLS

NIAGARA FALLS

Niagara falls

Niagara moving

Niagara Falls Canada | Niagara Falls Tour