210 likes | 358 Views
Haystack. Dennis Quan Oxygen Workshop, January, 2002. Introduction. Personalized information store Semistructured data with arbitrary metadata Unified ontology Standards-based components and infrastructure Compatible with existing systems Example user interface
E N D
Haystack Dennis Quan Oxygen Workshop, January, 2002
Introduction • Personalized information store • Semistructured data with arbitrary metadata • Unified ontology • Standards-based components and infrastructure • Compatible with existing systems • Example user interface • Integration with mail and groupware concepts • Collaboration possibilities
What is an Ontology? • “The branch of metaphysics that deals with the nature of being. “ – American Heritage Dictionary • Describes relationships between different objects in a system • Like schemata or class hierarchies
Resource Description Format (RDF) • Standard defined by W3C in 1999 (http://www.w3.org/RDF/) • Models statements of the form: <subject> <predicate> <object> • Can be expressed as a labeled, directed graph • For example, statements “Bob likes Alice” and “Bob likes Jane”: Jane likes Bob likes Alice <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”> <rdf:Description rdf:about=“Bob”> <likes rdf:resource=“Alice” /> <likes rdf:resource=“Jane” /> </rdf:Description> </rdf:RDF>
RDF Store • RDF Store used by Haystack to store all information • Runs off of a standard SQL database • Provides querying facility • Example: who likes Jane? (?x likes Jane); return ?x
Belief • With multitude of information, how much is believable? • Annotate who said what • Also can describe belief network using RDF • Example: John says that Bob likes Jane, and Bob believes John • Belief Server—component of Haystack that evaluates belief network and “filters” the store for information believed by the user Jane likes Bob assertedBy believes John
Collections • Basic means of aggregation • Difference from “folders”: containment versus membership • Categorization and subcategories
Queries • One possible means for constructing a collection (result set) • Can use all possible metadata fields to construct query • Natural language • Multiple query sources—the Web, other people’s Haystacks, etc. • Automatic update of query result sets • Possibilities for machine learning (e.g., when a user removes an item from a result set—a message to Haystack that an object does not belong)
Services • Callable services in Haystack • Also, automatic agents that respond to events • Available methods described in metadata • Haystack service initialization script also described in metadata • Services mainly written in Java, but can be written in any language
SOAP, WSDL and UDDI • Relationship to Web Services standards: • Simple Object Access Protocol (SOAP) http://www.w3.org/TR/SOAP/ • Web Services Description Language (WSDL) http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnwebsrv/html/wsdl.asp • Universal Description, Discovery and Integration (UDDI) http://www.uddi.org/ • SOAP and HTTP/PUT used as protocols for communication between services, including the RDF Store • RDFized version of WSDL used to describe services’ interfaces • UDDI query functionality easily modeled in RDF query
Inference Layer • The semantics defined in RDF often permit deduction • Example: Fido is a dog and dogs are mammals Fido is a mammal • Deduced knowledge is useful and should be stored • Inference Layer recognizes patterns and triggers agents/services to perform deduction
Views • May be several different ways of looking at an object • Example: appointment book can be viewed as a sortable list of appointments or a calendar • Views are a distinct type of object used to model these different ways of looking at objects
User Interface Ontology • UI components (e.g. JavaBeans, ActiveX controls) rich sources of metadata • Form descriptions also describable with metadata • Possible to construct a directed graph that models a user interface • Similar in concept to XUL • Permits dynamic deduction of user interface similar to XSLT, except semantic rather than syntactic • Part: a Haystack UI component • ViewPart: a kind of part specially designed to display a specific kind of View
SWT • Cross-platform Java widget toolkit • Part of Eclipse project (http://www.eclipse.org/) • Uses native operating systems’ widgets, avoiding performance problems • Used for Part framework • Integrates with Mozilla web browser • Also possible to use ActiveX controls and GTK widgets
Ozone • Haystack experimental user interface • Modeled after a web browser • Uses parts to describe user interface
Browse/Query Paradigm • Browsing: going through nested folders/categories to locate sought item(s) • Query: giving an explicit set of conditions to locate sought item(s) • Ozone adopts hybrid Browse/Query paradigm • Traditional subcategories still present in Collection view • Also, parameterized categories similar to queries • Previously issued queries persist as subcategories
Mail • E-mail a good source of metadata-rich documents • Messages, e-mail addresses, people and groups can be modeled in RDF • Haystack agents can be used to filter e-mail to make it more manageable • Many e-mail management techniques applicable to documents in general and vice versa
Storage Model • Objects in Haystack named by Uniform Resource Identifiers (URIs) • URLs are a subclass of URIs • Documents and web pages can be named by URLs • HTTP/FTP/WebDAV servers can then be used to store documents • Inefficient to store terabytes of “data” in RDF when existing storage solutions are effective
Collaboration • Allow Haystack-Haystack and Haystack-Semantic Web information exchange • Filtration of imported data • Who’s the expert? problem • Privacy concerns • Different ways of organizing information between different parties • Can be used to model mailing lists, newsgroups, and groupware
Ontological Conversion • Unlikely that everyone will agree on the same schemata • Ontological conversion converts from one schema to another • Can be implemented as Haystack agents that respond to metadata with “foreign” schemata
Implementation • Written for Java 2 platform (JDK 1.3.1) • SWT (Eclipse) used for user interface components • Mozilla web browser • HSQL open source SQL database written in Java • Lucene (Apache Jakarta project) search engine written in Java • Tomcat (Apache Jakarta project) web server written in Java • Parts written in Jython, Java-based Python interpreter