690 likes | 700 Views
PSU/Villanova/VT Discussion Virginia Tech’s Digital Library Research Laboratory. Jan. 10, 2005 -- PSU Edward A. Fox, fox@vt.edu Virginia Tech, Blacksburg, VA 24061 USA http://fox.cs.vt.edu/talks/ http://fox.cs.vt.edu/cv.htm. Acknowledgements (Selected).
E N D
PSU/Villanova/VT DiscussionVirginia Tech’s Digital LibraryResearch Laboratory Jan. 10, 2005 -- PSU Edward A. Fox, fox@vt.edu Virginia Tech, Blacksburg, VA 24061 USA http://fox.cs.vt.edu/talks/ http://fox.cs.vt.edu/cv.htm
Acknowledgements (Selected) • Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; ITR-0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS
Acknowledgements: Faculty, Staff • Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …
Acknowledgements: Students • Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Saverio Perugini, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Xiaoyan Yu, Baoping Zhang, Qinwei Zhu, …
Stepping Stones & Pathways: Improving retrieval by chains of relationshipsbetween document topics Fernando Das-Neves, Virginia Tech DLRL
A Little Experiment(Compare a simple query with a longer version that explicitly includes stepping stones) • “Literary Style in Sherlock Holmes stories” • Note: Numbers are total relevant web pages in top 20 Google results for the query made up of terms on either end of the link. No. of rel. docs. VS.
Another Example • “What is the Relationship between Data Mining and Recommender Systems?” • Naïve Results: There are many matches that are possible answers. • Discussion: But, many of the pages with co-occurrences give no real information about the requested relationship. 7 Recommender Systems Data Mining VS. 10 10 Machine Learning Data Mining Collaborative Recommender Social Networks Filtering Systems 15 9 11
An Alternative Interpretation of a Query in IR: • A query represents two related, separable concepts. • Objective: Retrieve a sequence of documents that support a valid set of chains of relationships between the two concepts. • Input: a query representing two concepts. • Output: two groups of documents + a set of stepping stones (document groups, i.e., clusters) connecting the topics by pathways (relations among clusters).
Type of Questions Matching Alternative Interpretation • Ill-defined questions, with non-enumerated answers: • “How or why is X related to Y?” • “What is the X of Y?” • Even if queries with form “give me something about X” lead to relevant docs, it is possible to increase the quantity and quality of information in the query result, when relations are explicit (as a result of our semi-automatic method).
Why is this useful? • Questions of this type are common. • For example, such questions often occur during research studies. • These occur often in educational settings, e.g., for homework. • These occur often in workplace settings, requiring gathering and relating of information. • Handling of this type of question by current systems often is inadequate.
How to Build Stepping Stones and Pathways? • Our approach involves a belief network, to combine content+structure in document similarity calculation, including citation and co-citation similarities. • Find two relevant document sets, each related to one of the two original sub-queries. • Find a diverse set of strong candidates, each connecting the two subsets, but as different as possible from other candidates. • Create stepping stones by finding similar documents to those candidates; keep the clusters that are heavily cited, or whose documents are highly correlated (in all aspects). • Repeat the process, finding a new stepping stone in between each pair of clusters that are weakly related, until the pathway length is too long, or the similarity is sufficient.
Streams, Structures, Spaces, Scenarios, and Societies (5S): A Formal Digital Library Framework and Its Applications Marcos André Gonçalves Doctoral defense Virginia Tech, Blacksburg, VA 24061 USA
Informal 5S Definition:DLs are complex systems that • help satisfy info needs of users (societies) • provide info services (scenarios) • organize info in usable ways (structures) • present info in usable ways (spaces) • communicate info with users (streams)
Hypotheses • A formal theory for DLs can be built based on 5S. • The formalization can serve as a basis for modeling and building high-quality DLs.
Research Questions 1. Can we formally elaborate 5S? 2. How can we use 5S to formally describe digital libraries? 3. What are the fundamental relationships among the Ss and high-level DL concepts? 4. How can we allow digital librarians to easily express those relationships? 5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties? 6. Where in the life cycle of digital libraries can key aspects of quality be measured and how?
Outline • Motivation: the problem • Hypotheses and research questions • Part 1:Theory • 5S: introduction, formal definitions • The formal ontology • Part 2: Tools/Applications • Language • Visualization • Generation • Logging • Part 3: Quality • Conclusions, Future Work
5S and DL formal definitions and compositions (April 2004 TOIS)
Ontology: Taxonomy of Services Infrastructure Services Information Satisfaction Services Repository-Building Add Value Creational Preservational Acquiring Authoring Cataloging Crawling (focused) Describing Digitizing Harvesting Submitting Conserving Converting Copying/Replicating Translating (format) Annotating Classifying Clustering Evaluating Extracting Indexing Linking Logging Measuring Rating Reviewing (peer) Surveying Training (classifier) Translating Visualizing Binding Browsing Customizing Disseminating Expanding(query) Filtering Recommending Requesting Searching
5SL: a DL Modeling language • Domain specific languages • Address a particular class of problems by offering specific abstractions and notations for the domain at hand • Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping. • XML-based realization of 5S • Interoperability • Use of many standard sub-languages (e.g., MIME types, XML Schemas, UML notations)
Overview of 5SGraph Workspace (instance model) Structured toolbox (metamodel)
Component Pool . . . Java ODL Search Wrapping import Java ODL Browse import Wrapping 5SGen – Version 2: ODL, Services, Scenarios 5SL 5SL - - Scenario Scenario Model Model (6) (6) DL DL 5SL 5SL - - Societies Societies XPath/JDOM XPath/JDOM Designer Designer Transform Transform Model Model (7) (7) (1) (1) DL DL StateChart StateChart Designer Designer Component Component Model Model (8) (8) Pool Pool XPATH/JDOM XPATH/JDOM Transform Transform . . (2) (2) Scenario Scenario . . Synthesis Synthesis (9) (9) . . 5SGen XMI:Class XMI:Class Java Java Deterministic Deterministic Model Model (3) (3) ODL ODL FSM FSM (10) (10) Search Search Wrapping Wrapping Xmi2Java Xmi2Java (4) (4) SMC SMC import import (11) (11) Java Java Java Java ODL ODL JSP JSP Finite Finite Java Java binds binds User User Browse Browse import import State Machine State Machine Classes Classes Interface Interface Class Class Model Model (5) (5) Wrapping Wrapping View View (13) (13) Controller Controller (12) (12) Generated DL Services Generated DL Services
The XML Log Format Log Transaction Timestamp Statement SessionId MachineInfo Event ErrorInfo SessionInfo RegisterInfo Action StatusInfo Update StoreSysInfo Search Browse Collection Catalog SearchBy Timeout PresentationInfo QueryString
Rao Shen’s Preliminary Exam:Hypothesis and Research Questions • The 5S framework provides effective solutions to DL integration. • Formally define the DL integration problem? • Guide integration of domain focused DLs? • How to formally model such domain specific DLs? • How to integrate formally defined DL models into a union DL model? • How to use the union DL model to help design and implement high quality integrated DLs? • Assess the integration?
Consists of mediator wrapper agent Intermediary-based mapping-based Interrelated with use hybrid mapper use composite mapper schema mapping used in use federation Union Archiving two architectures Consists of has an example has an example SemInt LSD Related Work DL interoperability approach
Consists of mediator wrapper agent Intermediary-based mapping-based Interrelated with use hybrid mapper composite mapper schema mapping used in use federation Union Archiving two architectures Consists of DL integration formalization based on DL interoperability approach use trained by GA
Formal Definition of DL Integration • DLi=(Ri, DMi, Servi, Soci), 1 i n • Ri is a network accessible repository • DMi is a set of metadata catalogs for all collections • Servi is a set of services • Soci is a society • UnionRep • UnionCat • UnionServices • UnionSociety
Society Society Union Society General Public archaeologists Archaeologists General Public Architecture of a Union DL DL1 Union DL DL2 Union Service Service Service Harvesting, Mapping, Searching, Browsing, Clustering, Visualization Searching Browsing Union Catalog Catalog1 Catalog2 Union Repository Repository1 Repository2
CitiViz:A Visual User Interface to the CITIDEL System ECDL 2004, Bath, England, September 2004 Nithiwat Kampanya, Rao Shen, Seonho Kim, Chris North, and Edward A. Fox fox@vt.edu http://fox.cs.vt.edu
Structures Societies Scenarios hypertext Streams indexing Spaces searching services Collection Repository browsing A Minimal DL in the 5S Framework Structured Stream Structural Metadata Specification Descriptive Metadata Specification Metadata Catalog Digital Object Minimal DL
Streams ArchObj ArchColl StraDia SpaTemOrg ArchDR hypertext services ArchDColl browsing indexing Societies Scenarios Spaces Structures searching Descriptive Metadata specification Structured Stream Arch Descriptive Metadata specification Arch Metadata catalog ArchDO Minimal ArchDL A Minimal ArchDL in the 5S Framework
ArchDL Expert 5S Archaeology MetaModel ArchDL Designer 5SGraph Structure Sub-model VN Metadata Format HD Metadata Format Scenario Sub-model ETANA-DL Metadata Format VN Catalog HD Catalog Mapping Tool Wrapper4VN Wrapper4HD Component Pool 5SGen Browsing … ETANA-DL Union Services Descriptions Harvesting Mapping Searching Browsing … Inverted Files XOAI Web Interface Search Service Index Union Catalog Browse DB Index Browse Service Services DB Other ETANA-DL Services XOAI
Computing and Information Technology Interactive Digital Educational Library (CITIDEL) • Domain: computing / information technology • Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), … • Submission & Collection: sub/partner collections www.citidel.org
www.CITIDEL.org • Led by Virginia Tech, with co-PIs: • Fox (director, DL systems) • Lee (history) • Perez (user interface, Spanish support) • Students: Ryan Richardson, Kate McDevitt, Jon Pryor, Baoping Zhang • Partners • College of New Jersey (Knox) • Hofstra (Impagliazzo) • Villanova (Cassel) • Penn State (Giles)
Digital library architecture for local and interoperable CITIDEL services
CITIDEL Technology Features • Component architecture (Open Digital Library) • Re-use and compose re-deployable digital library components. • Built Using Open Standards & Technologies • OAI: Used to collect DL Resources and DL Interoperability • XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …) • Perl: Component Integration • ESSEX: Search Engine Functionality • Very fast, utilizing in-memory processing • Includes snap-shots for persistence • Multi-scheming (Aaron Krowne, now at Emory U. Library) • Integrates multiple classifications / views through maps, closure • Extensions: clustering, visualization, personalization, …
Naren Ramakrishnan and Saverio Perugini (U. Dayton) CITIDEL + PIPE • Adds Interaction Personalization to CITIDEL • Automatically handles multi-modal conversion to Cell phone, PDA, Etc. • Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.