440 likes | 453 Views
Explore the semi-automatic mapping-based integration of diverse collections into archaeological digital libraries using the Megiddo case study, detailing the challenges and innovative approaches in this domain.
E N D
Incremental, Semi-automatic, Mapping-Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005, Vienna, September 19, 2005 Ananth Raghavan, Naga Srinivas Vemuri, Rao Shen, Marcos André Gonçalves, Weiguo Fan, and Edward A. Fox fox@vt.edu http://fox.cs.vt.edu
Acknowledgements (Selected) • Sponsors: NSF grant ITR-0325579; AOL, ASOR, CWRU, ETANA, Vanderbilt U., Virginia Tech • Faculty/Staff: Lillian Cassel, Debra Dudley, Roger Ehrich, Manuel Perez, … • VT (Former) Students: Doug Gorton, Aaron Krowne, Ming Luo, Hussein Suleman, Ricardo Torres, …
Acknowledgements (Selected) • Karen Borstad,MPP • Giorgio Buccellati, UCLA • Douglas Clark, Walla Walla College • Joanne Eustis, CWRU • Nick Fischio, CWRU • Israel Finkelstein, Tel-Aviv University • Paul Gherman, Vanderbilt U. • Andrew Graham, U. Toronto • Tim Harrison, U. Toronto • Larry Herr, Canadian University College • Christopher Holland,LRP • Paul Jacobs, Mississippi State U. • Douglas Knight, Vanderbilt U. • Stan LaBianca, Andrews U. • David McCreery, Willamette U. • Eric Meyers,Duke U. • Adam Porter, Illinois College • Jack Sasson, Vanderbilt U. • Tom Schaub,Indiana U. of Penn. • Randall Younker, Andrews U. • Doug Gorton, Virginia Tech
Outline • Problems • Background: ETANA-DL, Megiddo • Approaches • Within the 5S framework • Visual mapping service • Multi-dimensional browsing • Conclusions • Future Work
Problems • Vast quantities of heterogeneous archaeological data • Integration is a monumental task. • Wrapper automation • difficult to construct a global schema in archaeological domain
Background ETANA-DL Web Site
Background (Cont.) • Megiddo Collection • Archaeological site in Israel • Contains over 30000 records • 7 different types of artifacts • Wall • Locus • Pottery Bucket • Flint tool • Vessel • Lab Item • Miscellaneous Artifact
Approaches • Within the 5S framework • Visual mapping service • Semi-automatically generate wrapper based on a visual schema mapping tool that simultaneously improves the global schema. • Multi-dimensional browsing service • Extend access to newly integrated collections through multi-dimension browsing component.
Requirements (1) Analysis (2) DL Designer 5S DL 5SGraph Meta Expert Model Practitioner 5SL Teacher DL Model c omponent Design (3) pool Researcher ODLSearch, ODLBrowse, ODLRate, Tailored ODLReview, 5SLGen ……. DL Implementation (4) Services 5SSuite 5SGraph 5SGen Mapping Tool
Structure Sub-model Local Schema ETANA-DL Schema ArchDL Designer Local data Mapping Tool Wrapper Global data Union Catalog 5SGen Component Pool Multi-dimension Browsing Service Browsing … ArchDL Expert 5S Archaeology MetaModel ArchDL Designer 5SGraph Scenario Sub-model ETANA-DL Union Services Descriptions Harvesting Mapping Searching Browsing …
Megiddo Site Organization in Structure Sub-model *Flint tool *Vessel Megiddo *Area *Square *Locus *Pottery bucket *Lab item *miscellaneous artifact
Visual Mapping Service • Features of visual schema mapping tool • Scenario usage • Mapping Megiddo local schema into ETANA global one • Usability evaluation
Features of Visual Schema Mapping Tool • Schema Visualization using hyperbolic trees • Recommendation engine that uses 3 algorithms • Name-based matching (editing distance) • Rules • Mapping history • Colors to distinguish between different types of schema nodes (root, leaf, non-leaf, selected, recommended, and mapped) • Mapping table that stores mappings from local to global nodes • Allows for renaming, deleting a node, and adding a local schema sub-tree as a child in the global schema. • Generates an XSLT style sheet as a result of mapping process.
Mapping Megiddo Local Schema into ETANA Global Schema • Mapping of flint tool and vessel collections • Name-based matching (editing distance) • Rules • Area - > PARTITION • Square1 - > SUBPARTITION • OriginalBucket - > CONTAINER • Locus - > LOCUS • Mapping history
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.) Initial set of mappings for flint tool based on rules and name-based matching
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.) Adding FLINT sub-tree as a child of OBJECT in the global schema
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.) Global node Description renamed to DESCRIPTION, and user choosing to Save Mappings
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.) Flint tool style sheet generated
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.) Using the View Only Top Level Leaf Nodes option mapping Vessel Collection
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.) Name change recommendation based on mapping history
Usability Evaluation • Claims Analysis • Exploring trade-off between • linear representation and hyperbolic tree representation with recommendations in terms of mapping speed. • scrolling involved in linear representation and re-orient actions involved in hyperbolic trees. • representing mappings as lines across the screen and in a separate mapping table • editing capability in the same tool and mapping and editing in different tools in terms of ease of use and editing and mapping speed. • Benchmark Tasks (BTs) to explore the above claims • Comparison between Schema Mapper and MapForce for 1-1 schema mapping (as found in ETANA-DL).
Benchmark Task 1 • Required the user to map 6 given nodes from the local to global schema. • Used to compare time and scrolls vs. re-orients and number of errors. • Users were asked to indicate as to which tool helped them locate nodes faster.
Benchmark Task 1 Quantitative Results (Cont.) • 2 users recorded 1 error each when using Schema Mapper, no errors for MapForce. • The error was that they selected the wrong local schema node. • However, both of them realized their error because of the mapping table provided. • Reduces the criticality of error.
Benchmark Task 1 Qualitative Results • Wins • 8 out of 9 users felt that Schema Mapper helped locate both local schema and global schema nodes faster than MapForce. • The remaining user felt that both tools were equally effective for local schema node detection. However, for global schema node detection, Schema Mapper was superior. • Areas for Improvement • Users complained that they could not look at the full node name in Schema Mapper.
Benchmark Task 2 • User asked to map Megiddo Flint collection into ETANA-DL. • Task involves schema editing. • Task accomplished by using MapForce for mapping and XML Spy for editing for comparison with Schema Mapper. • Used to compare efficiency between the two tools.
Benchmark Task 2 Quantitative Results (Cont.) • Schema Mapper – All errors were due to Rename feature. • Task required the user to rename the node name to uppercase of existing node name. • The Rename box in the UI did not contain the old name. • Critical Incident with a high criticality • Rectified by adding old name in the Rename box while prompting the user to enter a new name. • In MapForce, one user actually lost all his mappings!!
Benchmark Task 2 Qualitative Results • Wins • All 9 users preferred editing capability of Schema Mapper over that of MapForce and XML Spy combined. • Areas for Improvement • Rename functionality to be extended to the mapping table. • Allowing a group rename by selecting multiple nodes and renaming them in a separate window.
Benchmark Task 3 • Asks the users to identify mappings done in BT-2. • Compares the time taken by each tool to identify the mappings. • Compares errors in identifying mappings.
Benchmark Task 3 Quantitative Results • Wins • 7 out of 9 users were faster using Schema Mapper. • No errors using Schema Mapper whereas 2 users made 1 error each while using MapForce. • Areas for Improvement • Sorting feature can be added to further aid the user in locating the mappings faster. (Has been subsequently added.)
Benchmark Task 3 Qualitative Results • Wins • All 9 users found it easier to identify mappings with Schema Mapper than MapForce.
Benchmark Task 4 • Users were asked whether they would be using View Only Top-Level Leaf Nodes and View Only This Sub-tree features. • This question was mainly posed to find out whether an undo feature (getting back the original view with all nodes displayed) needed to be implemented. • All users unanimously agreed that they would use both of the features. • (Undo feature was implemented subsequently.)
Summary of Usability Evaluation • All claims justified. • Rename box modified to display old name while prompting for new name. • Undo feature implemented. • Sort feature provided for sorting the mapping table.
Multi-dimension Browsing Service • Extend browsing service to integrated Megiddo collection • Flint • Vessel • Lab item • Miscellaneous artifact
Multi-dimension Browsing Service Integrated Megiddo collection
Conclusions • Demonstrate the DL integration workflow through Megiddo case study. • Visual schema mapping tool supports integration by wrapper generation and global schema enrichment. • Positive results from initial pilot studies of the visual schema mapping tool
Future Work • Extensive usability studies • Explore complex mappings • Enhance mapping recommendations
Questions?Comments? Thank You!