1 / 14

Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement

Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement. Richard Furuta and Frank Shipman Center for the Study of Digital Libraries and the Department of Computer Science Texas A&M University. Distributed Collections. The Web is continuously changing

fawn
Download Presentation

Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing Distributed Collections:Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital Libraries and the Department of Computer Science Texas A&M University

  2. Distributed Collections • The Web is continuously changing • .gov and .edu pages change less frequently than .com pages (1999) • Collections are needed to “organize” the Web • Bookmark lists • Yahoo! directories • Web portals (NSDL) • Walden’s Paths • Collection managers cannot control changes

  3. Changes to Items in Collections • Items in collections • Play specific roles • Are semantically related • To each other • To the collection • Change to an item may • Change its relationship to the collection • Less coherent with other items (default assumption) • More or no change in relationship • Affect the role it plays in the collection • Less suitable (default assumption) • More suitable or no effect on the role

  4. Research Focuses • Develop techniques to help collection managers cope with changes • Change, migration, disappearance • Categories of Change • Missing pages (migration and disappearance) • Find exact matches • Suggest similar pages • Changed pages: characterizing change • Quantity of change • Nature of change • Relevance to the collection • Implementation: Path Manager – A tool that helps collection managers cope with changes

  5. Management of Distributed Collections • Detection of change is easy • Determination of • Quantity of change is relatively easy • Relevance of change is less easy • Meaning of change is difficult • Approaches • Human validation (Yahoo! surfers) • Automatic detection of change (Path Manager)

  6. Path Manager – The tool • Types of change • Content changes (what) • Presentation changes (how) • Structural changes (linking) • Behavioral changes (scripting – not addressed) Collection-level overview Page-level overview Page details

  7. Collection-level Overview

  8. Little Change Server unreachable 404 error No change Drastic change Page-level Overview

  9. Page Details Page Information Modification details

  10. Content-based Metrics Angle between original and replacing pages (in degrees) Change is change… High angle of change for all cases

  11. Context-based Change Detection • Context consists of • Content from other pages in the path • Annotations created by the author • Additional metadata provided by the author • Distinguishes between edited and replaced pages

  12. Evaluation • 20 paths, pages selected from Yahoo! Directories • Each path consisted of 10 to 12 pages • Pages were randomly selected • no flash presentations or images • A page in each path was randomly selected for replacement • Each selected page was replaced by 3 pages • CNN Financials (large change) • Elephants (large change) • A page from the same Yahoo! Directory (small change)

  13. Experimental thresholds • Negative angle = divergence from the collection • Distinction between similar and different pages • Managers can now focus on divergent pages Results – Distribution of Context-based changes Replacements resulting in moving towards and away from the context vector

  14. For more information on Walden’s Paths http://www.csdl.tamu.edu/walden/ walden@csdl.tamu.edu Principal Investigators: Richard Furuta (furuta@csdl.tamu.edu) Frank Shipman (shipman@csdl.tamu.edu)

More Related