200 likes | 374 Views
Awareness Services for Digital Libraries. Arturo Crespo Hector Garcia-Molina Stanford University. Motivation. Our Objective : create the next generation Data Repositories tailored to Digital Libraries needs:
E N D
Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University
Motivation • Our Objective: create the next generation Data Repositories tailored to Digital Libraries needs: • Persistence, Distribution, Intellectual Property, Indexing and Cataloging, Replication, ... Data Storage Clients Naming Indexers Replica Data Storage
Data Stores and Clients DB Tech Reports DB Indexer AI Tech Reports CS Indexer HCI Tech Reports Data Stores Clients
Data Store Services • Object access • Via a handle • Object awareness • Clients must be aware of changes at the store
A Case Study: CS-TR and SIFT • SIFT: a selective dissemination service • CS-TR: A digital library of technical reports from about 50 universities • Awareness based on timestamps • Problems: • File system timestamps • Application timestamps • Deletions
The Problem How can a Data Storage Client detect the changes that have happened in remote Data Storages since the last update • There is not a “Perfect Algorithm”: • The best algorithm for solving this problem depends on the characteristics of the relation between the Data Storage and the client
The Design Space • Ratio of Data Storages per Client • Statefull versus Stateless Data Storages in relation with the Clients • Push versus Pull Model • Update Frequency{ • Client awareness of Data Storages • Complexity of the Algorithm How often the repository changes How often the client is updated
Standard Mechanisms for Client Updating • Key Query Algorithm • Snapshot Differential Algorithm • Timestamps and Versions • Logs • Triggers • Signatures
Contributions • Survey of the spectrum of awareness options • Advantages and disadvantages of each one • All mechanisms can be capture by a single algorithm: the UNI-AWARE algorithm • Enhancements for signature-based schemes • Reduced computation • Reduced communication costs
Related Work • Database replica maintenance • Remote file comparison • Deployment of programs over the network
The UNI-AWARE Algorithm • A unified algorithm that “covers” known schemes: • Snapshot algorithm • Timestamps and versions • Logs • Triggers • Signatures • Algorithm is tailored to a specific scheme through the definition of “custom functions”
UNI-AWARE: Signature Algorithm • Signature: a token associated with each document that has a high probability of being unique and changes when the content of the object changes • Example: CRC, checksums • Advantages: • Robust: as it does not require metadata maintenance • Easy to manage consistently when store fails or object migrates
UNI-AWARE: Signature Algorithm All signatures transferred Data Store Client Document Signature Request Documents
DIST-UNI-AWARE Algorithm • Objective: reduce amount of data exchanged between data store and clients • DIST-UNI-AWARE: • Unified algorithm that can be tailored to different schemes: • Hierarchical signatures • Hierarchical timestamps
DIST-UNI-AWARE Signatures of Buckets transferred Data Store Client Request more Signatures Request Documents Document Signature
Advantages of Signature Algorithms • Support the push and pull models • No need for reliable storage of additional data structures: if signatures are lost or corrupted, they can be recomputed • Efficient in usage of network resources, clients and data stores • Scales well in number of clients and documents
DIST-UNI-AWARE: Enhancements • Increase group split factor • Client sends additional information at split time • Clustering of changed objects
Conclusions • Awareness mechanism for digital libraries • Separation of storage functionality and other services • Awareness schemes must be resilient to computer environment changes and bugs • UNI-AWARE and DIST-UNI-AWARE
Reference • Arturo Crespo, Hector Garcia-Molina. "Awareness Services for Digital Libraries." ECDL'97. http://www-db.stanford.edu/~crespo/publications/
Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University