180 likes | 346 Views
Awareness Services for Digital Libraries. Arturo Crespo Hector Garcia-Molina Stanford University. Awareness Services for Digital Libraries. Digital library repository: Data store Other components: Indexers Name manager Replica manager Etc. Data Stores and Clients. DB Tech Reports.
E N D
Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University
Awareness Services for Digital Libraries Digital library repository: • Data store • Other components: • Indexers • Name manager • Replica manager • Etc
Data Stores and Clients DB Tech Reports DB Indexer AI Tech Reports CS Indexer HCI Tech Reports Data Stores Clients
Data Store Services • Object access • Via a handle • Object awareness • Clients must be aware of changes at the store
A Case Study: CS-TR and SIFT • SIFT: a selective dissemination service • CS-TR: A digital library of technical reports from about 50 universities • Awareness based on timestamps • Problems: • File system timestamps • Application timestamps • Deletions
Contributions • Survey of the spectrum of awareness options • Advantages and disadvantages of each one • All mechanisms can be capture by a single algorithm: the UNI-AWARE algorithm • Enhancements for signature-based schemes • Reduced computation • Reduced communication costs
Related Work • Database replica maintenance • Remote file comparison • Deployment of programs over the network
The Client-store Design Space • Push vs. Pull • Statefull versus stateless stores and clients • Cognizant clients and sources • Number of clients per data store
The UNI-AWARE Algorithm • A unified algorithm that “covers” known schemes: • Snapshot algorithm • Timestamps and versions • Logs • Triggers • Signatures • Algorithm is tailored to a specific scheme through the definition of “custom functions”
UNI-AWARE: Signature Algorithm • Signature: a token associated with each document that has a high probability of being unique and changes when the content of the object changes • Example: CRC, checksums • Advantages: • Robust: as it does not require metadata maintenance • Easy to manage consistently when store fails or object migrates
UNI-AWARE: Signature Algorithm All signatures transferred Data Store Client Document Signature Request Documents
DIST-UNI-AWARE Algorithm • Objective: reduce amount of data exchanged between data store and clients • DIST-UNI-AWARE: • Unified algorithm that can be tailored to different schemes: • Hierarchical signatures • Hierarchical timestamps
DIST-UNI-AWARE Signatures of Buckets transferred Data Store Client Request more Signatures Request Documents Document Signature
Advantages of Signature Algorithms • Support the push and pull models • No need for reliable storage of additional data structures: if signatures are lost or corrupted, they can be recomputed • Efficient in usage of network resources, clients and data stores • Scales well in number of clients and documents
DIST-UNI-AWARE: Performance • Performance depends on number of changes: • No changes: only one round is required • Single change: log2n rounds • 2 changes: log2n rounds, but twice as much data … • Eventually, DIST-UNI-AWARE starts behaving worse than UNI-AWARE
DIST-UNI-AWARE: Enhancements • Increase group split factor • Client sends additional information at split time • Clustering of changed objects
Conclusions • Awareness mechanism for digital libraries • Separation of storage functionality and other services • Awareness schemes must be resilient to computer environment changes and bugs • UNI-AWARE and DIST-UNI-AWARE
Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University