120 likes | 198 Views
Making DADS distributed a Nordunet2 project. Jochen Hollmann Chalmers University of Technology <joho@ce.chalmers.se>. Project Aims. Principles for the design of distributed systems devoted to Digital Libraries (DL) Project results will contribute with
E N D
Making DADS distributeda Nordunet2 project Jochen Hollmann Chalmers University of Technology <joho@ce.chalmers.se>
Project Aims Principles for the design of distributed systems devoted to Digital Libraries (DL) Project results will contribute with • Tradeoffs for the design of future DL infrastructures • Knowledge how users interact with DL • Algorithms for data replication and pre-fetching • Detailed experience from actual implementations (DADS) 2000
Agenda • Comparison: centralized and distributed approach • General techniques for speedup • System properties and opportunities for improvement • How to distribute? • Project plan 2000
Potential Advantages: Low complexity Low total ownership costs Simple administration Potential Disadvantages: Single point of failure Latency/Overload Availability Does not scale No parallel activities Centralized Approach 2000
Potential Advantages: High availability Minimal latency Data retrieval in parallel Potential Disadvantages: Expensive Bandwidth used to distribute Difficult to allow updates everywhere Does not scale Total Replication to all Clients 2000
Prefetching: Meta data or heuristics allow to request a local copy ahead of time Caching: Keep a retrieved copy for future use (and avoid re-transferring it) Replication: Select data and distribute copies without a request t t t General speedup techniques Start Prefetching Point of Replication Search result available Request 1 Request 2 2000
Properties of Articles and the System Articles • Contain references to related work selected by the author • Are catalogued by experts • Published articles went through an acceptance process • high quality data A Search • Reduces the number of articles to a small number • Presents the results before retrieving the article • May contain patterns to hint replication 2000
manual feedback t t t Selection from the list get related articles Search in the index fetch a paper General speedup techniques Start Prefetching Point of Replication Search result available Request 1 Request 2 2000
Department Researcher University Global Library Research Group How to distribute? In deep knowledge Prefetching Research area Caching on article base Caching on journal base Journals Replication of most used journals Field Everything 2000
Project Plan 2000
Project Plan Phase I (Aug 2000 - Apr 2001) • Analysis of the current centralized system and construction of a simulation model (using data from DADS) Phase II (Apr 2001 - Dec 2001) • Design and evaluation of a distributed version and the contained algorithms Phase III (Dec2001 - July 2002) • Evaluation and fine-tuning of the algorithms in DADS 2000
Life System analyze the log files find locality find bottleneck in the current system hints what should be logged hints what can be replicated where does latencies occur Understand the system properties Simulation build a trace driven the simulation model test if the bottlenecks can be reproduced measure the simulation with current and future technology parameters network technology, storage costs what are the problems that will remain Phase I: Analysis of the current system • Develop a benchmark! • Develop metric to quantify the costs. • Both systems should behave identically 2000