1 / 11

Building a Large Location Table to Find Replicas of Physics Objects Koen Holtman Heinz Stockinger

Building a Large Location Table to Find Replicas of Physics Objects Koen Holtman Heinz Stockinger CERN/CMS CHEP ’2000 Feb 7-11, 2000. Replication models. Different replication models: File based: FTP run files, Objectivity DB files, ... Import/attach them to local data store

Download Presentation

Building a Large Location Table to Find Replicas of Physics Objects Koen Holtman Heinz Stockinger

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building a Large Location Table to Find Replicas of Physics Objects Koen Holtman Heinz Stockinger CERN/CMS CHEP ’2000 Feb 7-11, 2000

  2. Replication models Different replication models: • File based: • FTP run files, Objectivity DB files, ... • Import/attach them to local data store • (Objectivity DRO/FTO) • Linkable Objectivity AMS server tricks: • Transparent local / remote file access • Page-level replication?? • Object based (this talk): • Can replicate every single object / set of objects • Transparent local / remote object access • Requires large Object Location Table (OLT) • In Objectivity, go from logical OID to physical OID

  3. Regional Centre Regional Centre Regional Centre The problem • >1010 physics objects (event related objects), all potentially replicated • Lookup: ‘Find the best replica of the physics object with (logical) OID xyz’ • Automatic (grid-like) • Best replica is not always nearest • Lookup operation needs to be fast, scalable

  4. Goal of this talk • Goal: show that an OLT can be built • By showing one viable implementation • This implementation is a spin-off of the work on automatic reclustering • >1010 scalability problem can be solved by • Exploiting specific properties of physics analysis • Limiting freedom of lookup operations • No Objectivity-style ‘on demand’ navigation

  5. Divide and conquer • Partition events into chunks, 1 subjob per chunk • Definition oflogical OID: (chunk ID, event sequence number, type/version ID)

  6. Job and storage model • Job model: job consists of • set of event IDs (chunk ID, event seq nr) • 1 or moretype/version IDs • physics code • code is run usingsubjobs, these get handed iterators • Storage model: • unit of storage is collection • subset of horizontal bar above

  7. Structure… Insert and delete collections... Next slide: 1 subjob, 1 type...

  8. 1 subjob, 1 type • Say subjob at CERN wants to read objects (1,3,1),(1,4,1),(1,7,1), ... • At start of subjob, global schedule is computed, say:try collection 1 first, thentry collection 3. • Subjob iteration is in increasing sequencenumber, step throughindices alongside withiteration

  9. Computing the schedule • Optimal schedule computed at start of subjob • Optimiser finds schedulewith minimal cost • Cost(‘read 3,4,7,…’,[1,3]) =Ccoll(‘read 3,4,… from 1’) +Ccoll(‘read 7,… from 3) = …. • Value of Ccoll( ) depends on location of collection, network load, access method, etc. • Calculations use collection contents indices only • Finding optimal schedule is an NP-complete problem, exponential time complexity, but in practice....

  10. Runtime of scheduler Time to compute optimal schedule 1 sec (micro sec) Can also timeout, gives good schedule

  11. Conclusions • Large object location tables can be built! • They are an important component in implementing the grid-like aims of the CMS CTP • Implementation shown is spinoff from work on reclustering • Scalability by using specific properties of physics analysis, and by limiting the freedom of lookup operations • No Objectivity-style ‘on demand’ navigation, all objects needs to be requested from the start • Cost functions can model any access method • Optimisation is both generic and cheap • Long paper will appear in future

More Related