Building a Large Location Table to Find Replicas of Physics Objects Koen Holtman Heinz Stockinger

Building a Large Location Table to Find Replicas of Physics Objects Koen Holtman Heinz Stockinger CERN/CMS CHEP ’2000 Feb 7-11, 2000

Replication models Different replication models: • File based: • FTP run files, Objectivity DB files, ... • Import/attach them to local data store • (Objectivity DRO/FTO) • Linkable Objectivity AMS server tricks: • Transparent local / remote file access • Page-level replication?? • Object based (this talk): • Can replicate every single object / set of objects • Transparent local / remote object access • Requires large Object Location Table (OLT) • In Objectivity, go from logical OID to physical OID

Regional Centre Regional Centre Regional Centre The problem • >1010 physics objects (event related objects), all potentially replicated • Lookup: ‘Find the best replica of the physics object with (logical) OID xyz’ • Automatic (grid-like) • Best replica is not always nearest • Lookup operation needs to be fast, scalable

Goal of this talk • Goal: show that an OLT can be built • By showing one viable implementation • This implementation is a spin-off of the work on automatic reclustering • >1010 scalability problem can be solved by • Exploiting specific properties of physics analysis • Limiting freedom of lookup operations • No Objectivity-style ‘on demand’ navigation

Divide and conquer • Partition events into chunks, 1 subjob per chunk • Definition oflogical OID: (chunk ID, event sequence number, type/version ID)

Job and storage model • Job model: job consists of • set of event IDs (chunk ID, event seq nr) • 1 or moretype/version IDs • physics code • code is run usingsubjobs, these get handed iterators • Storage model: • unit of storage is collection • subset of horizontal bar above

Structure… Insert and delete collections... Next slide: 1 subjob, 1 type...

1 subjob, 1 type • Say subjob at CERN wants to read objects (1,3,1),(1,4,1),(1,7,1), ... • At start of subjob, global schedule is computed, say:try collection 1 first, thentry collection 3. • Subjob iteration is in increasing sequencenumber, step throughindices alongside withiteration

Computing the schedule • Optimal schedule computed at start of subjob • Optimiser finds schedulewith minimal cost • Cost(‘read 3,4,7,…’,[1,3]) =Ccoll(‘read 3,4,… from 1’) +Ccoll(‘read 7,… from 3) = …. • Value of Ccoll( ) depends on location of collection, network load, access method, etc. • Calculations use collection contents indices only • Finding optimal schedule is an NP-complete problem, exponential time complexity, but in practice....

Runtime of scheduler Time to compute optimal schedule 1 sec (micro sec) Can also timeout, gives good schedule

Conclusions • Large object location tables can be built! • They are an important component in implementing the grid-like aims of the CMS CTP • Implementation shown is spinoff from work on reclustering • Scalability by using specific properties of physics analysis, and by limiting the freedom of lookup operations • No Objectivity-style ‘on demand’ navigation, all objects needs to be requested from the start • Cost functions can model any access method • Optimisation is both generic and cheap • Long paper will appear in future

Building a Large Location Table to Find Replicas of Physics Objects Koen Holtman Heinz Stockinger

Building a Large Location Table to Find Replicas of Physics Objects Koen Holtman Heinz Stockinger

Presentation Transcript

Searching Technology For a Large Number Of Objects

Find your table!

FILS Handling of Large Objects

FILS Handling of Large Objects

Building your Table

Automated Mapping of Large Binary Objects

BASE BUILDING LOCATION

FILS Handling of Large Objects

HOST Test Centers Find a location closest to you

Location: Building 3202

Location of Large Igneous Provinces:

Object Level Replication Koen Holtman Caltech/CMS PPDG meeting, Argonne July 13-14, 2000

The large Extension Table

Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

LOCATION To find where a place is

Building a Predictive Parsing Table

Building A Repository for Digital Objects

How to Find original brands/replicas on AliExpress ?

How To Find A Pediatric Dentist Near Your Location

Automated Mapping of Large Binary Objects

Manipulating Large Objects

Building a Large Database of FinTech Startup