110 likes | 124 Views
Computing tasks and the computing model. Computing tasks Identify tasks Identify data types Requirements to constrain computing model Computing model Baseline model including: Computing facilities Data storage model Network constraints Cost + manpower estimates
E N D
Computing tasks and the computing model • Computing tasks • Identify tasks • Identify data types • Requirements to constrain computing model • Computing model • Baseline model including: • Computing facilities • Data storage model • Network constraints • Cost + manpower estimates • Details in two LHC-B technical notes • Editors: P.Binko, A.Pacheco
Simulation • Existing program (SICB) used to: • Estimate distribution of event sizes (input to DAQ and data storage models) • Estimate complexity of data (input to trigger, reconstruction, analysis algorithm studies) • Estimate CPU and memory requirements (input to baseline computing model) • Evolution of simulation • Assess impact of new technologies (e.g. GEANT4, geometry from CAD, ODBMS) on extrapolations from SICB • Assess impact of more detailed simulations, parameterization of detector response • Estimate simulation needs (CPU, data volume) as a function of time
SICB measurements • Mean raw event size: 500kBytes • Breakdown by subdetector • Includes all GEANT hit information • If only detector digitizing, truth tracks and relationships between the two are stored, reduces to 200kBytes/event • CPU time: 10kMIPS-seconds/event • No parameterization for calorimeter showers • Memory requirement: 50MBytes
Requirements from tasks • For each of: • Software triggers • Reconstruction • Calibration and Alignment • Analysis • Detailed input from each subsystem: • Description of algorithms • Dataflow dependencies • Input and output data types and volume, calibration…. • Reliability • Quality assurance, monitoring, documentation…. • Performance • CPU requirements, frequency, rejection factors…. • Input to computing model
Requirements from data • Identify types of data in terms of contents • Raw data (real and simulated) • Event tag data • Reconstruction objects • Analysis objects • Calibration data, configuration data, detector description • For each data type: • Estimate volume • Understand access patterns • Input to computing model
How to store and accessa Pbyte of data? • Database approach • ODBMS • Direct access via queries (c.f. AltaVista) • Hierarchical storage • Where is data physically? • All at CERN, duplicated at regional centres, distributed • Sociological constraints • Network constraints • Fast networking: • 107 Internet users within few years, will bandwidth follow? • Security? • Internet vs. guaranteed bandwidth
How to analyse PB of data? • 106 CERN Units = 3000 quad Pentium II (300MHz) • Current production farms have 100 nodes • Scalability? • 60 MCHF at today’s prices • Technology trends? • Follow industry standards • Commodity components • Cheap • Possibility to mix&match components • CPU, disks, memory, video displays, network cards… • Software libraries, databases, tools • Follow technology evolution closely • To avoid inappropriate decisions
Computing facilities • Event filter farm and online reconstruction • Also available for reprocessing/simulation • Homogeneous online/offline computing architecture • Simulation farms • Can (should?) be distributed • Data server(s) • Storage of raw data at 20MB/s, ODBMS, HSS • Regional centres? • Analysis farms • Close to the data? • Desktop • Collaborative tools, SDE, analysis tools
Conclusion • Collect detailed requirements from subsystems • Dataflow requirements • CPU requirements • ….. • Evaluate requirements (and constraints) of computing model • Data storage • Data access • Computing facilities • …. • Summarised in (draft) documents