1 / 45

Distributed Data Access and Analysis for Next Generation HENP Experiments Harvey Newman, Caltech

Distributed Data Access and Analysis for Next Generation HENP Experiments Harvey Newman, Caltech CHEP 2000, Padova February 10, 2000. LHC Computing: Different from Previous Experiment Generations. Geographical dispersion: of people and resources

gannon
Download Presentation

Distributed Data Access and Analysis for Next Generation HENP Experiments Harvey Newman, Caltech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Data Access and Analysis • for Next Generation HENP Experiments • Harvey Newman, Caltech • CHEP 2000, Padova February 10, 2000 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  2. LHC Computing: Different from Previous Experiment Generations • Geographical dispersion: of people and resources • Complexity: the detector and the LHC environment • Scale: Petabytes per year of data ~5000 Physicists 250 Institutes ~50 Countries • Major challenges associated with: •  Coordinated Use of Distributed Computing Resources •  Remote software development and physics analysis •  Communication and collaboration at a distance • R&D: A New Form of Distributed System: Data-Grid February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  3. Four Experiments The Petabyte to Exabyte Challenge • ATLAS, CMS, ALICE, LHCB • Higgs and New particles; Quark-Gluon Plasma; CP Violation • Data written to “tape” ~5 Petabytes/Year and UP (1 PB = 1015 Bytes) • 0.1 to 1 Exabyte (1 EB = 1018 Bytes) (~2010) (~2020 ?) Total for the LHC Experiments February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  4. To Solve: the LHC “Data Problem” • While the proposed LHC computing and data handling facilities are large by present-day standards, • They will not support FREE access, transport or processing for more than a minute part of the data • Balance between proximity to large computational and data handling facilities, and proximity to end users and more local resources for frequently-accessed datasets • Strategies must be studied and prototyped, to ensure both:acceptable turnaround times, and efficient resource utilisation • Problems to be Explored • How to meet demands of hundreds of users who need transparent access to local and remote data, in disk caches and tape stores • Prioritise hundreds of requests of local and remote communities, consistent with local and regional policies • Ensure that the system is dimensioned/used/managed optimally, for the mixed workload February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  5. MONARC General Conclusions on LHC Computing • Following discussions of computing and network requirements, technology evolution and projected costs, support requirements etc. • The scale of LHC “Computing” requires a worldwide effort to accumulate the necessary technical and financial resources • A distributed hierarchy of computing centres will lead to better useof the financial and manpower resources of CERN, the Collaborations,and the nations involved, than a highly centralized model focused at CERN • The distributed model also provides better use of physics opportunities at the LHC by physicists and students • At the top of the hierarchy is the CERN Center, with the ability to perform all analysis-related functions, but not the ability to do them completely • At the next step in the hierarchy is a collection of large, multi-service “Tier1 Regional Centres”, each with • 10-20% of the CERN capacity devoted to one experiment • There will be Tier2 or smaller special purpose centers in many regions February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  6. Year 1998 2000 2005 0.05- BW Utilized Per Physicist 0.2 - 2 0.8 - 10 0.25 (and Peak BW Used) (2 - 10) (10 - 100) (0.5 - 2) BW Utilized by a University 0.25 - 10 1.5 - 45 34 - 622 Group BW to a Home-laboratory 34 - 622 - 1.5 - 45 or Regional Center 155 5000 BW to a Central Laboratory 155 - 2500 - 34 - 155 Housing One or More Major 622 10000 Experiments 622 - BW on a Transoceanic Link 1.5 - 20 34-155 5000 Bandwidth Requirements Estimate (Mbps) [*]ICFA Network Tas See http://l3www.cern.ch/~newman/icfareq98.html Circa 2000: Predictions roughly on track: “Universal” BW Growth” by ~2X Per Year; 622 Mbps on Links European and Transatlantic by ~2002-3 Terabit/sec US Backbones (e.g. ESNet) by ~2003-5 Caveats: Distinguish raw bandwidth and effective line capacity; Maximum end-to-end rate for individual data flows “QoS”/ IP has a way to go D388, D402, D274 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  7. Offline Online CMS L1 Slow Control Detector Monitoring L2/L3 “L4” Filtering Persistent Object Store Object Database Management System Simulation Calibrations, Group Analyses User Analysis Common Filters and Pre-Emptive Object Creation CMS Analysis and Persistent Object Store • Data Organized In a(n Object) “Hierarchy” • Raw, Reconstructed (ESD), Analysis Objects (AOD), Tags • Data Distribution • All raw, reconstructedand master parameter DB’s at CERN • All event TAG and AODs,and selected reconstructed data sets at each regional center • HOT data (frequently accessed) moved to RCs • Goal of location and medium transparency On Demand Object Creation C121 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  8. GIOD Summary • GIOD has • Constructed a Terabyte-scale set of fully simulated CMS events and used these to create a large OO database • Learned how to create large database federations • Completed the “100” (to 170) Mbyte/sec CMS Milestone • Developed prototype reconstruction and analysis codes, and Java 3D OO visualization demonstrators, that work seamlessly with persistent objects over networks • Deployed facilities and database federations as useful testbeds for Computing Model studies C51 Hit Track Detector C226 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  9. Tier2 Center ~1 TIPS Tier2 Center ~1 TIPS Tier2 Center ~1 TIPS Tier2 Center ~1 TIPS Tier2 Center ~1 TIPS HPSS HPSS HPSS HPSS HPSS Data Grid Hierarchy (CMS Example) 1 TIPS = 25,000 SpecInt95 PC (today) = 10-15 SpecInt95 ~PBytes/sec Online System ~100 MBytes/sec Offline Farm~20 TIPS Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size ~100 MBytes/sec Tier 0 CERN Computer Center ~622 Mbits/sec or Air Freight Tier 1 Fermilab~4 TIPS France Regional Center Germany Regional Center Italy Regional Center ~2.4 Gbits/sec Tier 2 ~622 Mbits/sec Tier 3 Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Data for these channels should be cached by the institute server Institute ~0.25TIPS Institute Institute Institute Physics data cache 100 - 1000 Mbits/sec Tier 4 E277 Workstations February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  10. LHC (and HEP) Challenges of Petabyte-Scale Data Technical Requirements • Optimize use of resources withnext generation middleware • Co-Locate and Co-Schedule Resources and Requests • Enhance database systems to work seamlesslyacross networks: caching/replication/mirroring • Balance proximity to centralized facilities, andto end users for frequently accessed data Requirements of the Worldwide Collaborative Nature of Experiments • Make appropriate use of data analysis resources in each world region, conforming to local and regional policies • Involve scientists and students in each world regionin front-line physics research • Through an integrated collaborative environment E163 C74,C292 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  11. Time-Scale: CMS Recent “Events” • A PHASE TRANSITION in our understanding of the role of CMS Software and Computing occurred in October - November 1999 • “Strong Coupling” of S&C Task,Trigger/DAQ, Physics TDR, detector performance studies and other main milestones • Integrated CMS Software and Trigger/DAQ planning for the next round: • May 2000 Milestone • Large simulated samples are required: ~ 1 Million events fully simulated a few times during 2000, in ~1 month • A smoothly rising curve of computing and data handling needs from now on • Mock Data Challenges from 2000 (1% scale) to 2005 Users want substantial parts of the functionality formerly planned for 2005, Starting Now A108 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  12. Roles of Projectsfor HENP Distributed Analysis • RD45, GIOD: Networked Object Databases • Clipper,GC; High speed access to Objects or File data FNAL/SAM for processing and analysis • SLAC/OOFS Distributed File System + Objectivity Interface • NILE, Condor: Fault Tolerant Distributed Computing with Heterogeneous CPU Resources • MONARC: LHC Computing Models: Architecture, Simulation, Strategy, Politics • PPDG: First Distributed Data Services and Data Grid System Prototype • ALDAP: Database Structures and Access Methods for Astrophysics and HENP Data • GriPhyN: Production-Scale Data Grid • Simulation/Modeling, Application + Network Instrumentation, System Optimization/Evaluation • APOGEE A391 E277 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  13. 622 Mbits/s FNAL/BNL 70k SI95 70 Tbyte Disk; Robot Univ 1 622 Mbits/s Univ 2 Tier2 Ctr 20k SI95 20 TB Disk Robot N X 622 Mbits/s Optional Air Freight CERN 350k SI95 350 Tbytes Disk; Robot Univ M 622Mbits/s Model Circa 2005 622 Mbits/s 622 Mbits/s MONARC: Common Project • Models Of Networked Analysis At Regional Centers • Caltech, CERN, Columbia, FNAL, Heidelberg, • Helsinki, INFN, IN2P3, KEK, Marseilles, MPI Munich, Orsay, Oxford, Tufts • PROJECT GOALS • Develop “Baseline Models” • Specify the main parameters characterizing the Model’s performance: throughputs, latencies • Verify resource requirement baselines: (computing, data handling, networks) • TECHNICAL GOALS • Define the Analysis Process • Define RC Architectures and Services • Provide Guidelines for the final Models • Provide a Simulation Toolset for Further Model studies F148 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  14. MONARC Working Groups/Chairs • “Analysis Process Design” • P. Capiluppi (Bologna, CMS) • “Architectures” • Joel Butler (FNAL, CMS) • “Simulation” • Krzysztof Sliwa (Tufts, ATLAS) • “Testbeds” • Lamberto Luminari (Rome, ATLAS) • “Steering” • Laura Perini (Milan, ATLAS) • Harvey Newman (Caltech, CMS) • & “Regional Centres Committee” February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  15. MONARC Architectures WG: Regional Centre Facilities & Services • Regional Centres Should Provide • All technical and data services required to do physics analysis • All Physics Objects, Tags and Calibration data • Significant fraction of raw data • Caching or mirroring calibration constants • Excellent network connectivity to CERN and the region’s users • Manpower to share in the development of common validation and production software • A fair share of post- and re-reconstruction processing • Manpower to share in ongoing work on Common R&D Projects • Excellent support services for training, documentation, troubleshooting at the Centre or remote sites served by it • Service to members of other regions • Long Term Commitment for staffing, hardware evolution and supportfor R&D, as part of the distributed data analysis architecture February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  16. MONARC and Regional Centres • MONARC RC FORUM: Representative Meetings Quarterly • Regional Centre Planning well-advanced, with optimistic outlook, in US (FNAL for CMS; BNL for ATLAS), France (CCIN2P3), Italy, UK • Proposals submitted late 1999 or early 2000 • Active R&D and prototyping underway, especially in US, Italy, Japan; and UK (LHCb), Russia (MSU, ITEP), Finland (HIP) • Discussions in the national communities also underway in Japan, Finland, Russia, Germany • There is a near-term need to understand the level and sharing of support for LHC computing between CERN and the outside institutes, to enable the planning in several countries to advance. MONARC Uses traditional 1/3:2/3 sharing assumption February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  17. Network from CERN Network from Tier 2 & simulation centers Tapes Regional Center Architecture Example by I. Gaines (MONARC) Tape Mass Storage & Disk Servers Database Servers Tier 2 Local institutes Data Import Data Export Production Reconstruction Raw/Sim  ESD Scheduled, predictable experiment/ physics groups Production Analysis ESD  AOD AOD  DPD Scheduled Physics groups Individual Analysis AOD  DPD and plots Chaotic Physicists CERN Tapes Desktops C169 Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  18. Data Grid: Tier2 Layer • Create an Ensemble of (University-Based) Tier2 • Data Analysis Centres • Site ArchitecturesComplementary to the the Major Tier1 Lab-Based Centers • Medium-scale Linux CPU farm, Sun data server, RAID disk array • Less need for 24 X 7 Operation  Some lower component costs • Less production-oriented, to respond to local and regional analysis priorities and needs • Supportable by a small local team and physicists’ help • One Tier2 Center in Each Region (e.g. of the US) • Catalyze local and regional focus on particular sets of physics goals • Encourage coordinated analysis developments emphasizing particular physics aspects or subdetectors. Example: CMS EMU in Southwest US • Emphasis on Training, Involvement of Students at Universities in Front-line Data Analysis and Physics Results • Include a high quality environment for desktop remote collaboration E277 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  19. MONARC Analysis Process Example Slow Control/Cal DAQ/RAW February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  20. Monarc Analysis Model Baseline: ATLAS or CMS “Typical” Tier1 RC • CPU Power ~100 KSI95 • Disk space ~100 TB • Tape capacity 300 TB, 100 MB/sec • Link speed to Tier2 10 MB/sec (1/2 of 155 Mbps) • Raw data 1% 10-15 TB/year • ESD data 100% 100-150 TB/year • Selected ESD 25% 5 TB/year [*] • Revised ESD 25% 10 TB/year [*] • AOD data 100% 2 TB/year [**] • Revised AOD 100% 4 TB/year [**] • TAG/DPD 100% 200 GB/year Simulated data 25% 25 TB/year (repository) • [*] Covering Five Analysis Groups; each selecting ~1% of Annual ESD or AOD data for a Typical Analysis • [**] Covering All Analysis Groups February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  21. MONARC Testbeds WG: Isolation of Key Parameters • Some Parameters Measured,Installed in the MONARC Simulation Models,and Used in First Round Validation of Models. • Objectivity AMS Response Time-Function, and its dependence on • Object clustering, page-size, data class-hierarchy and access pattern • Mirroring and caching (e.g. with the Objectivity DRO option) • Scalability of the System Under “Stress”: • Performance as a function of the number of jobs, relative to the single-job performance • Performance and Bottlenecks for a variety of data access patterns • Tests over LANs and WANs D235, D127 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  22. MONARC Testbeds WG • Test-bed configuration defined and widely deployed • “Use Case” Applications Using Objectivity: • GIOD/JavaCMS, CMS Test Beams, ATLASFAST++, ATLAS 1 TB Milestone • Both LAN and WAN tests • ORCA4 (CMS) • First “Production” application • Realistic data access patterns • Disk/HPSS • “Validation” Milestone Carried Out, with Simulation WG A108 C113 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  23. MONARC Testbed Systems February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  24. It provides: An efficient mechanism to simulate multitask processing An easy way to apply different load balancingschemes Multitasking Processing Model • A Java 2-Based, CPU- and code-efficient simulation for distributed systems has been developed • Process-oriented discrete event simulation F148 Concurrent running tasks share resources (CPU, memory, I/O)“Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is generated and all “processing times” are recomputed. February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  25. Role of Simulationfor Distributed Systems • Simulations are widely recognized and used as essential tools for the design, performance evaluation and optimisation of complex distributed systems • From battlefields to agriculture; from the factory floor to telecommunications systems • Discrete event simulations with an appropriate and high level of abstraction • Just beginning to be part of the HEP culture • Some experience in trigger, DAQ and tightly coupledcomputing systems: CERN CS2 models (Event-oriented) • MONARC (Process-Oriented; Java 2 Threads + Class Lib) • These simulations are very different from HEP “Monte Carlos” • “Time” intervals and interrupts are the essentials • Simulation is a vital part of the study of site architectures, network behavior, data access/processing/delivery strategies, for HENP Grid Design and Optimization February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  26. Example : Physics Analysis at Regional Centres • Similar data processing jobs are performed in each of several RCs • Each Centre has “TAG”and “AOD” databases replicated. • Main Centre provides “ESD” and “RAW” data • Each job processes AOD data, and also aa fraction of ESD and RAW data. February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  27. Example: Physics Analysis February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  28. Simple Validation Measurements The AMS Data Access Case Distribution of 32 Jobs’ Processing Time Simulation Measurements C113 4 CPUs Client LAN Raw Data DB Simulation mean 109.5 Measurement mean 114.3 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  29. MONARC Phase 3 • INVOLVING CMS, ATLAS, LHCb, ALICE • TIMELY and USEFUL IMPACT: • Facilitate the efficient planning and design of mutually compatible site and network architectures, and services • Among the experiments, the CERN Centre and Regional Centres • Provide modelling consultancy and service to the experiments and Centres • Provide a core of advanced R&D activities, aimed at LHC computing system optimisation and production prototyping • Take advantage of work on distributed data-intensive computing for HENP this year in other “next generation” projects [*] • For example PPDG February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  30. MONARC Phase 3 • Technical Goal: System OptimisationMaximise Throughput and/or Reduce Long Turnaround • Phase 3 System Design Elements • RESILIENCE, resulting from flexible management of each data transaction, especially over WANs • SYSTEM STATE & PERFORMANCE TRACKING, to match and co-schedule requests and resources, detect or predict faults • FAULT TOLERANCE, resulting from robust fall-back strategies to recover from bottlenecks, or abnormal conditions • Base developments on large scale testbed prototypesat every stage: for example ORCA4 • [*] See H. Newman, http://www.cern.ch/MONARC/progress_report/longc7.html February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  31. MONARC Status • MONARC is well on its way to specifying baseline Models representing cost-effective solutions to LHC Computing. • Discussions have shown that LHC computing has a new scale and level of complexity. • A Regional Centre hierarchy of networked centres appears to be the most promising solution. • A powerful simulation system has been developed, and is avery useful toolset for further model studies. • Synergy with other advanced R&D projects has been identified. • Important information, and example Models have been provided: • Timely for the Hoffmann Review and discussions of LHC Computing over the next months • MONARC Phase 3 has been Proposed • Based on prototypes, with increasing detail and realism • Coupled to Mock Data Challenges in 2000 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  32. University CPU, Disk, Users University CPU, Disk, Users Satellite Site Tape, CPU, Disk, Robot PRIMARY SITE DAQ, Tape, CPU, Disk, Robot University CPU, Disk, Users University CPU, Disk, Users University CPU, Disk, Users Satellite Site Tape, CPU, Disk, Robot The Particle Physics Data Grid (PPDG) DoE/NGI Next Generation Internet Project ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CS Site to Site Data Replication Service 100 Mbytes/sec PRIMARY SITE Data Acquisition, CPU, Disk, Tape Robot SECONDARY SITE CPU, Disk, Tape Robot • Coordinated reservation/allocation techniques; Integrated Instrumentation, DiffServ • First Year Goal: Optimized cached read access to 1-10 Gbytes, drawn from a total data set of up to One Petabyte Multi-Site Cached File Access Service February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  33. PPDG: Architecture for Reliable High Speed Data Delivery Resource Management Object-based and File-based Application Services File Replication Index Matchmaking Service File Access Service Cost Estimation Cache Manager File Fetching Service Mass Storage Manager File Mover File Mover End-to-End Network Services Site Boundary Security Domain February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  34. Distributed Data Delivery and LHC Software Architecture • Software Architectural Choices • Traditional, single-threaded applications • Allow for data arrival and reassembly OR • Performance-Oriented (Complex) • I/O requests up-front; multi-threaded; data driven; respond to ensemble of (changing) cost estimates • Possible code movement as well as data movement • Loosely coupled, dynamic February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  35. ALDAP (NSF/KDI) Project • ALDAP: AccessingLargeDataArchives inAstronomyandParticle Physics • NSF Knowledge Discovery Initiative (KDI) • CALTECH, Johns Hopkins, FNAL(SDSS) • Explore advanced adaptivedatabase structures, physical data storage hierarchies for archival storage of next generation astronomy and particle physics data • Develop spatial indexes, novel data organizations, distribution and delivery strategies, for Efficient and transparent access to data across networks • Example (Kohonen) Maps for data “self-organization” • Create prototype network-distributed data query execution systems using Autonomous Agent workers • Explore commonalities and find effective common solutionsfor particle physics and astrophysics data C226 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  36. Agent Agent Agent Agent Agent Beyond Traditional Architectures:Mobile Agents (Java Aglets) Mobile Agents: Reactive, Autonomous, Goal Driven, Adaptive • Execute Asynchronously • Reduce Network Load: Local Conversations • Overcome Network Latency; Some Outages • Adaptive  Robust, Fault Tolerant • Naturally Heterogeneous • Extensible Concept: Agent Hierarchies “Agents are objects with rules and legs” -- D. Taylor Agent Agent Service Application February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  37. D9 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  38. Grid Services Architecture [*]:Putting it all Together Applns HEP Data-Analysis Related Applications Appln Toolkits Remote data toolkit Remote comp. toolkit Remote viz toolkit Remote collab. toolkit Remote sensors toolkit ... Grid Services Protocols, authentication, policy, resource management, instrumentation, data discovery, etc. Grid Fabric Archives, networks, computers, display devices, etc.; associated local services [*] Adapted from Ian Foster E403 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  39. Grid Hierarchy Goals: Better Resource Use and Faster Turnaround • Efficient resource use and improved responsiveness through: • Treatment of the ensemble of site and network resourcesas an integrated (loosely coupled) system • Resource discovery, query estimation (redirection), co-scheduling, prioritization, local and global allocations • Network and site “instrumentation”: performance tracking, monitoring, forward-prediction, problem trapping and handling • Exploit superior network infrastructures (national,land-based) per unit cost for frequently accessed data • Transoceanic links relatively expensive • Shorter links  normally higher throughput • Ease development, operation, management and security, through the use of layered, (de facto) standard services E163 E345 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  40. Grid Hierarchy Concept:Broader Advantages • Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by region • Lower tiers of the hierarchy  More local control • Partitioning of users into “proximate” communitiesinto for support, troubleshooting, mentoring • Partitioning of facility tasks, to manage and focus resources • “Grid” integration and common services are a principalmeans for effective worldwide resource coordination • An Opportunity to maximize global funding resources and their effectiveness, while meeting the needs for analysis and physics February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  41. Grid Development Issues • Integration of applications with Grid Middleware • Performance-oriented user application software architectureneeded, to deal with the realities of data access and delivery • Application frameworks must work with system state and policy information (“instructions”) from the Grid • ODBMS’s must be extended to work across networks • “Invisible” (to the DBMS) data transport, and catalog update • Interfacility cooperation at a new level, across world regions • Agreement on the use of standard Grid components,services, security and authentication • Match with heterogeneous resources, performance levels,and local operational requirements • Consistent policies on use of local resources by remotecommunities • Accounting and “exchange of value” software February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  42. Grid Hierarchy Concept:Broader Advantages • Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by region • Lower tiers of the hierarchy  More local control • Partitioning of users into “proximate” communitiesinto for support, troubleshooting, mentoring • Partitioning of facility tasks, to manage and focus resources • “Grid” integration and common services are a principalmeans for effective worldwide resource coordination • An Opportunity to maximize global funding resources and their effectiveness, while meeting the needs for analysis and physics February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  43. Content Delivery Networks:a Web-enabled Pre- “Data Grid” • Worldwide Integrated Distributed Systems • for Dynamic Content Delivery Circa 2000 • Akamai, Adero, Sandpiper Server Networks • 1200  Thousandsof Network-Resident Servers • 25  60 ISP Networks • 25  30 Countries • 40+ Corporate Customers • $ 25 B Capitalization • Resource Discovery • Build “Weathermap” of Server Network (State Tracking) • Query Estimation; Matchmaking/Optimization; Request rerouting • Virtual IP Addressing • Mirroring, Caching • (1200) Autonomous-Agent Implementation February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  44. The Need for a “Grid”: the Basics • Computing for LHC will never be “enough” to fully exploit the physics potential, or exhaust the scientific potential of the collaborations • The basic Grid elements are required to make the ensemble of computers, networks, storage management systems, and function as a self-consistent system, implementing consistent (and complex) resource usage policies. • A basic “Grid” will an information gathering/ workflow guiding/ monitoring/ and repair-initiating entity, designed to ward off resource wastage (or meltdown) in a complex, distributed and somewhat “open” system. • Without such information, experience shows that effective global use of such a large, complex and diverse ensemble of resources is likely to fail; or at the very least be sub-optimal • The time to accept the charge to build a Grid, for sober and compelling reasons, is now • Grid-like systems are starting to appear in industry and commerce • But Data Grids on the LHC scale will not be in production untilsignificantly after 2005 February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

  45. Summary The HENP/LHC Data Analysis Problem Petabyte scale compact binary data, and computing resources distributed worldwide • Development of an integrated robust networked data access processing and analysis system is mission-critical • An aggressive R&D program is required • to develop reliable, seamless systems that work across an ensemble of networks • An effective inter-field partnership is now developing through many R&D projects (PPDG, GriPhyN, ALDAP…) • HENP analysis is now one of the driving forces for the development of “Data Grids” • Solutions to this problem could be widely applicable in other scientific fields and industry, by LHC startup • National and Multi-National “Enterprise Resource Planning” February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)

More Related