80 likes | 274 Views
LIGO Scientific Collaboration Data Grid Status. Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004 University of Chicago. Birmingham •. Cardiff. AEI/Golm •. *LHO, LLO: observatory sites * LSC - LIGO Scientific Collaboration - iVDGL supported.
E N D
LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004 University of Chicago
Birmingham• • Cardiff AEI/Golm • *LHO, LLO: observatory sites * LSC - LIGO Scientific Collaboration - iVDGL supported The LIGO Scientific Collaborationand the LSC Data Grid iVDGL has enabled the collaboration to establish a persistent production grid LSC Data Grid: 6 US sites + 3 EU sites (Birmingham, Cardiff/UK, AEI-MPII/Germany) LIGO Laboratory at Caltech
LIGO Laboratory“Distributed” Tier 1 Center CPU (GHz) Disk (TB) Tape (TB) Network() LHO 763 14 140 OC3 LLO 391 7 140 OC3 CIT 1150 30 500 GigE MIT 244 9 - FastE TOTAL 2548 60 780 These resources represent Tier 1 Center for Collaboration They are dedicated to Collaboration use only Use GT subset to move data, provide computing resource access to collaboration LIGO Laboratory at Caltech
LIGO Scientific Collaboration (LSC) Tier 2 - iVDGL/GRID3 sites • 889 GHz CPU • 34 TB RAID 5 storage • OC-12 (622 Mbps) to Abilene • 296 GHz CPU • 64 TB storage (commodity IDE) for data • OC-12 (622 Mbps) to Abilene LIGO Laboratory at Caltech
GEO 600 Computing SitesEuropean data grid sites • 272 GHz CPU • 18 TB storage (RAID 5 and commodity IDE) • GigE to SuperJANET(to GEANT to Abilene) • 508 GHz CPU • 18 TB storage (RAID5 and commodity IDE) • 10 Gbps to SuperJANET(to GEANT to Abilene) • 670 GHz CPU • 40 TB storage (commodity IDE) • Fast Ethernetto G-WiN(to GEANT to Abilene) LIGO Laboratory at Caltech
LSC DataGrid Current Status • Lightweight Data Replicator • Built upon GT (RLS/GridFTP) • Provides automated data movement -- in production, 7x24 • LSC Data Grid Server deployed at 7 of 9 sites • Built on VDT + LSC specific APIs • Data grid in science production, 7x24 • Most use comprises “conventional use” of GT, Condor • Job submission to individual sites, manual data product migration, tracking, etc. • 35 LSC scientists with digital credentials • VDT use is limited to subgroup participating in GriPhyN/iVDGL • Small experiments at running analysis jobs across multiple sites successful • Part of SC2002 demo • “Big Run” was pulled together as part of demo for SC2003 • Real effort to do production work -- hurdle still too high to interest most scientific users LIGO Laboratory at Caltech
Summary • Developed data replication, distribution capabilities over collaboration Data Grid • Robust, fast replication of data sets across 3 continents • 50+ TB over the internet • Provide data discovery mechanisms • Deployed a persistent Data Grid for the international collaboration • Access to distributed computing power - US & EU • Single sign-on using a single grid identity • Will eventually enable CPU-limited analyses as background jobs • Challenge: making full use of inherent CPU capacity • Implemented the use of virtual data catalogs for efficient (re)utilization of data as part of SC2002/03 • Tracking data locations, availability with catalogs • Data discovery, data transformations • Ongoing work for two classes of pipeline analyses LIGO Laboratory at Caltech
Plans • Continue in deployment and evolution of GRID3 • LIGO will participate with iVDGL partners in the future Open Science Grid (OSG) initiative • Focus must be on 7x24 production • S4, S5 Runs, data analysis Q42004 - Q42005 • Constrains use, access to resources for grid research • Provides excellent opportunity for use-case studies of success/failure • Continue to integrate grid technologies - VDT • Better, wider use of virtual data across all Data Grid sites • Publish data as they become available automatically • Prototype exists in non grid-enabled internal code - develop API to expose this module • Job scheduling across the distributed grid • Enhance/extend persistent data grid for the collaboration • Add sites (Tier 3) • Add additional LIGO Laboratory (Tier 1) resources • Redeploy the SC2003 pipelines using more efficient script topologies • Target: saturate distributed grid resources LIGO Laboratory at Caltech