280 likes | 441 Views
A&T Advisory Board EDC Storage Area Network (SAN). April 19, 2004 Ken Gacke, Brian Sauer, Doug Jaton gacke@usgs.gov bsauer@usgs.gov djaton@usgs.gov. Agenda. Storage Architecture EDC SAN Architectures Digital Reproduction SAN Landsat SAN LPDAAC SAN SAN Reality Check.
E N D
A&T Advisory BoardEDC Storage Area Network (SAN) April 19, 2004 Ken Gacke, Brian Sauer, Doug Jaton gacke@usgs.gov bsauer@usgs.gov djaton@usgs.gov
Agenda • Storage Architecture • EDC SAN Architectures • Digital Reproduction SAN • Landsat SAN • LPDAAC SAN • SAN Reality Check
Storage Architecture Direct Attached Storage Ethernet • Difficult to reallocate resources • File sharing via Network (NFS, FTP) • NFS Performance/Security Issues • Duplicate copies of data • I/O Performance/Bandwidth • Data Availability Concerns • Server failure => no data access Linux Sun SGI
Storage Technology Disk Farm SAN Configuration Ethernet • Hardware Solution • Fibre Channel Switch • Fibre Channel RAID • Logical Reallocation of Resources • File sharing via Network (NFS, FTP) • NFS Performance/Security Issues • Duplicate copies of data • I/O Performance/Bandwidth • Data Availability Concerns • Server failure => no data access Linux Fibre Switch Sun SGI
Storage Technology Clustered File System SAN Configuration Shared File System Ethernet • Hardware/Software Solution • Fibre Channel Switch • Fibre Channel RAID • Sharable File System • Logical Reallocation of Resources • Direct File Sharing • Single data copy • Efficient I/O • Scalable Bandwidth • High Data Availability Linux CXFS/CFS Fibre Switch Sun CXFS/CFS SGI CXFS/CFS
Storage Architecture • SAN Goals • File sharing across multiple servers • Heterogeneous Platform Support (IRIX, Solaris, Linux) • Reduce number of file copies • Improve I/O efficiency • Reduce I/O requirements on server • Reduce Network load • Reduce time required to transfer data • Storage Management • Increase disk storage utilization • Logical reallocation of storage resources • Data Availability • Maintain data access when a server fails
Digital Reproduction CR1 SAN April 19, 2004 Ken Gacke SAIC Contractor gacke@usgs.gov
Tape Drives 8x9840 2x9940B Historical Architecture – No SAN Ethernet UniTree Server Product Distribution Architecture Notes: 1) Data transfer via FTP 2) Duplicate storage on both servers 3) Multiple data file I/O required on both servers 4) System bandwidth constrained by Network
CR1 SAN Timeline • FY2002 – DMF Integration • DMF Production Release in December 2001 • Fully automated Data Migration process • 21TB migrated to DMF within 3 months • Data migration during off hours • Full data access through data migration period • FY2003 – CXFS Integration • SGI CXFS Certified SAN Configuration • CXFS On Two IRIX Servers, DMF and PDS • SGI TP9400 1TB RAID • 8 Port Brocade and 16 Port Brocade fibre switches • SGI Installed on 10/8/02 • Test DMF/CXFS configuration • Performed final CXFS testing • DMF/CXFS released to production on 11/5/02
2Gb Fibre 1Gb Fibre Disk Cache /dmf/edc 68GB /dmf/doqq 547GB /dmf/guo 50GB /dmf/pds 223GB /dmf/pdsc 1100GB Tape Drives 8x9840 2x9940B CR1 SAN Architecture Ethernet DMF Server Product Distribution
CR1 SAN Summary • Data Storage • 2TB Disk Cache storing 67 Terabytes on the backend • 2.5 Million Files • 2003 Average Monthly Data Throughput • Data ingest – 3.5TB • Data retrieval – 9.6TB • Average data throughput of 8.5MB/sec (includes tape access) • Minimal System/Ops Administration • Single Vendor Solution • SGI Software, RAID, and Fibre Switches • CXFS supported on SGI IRIX, Linux, Solaris, Windows, etc
Landsat SAN April 19, 2004 Brian Sauer SAIC Contractor bsauer@usgs.gov
Landsat SAN Goals • Improve Overall Performance (3 Hrs -> 1.5 Hrs) • Maximize Disk Storage Through Shared Resources • Centralized Management (System Admin, Hardware Eng) • Overcome Old SCSI RAID Obsolescence (Ciprico 6900) • Utilize Existing Investment in Fibre Channel Storage • Existing Investment in Ciprico NetArrays • “Open” Solution • High Performance • Combined throughput of over 240MB/sec • High Availability • Total Usable Storage over 10TB • SGI, Linux and SUN Clients • Integrate in Phases as Tasks Become SAN Ready
Landsat SAN Overview • 13 TB of Raw Storage Utilizing Ciprico NetArrays • Three Brocade Switches • Eleven Linux and Six SGI Clients • Data Capture System Database Server (DDS) • Landsat Processing System (LPS) • Landsat Archive Management System (LAM) • Image Assessment System (IAS) • Landsat Product Generation System (LPGS) • ADIC StorNext File System Software • Shared High Performance File System • Qlogic Fibre Channel Host Bus Adapters
DCS Database Server Capture & Transfer System 20 Minute Transfer R R R R C C C C C C C C (CTS) (DDS) 14 Minute Pass 24Minute Transfer 24 Minute Transfer R C C Landsat OLD Data Flow L7 Raw CC Archive (LAM) R 85 Minutes to Process C C L7 L0Ra Archive L7 Processing System L 0 (LAM) R a (LPS)
LAM LPS Landsat SAN Satellite dish LGS • Eliminated FTP Transfers CTS1 CTS2 CTS3 RAID3 RAID3 RAID3 SAN DDS RAW DATA L0RA DATA
Landsat SAN Summary • Advantages • Able to share data in a high performance environment to reduce the amount of storage necessary • Increase in overall performance of the Landsat Ground System • Open Solution • Able to utilize existing equipment • Currently testing with other vendors • Disk availability for projects during off-peak times e.g. IAS • Disadvantages / Challenges • Challenge to integrate an open solution • CIPRICO RAID controller failures • Not good for real-time I/O • Challenge to integrate into multiple tasks • Own agenda and schedule • Individual requirements • Difficult to guarantee I/O
LP DAAC SAN Forum April 19, 2004 Douglas Jaton SAIC Contractor djaton@usgs.gov
LP DAAC Data Pool – Phase I SAN Goals Phase I – “Data Pool” Implementation in early FY03 • Access/Distribution Method (ftp site): • Support increased electronic distribution • Reduce need to pull data from archive silos • Reduce need for order submissions (and media/shipping costs) • Give science and applications users timely, direct access to data, including machine access • Allow users to tailor their data views to more quickly locate the data they need by providing “The Data Pool SAN infrastructure effectively acts as a subset archive of the full ECS archive”
LP DAAC Data Pool (SAN) Configuration • Data Pools are an additional subset “inventory” of science data (granule, browse, metadata) that reside in a separate inventory database, with their physical files resident on local storage area network (SAN = 44TB) • STK D178 RAID racks with 1 Sun E450 metadata server. • Data Pool inventory is managed via 2nd Sybase Inventory database • Data pool contents are populated from the primary ECS archive. • Subscriptions can be fully qualified with the population occurring at insert time in the primary ECS archive (a function of ingest) (forward population) • Historical data load from primary ECS archive via query (historical population capability) in support of science or user requirements. • NASA intent is to grow the on-line to be a “working copy” of the most popular data • Dataset “Collections” belong to “Groups” and are configured for “N” days of persistence and are automatically removed at expiration (rolling archive concept) • Data Management of this 2nd archive to keep synchronized to primary has been problematic and has increased O&M costs. • Data Pool Web client(s) and/or anonymous ftp site access are used to navigate contents, browse, access, and download data products. Directory structure is used: • /datapool/<mode>/<collect grp>/<esdt.version_id>/<acq date> e.g. /datapool/ops/astt/ast_l1b.001/1999.12.31
LP DAAC Data Pool Contents & Access Science Data: • ASTER L1B Group (TERRA) • ASTER collection over U.S. States and Territories (no billing!) • MODIS Group (TERRA & AQUA) • 8 day rolling archive of daily data for MODIS • 12 months of data for higher level products • Most 8-day, 16-day, and 96-day products Access Methods: • Anonymous FTP Site • Web Client interface(s) to navigate & browse data holdings via Sybase inventory database • Public Access:http://lpdaac.usgs.gov/datapool/datapool.asp
LP DAAC Data Pool – Phase II SAN Goals Phase II FY04 – Optimize System Throughput (systemic resource): • Maximize Disk Storage Through Shared Resources • Centralized Management (System Admin, Hardware Engr) of disk • High Performance fibre channel connections • SGI, Linux and SUN Clients • Decrease turn-around time for production and distribution orders. • Integrate SAN into ECS subsystems in Phases as tasks become SAN ready/capable • Granules will be served from SAN (Data Pool) if available, rather than staging from tape. Less thrashing of the archives for popular datasets. • Effectively allows for more ingest bandwidth as less archive drive contention • Trick here is to maintain rule sets for popular data to minimize silo thrashing • Less copying of data – no need for dedicated read only caches across ingest, archive staging, production, media (PDS), distribution (ftp push & pull) “Fully Utilize the SAN infrastructure effectively across the sub-systems of the full ECS archive”
SAN Reality Check April 19, 2004 Brian Sauer SAIC Contractor bsauer@usgs.gov
EDC SAN Experience • Technology Infusion • TSSC Understands this new technology. • Bring it in at right level and at the right time to satisfy USGS programmatic requirements. • SAN technology is not a one size fits all solution set. • Need to balance complexity vs. benefits. • Project Requirements Differ • Size of SAN (Storage, Number Clients, etc) • Open System Versus Single Vendor • Experiences Gained • Provides high performance shared storage access • Provides better manageability and utilization • Provides flexibility in reallocating resources • Requires trained Storage Engineers • Complex architecture, especially as number of nodes increases
EDC SAN Reality Check • SAN Issues • Vendors typically oversell SAN architecture • Infrastructure costs • Hardware – Switches, HBAs, Fibre Infrastructure • Software • Maintenance • Hardware/Software maintenance • Labor • Disk maintenance higher than tape • Power & cooling of disk vs. tape • Complex Architecture • Requires additional/stronger System Engineering • Requires highly skilled System Administration • Lifecycle is significantly shorter with disk vs. tape.
EDC SAN Reality Check • SAN Issues • Difficult to share resources among projects in an enterprise environment • Ability to fund large shared infrastructure historically been problematic for EDC • Ability to allocate and guarantee performance to projects (storage, bandwidth, security, peak vs. sustained) • Scheduling among multiple projects would be challenging • Not all projects require a SAN • SAN will not replace the Tape Archive(s) anytime soon • Direct attached storage may be sufficient for many projects