1 / 24

Infrastructure

Infrastructure. Shane Canon Leadership Computing Facility Oak Ridge National Laboratory U.S. Department of Energy. Summary. LCF Roadmap Infrastructure for the Petascale Networking File Systems Archival Storage Data Analytics. Hardware Roadmap.

Download Presentation

Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Infrastructure Shane CanonLeadership Computing FacilityOak Ridge National LaboratoryU.S. Department of Energy

  2. Summary • LCF Roadmap • Infrastructure for the Petascale • Networking • File Systems • Archival Storage • Data Analytics

  3. Hardware Roadmap As it looks to the future, the NCCS expects to lead the accelerating field of high-performance computing. Upgrades will boost Jaguar’s performance fivefold—to 250 teraflops—by the end of 2007, followed by installation of a petascale system in 2009.

  4. Network 1000 TF200 GB/s LAN4 GB/s WAN • Shifting to a hybrid InfiniBand/Ethernet network • InfiniBand based network helps meet the bandwidth and scaling needs forthe center • Wide-Area networkwill scale to meetuser demand using currently deployed routers and switches 250 TF200 GB/s LAN3 GB/s WAN 2009 100 TF60 GB/s LAN3 GB/s WAN 2008 2007

  5. NCCS network roadmap summary Ethernet core scaled to match wide-area connectivity and archive Infiniband core scaled to match center-wide file system and data transfer Lustre Baker Gateway Ethernet [O(10GB/s)] Infiniband [O(100GB/s)] Jaguar High-Performance Storage System (HPSS) Viz

  6. NFS Servers ` Center-Wide File System (Spider) Data Analysis & Visualization Phoenix Cray X1E Jaguar Cray XT3 • 2007 • 1 PB • 30 GB/s (aggregate) • 2008 • 10 PB • 200 GB/s (aggregate) HPSS ESnet, USN, TeraGrid, Internet2, NLR Baker

  7. Center-Wide File System (Spider) 1000 TF10 PB200 GB/s • Increase scientific productivity by providing single repository for simulation data • Connect to all major LCF Resources • Connected to both InfiniBand and Ethernet networks • Potentially becomes the file system for the 1000 TF System 250 TF1 PB30 GB/s 2009 100 TF100 TB10 GB/s 2008 2007

  8. Center-Wide File System (Spider) • Lustre based file system • Can natively utilize the InfiniBand network • Already running on today’s XT3 at 10k+ clients • External based system will utilize routers that are part of the transport protocol used in Lustre (route between IB and SeaStar/Gemini) • External System already demonstrated on current XT systems *End of FY, In Production

  9. Data Storage – Past Usage • Data growth is explosive • Doubling stored data every year since 1998! • Over 1 PB stored today, and adding almost 2 TB of data per day

  10. Archival Storage • HPSS Software has already demonstrated ability to scale to many PB • Add 2 Silos/Year • Tape Capacity & Bandwidth, Disk Capacity and Bandwidth are all scaled to maintain a balanced system • Utilize new methods to improve data transfer speeds between parallelfile systems and archival system 1000 TF18 PB (Tape Cap)18 GB/s (Agg. BW) 250 TF10 PB (Tape Cap)8 GB/s (Agg. BW) 2009 100 TF4 PB (Tape Cap)4 GB/s (Agg. BW) 2008 2007

  11. Archival Storage * Doesn’t include older Silos Note: Mid of FY, In Production

  12. Data Analytics Existing Resources • Visualization Cluster (64 nodes/quadrics) • End-to-End Cluster (80 nodes/IB) Recently Deployed • Deploy 32 Nodes with 4X-DDR • Connected to Center-wide File System

  13. Data Analytics – Strategies Jaguar (250TF) (FY08) • Utilize portion of system for data analysis (50 TF/20 TB) Baker (FY08/09) • Utilize Jaguar as analysis resource (250 TF/50 TB) • Provision fraction of Baker for analysis

  14. First Half FY08 Perform “Bake-off” of Storage for Center-wide File System Expand IB network Demonstrate 1.5 GB/s sustained with single OSS node (dual socket QC) Deploy HPSS Upgrades Second Half FY08 Select storage system and Procure next Phase of Center-wide Storage (200 GB/s) Deploy next phase Center-wide File System Milestones – FY08

  15. Operations Infrastructure SystemsNow and Future Estimates

  16. Contacts Shane Canon Technology IntegrationCenter for Computational Sciences (864) 574-2028 canonrs@ornl.gov 16 Presenter_date

  17. CCS WAN overview Ciena CoreStream Ciena CoreDirector TeraGrid Juniper T320 DOE UltraScience/ NSF CHEETAH Force10 E600 Juniper T640 CCS Cisco 6509 NSTG CCS OC-192 to Internet 2 OC-192 to Esnet Internal 10G Link Internal 1G Link OC-192 to TeraGrid 2 x OC-192 to USN OC-192 to CHEETAH

  18. Spider10 CCS Network 2007 Infiniband Ethernet 48-96 SDR 32 Jaguar 128 DDR 48 48 Spider60 48 64 DDR 64 DDR 48 48 Viz 48 32 DDR 20 4 HPSS 64 DDR 4 48 24 SDR 4 Devel 48 87 SDR 48 E2E 48 3 SDR 20 SDR 20

  19. CCS IB network 2008 20 SDR 24 SDR Spider10 Devel 87 SDR Jaguar E2E 48-96 SDR (16-32 SDR/link) 64 DDR 32 DDR Viz 50 DDR HPSS 50 DDR 300 DDR (50 DDR/link) Baker Spider240

  20. Code Coupling/Workflow Automation GTC runs on Tflop – Pflop Cray Large data analysis End-to-end system 160p, M3D runs on 64P Monitoring routines here 40Gbps Data replication User monitoring Data archiving Data replication Post processing

  21. STK 9310 (6) 9840A (2) 9940A IBM p550 HPSS Core Services HPSS 6.2 STK 9310 (4) 9840A (8) 9940B STK 9310 (8) T10K STK 9310 (8) T10K STK 9310 (4) 9840A-SCSI (4) 9840A STK 9310 (8) 9940B HPSS – 2Q07 6 STK 9310 Silos 16 Dell 2950 Server 16 STK T10K Tape Drive 1 DDN S2A9550 100TB Replacing 9940B with T10K 10 GbE Copper Brocade Switches 2 - SilkWorm 3800 2 - SilkWorm 3900 1 - SilkWorm 4100 IBM Ndapi server 1GbE 8 Dell 2950 Disk Movers 10GbE 4 Gb Fibre DataDirect S2A9500 38.4 TB DataDirect S2A9500 SATA 100TB 6 - Dell 2850 Tape Movers - 10GbE 2 Gb Fiber - 9840 & 9940 Tape Drives STK TAPE 9840A - 10-35MBS 20GB 9840C - 30-70MBS 40GB 9940B - 30-70MBS 200GB Titanium A - 120+ MBS 500GB SSA Fibre Channel SCSI 1 GbE 10GbE 8 - Dell 2950 Tape Movers - 10GbE 4 Gb Fibre - T10K Tape Drives

  22. STK 9310 (6) 9840A (2) 9940A IBM p550 HPSS Core Services HPSS 6.2 STK 9310 (4) 9840A (8) 9940B STK 9310 (8) T10K HPSS – 1Q08 Replacing 9940B with T10K 6 STK 9310 Silos 2 SL8500 30 Dell 2950 Server 32 STK T10K Tape Drive 2 DDN S2A9550 100TB SL8500 (8) T10K 10 GbE Copper Brocade Switches 2 - SilkWorm 3800 2 - SilkWorm 3900 1 - SilkWorm 4100 STK 9310 (8) T10K IBM Ndapi server 1GbE STK 9310 (4) 9840A-SCSI (4) 9840A 14 Dell 2950 Disk Movers 10GbE 4 Gb Fibre DataDirect S2A9500 38.4 TB 6 - Dell 2850 Tape Movers - 10GbE 2 Gb Fiber - 9840 & 9940 Tape Drives DataDirect S2A9500 SATA 100TB STK 9310 (8) 9940B SL8500 (8) T10K STK TAPE 9840A - 10-35MBS 20GB 9840C - 30-70MBS 40GB 9940B - 30-70MBS 200GB Titanium A - 120+ MBS 500GB SSA Fibre Channel SCSI 1 GbE 10GbE 16 - Dell 2950 Tape Movers - 10GbE 4 Gb Fibre - T10K Tape Drives

  23. CCS network roadmap Summary: Hybrid Ethernet/Infiniband (IB) network to provide both high-speed wide-area connectivity and uber-speed local-area data movement • FY 2006–FY 2007 • Scaled Ethernet to meet wide-area needs and current day local-area data movement • Developed wire-speed, low-latency perimeter security to fully utilize 10 G production and research WAN connections • Began IB LAN deployment • FY 2007–FY 2008 • Continue building out IB LAN infrastructure to satisfy the file system needs of the Baker system • Test Lustre on IB/WAN for possible deployment

  24. Contacts Shane Canon ORNL (864) 574-2028 canonrs@ornl.gov 24 Presenter_date

More Related