290 likes | 476 Views
Storage Transport Services and Business Continuity using TCP/IP: A Case Study. Overview. Business drivers for storage transport and replication Current environments and challenges A look at an IP solution Run through the test cases Summary. Business Drivers and Implications.
E N D
Storage Transport Services and Business Continuity using TCP/IP: A Case Study
Overview • Business drivers for storage transport and replication • Current environments and challenges • A look at an IP solution • Run through the test cases • Summary
Business Drivers and Implications • Rapid recovery and timely resumption of critical operations following a wide-scale, regional disruption • Compliance with government regulations such as HIPAA to provide for security and integrity of information http://www.hipaadvisory.com/regs/HIPAAprimer1.htm • Lower cost alternative to long distance data replication – most companies already have an IP infrastructure
Reason for Federal Regulations “……the events surrounding September 11, 2001, have altered business recovery and resumption expectations for purposes of ensuring the resilience of the U.S. financial system……..” The Fed has summarized its desire for new regulations in a document titled "The Draft Interagency White Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System.“ http://www.federalreserve.gov/boarddocs/press/bcreg/2002/20020830/attachment.pdf
Current Solutions • Feasible only for the largest businesses • Costs • Complexity vs Expertise • Typically transports Fibre Channel directly over the medium • SONET • DWDM • Dark Fiber • High costs
FC FC FC FC FC FC Remote Office SONET or DWDM Network Transport Network Corporate HQ Typical Storage Transport Today • Metro or long haul distances • SONET • DWDM • Dark Fiber • Implementation: • Transparent Fibre Channel transport • Port extension or merging SANs
Today’s Challenges • Disparate SAN Islands • Expensive to interconnect • SAN Extension beyond the Metro • Most replication is remote point to point • Limits cost and BCDR effectiveness • Keeping costs down • Is it possible to leverage existing technology? • Management complexity • Incompatible technologies • Expertise
An Alternative Solution - FCIP • FCIP = Fibre Channel over IP • Low cost – much of the equipment is typically in place today • Ubiquitous technology • Abstraction of underlying transport • Extension beyond the metro – Global reach
Example FCIP Environment • Transparently joining SAN islands or ports over WAN • Transparent bridging of FC over TCP/IP • Extended distance (>2000 miles) FCFabric FCIP Cisco MDS 9200 Multilayer Fabric IPNetwork FCIP FCFabric FCIP Cisco MDS 9500 Multilayer Director FCIP FCFabric • Several Choices for IP Network • Packet over SONET • DSx – OCx TDM Facilities • Metro Ethernet Corporate HQ FCFabric RemoteSites
IP TCP FCIP SCSI FC Data What is FCIP? • FCIP is a mechanism that allows SAN islands and ports to be interconnected over IP networks. • Each interconnection is called an FCIP Link and can contain one (1) or more TCP connection(s). • Each end of a FCIP Link is associated to a Virtual E_Port (VE_Port). • VE_Ports communicate between themselves just like normally interconnected E_Ports by using SW_ILS: ELP, ESC, BF, RCF, FSPF, etc. • The result is a fully merged Fibre Channel fabric. • TCP/IP is used as the underlying transport to provide congestion control and in-order delivery of error-free data
Sprint / Cisco / HDS Case Study • Goal was to demonstrate the feasibility of using FCIP to accurately and efficiently replicate large amounts of data across trans-continental distances • Main Objectives • Utilize existing IP and SONET infrastructures • FCIP to transport Fibre Channel • Metro to trans-continental distances • SAN extension • Storage Transport • Data Replication
Case Study Topology Overview • HDS 9910 storage array connected via 2Gig/s FC interface to one Cisco MDS 9509 • Cisco MDS 9509 1Gb/s IP Services module connection to Cisco Catalyst 6509 which then connected to a 3600 mile OC-3 Packet over SONET (POS) circuit • 128GB Oracle database constructed using a Sunfire V880 • HDS 9910 storage array running asynchronous TrueCopy Burlingame, CA Kansas City 3600 Miles Roundtrip MDS w/ IP storage Module SUN V880 Catalyst 6509 DACS DACS SONET SAN HDS 9910 Catalyst 6509 MDS w/ IP storage Module
Scenarios Evaluated • Scenario A – Trans-continental 3600 mile OC-3 (Overland Park, KS to Burlingame, CA and back) • Scenario B – Direct “back to back” • Scenario C – Metro distances (25km, 50km, 75km) • All scenarios involved the same tests • Replication: The replication times for the complete replication of the Oracle database • Database exercising: Database testing includes row selections, deletes, inserts, and updates • Direct file copy: Copying individual ISO files
Equipment Used • Software used: • HDS TrueCopy - sync & async • Oracle Database load and update • Sequential File Copy • Windows IOmeter • VERITAS Volume Manager Mirror • Hardware used: • HDS 9910/9970 storage arrays • Cisco MDS 9509/9216 SAN switches with IPS-8 blade • Cisco Catalyst 6509/6513 & misc. optical network
Cisco IP Storage Switch Connections • Fibre Channel E_port connections via 50u fiber cables • Direct VE_port FCIP connection via single fiber cable • IPS-8 VE_port to PacketShere (latency injector) to IPS-8 VE_port • IPS-8 VE_port to Catalyst to Catalyst to IPS-8 VE_port • Catalyst to Catalyst trunk connections via • Direct short range multi-mode fiber optic GE connection • Long range ZX GBIC to fiber spools (25Km, 50 Km, 75Km) • OC3 POS WAN connection – (KC to California and return)
Interesting Metrics and Theoretical Performance • Speed of Light in Fiber is approx. 5 usec/ Km • Length of typical Fibre Channel frame is 5 Km • Potential maximum length of IP Storage frame <= 4 miles • Frames in “full pipe” for 400 miles ~ 100 • Buffer space for “full pipe” would be ~ 240K • Max single directional throughput for 1Gb/s Ethernet is calculated to be 105MB/s • Max single directional throughput for an OC-3 is calculated to be 15.5MB/s • Bandwidth achievable may be limited by: • Distance induced latency • Re-transmissions due to errors or dropped frames • Line speed/bandwidth
FC FC Single outstanding I/O example @ 3600 miles Write Command Latency 36 Xfer Rdy Write Data Latency 36 * 4 ~144ms SCSI Status H1 H2
I/O Details • Write command from H1 to H2 takes 36ms • H1 gets the xfer-rdy after 72ms • H1 sends data (assume all data is sent instantaneously) it takes 36ms to reach H2 • H1 gets status after 36ms • Total time for the Write IO complete is ~144ms
I/O Calculations • OC3 speed is 155Mbps • Total data sent in 144ms (in bytes) = (155 * 10^6 * 144 * 10^-3) / 8 = ~2800 Kbytes • So for each round trip time (72ms) data sent = ~1400 Kbytes • Write command tested has 56KBytes of data • In the test scenario there were 4 blocks of 4 IOs = 16 outstanding IOs * 56K = 896KByte • But 1400Kbytes outstanding is needed at all times to keep the OC3 link saturated
Case Study Results Performance • The 128 GB Oracle database successfully replicated in the initial 1,800 mile point-to-point connection in 3 hours 8 minutes • The 3,600 mile loop took just over 4 hours • Tests proved data integrity at 100% • Response times of local and remote disk arrays were comparable, which suggests accessing database remotely is feasible Average throughput of the 128 GB database is near wire speed for OC-3
Case Study Results, continued • Average throughput for the 128GB database replication was 13.3 MB/s. 128 GB * 1024 MB/GB 2 hr * 3600 sec/hr + 44 min * 60 sec/min • Real throughput observed was near wire rate throughout the majority of the data transfer (96%), with a drop coming in the last 4% of the data transferred • Additional testing using separate HDS arrays across 1800 miles brought results that were wire rate throughout the transfer and did not show any signs of slowing towards the end of the transfer • Number of outstanding I/Os was increased from 4 to 16 on the storage arrays to gain optimal performance on the WAN. This allowed the pipe to always be filled rather than sending data and waiting for the acknowledgement • Database response time of the local disk array and that of the remote disk array across the FCIP link is very comparable = 13.3 MB/sec
Case Study Results, continued • With various latency values, a TrueCopy of Oracle DB created a perfect copy • With no latency (.3ms), initial rate was 43 MB/s • With 25 ms of latency, initial data rate was 25 MB/s • With 50 ms of latency, initial data rate was 17 MB/s • With fiber spools (25km, 50km, and 75km), latency was .6ms, .9ms, and 1.2ms respectively • Copying 100 ISO images created perfect copies (synchronous writes badly affected by latency)
Sprint July 8th Announcement Demonstrated a Sprint competitive advantage - Clear lead in FCIP technology via Sprint/Cisco/Hitachi partnership - All major internal organizations mobilized; Engineering, Product management, Marketing - Most STS components already in place Sprint, Cisco, Hitachi Data Systems Team to Achieve Data Storage Breakthrough Industry first could improve customers’ business continuity efforts OVERLAND PARK, Kan. – July 8, 2003 –Sprint, along with Hitachi Data Systems and strategic alliance partner Cisco, has achieved a technical breakthrough that could have major implications on how customers deploy effective business continuity strategies. The telecommunications giant, along with its partners, has successfully tested asynchronous data replication over an IP network, using Fibre Channel over IP (FCIP) technology at a distance of more than 3,600 miles.
“This service could enable customers such as banks to replicate theirdata at extremely remote locations using their existing IP connections,“ Oliver Valente, VP of Technology Development - Sprint
Acknowledgements Gratefully recognize the individuals from Sprint, HDS, and Cisco for all their hard work that made this Case Study Possible Sprint Ray Dickensheets Matt Dixon Mike Haddock Audrey Harmon Kelley Ireland Nhan Tran Collette Turnbaugh Cisco Bruce Winters Jimmy Ho Brian Heili HDS Tom Coleman Dave Stratton
Summary • FCIP is a high performance, low cost alternative for BCDR applications • Metro to trans-continental+ distances • MAN and/or WAN applications • Synchronous replication with MAN distances • Asynchronous for longer WAN distances