470 likes | 598 Views
Surfing Technology Curves Steve Kleiman CTO Network Appliance Inc. Book Plug. The Innovator’s Dilemma - When New Technologies Cause Great Firms to Fail Clayton M. Christensen. About NetApp. Two product lines: Network Attached File Servers (a.k.a. filers) Web proxy caches: NetCache
E N D
Surfing Technology CurvesSteve KleimanCTONetwork Appliance Inc.
Book Plug • The Innovator’s Dilemma - When New Technologies Cause Great Firms to Fail • Clayton M. Christensen
About NetApp • Two product lines: • Network Attached File Servers (a.k.a. filers) • Web proxy caches: NetCache • Founded in 1992 • >$1B revenue run rate • >70% CAGR since founding • >120% last year
OverallResp. Resp. @ Max Ops perSpecRate CPUs Result Ops/FS RAID 2 4 1 8,165 15,270 15,235 3.04 1.91 1.54 23.8 3.7 3.6 20.4 10.4 46 340 318 15,235 no yes yes Filers: Fast, Simple, Reliable and Multi-protocol System Sun E 3500/4500 HP-9000 N4000 NetApp 840
Filers: Fast, Simple, Reliable and Multi-protocol • Disk management • Filer finds disks and organizes into RAID groups and spares automatically • Simple addition of storage • Automatic RAID reconstruction • Data management • Snapshots • SnapRestore • SnapMirror • Simple upgrade • Small command set
Filers: Fast, Simple, Reliable and Multi-protocol • Built-in RAID • Easy hardware maintenance • Hot plug disk, power, fans • Low MTTR • Cluster Failover • Autosupport • >99.995% measured field availability
Filers: Fast, Simple, Reliable and Multi-protocol • NFS • CIFS • CIFS and NFS attributes • HTTP • FTP • DAFS • Internet Cache • FTP • Streaming media
Network and Storage Bandwidth • YearStorageNetworkPenalty1992 10 MB 0.1 MB 100-to-1 • 1994 20 MB 1 MB 20-to-1 • 1996 40 MB 10 MB 4-to-1 • 1998 100 MB 100 MB 1-to-1 • 2001 200-400 MB 1000 MB .2-to-1
UNIX UNIX Application Windows/NT Print Filer Printer File Service Router/Switch Routing ... ... The Appliance Revolution 1980s (General Purpose) 1990s (Appliance Based)
Appliance philosophy • Appliance philosophy breeds focus • External simplicity internal simplicity • RISC argument • Don’t have to be all things to all people • Limited compatibility constraints • Interfaces are bits on wire • Think different! • Can innovate with both software and hardware
CPU Mem PCI NVRAM Filer Architecture • Commercial off-the shelf chips • Any appropriate architecture • i486 Pentium Alpha ‘064 Alpha‘164 PIII • Board level integration • 1 or more CPUs (4) • 1 or more PCI busses (4) • High bandwidth switches • Multiple memory banks • Integrated I/O • NVRAM
Roads Not Taken • No “unobtainium” • Minimalist infrastructure • No special purpose busses • No big MPs • Motherboards only: no cache coherent backplanes • No functionally distributed computers • No special purpose networks (e.g. HIPPI) • No block access protocols
ATM NFS GbE CIFS FDDI 100BT HTTP DataOnTap Architecture Daemons, Shells, Commands Java Virtual Machine Lib TCP/IP WAFL RAID Disk FCAL SCSI VINIC* VIPL DAFS SK * VI supported on FC, (Future: GbE, Infiniband)
DataOnTap • Simple Kernel • Message passing • Non-preemptive • Sample optimizations • Checksum caching • Suspend/Resume • Cache hit pass through
WAFL: Write Anywhere File Layout • Log-like write throughput • No segment cleaning (LFS) • Write data allocated to optimize RAID performance • Delayed write allocation • Active data is never overwritten (shadow paging) • On-disk data is always consistent • File system state is changed atomically • Every 10 sec, by default • Client modification requests are logged to NVRAM • NVRAM log is replayed only on reboot
Wave 2:Memory-to-Memory Interconnects (a.k.a NUMA, NORMA)
Problem: • Remove single points of failure • Without doubling hardware • Minimizing performance overhead • Without decreasing reliability
NVRAM NVRAM Clustered Failover Architecture Network Filer 1 Filer 2 ServerNet Fibre Channel Fibre Channel
Memory-to-Memory Interconnects • Efficient transfer model • Allows minimal overhead on receiver • Scaleable Bandwidth • High speed ASIC based switching • Gigabit technology • Open architecture • PCI, not coherent bus interface • Incorporate multiple technologies • Relatively inexpensive
PCI Bus ServerNet To partner NVRAM DMA NVRAM data from partner CPU NVRAM Mirroring NVRAM • NVRAM is split into local and partner regions • Data is assembled in NVRAM • Data is DMAed from NVRAM to equivalent offset in remote node • Client reply is sent when log entry DMA completes
Leveraged Components • Memory-to-Memory interconnects • Low overhead, high-bandwidth, cheap • WAFL • Always consistent file system • Built-in NVRAM logging/replay • Fibre Channel disks • Two independent ports • Single function appliance software • Simple, low-overhead failover
The Consequences ofHigher-speed Internet Access • 200K-400K home cable head-end • Requires 1.5-3Gbps access capability • 30% subscription rate, 20% online • Minimum 128Kbps BW • Enterprise • Remote sites still connected by slow links • Require high-quality access to content • Overloaded web servers • ISP • Require distribution and caching of large media files
Yet Another Appliance Cisco NetApp
NetCache • HTTP/FTP proxy cache appliance • Highly deployable • Forward and reverse proxy • Transparency • Filtering • iCAP • Enables value added services • Virus scanning, transcoding, ad insertion, … • Stream splitting • Stream caching • Content distribution
Static Content Dynamic Content Streaming Media Cacheable Content Cacheable Content Time
Filer Filer SnapMirror • Remote asynchronous mirroring • Continuous incremental update • Only allocated blocks are transmitted • Automatic resynchronization after disconnect • Destination is always a consistent “snapshot” of source WAN
Before Snapshot After Snapshot After Block Update Active FS Active FS Active FS Snapshot Snapshot C’ A B C D A B C D A B C D NewBlock Creating a Snapshot Disk Blocks
S1 S2 S3 FS Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8 WAFL: Block Map File • Multiple bits per 4KB block • Column for allocated block in the active file system • Columns for allocated blocks in snapshots • Taking a Snapshot • Copy root inode
Source Source 1 1 2 2 3 3 4 4 5 5 6 6 Destination Destination 1 1 4 5 6 4 Consistent Image Propagation • Fast Network or Slow Modification Rate • Slow Network or High Modification Rate
Wave 5: Local File Sharing andVirtual Interface Architecture
Internet orIntranet F760 F760 F760 Data Center ISPs: Scalable Services • Scalability • Scale compute power and storage independently • Resiliency • Cost • Commodity hardware and Open Systems standards Load Balancing Switch ApplicationServers Gigabit Switch File Servers
F760 Database • Better Manageability • Offline backup with snapshots • Replication • Recovery from snapshots • Easy storage management • Equal or better performance • Less retuning
Local File Sharing • Geographically constrained • 1 or 2 machine rooms • Mostly homogeneous clients • Can be large or small • 1 - 100 machines • Single administrative control • High performance applications • Web service, Cache • Email, News • Database, GIS
Local File Sharing Architecture Characteristics • Applications tend to avoid OS • e.g. No virtual memory • Applications tend to have OS adaptation layer • Different access protocol requirements • e.g. high-performance locking, recovery, streaming
What is VI? • Virtual Interface (VI) Architecture • VI architecture organization • Promoted by Intel, Compaq and Microsoft • VI Developer’s Forum • Standard capabilities • Send/receive message, remote DMA read/write • Multiple channels with send/completion queues • Data transfer bypasses kernel • Memory pre-registration
Application VIPLLibrary Data Control VI Architecture User KernelKVIPL client Kernel KVIPLModule VI compliantNIC driver Hardware VI compliantNIC
VI-compliant implementations • Fibre channel (FC-VI draft standard) • e.g. Troika, Emulex • Giganet • Servernet II • Infiniband • Enables 1U MP heads • Future: VI over TCP/IP
How VI Improves Data Transfer • No fragmentation, reassembly and realignment data copies • No user/kernel boundary crossing • No user/kernel data copies • Data transfer direct to application buffers
Data Control Memory Direct Access File System Application Buffers File Access API User DAFS VIPL* API VIPL VI NICDriver Kernel NIC Hardware * VI Provider Layer specification maintained by the VI Developers Forum
DAFS Benefits • File access protocol with implicit data sharing • Direct application access • File data transfers directly to application buffers • Bypasses Operating System • File semantics • Optimized for high throughput and low latency • Consistent high speed locking • Graceful recovery/failover of clients and servers • Fencing • Enhanced data recovery • Leverages VI for transport independence
DAFS vs. SAN Wires Direct(direct transfer to memory) Network(TCP/IP) LocalAttached SCSI over IP Block SAN Protocols DAFS NAS File
Summary • Wave 1: Filers • Technology: Fast networks, commodity servers • Environment: Appliance-ization • Wave 2: Failover • Technology: Memory-to-memory interconnects, Dual ported FC disks • Environment: 24x7 requirements • Wave 3: NetCache • Technology: Internet, HTTP • Environment: High BW requirements, POP deployability
Summary • Wave 4: SnapMirror • Technology: Disk areal density, Fibre Channel, fast networks • Environment: Cost of downtime for recovery • Wave 5: DAFS • Technology: VI architecture • Environment: Local file sharing