330 likes | 465 Views
A Tentative Proposal for ISTORE-2. July 18, 2000. David A. Patterson pattrsn@cs.berkeley.edu (510) 642-6587 University of California, Berkeley. Winfried W. Wilcke wilcke@almaden.ibm.com (408) 927-2139 Almaden Research Center. Richard C. Booth rcbooth@us.ibm.com (408) 927-1879
E N D
A Tentative Proposal for ISTORE-2 July 18, 2000 David A. Patterson pattrsn@cs.berkeley.edu (510) 642-6587 University of California, Berkeley Winfried W. Wilcke wilcke@almaden.ibm.com (408) 927-2139 Almaden Research Center Richard C. Booth rcbooth@us.ibm.com (408) 927-1879 Almaden Research Center
Underlying Beliefs... • Commodity components are quickly winning the server wars • Gigabit Ethernet will win everything • x86 Processors • Linux OS will prosper • Large servers (100-10k nodes) will be quite common - and most are storage centric • What matters most: • Ease of management, density of nodes and seamless geographical interconnect
Generations of IStore • IStore = IStore-1: Present UCB Project • IStore-2: Joint Research Prototype • ~2000 nodes • Split between UCB, IBM and others • Hardware similar to IStore-1 • Focus on real applications and management software • Operational YE 2001 • Follow-on Work
Talk Outline • Project Goals • Applications • Research Topics • Hardware Architecture • Development Schedule • Working Relationships • Next Steps
Candidate Applications • Research Focus • NOAA Severe Weather Warning (R. Arps, ARC) • Fast Image Recognition (J. Malik, UCB) • Commercial Focus • Scalable E-business server (IGS) - a must ! • Deep Searching of Entire Web; Webfountain (N. Pass) • (tbd) Large Scale Network Attached Server (J. Palmer) • (tbd) Speech Recognition Farms for Phone-based Special Web-services
NOAA Severe Weather.... Ron Arps • Doppler Radar enables detection of violent tornadoes and plane crashes due to windshear • Doubled warning time for residents in Oklahoma during '99 class 5 outbreaks • Goal: 15 minutes avg. warning time in 2004 • Eventually 120 radar sites will be established • Matches well with I-Store characteristics • Needs scalable local storage/processing plus seamless transfer of data on geographical scale, manageable from one site
WebfountainNorm Pass • Index entire Web every few weeks • Google, Northernlight index 25% • 4 TB index => 200 TB in two years • 'Miner' technology demonstrated • Resumes, Prices, Geospatial,... • Prototype running on a 30 node Linux farm
Software Model • Users will see a standard Linux farm (shared nothing) programming model • No porting effort for existing Linux farm applications (except dealing with different versions of Linux, of course) • The system management functions are only visible to system administrators • Exception are performance monitoring functions useful for tuning apps
Differences to a Linux Farm • Much higher spatial density of Nodes or ‘Bricks’ • Single network protocol (Ethernet) for ALL off-node communications • Design with geographical distribution in mind • Diagnostic Processors • Lego-like, standardized building blocks • Regular and relaxed homogeneous • Monitoring Hardware • Measuring of relevant environmental parameters • (New) System Management Language • AME, SON and RAIN objectives
AME, RAIN and SON • Three areas of system research to be explored with I-Store • These three areas are largely independent of each other
AME • Availability • No single points of failure • Introspection, failover and fast failure • Fast repair by swapping identical blocks • Maintainability • Homogenous structure • System management language • Extensibility/Scalability • Shared nothing architecture
RAIN • Redundant Array of Inexpensive Network (Switches) • Issues to be explored • Optimal topology • Density/cost of ports, optics vs. copper • Routing algorithms within a machine • Need for TCP hardware acceleration • Performance of Ethernet protocol • Frame sizes • Simplified switches
SON • Storage Oriented Nodes • Basic Premise of one node=one disk=one processor • It works in farms, but is it a good general choice? • Is the loss of flexibility (in the ratio of disks per processor) a good tradeoff for easier management?
Additional Software Research Topics... • Define AME, RAIN, SON benchmarks • Server Management Language • Parallel Searching of geographically distributed database • Dynamic Resource Allocation (i.e. Firewalls) • SCSI over TCP/IP (SAN within I-Store) • Storage for mobile users (a’la Ocean Store)
System Management Language • Define a high-level, interpretive(?) system management language • May use facilities of system OS • Highly regular I-Store is the first target • Sample Verbs • allocate, protect, share, map, backup, restore, copy, correlate, display, discover, ping, initialize, report, arm, define(node)....
System Management Language • Should easily describe tasks such as: • Backup all data located in the Philippines to Colorado (a volcano is about to blow) • Set alarm if any disk is more than 80% full • Define protected subregions in the system • Display CPU utilization by time and state • Discover present routing topology • Show 3D correlation plot of disk vibration vs brick temperature vs. actual failure events • .....
Hardware ArchitectureDevelopment Schedule&Working Relationships
IStore HardwareArchitecture Goals • Seamless Scalability • O(10,000) AME Storage Nodes • Optimized Storage Brick for Packaging Density • Geographically Disperse Nodes • Gb Ethernet Connections to WAN Routers • Storage Brick • Full PME Brick: Processor, Memory, Cache • Gb Ethernet as the Sole Interconnection Fabric • Imbedded Disk with 10s GBytes
IStore HardwareArchitecture Goals (cont.) • State-of-the-art Intel Processor Memory Element (PME) • 650 MHz Pentium III with 100 MHz System Bus • 256 KB L2 cache • O(512MB) main memory • State-of-the-art Interconnect Fabric • 1 Gb Ethernet Runtime Network • 10/100 Mb Ethernet Diagnostic Network • State-of-the-art Disks • 2.5" ~32 GB drive
IStore HardwareArchitecture Goals (cont.) • Berkeley AME Hardware Management Support • Diagnostic processor • Environmental sensors • TCP/IP Hardware Accelerator • Class 4: Hardware State Machine • SCSI over TCP ("iSCSI") Support • Compatible with Standard Ethernet Switches/Routers
IStore-1Current Berkeley Design • 80 nodes • AME • 266 MHz Pentium II • Four 100 MB Ethernet Ports/brick • Integrated UPS
IStore-2Deltas from IStore-1 • Geographically Disperse Nodes • O(1000) nodes at Almaden • O(1000) nodes at Berkeley • Upgraded Storage Brick • Pentium III 650 MHz Processor • Two Gb Ethernet Copper Ports/brick • One 2.5" ATA disk • User Supplied UPS Support • Standard Ethernet Switches
Follow on Work • Ethernet Sourced in Memory Controller (North Bridge) • TCP/IP Hardware Accelerator • Class 4: Hardware State Machine • SCSI over TCP Support • Integrated UPS
Why an IStore-2 PrototypeIs Interesting • Storage Bricks • New ratios for MIPS/bandwidth/storage • New level of density • AME Hardware Support • Seamless scaling • Self maintaining nodes • It Exists
IStore-2Core Design Team • IBM (full time) • System Architect: Winfried Wilcke • Lead Designer: Richard Booth • 1 Experienced Hardware Designer: tbd • 3 Designers: tbd • Berkeley • 6 Graduate Students
IStore-2Development Schedule • Working Model • 7/00: Agreement in Principle • 8/00: Working Team Membership • Design • 9/00: Architecture Specification version 1.0 • 11/00: Design Workbook version 1.0 • Implementation • 2Q/01: First 3 Nodes Power-up • 3Q/01: O(64) nodes available to users • 4Q/01: O(2000) nodes available to users
IStore-2 Footprint(per 1000 nodes) • 16 Storage (19") Racks • 64 Storage bricks/rack • 8 type 1 storage bricks/drawer • 8 storage drawers/rack • Ethernet switches in rack • 8 Global Ethernet Switch (19") Racks • Requires 600 sq.. ft lab
IStore-2 PlatformRequired Resources • Staffing • 6 ARC/SSD IBMers • 6 UCB Graduate Students • Lab Space • 600 sq. ft. lab at Almaden • 600 sq. ft. lab at Berkeley • Hardware Costs • $3M (mostly 2001 dollars)
IStore-2Working Model • Jointly Authored Architecture Specification • 1 or 2 Almaden authors • 1 or 2 Berkeley authors • Design Workbook • Each Core Team Member owns a section • Weekly Half Day Working Face-to-face Meetings • Alternate between Almaden and Berkeley • Shared Electronic Documentation • Machine Available -for free- to Users From Either Institution • IP is Handled Like Previous IBM/UCB Projects ?? • Fabrication (some design ?) Vendored Out
Next Steps • Continue to Seek Feedback on Proposal • Funding Discussion • IBM • Berkeley • Form IBM Team • Begin Regular Working Meetings • Begin Architectural Design