310 likes | 498 Views
TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned. Armando Fox University of California,Berkeley fox@cs.berkeley.edu. Vision: “The Content You Want”. What do above apps have in common? Adapt (collect, filter, transform) existing content…
E N D
TACC Retrospective:Contributions, Non-Contributions, and What We Really Learned Armando FoxUniversity of California,Berkeley fox@cs.berkeley.edu
Vision: “The Content You Want” What do above apps have in common? • Adapt (collect, filter, transform) existing content… • according to client constraints • respecting network limitations • according to per-user preferences • But: Lack of unified framework for designing apps that exploit this observation
Contributions • TACC, a model for structuring services • Transformation, Aggregation, Caching, Customization of Internet content • Scalable TACC server • Based on clusters of commodity PC’s • Easy to author “industrial strength” services • Scalable Network Service (SNS) platform maps app semantics onto cluster-based availability mechanisms • Experience with real users • ~15,000 today at UCB
What’s TACC? • Transformation (“local”, “one-to-one”) • TranSend, Anonymizer • Aggregation (“nonlocal”, “many-to-one”) • Search engines, crawlers, newswatchers • Caching • Both original and locally-generated content • Customization • Per user: for content generation • Per device: data delivery, content “packaging”
C T TACC Example: TranSend • Transparent HTTP proxy • On-the-fly, lossy compression of specific MIME types (GIF, JPG...) • Cache both original & transformed • User specifies aggressiveness and “refinement” UI • Parameters to HTML & image transformers $
T C Top Gun Wingman • PalmPilot web browser • Intermediate-form page layout • Image scaling & transcoding • Controlled by layout engine • Device-specific ADU marshalling • Including client versioning • Originals and device-specific pages cached html $ A ADU
Application Partitioning • Client competence • Styled text, images, widgets are fine • Bitmaps unnecessary • Client responsiveness • Scrolling, etc. shouldn’t require roundtrip to server • Client independence • Very late conversion to client-specific format
$ C W W W W W W T A TACC Conceptual Data Flow To Internet FE User request • Front end accepts RPC-like user requests • User’s customization profile retrieved • Original data fetched from cache or Internet • Aggregation/transformation workers operate on data according to customization profile
TACC Model Summary • Mostly stateless, composable workers • Unifies previously ad hoc applications under one framework • Encourages re-use through modularization • Composition enables both new services and new clients • TACC breakdown provides unified way to think about app structure
Services Should Be Easy To Write • Rapid prototyping • Insulate workers from “mundane” details • Easy to incorporate existing/legacy code • Few assumptions about code structure • Must support variety of languages • May be fragile • Composition to leverage existing code
Building a TACC Server • Challenge: Scalable Network Service (SNS) requirements • Scalability to 100K’s of users with high availability • Cost effective to deploy & administer • But, services should remain easy to write • Server provides some bug robustness • Server provides availability • Server handles load balancing and scaling • Preserve modularity (& componentwise upgradability) when deploying
Layered Model of Internet Services httpd, etc. • TACC Layer • Programming model based on composable building blocks • SNS Layer: “large virtual server” • Implements SNS requirements • Cluster computing for hardware F/T and incremental scaling TACC ScalableNetwork Svc • Exploit TACC model semantics for software F/T • SNS layer is reusable and isolated from TACC • Application “content” orthogonal to SNS mechanisms • Key to making apps easy to write
Why Use a Cluster? • Incremental scalability, low cost components • High availability through hardware redundancy Goals: • Demonstrate that clusters and TACC fit well together • Separate SNS from TACC
C FE $ $ $ FE W W W A Interconnect W W FE W GUI LB/FT T Cluster-Based TACC Server • Component replication for scaling and availability • High-bandwidth, low-latency interconnect • Incremental scaling: commodity PC’s User ProfileDatabase Caches Front Ends Workers Load Balancing &Fault Tolerance AdministrationInterface
W W W A Interconnect W W W T “Starfish” Availability: LB Death • FE detects via broken pipe/timeout, restarts LB C FE $ $ $ FE FE LB/FT
W W W A Interconnect W W W T LB/FT “Starfish” Availability: LB Death • FE detects via broken pipe/timeout, restarts LB • New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables • If partition heals, extra LB’s commit suicide • FE’s operate using cached LB info during failure C FE $ $ $ FE FE LB/FT
W W W A Interconnect W W W T “Starfish” Availability: LB Death • FE detects via broken pipe/timeout, restarts LB • New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables • If partition heals, extra LB’s commit suicide • FE’s operate using cached LB info during failure C FE $ $ $ FE FE LB/FT
Fault Recovery Latency Task queue length
Behavior in the Large • TranSend: 160 image transformations/sec = 10 Ultra-1 servers • Peak seen during UCB traces on 700-modem bank: 15/sec • Amortized hardware cost <$0.35/user/month (one $5K PC serving ~15,000 subscribers) • Wingman: factor of 6-8 worse • Administration: one undergraduate part-time
Building a Big System • Restartable, atomic workers • Read-only data from other origin server(s) • Orthogonal separation of scalability/availability from application “content” • Multiple lines of defense • App modules agree to obey semantics compatible with these mechanisms • Common-case failure behavior compatible with users’ Internet experience • Enables reuse of whole workers, however diverse
Availability & Scalability Summary • Pervasive strategy: timeout, retry, restart • Transient failures usually invisible to user • Process peers watch each other • Mostly stateless workers, xact support possible • Simplicity from exploiting soft state • Piggyback status info on multicast beacons • Use of stale LB info fine in practice • “Starfish” availability works in practice
Service Authoring • Keyword hiliting: < 1 day • Wingman: 2-3 weeks • Various apps from graduate seminar projects • Safe worker upload • Annotate the Web • “Channel aggregators”
New Services By Composition • Compose existing services to create a new one • ~2.5 hours to implement • Composes with TranSend or Wingman Internet TranSend Metasearch
Experience With Real Users • Transparent enhancements • Minimal downtime • Low administration cost • Multicast-based administration GUI • Virtually no dedicated resources at UCB • “Overflow pool” of ~100 UltraSPARC servers • Users don’t mind relying on middleware proxy
Why Now? • Internet’s critical mass • Commercial push for many device types (transistor curves) • Cluster computing economically viable • A good time for infrastructural services
Related Work • Transformational proxy services: WBI, Strands • Application partitioning: Wit, InfoPad, PARC Ubiquitous Computing • Computing in the infrastructure: Active Networks • Soft state for simplicity and robustness: Microsoft Tiger, multicast routing protocols
Summary of Contributions • TACC, a composition-based Internet services programming model • captures rich variety of apps • one view of customization • No-hassle deployment on a cluster • Automatic and robust partial-failure handling • Availability & scaling strategies work in practice • New apps are easy to write, deploy, debug • SNS behaviors are free • Compose existing services to enable new clients
Non-Contributions (a/k/a Future Work) Accidental contributions: • Legacy code glue • Cheap test rig for next project (prototyping path discovery; a bare bones “cluster OS”) Non-contributions: • Fair resource allocation over cluster • Built-in security abstractions • Rich state management abstractions
What We Really Learned • Design for failure • It will fail anyway • End-to-end argument applied to availability • Orthogonality is even better than layering • Narrow interface vs. no interface • A great way to manage system complexity • The price of orthogonality • Techniques: Refreshable soft state; watchdogs/timeouts; sandboxing
How About State Management? • Transactional apps? • API’s are there, but you have to roll your own consistency • Groupware apps with group state? • One way: distributed, F/T group state like SRM! • Keeps state management orthogonal to SNS layer The Moral: Consistency, Availability, Partition-resilience: pick at most 2
Future Work • TACC as test rig for Ninja • Taxonomy of app structure and platforms • What is the “big picture” of different types of Internet services, and where does TACC fit in? • Joint work with Dr. Murray Mazer at the Open Group Research Institute • Apply lessons to reliable distributed systems • Formalize programming model • Finish writing thesis