460 likes | 595 Views
The Case for Open Infrastructure Services in Java. David Culler Computer Science Division U.C. Berkeley www.cs.berkeley.edu/~culler Java Grande Dinner Keynote, June 2000. Appetizer. ‘Grande’-scale computing dominated by internet services
E N D
The Case for Open Infrastructure Services in Java David Culler Computer Science Division U.C. Berkeley www.cs.berkeley.edu/~culler Java Grande Dinner Keynote, June 2000
Appetizer • ‘Grande’-scale computing dominated by internet services • Delivered to millions per day on well-engineered clusters over service interfaces Clients Clients Servers Clients Clients Clients Clients Servers Servers The Internet Java Grande
Opportunity: infrastructure services • Prehistoric: DNS, IP route tables, … • Historic: crawl, index, search, • Emerging: compose and manipulate data and services Infrastructure Services Clients Clients Servers Clients Clients Clients Clients Servers And client diversity has just begun! Servers The Internet Java Grande
Open Danger: loss of distributed innovation • PC generation of individual authoring & distr. • vs ATT, IBM, AOL scale service engineering • … Infrastructure Services Clients Clients Servers Clients Clients Clients Clients Servers Servers The Internet Java Grande
UCB Ninja Vision • Open platform architecture for world-scale internet services • receptive execution environment • push services into the platform • scalability and availability “built-in” • service composition as a first-class programming concept => make it easy to author and publish high quality services into a well-engineered infrastructure ..for example Java Grande
3 HTTPd service Music Directory service .au/.mp3 player WWW Browser Web page with song playlists Ninja iSpace 4 Music stream (.au or .mp3) Pushes an index of locally available songs to the master directory. 2 CDDB service CD “ripper” service Fetches track/title & artist information from an online DB. 1 Ninja iSpace Example: Ninja Jukebox 98 Collaborative Community: anyone can add content => mp3.com, real jukebox, napster Authentication and authorization was built-in Jukebox 99: Music similarity query engine => mongomusic.com, ... Java Grande
Santio: universal instant messaging S. Gribble AOL protocol AOL protocol AOL worker english to spanish profile DDS english to spanish english to spanish AOL client ICQ protocol ICQ protocol ICQ worker sanctio service (cluster) Java Grande ICQ client
Transient Store Identity Service FT FT Filter and Control Modifier SA SA Format Transcoders Security Adpaters Composable, Secure Proxy Architecture for Post-PC devices S. Ross, J. Hill Internet Services Diverse Clients Personal Appl Embeded Untrusted Client DATEK (Trust Contract) Trusted Client https Java Grande
Reduce value of the information DATEK Java Grande
Example: eScience Services ‘Sugar’ MEMS simulation Service Nodal Modeling LAPACK Services Netsolver Java Grande
Outline • Call for distributed innovation of scalable, composable services • Wandering Down the Java Garden Path • Returning to robust building blocks and design patterns • Postprandial thoughts Java Grande
Bases (1M’s) • scalable, highly available • persistent state • databases, agents • “home” base per user • service programming environment Wide-Area Path • Active Proxies (100M’s) • not packet routers • bootstrap thin devices into infrastructure • soft-state and well-connected • Units (1B’s) • sensors / actuators • PDAs / smartphones / PCs • heterogeneous • Minimal functionality: “Smart Clients” A ‘Structured Architecture’ Approach Java Grande
Guided by the CAP lemma • Consider • Consistency • Availability • Operation in the presence of network Partitions You may have any two of the three, but not all three • Example: replicate for availability • lose consistency upon update during partition • or can defer the updates till healed • or can engineer the system so no partition between replicas Java Grande
The Java “Apple” • strong typing • automatic memory management • Concurrency built-in: Threads and Synchronized Methods • finally! • Elegant remote access built-in: RMI • service lookup yields service object stub • transparent access • Code mobility • traditionally for pulling down applets on demand Java Grande
Service is an interface, plus objects that implement that interface. • Name service, RMI stub registry, and service control API: • LoadService (URL) • interf.[ ]=ListServices • stub=GetService(name) • KillService(name) Sandbox that contains untrusted, uploaded services. RMI + authentication, encryption, multicast, user-level SAN speed. JVM provides service upload capability, plus strong typing of service interfaces. Distributed hash table API provide scalable, available hard state iSpace Execution Environment Untrusted Services Trusted Services Loader Security Mgr Ninja iSpace + RMI JVM + persistent store APIs iSpace Java Grande
m-RMI stub MultiSpace Loader DDS SAN Multispace Cluster Platform client • RMI “Redirector Stubs run-time compiled RMI superstub • stub selection policy • fail-over, • broadcast, multicast, fork, etc. iSpace Java Grande
After the garden: Post-Prototype Reality • Powerful, attractive, tantalizing possibilities… • see examples ... • Didn’t scale • service concurrency • client population • service diversity • Wasn’t robust • Lessons • Thread-per-task considered harmful • Woes of blocking interfaces • The Transparency trap • Versions really matter Java Grande
RMI RMI Blocking Java RMI Thread-per-task services • Server Thread per client thread • familiar per-task programming model, including RMI and I/O • Socket per client JVM (or per thread, per stub!) Client Service Java Grande
The transparency trap • Server commits thread regardless of client load • Client places demand regardless of server concurrency • || resource || to blocking composition depth • ease leads to fine grain use of remote objects • RMI “call backs” make client a server • lifetime and scope of remote object unlimited • inexpressive error model (wait or RemoteException) • serialization is costly Java Grande
Blocking + Thread = Non-blocking ??? • JAVA i/o and comm APIs all blocking! • need JNI for select! Keep going to the “thread well” Java Grande
closed loop implies S = A Study a Service “test problem” • A: popularity • L: I/O, network, or service composition depth task arrivals rate: A tasks / sec Threaded server dispatch( ) or create( ) latency: L sec # concurrent tasks in server: T = A x L task completionsrate: S tasks / sec Java Grande
Response time vs S (= T/L) Java Grande
Threads are a limited Resource • Fix L = 10 ms, for each T measure max A = S • Cluster parallelism just raises the threshold * CPU bound tasks saturate early * focus on threads, footprint follows Java Grande ultra 170 and E450, Solaris 7.2, jdk 1.2.2
Alternative: queues, events, typed msgs • server provides bounded resources at request interface • chooses when to assign resources to request event • imposes load-conditioning or admission control • client retains control of its thread • chooses when to block • permits negotiation protocol • key to service composition • queues absorb load and decouple operations • provide non-blocking interface • RMI as syntax sugar Explicit request queue Java Grande
task arrivals rate: A tasks / sec timer queue with latency: L seconds closed loop implies S = A task completionsrate: S tasks / sec Java Event-based Server • Fixed # threads , independent of # concurrent tasks in server (A x L) Java Grande
Event-per-task saturates gracefully • Better and more robust performance • Use cluster parallelism to match demand • Decompose task into multiple events • circulate or pipeline • but ... Java Grande
Down side of event approach • Lose the familiar sequential programming (plus synchronization) • need a handler per stage of the task • Does not naturally exploit SMP parallelism • must pipeline multiple event handler blocks • Blocking interfaces (or faults) cause throughput to follow 1/L in an event block! Java Grande
Explicit event queue absorbs bursts of tasks allows introspection Load conditioning point # concurrent tasks decoupled from # concurrent threads in server: Bounded thread pool of T < T’ threads Hybrid, Robust building block • Compose service as graph of task handlers • Decouple stages of task within a node • Replicate across cluster nodes for scale and availability • Thread parallelism and latency tolerance within task handler block (i.e., A x L < T per node) Java Grande
Hybrid Performance • Competitive with pure event block • small overhead due to extra threads • Upon blocking op, throughput tracks T/L Ultra 1 Java Grande
Four key task handler design patterns • Wrap • Pipeline • Replicate • Combine Java Grande
Wrap • Take arbitrary piece of code: • place queue in front • encapsulate with bounded thread pool T < T’ => get ‘robust’ service with non-blocking interface => Java Grande
Wrap (thread-per-task server) • Get robust hybrid task handler with T/L tolerance • Preserve conventional task sequencing • Building block for composed services => Java Grande
Pipeline • Decouple stages within task handler across multiple task handlers • Wrapped Blocking call is natural boundary => Java Grande
Why Pipeline? • Functional parallelism across stages • when thread blocks in one... • Functional parallelism across processors • Functional parallelism across nodes • Increase locality (cache, VM, TLB, …) within node • tend to perform operation (stage) on “convoy” of tasks • Limit number of threads devoted to “low concurrency” operation • ex: file system can only handle 40-50 concurrent write requests, so this limits useful T • additional threads can be applied to remainder of stage Java Grande
Replicate • Scale throughput across nodes • Provide fault isolation boundary • Mediate thread-pool bottleneck within node => Java Grande
Combine • Two task handlers share pool and queue • Common use is before/after wrapped call • Avoid wasting threads => Java Grande
A Prescription Well-conditioned node • Wrap to introduce load conditioning • Pipeline to avoid wasting threads at bottlenecks • Pipeline to enhance locality Available Service • Replicate for Fault Tolerance Scaling • Replicate to meet concurrency demand Tuning • Combine to limit threads per node • Pipeline for functional specialization Java Grande
Ninja vSPACE design • Each blocking interface is wrapped • Service described by collection of task handler modules • Each module implements a set of task types • includes completion events • module clones are replicated on demand • Most task handlers are state free • Persistent state provided by DDS • Explicit queues are the fundamental means of introspection Java Grande
Service Service Service DDS lib DDS lib DDS lib Storage “brick” Storage “brick” Storage “brick” Storage “brick” Storage “brick” Storage “brick” Example: Hash Table Distr. Data Struct. Clustered Service Distr Hash table API Redundant low latency high xput network System Area Network Single-node durable hash table Java Grande
DDS Hash Table Brick Design I/O core I/O core distributed hashtable disk network “RPC” skeletons file system / network stack single-node raw disk HT Ideal I/O Core buffer I/O core I/O core cache disk network I/O core I/O core disk network file system / network stack operating system raw disk Pragmetic I/O Core DDS Brick Java Grande
Scalable Throughput Java Grande
Robust under load Java Grande
Fault and Recovery Garbage collection Recover done Recovered node cold Recover start Three nodes One dies Java Grande
Dessert thoughts • Performance and efficiency on Java is critical first step, but cannot stay in MPP mode • Huge Opportunity • distributed innovation of widely used services (with I/O) • service composition as new level of programming • Need to deal with resource containment, load, errors, versions and coupling from the beginning • events, queues, types msgs => managed RMI • Event driven execution (encapsulating threads) is exciting & opens a rich set of questions • expressiveness, synthesis • introspection, scheduling, concurrency control • debugging Java Grande
Where to go for more • http://ninja.cs.berkeley.edu • A Design Framework for Highly Concurrent Systems, Matt Welsh, Steven Gribble, Eric Brewer, and David Culler. • Scalable, Distributed Data Structures for Internet Service Construction, Steven Gribble, Eric Brewer, Joseph Hellerstein, and David Culler. • A security Architecture for the Post-PC World, S. Ross, J. Hill, M. Chen, D. Culler, A. Joseph, E. Brewer • The MultiSpace: an Evolutionary Platform for Infrastructural Services, Steven Gribble, Matt Welsh, Eric Brewer, and David Culler. Java Grande
Backup: Mobility not enough • RMI names classes / interfaces in the registry • which class do you get? • Class path management nightmare • Must maintain source web server • distinct services may need distinct instances • service name != class name • versioning is essential • use renaming to allow multiple versions within VM • service publication expresses entire dependence set Java Grande