A Recovery-Friendly, Self-Managing Session State Store

A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling, Emre Kiciman, Armando Fox{bling,emrek,fox}@cs.stanford.edu

Outline • Motivation: What is Session State? • SSM: • Architecture • Algorithm • Backpressure and Admission Control • SSM + Pinpoint • Self-recovering, self-monitoring • Benchmarks • Next steps: Sun Reference AppServer integration • Conclusion

Proliferation of J2EE and Web Services • J2EE embraced as industry standard • Framework • Simplifies development • Allows for portability of services • Standardized interfaces • However, difficulties remain…

The Pain – Administration and Maintenance • Administration is difficult and costly • $$ -- Database admins cost ~$200K/yr a head • Development efficiency negatively impacted • Failure/Recovery is costly • Recovery slow, especially site outages • Data loss on crashes • Users adversely affected

Not All State is Created Equal • Various types of state in J2EE… • User profile state • Persistent shared state • Transaction history state • But usually stored in the same place • Stored in DB or FS Focus on particular class Exploit its properties Simplify Administration and Maintenance

Example of Session State

Properties of Session State 2 1 App Server 3 Browser 4 6 5 • Subcategory of session state • Single-user, serial access, semi-persistent data • Examples: Temporary application data, application workflow • Example of usage (e.g. J2EE):

Goal • Build a session state store that is: • Failure-friendly • Does not lose data on crash • Degrades gracefully • Recovery-friendly • Recovers fast • Self-Managing

Session State Manager (SSM) AppServer AppServer STUB STUB Brick 1 Brick 2 Brick 3 Brick 4 Brick 5 RAM, Network Interface Redundant, in-memory hash table distributed across nodes • Algorithm: Redundancy similar to quorums • Write to many random nodes, wait for few (avoid performance coupling) • Read one

Write example: “Write to Many, Wait for Few” AppServer STUB Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2 Brick 1 Brick 2 Browser Brick 3 Brick 4 Brick 5

Write example: “Write to Many, Wait for Few” AppServer STUB Crashed? Slow? Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2 Brick 1 Brick 2 Browser 14 Brick 3 Brick 4 Cookie holds metadata Brick 5

Read example: AppServer STUB Try to read from Bricks 1, 4 Brick 1 14 Brick 2 Browser Brick 3 Brick 4 Brick 5

Read example: AppServer STUB 14 Brick 1 Brick 2 Browser Brick 3 Brick 4 Brick 5

Read example: AppServer STUB Brick 1 crashes Brick 1 Brick 2 Browser Brick 3 Brick 4 Brick 5

Read example: AppServer STUB Brick 2 Browser Brick 3 Brick 4 Brick 5

SSM: Failure and Recovery • Failure of single node • No data loss, WQ-1 remain • State is available for R/W during failure • Recovery • Restart – No recovery • No special case recovery code • State is available for R/W during brick restart • Session state is self-recovering • User’s access pattern causes data to be rewritten

Backpressure and Admission Control AppServer AppServer STUB STUB Brick 1 Brick 2 Drop Requests Brick 3 Brick 4 Brick 5 Heavy flow to Brick 3

Backpressure and Admission Control AppServer AppServer STUB STUB Brick 1 Brick 2 Drop Requests Brick 3 Brick 4 Reduce Sending Brick 5 Reject requests

Recovery Philosophy Downtime Undetected Errors Undetected Errors Hard Hard Ideal Ideal Downtime RECOVERY COST Cheap Expensive Lax Accurate Aggressive DETECTION ACCURACY

Failure detection and Recovery Recovered Detection Failure Recovery SSM: Failure masked Instant recovery

False Positives Normal Operation False positivetriggered Instant recovery

Statistical Monitoring Pinpoint Pinpoint Statistics Statistics NumElementsMemoryUsedInboxSizeNumDroppedNumReadsNumWrites Brick 1 Brick 2 Brick 3 Brick 4 Brick 5

Statistical Monitoring Pinpoint Pinpoint Statistics Statistics NumElementsMemoryUsedInboxSizeNumDroppedNumReadsNumWrites Brick 1 Brick 2 Brick 3 Brick 4 Brick 5 REBOOT

Statistical Monitoring Pinpoint Pinpoint Statistics Statistics NumElementsMemoryUsedInboxSizeNumDroppedNumReadsNumWrites Brick 1 Brick 2 Brick 3 Brick 4 Brick 5

SSM Monitoring • N replicated bricks handle read/write requests • Cannot do structural anomaly detection! • Alternative features (performance, mem usage, etc) • Activity statistics: How often did a brick do something? • Msgs received/sec, dropped/sec, etc. • Same across all peers, assuming balanced workload • Use anomalies as likely failures • State statistics: Current state of system • Memory usage, queue length, etc. • Similar pattern across peers, but may not be in phase • Look for patterns in time-series; differences in patterns indicate failure at a node.

Surprising Patterns in Time-Series 1. Discretize time-series into string. [Keogh] [0.2, 0.3, 0.4, 0.6, 0.8, 0.2] -> “aaabba” 2. Calculate the frequencies of short substrings in the string. “aa” occurs twice; “ab”, “bb”, “ba” occurs once. 3. Compare frequencies to normal, look for substrings that occur much less or much more than normal.

Microbenchmarks • UC Berkeley Millennium Cluster • Six bricks running • Candidate Write Set = 3, Write quota = 2 • Candidate Read Set = 2 • State Size = 8K

Induced Fault SSM unaffected One bricked killed Brick restarted by PP

Memory fault SSM unaffected Memory fault detected in hash PP restarts Brick

Network Fault – 70% packet loss Fault detectedBrick killed Network fault injected PP restarts Brick

Performance Fault Performance fault injected

Macrobenchmark • TellMe’s Email-By-Phone Application • Session state stored in memory • Email header information • Index information • Alter application to store session state using • Disk • SSM

Macrobenchmark Throughput preserved compared to disk 25% Throughput Degradation compared to in-memory

Future Work • Integrate with Sun’s reference Application Server • Enterprise benchmarks • Statistical Anomaly Detection • Too many magic numbers • Integrated ROC-J2EE application server

Conclusion SSMA Recovery-Friendly, Self-ManagingSession State Store Benjamin Lingbling@cs.stanford.eduhttp://swig.stanford.edu/

Existing solutions : • File System and Databases • Poor failure behavior • Lose data (FS) • Slow recovery (Both) • Difficult to administer (DB) • Difficult to tune (both) • In-memory replication using primary/secondary: • Performance coupling • Poor failover (uneven load balancing)

Other implementation details • Garbage collection • Generational hash table • Hash table of hash tables • Each hash table has an associated time range • When time has passed, GC that table • No reference counting, scanning, etc.

SSM: Self-Managing • Adaptive: • Stub maintains count of maximum allowable in-flight requests to each brick • Additive increase on successful request • Multiplicative decrease on timeout • Stubs discover capacity of each brick  Self-Tuning • Admission control • Stubs say “no” if insufficient bricks • Propagate backpressure from bricks to clients • Turn users away under overload  Self-Protecting

A Recovery-Friendly, Self-Managing Session State Store

A Recovery-Friendly, Self-Managing Session State Store

Presentation Transcript

Managing State Information

Managing the Store

Evolving Toward a Self-Managing Network

SELF MANAGING TASKS

Women friendly welfare state?

Self-managing database systems

SESSION: MANAGING CONTAMINATION

Managing Self

Overcoming State Barriers for a More Bike Friendly State

Managing Self

Webex Video Tutorials Managing a session

Session 10 : Managing State

Free Recovery: A Step Towards Self-Managing State

Towards Self-Managing Databases

Self-Managing Health

Genericapharmacy | pocket friendly online store

Managing Self

Self-Managing Cost Models

Eco friendly Store in Australia

How to make a shopify store mobile friendly