590 likes | 735 Views
Engineering Scalable Social Games. Dr . Robert Zubek. Top social game developer. Growth Example Roller Coaster Kingdom . 1M DAUs over a weekend. source: developeranalytics.com. Growth Example FishVille. 6 M DAUs in its first week. source: developeranalytics.com.
E N D
Engineering ScalableSocial Games Dr. Robert Zubek
Growth ExampleRoller Coaster Kingdom 1M DAUs over a weekend source: developeranalytics.com
Growth ExampleFishVille 6M DAUs in its first week source: developeranalytics.com
Growth ExampleFarmVille 25M DAUs over five months source: developeranalytics.com
Talk Overview Introduce game developers to best practices for large-scale web development Part I. Game Architectures Part II. Scaling Solutions
server approaches • client approaches Part I. Game Architectures
Web server stack • Web + MMO stack Three Major Types • HTML • Flash • Flash
Mixed stack • Web stack Server Side • Usually based on a LAMP stack • Game logic in PHP • HTTP communication • Game logic in MMO server (eg. Java) • Web stack for everything else
Example 1FishVille DB Servers Web Servers + CDN Cache and Queue Servers
Example 2 YoVille DB Servers MMO Servers Cache and Queue Servers Web Servers + CDN
HTTP scales very well • Stateless request/response • Easy to load-balance across servers • Easy to add more servers Some limitations • Server-initiated actions • Storing state between requests Why web stack?
exampleWeb stack and state caches Web #1 x DB Servers Web #2 Client x Web #3 x Web #4
MMO servers • Minimally “MMO”: • Persistent socket connection per client • Live game support: chat, live events • Supports data push • Keeps game state in memory • Very different from web!
Harder to scale out! • But on the plus side: • Supports live communication and events • Game logic can run on the server, independent of client’s actions • Game state can be entirely in RAM Why MMO servers?
ExampleWeb + MMO stack MMO Servers DB Servers Client Web Servers caches
HTML + AJAX • Flash Client Side • High production quality games • Game logic on client • Can keep open socket • The game is “just” a web page • Minimal sys reqs • Limited graphics and animation
Social Network Integration • “Easy” because not related to scaling • Calling the host network to get a list of friends, post something, etc. • Networks provide REST APIs, and sometimes client libraries
Introduce game developers to best practices for large-scale web development not blowing upas you grow Part I. Game Architectures Part II. Scaling Solutions
ScalingScale up vs. scale out I need a bigger box I need more boxes
ScalingScale up vs. scale out Scale out has been a clear win • At some point you can’t get a box big enough, quickly enough • Much easier to add more boxes • But: requires architectural support from the app!
Example:Scaling up vs. out • 500,000 daily players in first week • Started with just one DB, which quickly became a bottleneck • Short term: scaled up, optimized DB accesses, fixed slow queries, caching, etc. • Long term: must scale out with the size of the player base!
Scaling We’ll look at four elements: • Databases • Caching • Capacity planning • Game servers (web + MMO)
II A. Databases • DBs are the first to fall over caches queues App Servers Client DB DB App Servers
App W R
App App W R M S S
App App M M M S S
App App M0 M1 M M
How to partition data? • Two most common ways: • Vertical – by table • Easy but doesn’t scale with DAUs • Horizontal – by row • Harder to do, but gives best results! • Different rows live on different DBs • Need a good mapping from row # to DB
Row striping M0 M1 • Row-to-shard mapping: • Primary key modulo # of DBs • Like a “logical RAID 0” • More clever schemes exist, of course
Scaling out your shards M0 M1 Servers Servers Servers x x S0 M2 S1 M3
Sharding in a nutshell • It’s just data striping across databases (“logical RAID 0”) • There’s no automatic replication, etc. • No magic about how it works • That’s also what makes it robust, easy to set up and maintain!
Example: Sharding • Biggest challenge: designing how to partition huge existing dataset • Partitioned both horizontally and vertically • Lots of joins had to be broken • Data patterns needed redesigned • With sharding, you need to know the shard ID for each query! • Data replication had difficulty catching up with high-volume usage
Sharding surprises • Can’t do joins across shards • Instead, do multiple selects, or denormalize your data Be careful with joins
Sharding surprises • CPU-expensive; easier to pay this cost in the application layer (just be careful) • The more you keep in RAM, the less you’ll need these Skip transactions and foreign key constraints
II B. Caching • DBs can be too slow • Spend memory to buy speed! caches queues caches queues App Servers Client DB App Servers
Caching • Two main uses: • Speed up DB access to commonly used data • Shared game state storage for stateless web game servers • Memcached: network-attached RAM cache
Memcache • Stores simple key-value pairs • eg. “uid_123” => {name:”Robert”, …} • What to put there? • Structured game data • Mutexes for actions across servers • Caveat • It’s an LRU cache, not a DB! No persistence!
Memcache scaling out Shard it just like the DB! DBs MCs App
II C. Game Servers caches queues MMO Servers MMO Servers Client DB Web Servers Web Servers
Scaling web servers LB DNS LB LB LB
Scaling content delivery Web Servers Client game data only media files media files CDN
Scaling MMO servers • Scaling out is easiest when servers don’t need to talk to each other • Databases: sharding • Memcache: sharding • Web: load-balanced farms • MMO: ??? • Our MMO server approach: shard just like other servers
Scaling MMO servers • Keep all servers separate from each other • Minimize inter-server communication • All participants in a live event should be on the same server • Scaling out means no direct sharing • Sharing through third parties is okay
Problem #1:Load balancing • What if a server gets overloaded? • Can’t move a live player to a different server • Poor man’s load-balancing: • Players don’t pick servers; the LB does • If a server gets hot, remove it from the LB pool • If enough servers get hot, add more server instances and send new connections there
Problem #2: Deployment Downtime = lost revenues • You can measure cost of downtime in $/minute!
Problem #2: Deployment Deployment comparison: • Web: just copy over PHP files • Socket servers – much harder • How to deploy with zero downtime? • Ideally: set up shadow new server and slowly transition players over