710 likes | 879 Views
Scalable and Secure Architectures for Online Multiplayer Games. Thesis Proposal Ashwin Bharambe May 15, 2006. 8 million. 7 million. These MMORPGs have client-server architectures They accommodate ~0.5 million players at a time. Some more facts. 6 million. 5 million. 4 million.
E N D
Scalable and Secure Architectures for Online Multiplayer Games Thesis Proposal Ashwin Bharambe May 15, 2006
8 million 7 million • These MMORPGs have client-server architectures • They accommodate ~0.5 million players at a time Some more facts 6 million 5 million 4 million 3 million 2 million 1 million 1997 1998 1999 2000 2001 2002 2003 2004 2005 Online Games are Huge! http://www.mmogchart.com/ World of Warcraft Number of subscribers Final Fantasy XI Everquest Ultima Online
Why MMORPGs Scale • Role Playing Games have been slow-paced • Players interact with the server relatively infrequently • Maintain multiple independent game-worlds • Each hosted on a different server • Not true with other game genres • FPS or First Person Shooters (e.g., Quake) • Demand high interactivity • Need a single game-world
Quake II server Bandwidth (kbps) FPS Games Don’t Scale • Bandwidth and computation, both become bottlenecks
Goal: Cooperative Server Architecture • Focus on fast-paced FPS games
Distributing Games: Challenges • Tight latency constraints • As players or missiles move, updates must be disseminated very quickly • < 150 ms for FPS games • High write-sharing in the workload • Cheating • Execution and state maintenance spread over untrustworthy nodes
Talk Outline • Problem • Background • Game Model • Related Work • Colyseus Architecture • Expected Contributions
Immutable State Interactive 3-D environment (maps, models, textures) Game Model Mutable State Ammo Monsters Game Status Screenshot of Serious Sam Player
Game Execution in Client-Server Model void RunGameFrame() // every 50-100ms { // every object in the world // thinks once every game frame foreach (obj in mutable_objs) { if (obj->think) obj->think(); } send_world_update_to_clients(); };
Object Partitioning Player Monster
Monster Missile Item Item Object Discovery Replica Synchronization Distributed Game Execution class CruzMissile { // every object in the world // thinks once every game frame void think() { update_pos(); if (dist_to_ground() < EPSILON) explode(); } void explode() { foreach (p in get_nearby_objects()) { if (p.type == “player”) p.health -= 50; } } };
Talk Outline • Problem • Background • Game Model • Related Work • Colyseus Architecture • Expected Contributions
Related Work • Distributed Designs • Distributed Interactive Simulation (DIS) • e.g., HLA, DIVE, MASSIVE, etc. • Use region-based partitioning, IP multicast • Butterfly, Second-Life, SimMUD [INFOCOM 04] • Use region-based partitioning, DHT multicast • Cheat-proofing • Lock-step synchronization with commitment
Related Work: Techniques • Region-based Partitioning • Parallel Simulation • Area-of-Interest Management with Multicast
Related Work: Techniques • Region-based Partitioning • Divide the game-world into fixed #regions • Assign objects in one region to one server + Simple to place and discover objects – High migration rates, especially for FPS games – Regions exhibit very high skews in popularity can result in severe load imbalance • Parallel Simulation • Area-of-Interest Management with Multicast
Related Work: Techniques • Region-based Partitioning • Parallel Simulation • Peer-to-peer: each peer maintains full state • Writes to objects are sent to all peers + Point-to-point link updates go fastest – Needs lock-step + bucket synchronization – No conflict resolution inconsistency never heals • Area-of-Interest Management with Multicast
Related Work: Techniques • Region-based Partitioning • Parallel Simulation • Area-of-Interest Management with Multicast • Players only need updates from nearby region • 1 region == 1 multicast group, use one shared multicast tree per group • Bandwidth load-imbalance due to skews in region popularity • Updates need multiple hops, bad for FPS games
Talk Outline • Problem • Background • Colyseus Architecture • Scalability [NSDI 2006] • Evaluation • Security • Expected Contributions
Server S1 P1 P2 R3 R4 Object Discovery Replica Management Object Placement P3 P4 Server S2 Colyseus Components get_nearby_objects () Server S3
Popularity Region Rank Object Placement • Flexible and dynamic object placement • Permits use of clustering algorithms • Not tied to “regions” • Previous systems use region-based placement • Frequent, disruptive migration for fast games • Regions in a game have very skewed popularity
Single Primary Read-only Replicas Replication Model Primary-Backup Replication • Writes are serialized at the primary • Primary responsible for executing think code • Replica trails from the primary by one hop • Weakly consistent • Low latency is critical 1-hop
Object Discovery • Most objects only need other “nearby” objects for executing think functions get_nearby_objects ()
Find all objects with obj.x ε[x1, x2] obj.y ε[y1, y2] obj.z ε[z1, z2] Subscription My position is x=x1, y=y1, z=z1 Located on 128.2.255.255 Publication Distributed Object Discovery S S Use a structured overlay to achieve this S P
Mercury: Range Queriable DHT [SIGCOMM 2004] • Supports range queries vs. exact matches • No need for partitioning into “regions” • Places data contiguously • Can utilize spatial locality in games • Dynamically balances load • Control traffic does not cause hotspots • Provides O(log n)-hop lookup • About 200ms for 225 nodes in our setup
Object Discovery Optimizations • Pre-fetch soon-to-be required objects • Use game physics for prediction • Pro-active replication • Piggyback object creation on update messages • Soft-state subscriptions and publications • Add object-specific TTLs to pubs and subs
Colyseus Design: Recap 128.2.9.200 128.2.9.100 Replica Direct point-to-point connection Monster on 128.2.9.200 Mercury Find me nearby objects
Talk Outline • Problem • Background • Colyseus Architecture • Scalability • Evaluation[NSDI 2006] • Security • Expected Contributions
Evaluation Goals • Bandwidth scalability • Per-node bandwidth usage should scale with the number of nodes • View inconsistency due to object discovery latency should be small • Discovery latency, pre-fetching overhead in [NSDI 2006]
Experimental Setup • Emulab-based evaluation • Synthetic game • Workload based on Quake III traces • P2P scenario • 1 player per server • Unlimited bandwidth • Modeled end-to-end latencies • More results including a Quake II evaluation, in [NSDI 2006]
Mean outgoing bandwidth (kbps) Number of nodes Per-node Bandwidth Scaling
View Inconsistency no delay 100 ms delay 400 ms delay Avg. fraction of mobile objects missing Number of nodes
Planned Work • Consistency models • Game operations demand differing levels of consistency and latency response • Causal ordering of events • Atomicity • Deployment • Performance metrics depend crucially on the workload • A real game workload would be useful for future research
Talk Outline • Problem • Background • Colyseus Architecture • Scalability • Evaluation • Security [Planned Work] • Expected Contributions
Cheating in Online Games • Why do cheats arise? • Distributed system (client-server or P2P) • Bugs in the game implementation • Possible Cheats in Colyseus • Object Discovery • map-hack, subscription-hijack • Replication • god-mode, event-ordering, etc. • Object Placement • god-mode
Object Discovery Cheats • map-hack cheat [Information overexposure] • Subscribe to arbitrary areas in the game • Discover all objects, which may be against game rules • Subscription-hijack cheat • Incorrectly route subscriptions of your enemy • Enemy cannot discover (see) players • Other players can see her and can shoot her
You die! Replication Cheats • god-mode cheat • Primary node has arbitrary control over writes to the object • Timestamp cheat • Primary node decides the serialized write order No, I don’t! Node A Node B
I am dead Hide from this guy I moved to another room Replication Cheats • Suppress-update cheat • Primary does not send updates to the replicas • Inconsistency cheat • Primary sends incorrect or conflicting updates to the replicas Player B Player C Player A Player D
Related Work • NEO protocol [GauthierDickey 04] • Lock-step synchronization with commitment • Send encrypted update in round 1 • Send decryption key in round 2, only after you receive updates from everybody + Addresses • suppress-update cheat • timestamp cheat – Lock-step synchronization increases “lag” – Does not address god-mode cheat, among others
Solution Approach • Philosophy: Detection rather than Prevention • Preventing cheating ≈ Byzantine fault tolerance • Known protocols emphasize strict consistency and assume weak synchrony • Multiple rounds unsuitable for game-play • High-level decisions • Make players leave an audit-trail • Make peers police each other • Keep detection out of critical path always
Log Log Log Distributed Audit Randomly chosen witness Centralized Auditor
Player Log Witness Log Logging Using Witnesses Player Node Think code Optimistic Update path Serialized Updates Witness Node
Using Witnesses: Good and Bad + Player, witness logs can be used for audits Potentially address timestamp, god-mode and inconsistency cheats + Witness can generate pubs + subs Addresses map-hack cheat – Bandwidth overhead – Does not handle suppress-update cheat and the subscription-hijack cheat
Using Witnesses: Alternate Design • Move the primary directly to the witness node • Code execution and writes directly applied at the witness – Primary replica updates go through witness – Witness gets arbitrary power • Player cannot complain to anybody Witness Node has primary copy of player
Challenges • Balance power between player and witness • Use cryptographic techniques • How do players detect somebody is cheating? • Extraction of rules from the game code • Securing the object discovery layer • Leverage DHT security research • Keep bandwidth overhead minimal
Talk Outline • Problem • Background • Colyseus Architecture • Scalability • Evaluation • Security • Expected Contributions
Expected Contributions • Mercury range-queriable DHT • Design and evaluation of Colyseus • Real-world measurement of game workloads • Anti-cheating protocols
Expected Contributions • Mercury range-queriable DHT • First structured overlay to support range queries and dynamic load balancing • Implementation used in other systems • Design and evaluation of Colyseus • Real-world measurement of game workloads • Anti-cheating protocols
Expected Contributions • Mercury range-queriable DHT • Design and evaluation of Colyseus • First distributed design to be successfully applied for scaling FPS games • Demonstrated that low-latency game-play is feasible • Flexible architecture for adapting to various types of games • Real-world measurement of game workloads • Anti-cheating protocols
Expected Contributions • Mercury range-queriable DHT • Design and evaluation of Colyseus • Real-world measurement of game workloads • Deployment of Quake III • Anti-cheating protocols