Scalable and Secure Architectures for Online Multiplayer Games

Scalable and Secure Architectures for Online Multiplayer Games Thesis Proposal Ashwin Bharambe May 15, 2006

8 million 7 million • These MMORPGs have client-server architectures • They accommodate ~0.5 million players at a time Some more facts 6 million 5 million 4 million 3 million 2 million 1 million 1997 1998 1999 2000 2001 2002 2003 2004 2005 Online Games are Huge! http://www.mmogchart.com/ World of Warcraft Number of subscribers Final Fantasy XI Everquest Ultima Online

Why MMORPGs Scale • Role Playing Games have been slow-paced • Players interact with the server relatively infrequently • Maintain multiple independent game-worlds • Each hosted on a different server • Not true with other game genres • FPS or First Person Shooters (e.g., Quake) • Demand high interactivity • Need a single game-world

Quake II server Bandwidth (kbps) FPS Games Don’t Scale • Bandwidth and computation, both become bottlenecks

Goal: Cooperative Server Architecture • Focus on fast-paced FPS games

Distributing Games: Challenges • Tight latency constraints • As players or missiles move, updates must be disseminated very quickly • < 150 ms for FPS games • High write-sharing in the workload • Cheating • Execution and state maintenance spread over untrustworthy nodes

Talk Outline • Problem • Background • Game Model • Related Work • Colyseus Architecture • Expected Contributions

Immutable State Interactive 3-D environment (maps, models, textures) Game Model Mutable State Ammo Monsters Game Status Screenshot of Serious Sam Player

Game Execution in Client-Server Model void RunGameFrame() // every 50-100ms { // every object in the world // thinks once every game frame foreach (obj in mutable_objs) { if (obj->think) obj->think(); } send_world_update_to_clients(); };

Object Partitioning Player Monster

Monster Missile Item Item Object Discovery Replica Synchronization Distributed Game Execution class CruzMissile { // every object in the world // thinks once every game frame void think() { update_pos(); if (dist_to_ground() < EPSILON) explode(); } void explode() { foreach (p in get_nearby_objects()) { if (p.type == “player”) p.health -= 50; } } };

Talk Outline • Problem • Background • Game Model • Related Work • Colyseus Architecture • Expected Contributions

Related Work • Distributed Designs • Distributed Interactive Simulation (DIS) • e.g., HLA, DIVE, MASSIVE, etc. • Use region-based partitioning, IP multicast • Butterfly, Second-Life, SimMUD [INFOCOM 04] • Use region-based partitioning, DHT multicast • Cheat-proofing • Lock-step synchronization with commitment

Related Work: Techniques • Region-based Partitioning • Parallel Simulation • Area-of-Interest Management with Multicast

Related Work: Techniques • Region-based Partitioning • Divide the game-world into fixed #regions • Assign objects in one region to one server + Simple to place and discover objects – High migration rates, especially for FPS games – Regions exhibit very high skews in popularity  can result in severe load imbalance • Parallel Simulation • Area-of-Interest Management with Multicast

Related Work: Techniques • Region-based Partitioning • Parallel Simulation • Peer-to-peer: each peer maintains full state • Writes to objects are sent to all peers + Point-to-point link  updates go fastest – Needs lock-step + bucket synchronization – No conflict resolution  inconsistency never heals • Area-of-Interest Management with Multicast

Related Work: Techniques • Region-based Partitioning • Parallel Simulation • Area-of-Interest Management with Multicast • Players only need updates from nearby region • 1 region == 1 multicast group, use one shared multicast tree per group • Bandwidth load-imbalance due to skews in region popularity • Updates need multiple hops, bad for FPS games

Talk Outline • Problem • Background • Colyseus Architecture • Scalability [NSDI 2006] • Evaluation • Security • Expected Contributions

Server S1 P1 P2 R3 R4 Object Discovery Replica Management Object Placement P3 P4 Server S2 Colyseus Components get_nearby_objects () Server S3

Popularity Region Rank Object Placement • Flexible and dynamic object placement • Permits use of clustering algorithms • Not tied to “regions” • Previous systems use region-based placement • Frequent, disruptive migration for fast games • Regions in a game have very skewed popularity

Single Primary Read-only Replicas Replication Model Primary-Backup Replication • Writes are serialized at the primary • Primary responsible for executing think code • Replica trails from the primary by one hop • Weakly consistent • Low latency is critical 1-hop

Object Discovery • Most objects only need other “nearby” objects for executing think functions get_nearby_objects ()

Find all objects with obj.x ε[x1, x2] obj.y ε[y1, y2] obj.z ε[z1, z2] Subscription My position is x=x1, y=y1, z=z1 Located on 128.2.255.255 Publication Distributed Object Discovery S S Use a structured overlay to achieve this S P

Mercury: Range Queriable DHT [SIGCOMM 2004] • Supports range queries vs. exact matches • No need for partitioning into “regions” • Places data contiguously • Can utilize spatial locality in games • Dynamically balances load • Control traffic does not cause hotspots • Provides O(log n)-hop lookup • About 200ms for 225 nodes in our setup

Object Discovery Optimizations • Pre-fetch soon-to-be required objects • Use game physics for prediction • Pro-active replication • Piggyback object creation on update messages • Soft-state subscriptions and publications • Add object-specific TTLs to pubs and subs

Colyseus Design: Recap 128.2.9.200 128.2.9.100 Replica Direct point-to-point connection Monster on 128.2.9.200 Mercury Find me nearby objects

Putting It All Together

Talk Outline • Problem • Background • Colyseus Architecture • Scalability • Evaluation[NSDI 2006] • Security • Expected Contributions

Evaluation Goals • Bandwidth scalability • Per-node bandwidth usage should scale with the number of nodes • View inconsistency due to object discovery latency should be small • Discovery latency, pre-fetching overhead in [NSDI 2006]

Experimental Setup • Emulab-based evaluation • Synthetic game • Workload based on Quake III traces • P2P scenario • 1 player per server • Unlimited bandwidth • Modeled end-to-end latencies • More results including a Quake II evaluation, in [NSDI 2006]

Mean outgoing bandwidth (kbps) Number of nodes Per-node Bandwidth Scaling

View Inconsistency no delay 100 ms delay 400 ms delay Avg. fraction of mobile objects missing Number of nodes

Planned Work • Consistency models • Game operations demand differing levels of consistency and latency response • Causal ordering of events • Atomicity • Deployment • Performance metrics depend crucially on the workload • A real game workload would be useful for future research

Talk Outline • Problem • Background • Colyseus Architecture • Scalability • Evaluation • Security [Planned Work] • Expected Contributions

Cheating in Online Games • Why do cheats arise? • Distributed system (client-server or P2P) • Bugs in the game implementation • Possible Cheats in Colyseus • Object Discovery • map-hack, subscription-hijack • Replication • god-mode, event-ordering, etc. • Object Placement • god-mode

Object Discovery Cheats • map-hack cheat [Information overexposure] • Subscribe to arbitrary areas in the game • Discover all objects, which may be against game rules • Subscription-hijack cheat • Incorrectly route subscriptions of your enemy • Enemy cannot discover (see) players • Other players can see her and can shoot her

You die! Replication Cheats • god-mode cheat • Primary node has arbitrary control over writes to the object • Timestamp cheat • Primary node decides the serialized write order No, I don’t! Node A Node B

I am dead Hide from this guy I moved to another room Replication Cheats • Suppress-update cheat • Primary does not send updates to the replicas • Inconsistency cheat • Primary sends incorrect or conflicting updates to the replicas Player B Player C Player A Player D

Related Work • NEO protocol [GauthierDickey 04] • Lock-step synchronization with commitment • Send encrypted update in round 1 • Send decryption key in round 2, only after you receive updates from everybody + Addresses • suppress-update cheat • timestamp cheat – Lock-step synchronization increases “lag” – Does not address god-mode cheat, among others

Solution Approach • Philosophy: Detection rather than Prevention • Preventing cheating ≈ Byzantine fault tolerance • Known protocols emphasize strict consistency and assume weak synchrony • Multiple rounds  unsuitable for game-play • High-level decisions • Make players leave an audit-trail • Make peers police each other • Keep detection out of critical path always

Log Log Log Distributed Audit Randomly chosen witness Centralized Auditor

Player Log Witness Log Logging Using Witnesses Player Node Think code Optimistic Update path Serialized Updates Witness Node

Using Witnesses: Good and Bad + Player, witness logs can be used for audits  Potentially address timestamp, god-mode and inconsistency cheats + Witness can generate pubs + subs  Addresses map-hack cheat – Bandwidth overhead – Does not handle suppress-update cheat and the subscription-hijack cheat

Using Witnesses: Alternate Design • Move the primary directly to the witness node • Code execution and writes directly applied at the witness – Primary  replica updates go through witness – Witness gets arbitrary power • Player cannot complain to anybody Witness Node has primary copy of player

Challenges • Balance power between player and witness • Use cryptographic techniques • How do players detect somebody is cheating? • Extraction of rules from the game code • Securing the object discovery layer • Leverage DHT security research • Keep bandwidth overhead minimal

Talk Outline • Problem • Background • Colyseus Architecture • Scalability • Evaluation • Security • Expected Contributions

Expected Contributions • Mercury range-queriable DHT • Design and evaluation of Colyseus • Real-world measurement of game workloads • Anti-cheating protocols

Expected Contributions • Mercury range-queriable DHT • First structured overlay to support range queries and dynamic load balancing • Implementation used in other systems • Design and evaluation of Colyseus • Real-world measurement of game workloads • Anti-cheating protocols

Expected Contributions • Mercury range-queriable DHT • Design and evaluation of Colyseus • First distributed design to be successfully applied for scaling FPS games • Demonstrated that low-latency game-play is feasible • Flexible architecture for adapting to various types of games • Real-world measurement of game workloads • Anti-cheating protocols

Expected Contributions • Mercury range-queriable DHT • Design and evaluation of Colyseus • Real-world measurement of game workloads • Deployment of Quake III • Anti-cheating protocols

Scalable and Secure Architectures for Online Multiplayer Games

Scalable and Secure Architectures for Online Multiplayer Games

Presentation Transcript

Architectures for Secure Systems

Middleware for mobile multiplayer online games

Multiplayer Online Games

CAMEO: Continuous Analytics for Massively Multiplayer Online Games

Scalable Web Architectures

Colyseus: A Distributed Architecture for Online Multiplayer Games

Scalable Web Architectures

Networked Multiplayer Online Games

Massively Multiplayer Online Games and Mobile Devices

Scalable and transparent parallelization of multiplayer games

The Intersection of Psychiatry and Multiplayer Online Games

Multiplayer Online Games

Multiplayer Online Games

Multiplayer Board Games

Cheating at Multiplayer Online Games

Multiplayer Online Games

Cheating at Multiplayer Online Games

MCCA : A Communication Architecture for Online Multiplayer Games

Best Multiplayer Online Mobile Games For Android/IOS

Unblocked Online Multiplayer games

Unblocked Online Multiplayer games