530 likes | 798 Views
E Pluribus UnumMatchmaking in Halo 3. Chris ButcherBungie Studiosbutcher@bungie.com Game Developers Conference 2008. Overview. What Is Matchmaking?Matchmaking BasicsLessons from Halo 2Halo 3 Design GoalsVoice, Identity, Community ReinforcementSkill Measurement and Reward SystemsTechnical DesignTrueSkillMatchmaking AlgorithmsRecommendationsResults from Halo 3 Live Operation.
E N D
2. E Pluribus Unum
Matchmaking in Halo 3
3. Overview What Is Matchmaking?
Matchmaking Basics
Lessons from Halo 2
Halo 3 Design Goals
Voice, Identity, Community Reinforcement
Skill Measurement and Reward Systems
Technical Design
TrueSkill
Matchmaking Algorithms
Recommendations
Results from Halo 3 Live Operation
4. What Is Matchmaking?
5. Manual Game Browsing User is presented with a list of possible games
Tries to find an open slot
Tries to find a fair game
Inconsistent experience
Not good for casual gamers
I Just Want To Play!
Player is responsible for finding a slot themselves, which is tedious.
Lots of people all trying to join the same games you are
Cant join up with friends
Each game is the luck of the draw regarding difficulty of opponentsPlayer is responsible for finding a slot themselves, which is tedious.
Lots of people all trying to join the same games you are
Cant join up with friends
Each game is the luck of the draw regarding difficulty of opponents
6. Terminology Manual game browsing is a standard technique
Host Game / Join Game options in UI are common
Xbox LIVE refers to this as the Matchmaking API
Quick Match, Custom Match
In this presentation, matchmaking means an automated peer-to-peer system that organizes players into groups based on user preference
Game could still be client / server once the game starts
Could even use dedicated servers
7. Vision of Matchmaking Provide an experience that is:
Fast
Reliable
Consistent
8. Matchmaking Basics
9. Matchmaking Ecosystem Continuous stream of groups entering matchmaking
Some groups decide to start gathering a game
The remainder search for games to join
Each group can be multiple machines and players
10. Xbox LIVE Matchmaking Service Gatherers register with XBL service
Each group has unique matchmaking desires
Type of game, skill level, spoken language, etc
Searchers query service with parameter filters
Service returns matching candidates Lots of parameters depending on your game. Ill talk about these in detail later.
Lots of parameters depending on your game. Ill talk about these in detail later.
11. Candidate Evaluation Searcher evaluates all candidates in parallel, best matches first
Ping network connectivity, get current group state
Measure quality of connection using Xbox QoS probes
Group-to-group XNetConnect
Group-to-group session join
Network layer handles as asynchronous processes Join protocol has multiple phases, each of these is an asynchronous process. You need to architect your network layer so that they can execute in parallel to multiple targets. Lots of work here to make this seamless and robust.Join protocol has multiple phases, each of these is an asynchronous process. You need to architect your network layer so that they can execute in parallel to multiple targets. Lots of work here to make this seamless and robust.
12. Matchmaking Life Cycle Groups enter Matchmaking continuously
Each group chooses to gather or search
Gather: register session with XBL service
Search: query service for candidates
Search, evaluate candidates, try to join
If no suitable candidates, search again
Halo specific game flow:
Gatherer waits until game is full
Determine game settings, host selection
Start game First section: Matchmaking principles (true for any game)
Second section: Halo specific usageFirst section: Matchmaking principles (true for any game)
Second section: Halo specific usage
13. Lessons from Halo 2
14. Halo 2 has had good longevity Year-on-Year retention is > 80% Players seem to like Matchmaking
Provides enjoyment for many thousands of games
Players seem to like Matchmaking
Provides enjoyment for many thousands of games
15. Game is well suited to Matchmaking Small-group gameplay (2-5 per team)
Interact with friends in your group
Both coordinated effort and individual skill required
Opponents are anonymous and interchangeable
Long term goals are self-driven rather than peer-driven
I want to reach Level 30
Not: I want to be the best on my server
16. Lessons Learned - Matchmaking Received well by the majority of players
Always something to do, a mix of novelty and the familiar
Configurable experience allows longevity
Required several early updates to operate robustly
DLC maps locking people out was a problem
People dont like feeling they have no control
International experience was poor
17. Lessons Learned - Skill System Modified ELO rating system
Both a skill measurement and also reward for investment
Non-zero-sum for levels 1-20 to give a hill-climbing experience
Was abused through boosting
Zero-sum competition for advancement
Skill level achievement is always in jeopardy
Leads to anxiety, anger and frustration in players
WTF I lost my level 30, my team sucks
Players are locked in a continuous struggle to get and then retain their skill level, as the only visible sign of achievement. This creates tension because its hard to attain and easy to lose. Tends to manifest as negative emotions.
Playing Halo 2 is stressful!Players are locked in a continuous struggle to get and then retain their skill level, as the only visible sign of achievement. This creates tension because its hard to attain and easy to lose. Tends to manifest as negative emotions.
Playing Halo 2 is stressful!
18. Ranked Matchmaking in Halo 2
Hyper-Competition
+
Anonymity
+
Loss Anxiety
=
Negative Emotional Pressure Weve made several online multiplayer games so we know that people tend to be jerks online.
But, even we were surprised at how this combination of factors led to a pretty negative emotional tone in the community.Weve made several online multiplayer games so we know that people tend to be jerks online.
But, even we were surprised at how this combination of factors led to a pretty negative emotional tone in the community.
19. Ranked Matchmaking in Halo 2
And were not the only ones who noticed this.And were not the only ones who noticed this.
20. Design Goals for Halo 3
21. Overall Goals Make the online experience approachable
Provide accountability and identity
Give players a reason to keep coming back
Tools:
Voice
Identity
Skill System
Reward System
New Player Experience
22. Voice Design Cant predict how players will use voice
Give listeners control over what they hear
Remove temptation to use voice negatively
Allow time for socialization that isnt under pressure
Make it easy for players to opt out or mute
Positive: Chatting idly with friendly strangers
Negative: Being abused by hostile anonymous bigots Everyone has a different opinion on the correct use of voice communication online. No way to matchmake based on this we dont know if someone is a mellow player or a foulmouthed bigot. You have to put control in the hands of the listener.
We only allow you to communicate with your enemies after the game.
This was a tough decision for us, but the right one
The difference between a positive and a negative voice interaction is often one of control. Put control in the hands of the listener.Everyone has a different opinion on the correct use of voice communication online. No way to matchmake based on this we dont know if someone is a mellow player or a foulmouthed bigot. You have to put control in the hands of the listener.
We only allow you to communicate with your enemies after the game.
This was a tough decision for us, but the right one
The difference between a positive and a negative voice interaction is often one of control. Put control in the hands of the listener.
23. Identity Design Every player has a public Service Record
Persistent individual identity reduces anonymity
Goal is to reduce anonymity and provide long-term identification
Publicly accessible in-game to everyone
Reduce sock-puppeting that was prevalent in Halo 2
Rewards are individual
Success recognized directly, or via social comparison with friends
No global leaderboards!
Primarily competing with yourself Making the Service Record visible to everyone reduces anonymity
Also want to reduce the sock-puppeting that was prevalent in Halo 2
Global leaderboards just encourage cheaters. Make rewards individual and direct, rather than forcing players to compete for global recognition. Acts to change the tone of competition.Making the Service Record visible to everyone reduces anonymity
Also want to reduce the sock-puppeting that was prevalent in Halo 2
Global leaderboards just encourage cheaters. Make rewards individual and direct, rather than forcing players to compete for global recognition. Acts to change the tone of competition.
24. Skill System Design Range 1-50; everyone starts at level 1
Almost everyone gains levels quickly, providing positive feedback
After 50-100 games, skill level stabilizes
Needs to still feel dynamic and not stagnant.
But shouldnt lose a level from one bad game.
Skill should be a statistic, not a reward
After time the skill system will converge to accurately measure your skill. It still needs to move as your ability changes, or you go on a winning / losing streak. But it shouldnt be so reactive that you worry about losing a level.
Note that the goal is not to provide a reward for players any more, we are introducing a separate system
After time the skill system will converge to accurately measure your skill. It still needs to move as your ability changes, or you go on a winning / losing streak. But it shouldnt be so reactive that you worry about losing a level.
Note that the goal is not to provide a reward for players any more, we are introducing a separate system
25. Reward System Design Reward for playing
Experience points (XP)
Only for wins, to prevent boosting
Penalty for quitting games early
Experience rating hierarchy
Ratings require both skill and XP
Emphasized in UI over skill
Permanent; no loss anxiety
26. New Player Experience Separate Boot Camp playlist for new players only
Limited set of maps and weapons to ease players in
Small groups for socialization
Reward early and often!
Move skilled players out quickly
5 wins triggers graduation
27. Technical Design (Skill)
28. Xbox LIVE Skill System TrueSkill A mathematical library implemented on XBL back end
Bayesian estimation techniques developed by Microsoft Research Cambridge
Models player skills as probability density functions [ľ, s]
ľ is mean (current estimate), s is standard deviation (uncertainty)
TrueSkill is stored and updated invisibly by XBL back end
Start in the middle as a wide possibility band, mu=0, std dev s0
As games are scored the skill adjusts rapidly and its band shrinks as the system is more certain of a players skill
Once the skill converges (s is small) it moves much slower
Start in the middle as a wide possibility band, mu=0, std dev s0
As games are scored the skill adjusts rapidly and its band shrinks as the system is more certain of a players skill
Once the skill converges (s is small) it moves much slower
29. Using TrueSkill in Halo 3 Dont show players the raw mathematics of [ľ, s]
Use skill lower bound: s = ľ - ks (we chose k=4)
Transform by remap function into range 1-50 for display in UI
30. Customizing TrueSkill Mathematical configuration variables
ß (performance factor), ? (dynamics factor), draw probability
Left ß alone: dangerous, affects final skill distribution
Increased ? so that players skill never fully converges
Draw probability must be accurate, if it is set too low then ties will be considered highly significant
Update Weight modifies rate of change of [ľ, s]
We used this to give players a hill-climbing experience by initially decreasing their TrueSkill update weight
Weights start out small and return to normal over 50-100 games in a playlist
Even though we can identify good or bad players after 8 games, it is more satisfying for them to feel they earned their skill over time
Performance factor (ß) describes the randomness of players performance. High values of ß mean that each game has less effect on skill. You might be tempted to modify this so that convergence is faster or slower. This is dangerous because it affects the final distribution of all players. We left it alone.
Dynamics factor (?) is safe to modify. Higher values mean the skill never fully converges and remains live even after many games.
Draw probability is important to get right as otherwise ties will be treated as highly significant and cause unexpected updates.Performance factor (ß) describes the randomness of players performance. High values of ß mean that each game has less effect on skill. You might be tempted to modify this so that convergence is faster or slower. This is dangerous because it affects the final distribution of all players. We left it alone.
Dynamics factor (?) is safe to modify. Higher values mean the skill never fully converges and remains live even after many games.
Draw probability is important to get right as otherwise ties will be treated as highly significant and cause unexpected updates.
32. TrueSkill Summary Advantages
Already implemented for you by Xbox LIVE
Converges quickly
Provides good estimate of player skill for matchmaking
TrueSkill developers are very helpful and knowledgeable
Disadvantages
Complex mathematics, takes an expert to understand and tweak
Hard to predict overall convergence of system
Default behavior does not fit our ideals for a skill system
Vulnerable to exploitation, both real and perceived (ties increase rank)
This is a hard problem with no clear solution TrueSkill is a good system for what it does. However when you extend it outside its domain of mathematical estimation and try to turn it into something you can show to players, you will run into trouble. It is possible but it takes a lot of work. And the details will never really be under your control.
Default behavior does not fit our ideals, and changing the settings makes its behavior hard to predict overconstraining the system.
Multivariate representation leads to perceived issues, e.g. ties increasing rank.
Showing skill to players is valuable but was it worth all this? Hard problem. We dont know what the right answer is here.TrueSkill is a good system for what it does. However when you extend it outside its domain of mathematical estimation and try to turn it into something you can show to players, you will run into trouble. It is possible but it takes a lot of work. And the details will never really be under your control.
Default behavior does not fit our ideals, and changing the settings makes its behavior hard to predict overconstraining the system.
Multivariate representation leads to perceived issues, e.g. ties increasing rank.
Showing skill to players is valuable but was it worth all this? Hard problem. We dont know what the right answer is here.
33. Technical Design (Matchmaking)
34. Search Criteria Use precise initial query parameters to find an ideal match
Skill, Experience, Network Connection Quality
Initial queries are less likely to find a match
Allows tight matches in large populations
Query parameters must include all selection criteria
Halo 2 had some criteria that were not stored in XBL service
Searcher spent time querying candidates that they would never want to join e.g. due to spoken language
Wastes bandwidth and also wastes precious search time We start out with precise filters where we are looking for an ideal match.
Queries are less likely to succeed but the penalty for an empty query is relatively low.
Lets us scale up to large populations and provide tight matchmaking.
Very important that you should not have the client doing any post-filtering on candidates.
Did this in Halo 2 for a few attributes (spoken language)
Wasting time is VERY bad because each set of candidates is only viable for a short period of timeWe start out with precise filters where we are looking for an ideal match.
Queries are less likely to succeed but the penalty for an empty query is relatively low.
Lets us scale up to large populations and provide tight matchmaking.
Very important that you should not have the client doing any post-filtering on candidates.
Did this in Halo 2 for a few attributes (spoken language)
Wasting time is VERY bad because each set of candidates is only viable for a short period of time
35. Search Expansion Fuzzy match in many dimensions
Analog parameters (skill, experience, network connection)
Binary parameters (language, country, DLC maps)
Treat binary parameters as soft filters
Expansion has multiple phases
Look for ideal match, expand analog filter a bit
Remove binary soft filters
Expand analog filter out to max, relax connection quality
Keep trying intermittently, switch to gathering
36. Ecosystem Balance Must have good balance of searchers and gatherers
Halo 2: Easy to model in theory, impossible in practice
Global Internet network properties
Latency in Live service updating
Network engine internals (time to discover, time to join)
Expire lists of candidates quickly
Searchers are in a race to join limited set of active games
Make the ecosystem adaptive
Gatherers can also search
If nobody is joining you, you have a chance to join someone
Ecosystem can adaptively balance for low-population scenario This bit us hard when we launched Halo 2.
Required multiple updates to fix.
No way to diagnose these problems in the wild. (H2 matchmaking still unknowably broken.)
Trying to make H3 ecosystem more fault tolerantThis bit us hard when we launched Halo 2.
Required multiple updates to fix.
No way to diagnose these problems in the wild. (H2 matchmaking still unknowably broken.)
Trying to make H3 ecosystem more fault tolerant
37. Recommendations
38. Is Matchmaking Right For You? Works with different genres
Works with different game models
Could match into games in progress
Could use dedicated servers
Scales to wide range of user populations
Halo 3 playlists range from < 1k to 100k concurrent users
Significant investment in client software
2-3 developers for project lifecycle
Back end functionality optional but helps a lot
Payoff comes from building a lasting community
39. Design For Security Users have no control so you must provide safe games for them
Every aspect of your game will be attacked
Hardware attacks, network attacks (bridging, standby, DoS), game attacks (modified content, LSP interception), exploits (skill de-leveling, out-of-map), many more
Halo 2 required five updates over three years for security
This is an entire talk in itself
40. Test Early, Test Often MS-Internal Alpha and Beta (10k) 11/06 and 4/07
Rich text data mining as primary feedback
Searchable centralized logging system with event severities
Find hard bugs in client (edge cases, crashes, network protocol)
Transcontinental Matchmaking and network testing
Public Beta (900k) 5/07
PR boost, some gameplay feedback also
Tune TrueSkill distribution curves on real player skill mix
Load balancing of LSP servers to avoid Day 1 meltdown
High population games must involve XBL in testing
Easy to create a scalability problem on back end
41. Collect Data From Production You will need retail environment instrumentation
Cant use data mining too much data to store and transmit
For Halo 2 launch we had no alternative
Halo 3 uses special-purpose binary uploads
End-of-game report for bungie.net analysis
Matchmaking status report for ecosystem diagnosis
Network QoS report for research
Volume of data is massive, we discard 90%+
Slim fire-and-forget stateless HTTP-over-XLSP upload
Per-machine settings for deep investigation
42. Results
43. Results Overall Deployed successfully, no client update needed
Some normal LSP scalability balancing in first few days
Reviews mention MP as streamlined, transparent
High penetration of online multiplayer
5.9M unique users observed on Xbox LIVE
5.2M have played in Matchmaking (88%)
Approximately 3x Halo 2 peak concurrency
560k peak concurrent users in 15 playlists
Longevity is an open question
Tracking steady at 1.2M unique users per week
Deployment of Halo 3 matchmaking was successful compared to Halo 2 which required 2 early client updates
Deployment of Halo 3 matchmaking was successful compared to Halo 2 which required 2 early client updates
44. Results Launch
This graph shows Halo 2 and Halo 3 MP games/day recorded by bungie.net.
Looks like a sharp decline in H3 but we think this is actually the expected seasonal decline that happens every year in Oct/Nov.
This graph shows Halo 2 and Halo 3 MP games/day recorded by bungie.net.
Looks like a sharp decline in H3 but we think this is actually the expected seasonal decline that happens every year in Oct/Nov.
45. Results Player Community Skill numbers that you can actually believe in!
Perception is that level 50 means skilled, not a cheater
Some experience boosting
We assumed it would be possible to circle boost XP
There were ways that let you do it many times faster than normal
No way to advance to higher ratings without ranked play
This was probably a mistake
Player identity features very well received
Social community seems to be better than Halo 2
46. Results MP Game Selection Custom Games: 16% Matchmaking: 84% These figures are in terms of player-games, so one 4v4 game counts as 8 and a 1v1 game counts as 2These figures are in terms of player-games, so one 4v4 game counts as 8 and a 1v1 game counts as 2
47. Results Player Retention
5.9M connected to Live
5.2M enter Matchmaking at least once
60% play at least 100 games; this indicates that online multiplayer is no longer a niche
5.9M connected to Live
5.2M enter Matchmaking at least once
60% play at least 100 games; this indicates that online multiplayer is no longer a niche
48. Results Skill in Team Slayer
49. Results Overall Skill
50. Results Experience Rating
51. Future Design Thoughts New player experience was good, not great
Implemented late, needs goal-driven UI flow
58% of players went on to play 100 games or more
But 19% of players stopped after < 20 games
Online model is focused on skill improvement
But most players dont care about skill, they like reward better
Negative behavior was reduced somewhat
Tremendous amount of room for improvement
Reputation and social history as part of public player identity
Empowering players to change their experience is the right path
52. Future Technical Thoughts Starting to feel like a solved problem technically
Ecosystem could be more self adjusting
Still some search / gather balance issues
Move more of the ecosystem to a centralized service?
Ubiquitous Matchmaking?
Not just as explicit UI
Invisible fabric of online experience
Peer-to-peer is the future
53. Credits Bungie
This system is the work of many people
Design, Networking, UI, bungie.net, more
Microsoft Research
MSR Cambridge Applied Games Group (TrueSkill)
MSR Networking Research Group (QoS data analysis)
Microsoft Game Studios
Xbox Platform
XDC (XNA Developer Connection)
Xbox LIVE Team
Xbox LIVE Operations Team