180 likes | 325 Views
SteerBench: a benchmark suite for evaluating steering behaviors. Authors : Singh, Kapadia , Faloutsos , Reinman Presented by : Jessica Siewert. Content of presentation. Introduction Previous work The Method Assessment. Introduction – Context and motivation. Steering of agents
E N D
SteerBench: a benchmark suite for evaluating steering behaviors Authors: Singh, Kapadia, Faloutsos, Reinman Presentedby: Jessica Siewert
Content of presentation • Introduction • Previous work • The Method • Assessment
Introduction – Context and motivation • Steering of agents • Objective comparison • Standard? • Test cases and scoring, user evaluation • Metric scoring • Demonstration
Introduction – Previous work There is not really anything like it yet (Nov ‘08)
Introduction - Promises • Evaluate objectively • Help researchers • Working towards a standard for evaluation • Take into account: • Cognitive decisions • Situation-specific aspects
The test cases • Simple validation scenarios • Basic one – on – one interactions • Agent interactions including obstacles • Group interactions • Large-scale scenarios
The user’s opinion • Rank on overal score across test cases (comparing) • Rank algorithms based on • a single case, or • one agent’s behavior • Pass/fail • Visually inspect results • Examine detailed metrics of the performance
The metric • Number of collisions • Time efficiency • Effort efficiency • Penalties?
Developments since then • Ioannis Karamouzas , Peter Heil , Pascal Beek , Mark H. Overmars, A Predictive Collision Avoidance Model for Pedestrian Simulation, Proceedings of the 2nd International Workshop on Motion in Games, November 21-24, 2009, Zeist, The Netherlands • Shawn Singh , Mubbasir Kapadia , Billy Hewlett , Glenn Reinman , Petros Faloutsos, A modular framework for adaptive agent-based steering, Symposium on Interactive 3D Graphics and Games, February 18-20, 2011, San Francisco, California • Suiping Zhou , Dan Chen , Wentong Cai , Linbo Luo , Malcolm Yoke Hean Low , Feng Tian , Victor Su-Han Tay , Darren Wee Sze Ong , Benjamin D. Hamilton, Crowd modeling and simulation technologies, ACM Transactions on Modeling and Computer Simulation (TOMACS), v.20 n.4, p.1-35, October 2010
Experiments – Claim recall • Evaluate objectively • Help researchers • Working towards a standard for evaluation
Assessment – good things • All the measured variables seem logical (Too?) • Extensive variable set, with option to expand • Customized evaluation • Cheating not allowed • collision penalties • fail constraint • goal constraint • Layered set of test cases
Assessment • The measurements all seem to be approximately the same • User test makes the difference? • Who are these users? • Examine, inspect, all vage terms • What about the objective of objectiveness?
Assessment • How good is it to be general • How general/specific is this method? • Time efficiency VS. Effort efficiency • Should it be blind for the algorithm itself? • Penalties, fail and goal constraints not specified!
Assessment – scoring(1/2) • The test cases are clearly specified. But it is not specified HOW a GOOD agent SHOULD react, though they say there is such a specification • How can you get cognitive decisions out of only position, direction and a goal?
Assessment – scoring(2/2) • “Scoring not intended to be a proof of an algorithm’s effectiveness.” • How do you interpreted scores and who wins? • “B is slightly better on average, but A has the highest scores.”
Assessment – final questions • Can this method become a standard? • What if someone claims to be so innovative this standard does not apply to them? • Nice first try, though! Getty images