SteerBench: a benchmark suite for evaluating steering behaviors

SteerBench: a benchmark suite for evaluating steering behaviors Authors: Singh, Kapadia, Faloutsos, Reinman Presentedby: Jessica Siewert

Content of presentation • Introduction • Previous work • The Method • Assessment

Introduction – Context and motivation • Steering of agents • Objective comparison • Standard? • Test cases and scoring, user evaluation • Metric scoring • Demonstration

Introduction – Previous work There is not really anything like it yet (Nov ‘08)

Introduction - Promises • Evaluate objectively • Help researchers • Working towards a standard for evaluation • Take into account: • Cognitive decisions • Situation-specific aspects

The test cases • Simple validation scenarios • Basic one – on – one interactions • Agent interactions including obstacles • Group interactions • Large-scale scenarios

The user’s opinion • Rank on overal score across test cases (comparing) • Rank algorithms based on • a single case, or • one agent’s behavior • Pass/fail • Visually inspect results • Examine detailed metrics of the performance

The metric • Number of collisions • Time efficiency • Effort efficiency • Penalties?

Movies…

Developments since then • Ioannis Karamouzas , Peter Heil , Pascal Beek , Mark H. Overmars, A Predictive Collision Avoidance Model for Pedestrian Simulation, Proceedings of the 2nd International Workshop on Motion in Games, November 21-24, 2009, Zeist, The Netherlands • Shawn Singh , Mubbasir Kapadia , Billy Hewlett , Glenn Reinman , Petros Faloutsos, A modular framework for adaptive agent-based steering, Symposium on Interactive 3D Graphics and Games, February 18-20, 2011, San Francisco, California • Suiping Zhou , Dan Chen , Wentong Cai , Linbo Luo , Malcolm Yoke Hean Low , Feng Tian , Victor Su-Han Tay , Darren Wee Sze Ong , Benjamin D. Hamilton, Crowd modeling and simulation technologies, ACM Transactions on Modeling and Computer Simulation (TOMACS), v.20 n.4, p.1-35, October 2010

Experiments – Claim recall • Evaluate objectively • Help researchers • Working towards a standard for evaluation

Assessment – good things • All the measured variables seem logical (Too?) • Extensive variable set, with option to expand • Customized evaluation • Cheating not allowed • collision penalties • fail constraint • goal constraint • Layered set of test cases

Assessment • The measurements all seem to be approximately the same • User test makes the difference? • Who are these users? • Examine, inspect, all vage terms • What about the objective of objectiveness?

Assessment • How good is it to be general • How general/specific is this method? • Time efficiency VS. Effort efficiency • Should it be blind for the algorithm itself? • Penalties, fail and goal constraints not specified!

Assessment – scoring(1/2) • The test cases are clearly specified. But it is not specified HOW a GOOD agent SHOULD react, though they say there is such a specification • How can you get cognitive decisions out of only position, direction and a goal?

Assessment – scoring(2/2) • “Scoring not intended to be a proof of an algorithm’s effectiveness.” • How do you interpreted scores and who wins? • “B is slightly better on average, but A has the highest scores.”

Assessment – final questions • Can this method become a standard? • What if someone claims to be so innovative this standard does not apply to them? • Nice first try, though! Getty images

SteerBench: a benchmark suite for evaluating steering behaviors

SteerBench: a benchmark suite for evaluating steering behaviors

Presentation Transcript

Steering Behaviors

Steering Behaviors for Autonomous Vehicles in Virtual Evironments

The PARSEC Benchmark Suite

Towards a Benchmark for Evaluating Design Pattern Miner Tools

A Micro-benchmark Suite for AMD GPUs

BigDataBench : a Big Data Benchmark Suite from Internet Services

The PROOF Benchmark Suite Measuring PROOF performance

RNAsim/CRIMSON Algorithm Benchmark Suite

The HPC Challenge (HPCC) Benchmark Suite

PBB: A Parallel Bioinformatics Benchmark Suite for Shared Memory Multiprocessors

BENCHMARK SUITE

The HPC Challenge (HPCC) Benchmark Suite

The HPEC Challenge Benchmark Suite

BugBench: A Benchmark for Evaluating Bug Detection Tools

Establishing a Benchmark for Evaluating Hazel Net News

Steering Behaviors For Autonomous Characters

Steering Behaviors

Benchmark Suite for Web Services

SPEC OMP Benchmark Suite

PBB: A Parallel Bioinformatics Benchmark Suite for Shared Memory Multiprocessors

HPCS HPCchallenge Benchmark Suite

The HPC Challenge (HPCC) Benchmark Suite