250 likes | 264 Views
Learn how the dynamic benchmarking game model is used in software development to assess relative performance, rank software, motivate students, and create a realistic development environment.
E N D
DynamicBenchmarking Alex Dubreuil Northeastern University dubreuil.a@husky.neu.edu acdubre@gmail.com Software development though competition
Contents • Dynamic Benchmarking Introduction • Uses of the Benchmarking Game model • Software Development (CS 4500) • A Lesson I’ve learned Caution: Slide layout may cause drowsiness.
Benchmarking • Assesses relative performance • Typically by running standardized tests • Produces scores which are then compared • SATs • Other options exist • Allowing software to compete directly • Chess game
The Traditional Approach Developer A Software A Static Benchmark Score A Developer B Software B Score B Developer C Software C Score C Parameterized by the domain.
The Dynamic Approach Software A Team A Agent Artificial World (Game) Benchmark A Agent Ranking Software B Team B Agent Benchmark B Agent Software C Team C Benchmark C Parameterized by the domain.
An Artificial WorldAgent’s View Agent Beliefs, Challenges, Problems, Solutions Opponents’ communication, Feedback • Problems: Benchmark output • Solutions: Software output • Beliefs/Challenges: statements about algorithms Administrator Results
Problems & Solutions • Problem communication: • Define an instance of a problem in the domain • Solution communication: • Respond to an opponent’s problem • Administrator has a metric for determining how good a solution is • This metric is well defined and known by all
Beliefs & Challenges • General statements about algorithms • Belief: • Defines a subset of the problems in the domain • Makes a statement about the problems in that subset • Challenge: • A response to a belief of an opponent
Administrator • Opponents’ communication • Filter all communication through the Administrator for security • Filter information when necessary • Feedback: • Inform agents of rule violations • Inform agents of status changes
Administrator • Results • Track state changes through the game • Produce the agent ranking from the end game state
What’s next • Dynamic Benchmarking Introduction • Uses of the Benchmarking Game model • Software Development (CS 4500) • A Lesson I’ve learned If you can read this, you don’t need glasses.
Overhead • Requires mature Administrator, communication system for accurate results • Reuse between domains is possible • Requires new translation for each problem domain
Software Development • Ranks software without a mature benchmark • Dynamic approach excels when a well-defined benchmark does not exist • Creates data to build better benchmarks • Because Agents, not Software, are ranked • Forces developers to consider both their solutions and the problem domain
Education • Motivates students • Mature Administrator/Agent not required • Creates interesting student interaction • Creates a realistic software development environment
What’s next • Dynamic Benchmarking Introduction • Uses of the Benchmarking Game model • Software Development (CS 4500) • A Lesson I’ve learned Yeah, I got nothing.
Specker Challenge Game • The SCG is the basis for Professor Karl Lieberherr’s Software Development class • Uses an arity 3 boolean constraint satisfaction problem (CSP) as our domain • Teams of 2~3 produce the components of an Agent
(Some of the) Skills Involved • Using outsourced tools • DemeterF (developed by Bryan Chadwick) • Component Market • Dealing with users • Underspecified requirements • Source control • Constraint Satisfaction algorithms • Data mining
Added bonus Domain Knowledge Experts Code So what? Programmers Requirements Limitations How-to Non-technical Requirements Gibberish Salespeople Customers Users
It’s a busy class • Traditional grading would not work • The competition keeps students motivated
What’s next • Dynamic Benchmarking Introduction • Uses of the Benchmarking Game model • Software Development (CS 4500) • A Lesson I’ve learned
Administrator Security • Never accept extra input • Transaction: Challenge: ID, Type, Price • vs. • Transaction: Challenge: ID • Check all necessary input • Transaction: Deliver Problem: ID, Problem • Check: Does the Problem match the Type?
General Lesson • Never trust user input • Sanitize data • Protect against buffer overflows
More General Lesson • It’s good to see things before they can do you or others harm • Users you can yell at • Security flaws that don’t cost money • Underspecified requirements
Thank you! Alex Dubreuil Northeastern University dubreuil.a@husky.neu.edu acdubre@gmail.com