1 / 24

Improving the Automatic Evaluation of Problem Solutions in Programming Contests

Improving the Automatic Evaluation of Problem Solutions in Programming Contests. Pedro Ribeiro and Pedro Guerreiro. Presentation Overview. Automatic Evaluation: Past and Present The case of IOI A possible path for improving evaluation Developing only a function (not a complete program)

Download Presentation

Improving the Automatic Evaluation of Problem Solutions in Programming Contests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving theAutomatic Evaluation of Problem Solutions in Programming Contests Pedro Ribeiro and Pedro Guerreiro

  2. Presentation Overview • Automatic Evaluation: Past and Present • The case of IOI • A possible path for improving evaluation • Developing only a function (not a complete program) • Abstract Input/Output • Repeat the same function call (+ clock precision) • No hints on expected complexity • Examine runtime behaviour as tests increase in size • Some preliminary results • Conclusions

  3. Programming Contests • All programming contests need an efficient and fair way of distinguishing submitted solutions (Automatic) Evaluation • What do we evaluate? • Correction: does the program produce correct answers for all instances of the problem? • Efficiency: does it do it fast enough? Does it have the necessary time and memory complexity?

  4. Programming Contests • Classic way of evaluating • Set of pre-defined tests (inputs) • Run program with tests and check output • IOI has been doing this almost the same way since the beginning with two major advances: • Manual evaluation > Automatic evaluation • Individual Tests -> Grouped tests • Although IOI has 3 different types of tasks, the main core of the event are still batch tasks

  5. IOI Types of Tasks

  6. Programming Contests • Correction: almost “black art” • “Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence” (Dijkstra) • Efficiency: • Typically judges create set of model solutions of different complexities • Tests designed in that model solutions achieve planned number of points • Considerable amount of tuning (environment) • Considerable amount of man power needed • More difficult to introduce new languages

  7. Ideas: Single function • Solve the problem by writing a specific function (as opposed to a complete program) • Motivation: • Concentrate on the core algorithm (less distractors) • Can be used on earlier stages of learning • Opportunities for new ways of testing(more control on submitted code) • It is already done on other types of contests: • TopCoder • Teaching Environments(Ribeiro and Guerreiro, 2008)

  8. Ideas: I/O Abstraction • The Input and Output should be “abstract” and not specific to a language • How to do it: • Input already in memory, passed as function arguments (simple form, no complex data structure) • Output as the function return value(s) • Motivation: • Less information processing details • Less complicated problem statements • We can measure time spent in solution (not in I/O) • More balanced performance between languages

  9. Idea: Repeat function calls • In the past we used smaller input sizes increased speed of computers • Currently we use huge input sizes • Clock resolution is poor: small instances > instant • Need to distinguish small asymptotic complexities • Historic fact: Smaller time limit used on IOI: • IOI 2007, problem training: 0.3 seconds • Future? • Always more speed > bigger input size

  10. Idea: Repeat function calls • Problems completely detached from reality: • Ex: IOI 2007 Sails, ship with 100,000 masts

  11. Idea: Repeat function calls • Problems completely detached from reality: • Ex: IOI 2007 Sails, ship with 100,000 masts

  12. Idea: Repeat function calls • Real world: How can we measure the thickness of a sheet of paper if we have a standard ruler without enough accuracy? stack of 100 sheets measures 1cm, then each sheet is ~0.1mm • We can use the same idea on functions! • Run once with small instances may be instantaneous But • Running multiple times takes more than 0.00s!

  13. Idea: Repeat function calls • Run the same functions several times and compute average time • Pros • Input size can be smaller and related to problem • We can concentrate on quality of test cases and rely less on randomization to produce big test cases that are impossible to verify manually • Cons • We must be careful with memory persistence between successive function calls

  14. Idea: No hints on complexity • When we give limits for the input: • we simplify implementation details and avoid the need for dynamic memory allocation. but • We disclose the complexity required for the problem • Trained students can identify precisely the complexity needed • This has great impact on problem solving aspect: • Different mindset: I know which complexity I’m looking for and I settle for a solution that does that vs • Scientific approach with real world open problem • Ex: is there a polynomial solution for a problem?

  15. Idea: No hints on complexity • Give limits for implementation purposes, but make it clear that those are not related to sought efficiency • More scientific and open ended approach • Need to think how to really solve the problem (and not how to produce a program that passes the test cases) • Not overemphasize runtime of particular language • (let me make a test with maximum limits and see if it runs in X seconds on this machine with this language)

  16. Idea:Runtime behaviour as tests increase • Typically we measure efficiency by creating set of tests such that different model solutions achieve different number of points But • not passing does not imply that the required complexity was not achieved (other factors) • Just means that the test case is solved within the constraints • A lot of man power needed for model solutions and fine tuning (compiler version, computer speed, language used, etc)

  17. Idea:Runtime behaviour as tests increase • How can we improve on that? • Pen and Paper not an option for large scale evaluation • Need for automatic processes • We have different tests, we have different time measures, why don’t we use all this information? • Plot the runtime as data increases and do some curve fitting • Impossible to determine complexity for all programs, but even a trivial (imperfect) curve can show more information than just knowing which test cases are passed

  18. Some Preliminary Results • As a proof of concept a simple problem: • Input: Sequence of integers • Output: Subsequence of consecutive integers with maximum sum • Only ask for function with I/O already given • Small input limit (only 100) • Measure time by running multiple times (until aggregated time reached 1s) • Use random data for 1,4,8,12,…64

  19. Some Preliminary Results • Implemented 3 model solutions: • O(N^3) – Iterate all possible intervals in O(N^2) plus iterate trough each interval to discover sum in O(N) • O(N^2) – Iterate all possible intervals in O(N^2) plus O(1) checking of each sum with accumulated sums • O(N) – Iterate trough sequence and keep partial sum, whenever the partial sum is negative, it cannot contribute to best and therefore “reset” to zero and continue A B C

  20. Some Preliminary Results • Plot Time(N) / Time(1) • Simple correlation measure with another function

  21. Some Preliminary Results • Out of scope to give more detailed mathematical analysis • We could use other statistical measures • We know that it is impossible to automatically compute and prove complexities but • This simple approach gives meaningful results • runtime is somehow consistent and correlated with a certain function and therefore appears to grow following a pattern that we were able to identify • Ex: Linear > appears to take twice the time when data doubles

  22. Some Preliminary Results • What could this do? • More information from the same test cases • Possibility of giving students automatic feedback on runtime behavior • Possibility of identifying runtime behaviors for which no model solutions were created (less man power!) • Independent of language specific details Ex: Archery Problem, IOI 2009, Day 1 There were solutions with O(N^2R), O(N^3), O(N^2 log N), O(N^2), O(N log N), … No need to code them all in all languages and then tune!

  23. Conclusion • 20 Years of IOI: computers are much faster, style of evaluation is still the same • Setting up test cases is time consuming and requires man power • Need to think of ways to improve evaluation • Our proposal, geared to more informal contests or teaching environments, can offer: • No distraction with I/O • No large data sets • More natural problem statements • No hint on complexity (open ended approach) • No need for implementing many model solutions • New languages can be added without changing tests • Still more work to obtain robust system but we feel this ideas (or some of them) can be used in practice • Future: can evaluation be improved in other ways?

  24. The End • And that’s all!:-) Questions? Pedro Ribeiro (pribeiro@dcc.fc.up.pt) Pedro Guerreiro (pjguerreiro@ualg.pt)

More Related