270 likes | 369 Views
Algoval: Evaluation Server Past, Present and Future. Simon Lucas Computer Science Dept Essex University 25 January, 2002. Architecture Evolution. Version 1: Centralised evaluation of Java submissions (Spring 2000) Version 2: Distributed evaluation using Java RMI (Summer 2001)
E N D
Algoval: Evaluation ServerPast, Present and Future Simon Lucas Computer Science Dept Essex University 25 January, 2002
Architecture Evolution • Version 1: Centralised evaluation of Java submissions (Spring 2000) • Version 2: Distributed evaluation using Java RMI (Summer 2001) • Version 3: Distributed evaluation using XML over HTTP (Spring 2002)
Competitions • Post-Office Sponsored OCR Competition (Autumn 2000) • IEEE Congress on Evolutionary Computation 2001 • IEEE WCCI 2002 • ICDAR 2003 • Wide range of contests – OCR, Sequence Recognition, Object Recognition
Parameterised Algorithms • Note that league table entries can include the parameters that were used to configure the algorithm • This allows developers to observe the results of different parameter settings on the performance measures • E.g.: problems.seqrec.SNTupleRecognizer?n=4&gap=11?eps=0.01
Centralised • System restricted submissions to be written in Java – for security reasons • Java programs can be run in within a highly restrictive security manager • Does not scale well under heavy load • Many researchers unwilling to convert their algorithm implementations to Java
Centralised II • Can measure every aspect of an algorithms performance • Speed • Memory requirements (static, dynamic) • All algorithms compete on a level playing field • Very difficult for an algorithm to cheat
Distributed • Researchers can test their algorithms against others without submitting their code • Results on new datasets can be generated immediately for all clients that are connected to the evaluation server • Results are generated by the same evaluation method. • Hence meaningful comparisons can be made between different algorithms.
Distributed (RMI) • Based on Java’s Remote Method Invocation (RMI) • Works okay, but client programs still need to access a Java Virtual Machine • BUT: the algorithms can now be implemented in any language • However: there may still be some work converting the Java data structures to the native language
Distributed II • Since most computation is done on the clients' machines, it scales well. • Researchers can implement their algorithms in any language they choose - it just has to talk to the evaluation proxy on their machine. • When submitting an algorithm it is also possible to specify URLs for the author and the algorithm • Visitors to the web-site can view league tables then follow links to the algorithm and its implementer.
Remote Participation • Developers download a kit • Interface their algorithm to the spec. • Run a command-line batch file to invoke their algorithm on a specified problem
Features of RMI • Handles Object Serialization • Hence: problem specifications can easily include complex data structures • Fragile! – changes to the Java classes may require developers to download a new developer kit • Does not work well through firewalls • HTTP Tunnelling can solve some problems, but has limitations (e.g. no callbacks)
<future>XML Version</future> • While Java RMI is platform independent (any platform with a JVM), XML is language independent • XML version is HTTP based • No known problems with firewalls
XML Version • Each client (algorithm under test) • parses XML objects (e.g. datasets) • sends back XML objects (e.g. pattern classifications) to the server
Pattern recognition servers • Reside at particular URLs • Can be trained on specified or supplied datasets • Can respond to recognition requests
Example Request • Recognize this word: • Given the dictionary at: • http://ace.essex.ac.uk/viadocs/dic/pygenera.txt • And the OCR training set at: • http://ace.essex.ac.uk/algoval/ocr/viadocs1.xml • Respond with your 10 best word hypotheses
1. MELISSOBLAPTES2. ENDOMMMASIS3. HETEROGRAPHIS4. TRICHOBAPTES5. HETEROCHROSIS6. PHLOEOGRAPTIS7. HETEROCNEPHES8. DRESCOMPOSIS9. MESOGRAPHE10.DIPSOCHARES Example Response
Issues • How general to make problem specs • Could set up separate problems for OCR and face recognition, or a single problem called ImageRecognition • How does the software effort scale?
Software Scalability • Suppose we have: • A algorithms implemented in L languages • D datasets • P problems • E algorithm evaluators • How will our software effort scale with respect to these numbers?
Scalability (contd.) • Consider server and clients • More effort at the server can mean less effort for clients • For example, language specific interfaces and wrappers can be defined • This makes participation in a particular language much less effort • This could be done on demand
Summary • Independent, automatic algorithm evaluation • Makes sound scientific and economic sense • Existing system works but has some limitations • Future XML-based system will overcome these • Then need to get people using this • Future contests will help • Industry support will benefit both academic research and commercial exploitation