1 / 20

Zulu: an active finite state machine learning competition

Zulu: an active finite state machine learning competition. Valencia September 2010. Colin de la Higuera. General goal. http://labh-curien.univ-st-etienne.fr/ zulu To support research in DFA learning To promote active learning as an alternative to statistical learning

abrial
Download Presentation

Zulu: an active finite state machine learning competition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Zulu: an active finite state machine learning competition Valencia September 2010 Colin de la Higuera ICGI, Valencia, September 2010

  2. General goal http://labh-curien.univ-st-etienne.fr/zulu • To support research in DFA learning • To promote active learning as an alternative to statistical learning • To attempt to use learning for under-resourced languages ICGI, Valencia, September 2010

  3. State of the art (1) • Learning automata is a difficult but great topic, with not enough positive results (… do come this afternoon…) • The question of learning DFA has received attention for 30 years • Typical protocol consists in learning from a bunch of data: you need a lot of data if you want to learn… ICGI, Valencia, September 2010

  4. State of the art (2) • Alternative introduced by Angluin: the learner can make queries to an oracle • Typical queries are membership q., equivalence q., subset q. or correction q. • Algorithm L* can learn DFA with a polynomial amount of resources ICGI, Valencia, September 2010

  5. State of the art (3) Many reasons for wanting to learn DFA from queries • Useful in a number of fields • Start with DFA… • Under-resourced languages ICGI, Valencia, September 2010

  6. The task • The participant is told that (s)he is to learn a DFA and allowed to ask k membership queries • She is given the alphabet, k, and an upper bound on the number of states. • The participant interactively uses the online oracle, and after making k queries, is given 1800 strings that she has to parse and classify. Score is % of correct labels. ICGI, Valencia, September 2010

  7. The baseline • Angluin’s L* algorithm learns perfectly but uses MQ and EQs • A version in which EQs are “simulated” by random sampling is provided ICGI, Valencia, September 2010

  8. A membership query • Learner: does aababababbbab belong to the language? • Oracle: no ICGI, Valencia, September 2010

  9. An equivalence query • Learner: Is (aa*(b+ab)*bb+aa)* the correct answer? • Oracle: No, because aabababba does belong to the language ICGI, Valencia, September 2010

  10. Simulating an equivalence query • Random strings are sampled: aabba, bbabba, aaaababab, bbabababaaaa,… • Learner’s hypothesis: aabbaL • Learner: does aabba belong to L? • Oracle: yes (if we agree many times I can’t be far off) • Oracle: no (aabba can be used as a counterexample) ICGI, Valencia, September 2010

  11. The theory • DFA are learnable with MQ and EQ • DFA are not learnable from a polynomial number of MQ • You can’t really simulate the EQ through sampling because you don’t know what the distribution is ICGI, Valencia, September 2010

  12. The oracle (1) • is given an upper bound n on the number of queries and the size of the alphabet • generates a (minimal) DFA with at most n states • runs the baseline on this DFA and halts as soon as it is 70% correct. This gives the number of queries (k) for that task. • gives the player an identifier. ICGI, Valencia, September 2010

  13. The oracle (2) • interacts with the learner and answers to k queries • generates 1800 strings and gives them to the learner • receives the 1800 labels and computes the score ICGI, Valencia, September 2010

  14. Scientific committee • Dana Angluin, Yale University, USA • Leo Becerra Bonache, Univ. de Tarragona, Spain • François Coste, IRISA, Rennes, France • Alex Clark, Royal Holloway Univ. of London, UK • Ricard Gavaldá, UPC Barcelona, Spain • Colin de la Higuera, U. Saint-Etienne/Nantes, France • Jean-Christophe Janodet, U. de Saint-Etienne, France • Aurélien Lemay, Université de Lille 3, France • Laurent Miclet, ENSSAT Lannion and IRISA, France • Tim Oates, University of Maryland, USA • Anssi Yli-Jyrä, Helsinki, Finland • Menno van Zaanen, Tilburg University, The Netherlands ICGI, Valencia, September 2010

  15. Organisation committee • Myrtille Ponge • David Combe • Jean-Christophe Janodet • Colin de la Higuera ICGI, Valencia, September 2010

  16. Some open issues • How should the DFA be generated? • What is a random DFA? • Generate random NFA instead? • Should they not be “typical DFA”? • What distribution for the test set? • If the distribution is known, this helps! • How do we have a fair competition? ICGI, Valencia, September 2010

  17. Main dates • 23rd of July 2009: official launch • till May 2010: advertising and training phase • June 2010: competition phase • 7th July 2010: results published • September 2010: Workshop / Special session ICGI, Valencia, September 2010

  18. Zulu competition • http://labh-curien.univ-st-etienne.fr/zulu • 23 competing algorithms, 11 players • End of the competition a week ago. • Tasks: • Learn a DFA, be as precise as possible, with n queries ICGI, Valencia, September 2010

  19. Results ICGI, Valencia, September 2010

  20. Winners • Falk Howar • Balle • Eisenstat ICGI, Valencia, September 2010

More Related