Zulu: an active finite state machine learning competition

Zulu: an active finite state machine learning competition Valencia September 2010 Colin de la Higuera ICGI, Valencia, September 2010

General goal http://labh-curien.univ-st-etienne.fr/zulu • To support research in DFA learning • To promote active learning as an alternative to statistical learning • To attempt to use learning for under-resourced languages ICGI, Valencia, September 2010

State of the art (1) • Learning automata is a difficult but great topic, with not enough positive results (… do come this afternoon…) • The question of learning DFA has received attention for 30 years • Typical protocol consists in learning from a bunch of data: you need a lot of data if you want to learn… ICGI, Valencia, September 2010

State of the art (2) • Alternative introduced by Angluin: the learner can make queries to an oracle • Typical queries are membership q., equivalence q., subset q. or correction q. • Algorithm L* can learn DFA with a polynomial amount of resources ICGI, Valencia, September 2010

State of the art (3) Many reasons for wanting to learn DFA from queries • Useful in a number of fields • Start with DFA… • Under-resourced languages ICGI, Valencia, September 2010

The task • The participant is told that (s)he is to learn a DFA and allowed to ask k membership queries • She is given the alphabet, k, and an upper bound on the number of states. • The participant interactively uses the online oracle, and after making k queries, is given 1800 strings that she has to parse and classify. Score is % of correct labels. ICGI, Valencia, September 2010

The baseline • Angluin’s L* algorithm learns perfectly but uses MQ and EQs • A version in which EQs are “simulated” by random sampling is provided ICGI, Valencia, September 2010

A membership query • Learner: does aababababbbab belong to the language? • Oracle: no ICGI, Valencia, September 2010

An equivalence query • Learner: Is (aa*(b+ab)*bb+aa)* the correct answer? • Oracle: No, because aabababba does belong to the language ICGI, Valencia, September 2010

Simulating an equivalence query • Random strings are sampled: aabba, bbabba, aaaababab, bbabababaaaa,… • Learner’s hypothesis: aabbaL • Learner: does aabba belong to L? • Oracle: yes (if we agree many times I can’t be far off) • Oracle: no (aabba can be used as a counterexample) ICGI, Valencia, September 2010

The theory • DFA are learnable with MQ and EQ • DFA are not learnable from a polynomial number of MQ • You can’t really simulate the EQ through sampling because you don’t know what the distribution is ICGI, Valencia, September 2010

The oracle (1) • is given an upper bound n on the number of queries and the size of the alphabet • generates a (minimal) DFA with at most n states • runs the baseline on this DFA and halts as soon as it is 70% correct. This gives the number of queries (k) for that task. • gives the player an identifier. ICGI, Valencia, September 2010

The oracle (2) • interacts with the learner and answers to k queries • generates 1800 strings and gives them to the learner • receives the 1800 labels and computes the score ICGI, Valencia, September 2010

Scientific committee • Dana Angluin, Yale University, USA • Leo Becerra Bonache, Univ. de Tarragona, Spain • François Coste, IRISA, Rennes, France • Alex Clark, Royal Holloway Univ. of London, UK • Ricard Gavaldá, UPC Barcelona, Spain • Colin de la Higuera, U. Saint-Etienne/Nantes, France • Jean-Christophe Janodet, U. de Saint-Etienne, France • Aurélien Lemay, Université de Lille 3, France • Laurent Miclet, ENSSAT Lannion and IRISA, France • Tim Oates, University of Maryland, USA • Anssi Yli-Jyrä, Helsinki, Finland • Menno van Zaanen, Tilburg University, The Netherlands ICGI, Valencia, September 2010

Organisation committee • Myrtille Ponge • David Combe • Jean-Christophe Janodet • Colin de la Higuera ICGI, Valencia, September 2010

Some open issues • How should the DFA be generated? • What is a random DFA? • Generate random NFA instead? • Should they not be “typical DFA”? • What distribution for the test set? • If the distribution is known, this helps! • How do we have a fair competition? ICGI, Valencia, September 2010

Main dates • 23rd of July 2009: official launch • till May 2010: advertising and training phase • June 2010: competition phase • 7th July 2010: results published • September 2010: Workshop / Special session ICGI, Valencia, September 2010

Zulu competition • http://labh-curien.univ-st-etienne.fr/zulu • 23 competing algorithms, 11 players • End of the competition a week ago. • Tasks: • Learn a DFA, be as precise as possible, with n queries ICGI, Valencia, September 2010

Results ICGI, Valencia, September 2010

Winners • Falk Howar • Balle • Eisenstat ICGI, Valencia, September 2010

Zulu: an active finite state machine learning competition