WP3 – Retrieval systems Susan Terge project manager ALL Hungary

WP3 – Retrieval systems Susan Terge project manager ALL Hungary Enabling Access to Sound Archives through Integration, Enrichment and Retrieval

Introduction to Workpackageparticipants & schedule • Participants and their contributions • QMUL 22 mm – music & cross media retrieval • DIT: 5 mm – music retrieval • ALL: 24 mm – speech retrieval & vocal queries • LFUI: 3 mm – integration of retrieval engines • NICE: 12 mm – retrieval for attributes of speech • Schedule • T3.1 Music retrieval: month 9 – month 20 • T3.2 Speech retrieval: month 8 – month 20 • T3.3: Cross-media retrieval: month 1 – month 6 month 21 – month 26 • T3.4: Vocal query intf.: month 7 – month 20

Milestones • M3.1 Initial vocal query system tested, initial speech and music retrieval algorithms developed (Month 8) • M3.2 Vocal query is fully-functional, speech and music retrieval implemented, cross-media retrieval method finalized (Month 14) • M3.3 Vocal query finished, speech and music retrieval systems established, basic cross-media retrieval implemented (Month 20) • M3.4 Cross-media retrieval fully functional, further work is only refinement and optimization (Month 26)

Deliverables • D3.1 Report outlining retrieval system functionality and specification (Month 6) • D3.2 Prototype on speech and music retrieval systems with vocal query interface (Month 20 – 2008 Jan 1.) • Schedule: • Until now nothing was arrived !!! • 2007 Dec. 7. – materials arrive from Partners who responsible for chapters: QMUL, SILOGIC, RSAMD • 2007 Dec. 12. – first draft done and shared by ALL • 2007 Dec. 19. – refinements accepted from chapter responsibles and uniform formatting done by ALL • 2007 Dec. 20. – finalized and shared with partners by ALL • 2008 Jan. 10. – confirmed and accepted by all partners and coordinator • D3.3 Prototype on cross-media retrieval system (Month 26 – 2008 July 1.)

Deliverable D3.2 – Prototype on speech and music retrieval with vocal query interface 1.1 Introduction – 5 pages (ALL – generally responsible) 1.2 The EASAIER system – 10-15 pages (SILOGIC – generally responsible) • System architecture • Integration of components 1.3 Music retrieval components – 10-15 pages (QMUL – generally responsible) • Summarizes the research and development • Specifies its interfaces and Documents the code 1.4 Speech retrieval components – 10-15 pages (ALL – generally responsible) • Summarizes the research and development • Specifies its interface and Documents the dlls The vocal query component – 2-3 pages (ALL – generally responsible) • Summarizes the research and development • Specifies its interface and Documents the dlls 1.6 Evaluation of the prototype – 5 pages (RSAMD – generally responsible) • Modul based and integrated evaluation Contributors: ALL, SILOGIC, NICE, QMUL, RSAMD

Deliverable D3.2 – 1.2 The EASAIER system - SILOGIC • Architecture of the EASAIER system – 5-8 pages (SILOGIC) • Retrieval in EASAIER • Role of the ontology • Client side functionality • Server side functionality • Integration of retrieval engines • Music retrieval integration – 1-2 pages (SILOGIC & QMUL) • Speech retrieval integration – 1-2 pages (SILOGIC & ALL) • The EASAIER user interface • Design 2 ½ pages (SILOGIC) • Vocal query – ½ pages (SILOGIC & ALL) • QMUL sends material to SILOGIC, SILOGIC sends to ALL by Dec. 1.

Deliverable D3.2 – 1.5 Speech retrieval components- ALL • Speaker identification – 2-3 pages (NICE) • Emotion detection – 1-2 pages (NICE) • Gender detection – 1-2 pages (NICE) • …[any other feature – language, accent ...]... detection – 1-2 pages (NICE) • Speech segmentation – 1-2 pages (ALL) • Speech content retrieval – 5-6 pages (ALL) • NICE sends material to ALL by Dec. 1.

Deliverable D3.2 – 1.6 Evaluation of the prototype - RSAMD • Soundbite (RSAMD & QMUL) • Speaker retrieval (NICE) • Vocal query – 1/3 page (ALL) • Speech retrieval – 1/2 page (ALL) • RSAMD sends material to ALL by Dec. 1

D4.1 – algorithms from ALL • Delivered with test environment • Speech/non-speech segmentation • Silent detection • In VAMP compatible format • We did not get any response about integration • ALL finished its contribution to WP4

Vocal query - current status • Phoneme level recognition was speeded up • English dictionary was used from TIMIT can be replaced from US Supreme court (perhaps merged) later • Draft API interface was accepted for integration • Dummy mock-up version to start integration is under preparation • A working demo version will be presented • Vocal query user interface was not designed?!?! – we have to discuss it with SILOGIC

Speech retrieval – current status • Draft Speech API interface was accepted by partners • Speech API interface specification was refined by ALL (new version will be share together with the mock up version) • Interfaces was implemented • Inner speech retrieval architecture is under redesign to support multiple training threads and work as a server • English version of the speech retrieval is under development • Parallel performance tuning for English and Hungarian version is in progress and will continue until mid 2008 • Dummy mock-up version for integration is under preparation

Speech retrieval integration – slide 1 • It will be a server application (NT service) • It would be desirable to omit the C++ layer and provide only the web service layer by our Java based serverapplication – please accept!!! • Java -> C++ wrapping is unreasonable and destroys performance • ALL will deliver JAR, DLL and EXE files, the DLLs and EXEs will not be platform independent • The DLLs and EXEs will run on Windows XP/2000/2003 • On request, another version of the DLLs can be produced by ALL for a new platform • Our source (JAVA) is platform indepent, we do not share it

Speech retrieval integration – slide 2 • Methods of the speech API interface • Asynchron processes • Training • char* speech_train(char* xml_descriptor); • Preprocessing • char * speech_preprocess(char* xml_descriptor); • Status investigation • History • speech_process speech_history(char* proc_id); • Repository • speech_train * speech_repository();

Speech retrieval integration – slide 3 • Interactive methods • Retrieval • speech_descriptor* speech_retrieve(SpeechLang lang, char* words, int max_hits, int min_confid_percent); • Vocal query • char* vocal_query(SpeechLang lang, byte* wave, int min_confid_percent); • Fast response is required • Server architecture is required to avoid repeatable initialization

Speech retrieval – common English corpus • US Supreme court corpus is suitable for speech retrieval purposes for ALL • Will it be used by the integrated EASAIER system as well? • Will it be used by NICE as well? • Is there any other corpus to be used in the project for speech retrieval?

DEMO • Speeded-up Vocal query in English and in Hungarian • Speech API demo via web services • Vocal query • Retrieval

WP3 – Retrieval systems Susan Terge project manager ALL Hungary