1 / 16

WP3 – Retrieval systems Susan Terge project manager ALL Hungary

WP3 – Retrieval systems Susan Terge project manager ALL Hungary. Enabling Access to Sound Archives through Integration, Enrichment and Retrieval. Introduction to Workpackage participants & schedule. Participants and their contributions QMUL 22 mm – music & cross media retrieval

kaden
Download Presentation

WP3 – Retrieval systems Susan Terge project manager ALL Hungary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP3 – Retrieval systems Susan Terge project manager ALL Hungary Enabling Access to Sound Archives through Integration, Enrichment and Retrieval

  2. Introduction to Workpackageparticipants & schedule • Participants and their contributions • QMUL 22 mm – music & cross media retrieval • DIT: 5 mm – music retrieval • ALL: 24 mm – speech retrieval & vocal queries • LFUI: 3 mm – integration of retrieval engines • NICE: 12 mm – retrieval for attributes of speech • Schedule • T3.1 Music retrieval: month 9 – month 20 • T3.2 Speech retrieval: month 8 – month 20 • T3.3: Cross-media retrieval: month 1 – month 6 month 21 – month 26 • T3.4: Vocal query intf.: month 7 – month 20

  3. Milestones • M3.1 Initial vocal query system tested, initial speech and music retrieval algorithms developed (Month 8) • M3.2 Vocal query is fully-functional, speech and music retrieval implemented, cross-media retrieval method finalized (Month 14) • M3.3 Vocal query finished, speech and music retrieval systems established, basic cross-media retrieval implemented (Month 20) • M3.4 Cross-media retrieval fully functional, further work is only refinement and optimization (Month 26)

  4. Deliverables • D3.1 Report outlining retrieval system functionality and specification (Month 6) • D3.2 Prototype on speech and music retrieval systems with vocal query interface (Month 20 – 2008 Jan 1.) • Schedule: • Until now nothing was arrived !!! • 2007 Dec. 7. – materials arrive from Partners who responsible for chapters: QMUL, SILOGIC, RSAMD • 2007 Dec. 12. – first draft done and shared by ALL • 2007 Dec. 19. – refinements accepted from chapter responsibles and uniform formatting done by ALL • 2007 Dec. 20. – finalized and shared with partners by ALL • 2008 Jan. 10. – confirmed and accepted by all partners and coordinator • D3.3 Prototype on cross-media retrieval system (Month 26 – 2008 July 1.)

  5. Deliverable D3.2 – Prototype on speech and music retrieval with vocal query interface 1.1 Introduction – 5 pages (ALL – generally responsible) 1.2 The EASAIER system – 10-15 pages (SILOGIC – generally responsible) • System architecture • Integration of components 1.3 Music retrieval components – 10-15 pages (QMUL – generally responsible) • Summarizes the research and development • Specifies its interfaces and Documents the code 1.4 Speech retrieval components – 10-15 pages (ALL – generally responsible) • Summarizes the research and development • Specifies its interface and Documents the dlls The vocal query component – 2-3 pages (ALL – generally responsible) • Summarizes the research and development • Specifies its interface and Documents the dlls 1.6 Evaluation of the prototype – 5 pages (RSAMD – generally responsible) • Modul based and integrated evaluation Contributors: ALL, SILOGIC, NICE, QMUL, RSAMD

  6. Deliverable D3.2 – 1.2 The EASAIER system - SILOGIC • Architecture of the EASAIER system – 5-8 pages (SILOGIC) • Retrieval in EASAIER • Role of the ontology • Client side functionality • Server side functionality • Integration of retrieval engines • Music retrieval integration – 1-2 pages (SILOGIC & QMUL) • Speech retrieval integration – 1-2 pages (SILOGIC & ALL) • The EASAIER user interface • Design 2 ½ pages (SILOGIC) • Vocal query – ½ pages (SILOGIC & ALL) • QMUL sends material to SILOGIC, SILOGIC sends to ALL by Dec. 1.

  7. Deliverable D3.2 – 1.5 Speech retrieval components- ALL • Speaker identification – 2-3 pages (NICE) • Emotion detection – 1-2 pages (NICE) • Gender detection – 1-2 pages (NICE) • …[any other feature – language, accent ...]... detection – 1-2 pages (NICE) • Speech segmentation – 1-2 pages (ALL) • Speech content retrieval – 5-6 pages (ALL) • NICE sends material to ALL by Dec. 1.

  8. Deliverable D3.2 – 1.6 Evaluation of the prototype - RSAMD • Soundbite (RSAMD & QMUL) • Speaker retrieval (NICE) • Vocal query – 1/3 page (ALL) • Speech retrieval – 1/2 page (ALL) • RSAMD sends material to ALL by Dec. 1

  9. D4.1 – algorithms from ALL • Delivered with test environment • Speech/non-speech segmentation • Silent detection • In VAMP compatible format • We did not get any response about integration • ALL finished its contribution to WP4

  10. Vocal query - current status • Phoneme level recognition was speeded up • English dictionary was used from TIMIT can be replaced from US Supreme court (perhaps merged) later • Draft API interface was accepted for integration • Dummy mock-up version to start integration is under preparation • A working demo version will be presented • Vocal query user interface was not designed?!?! – we have to discuss it with SILOGIC

  11. Speech retrieval – current status • Draft Speech API interface was accepted by partners • Speech API interface specification was refined by ALL (new version will be share together with the mock up version) • Interfaces was implemented • Inner speech retrieval architecture is under redesign to support multiple training threads and work as a server • English version of the speech retrieval is under development • Parallel performance tuning for English and Hungarian version is in progress and will continue until mid 2008 • Dummy mock-up version for integration is under preparation

  12. Speech retrieval integration – slide 1 • It will be a server application (NT service) • It would be desirable to omit the C++ layer and provide only the web service layer by our Java based serverapplication – please accept!!! • Java -> C++ wrapping is unreasonable and destroys performance • ALL will deliver JAR, DLL and EXE files, the DLLs and EXEs will not be platform independent • The DLLs and EXEs will run on Windows XP/2000/2003 • On request, another version of the DLLs can be produced by ALL for a new platform • Our source (JAVA) is platform indepent, we do not share it

  13. Speech retrieval integration – slide 2 • Methods of the speech API interface • Asynchron processes • Training • char* speech_train(char* xml_descriptor); • Preprocessing • char * speech_preprocess(char* xml_descriptor); • Status investigation • History • speech_process speech_history(char* proc_id); • Repository • speech_train * speech_repository();

  14. Speech retrieval integration – slide 3 • Interactive methods • Retrieval • speech_descriptor* speech_retrieve(SpeechLang lang, char* words, int max_hits, int min_confid_percent); • Vocal query • char* vocal_query(SpeechLang lang, byte* wave, int min_confid_percent); • Fast response is required • Server architecture is required to avoid repeatable initialization

  15. Speech retrieval – common English corpus • US Supreme court corpus is suitable for speech retrieval purposes for ALL • Will it be used by the integrated EASAIER system as well? • Will it be used by NICE as well? • Is there any other corpus to be used in the project for speech retrieval?

  16. DEMO • Speeded-up Vocal query in English and in Hungarian • Speech API demo via web services • Vocal query • Retrieval

More Related