160 likes | 291 Views
11 th ECESS Meeting College Language Resources. 0. Minutes making for College ‘Language Resources’ 1. Goal of meeting 2. Status members of College 3. Interests and acceptance of associated members and observers 4. Acceptance of College minutes of last meeting
E N D
11th ECESS MeetingCollege Language Resources • 0. Minutes making for College ‘Language Resources’ • 1. Goal of meeting • 2. Status members of College • 3. Interests and acceptance of associated members and observers • 4. Acceptance of College minutes of last meeting • 5. College-Action List of 10th meeting • 6. Status of partners • Pronunciation lexica (Pool Lex1, Pool Lex2) • Acoustic data for TTS voices (Pool Voice1, Pool Voice2) • Text Corpora (Pool Text1, Pool Text2)
7. The actual state of LR specification • Accepting specification forText Corpora (Pool Text1, Pool Text2) • Accepting specification for Acoustic data forTTS voices(minimal requirements, Pool Voice2) • 8. Further plans of partners • 9. Discussion: General issues • ECESS LR specification documents (public webpage) • LR distribution (internal page) • Splitting LR • 10. Discussion: Further directions of LR College • Extension of LR collection (new languages) • Specification for new types of Pools • Publications, promotion of ECESS LR • 11. NewAction List of College Language Resources College 11th ECESS meeting
1. Goal of Meeting • Status and further plans of partners • Interests and acceptance of members, associated members and observers • Accepting the specification for Text Corpora (Pool Text1, Pool Text2) • Finalizing the specification for Acoustic data for TTS voices (Pool Voice2) • ECESS LR specification documents (public and internal page) • Extension of LR collection Language Resources College 11th ECESS meeting
Current members of LR College • AMU (Coordinator) Grażyna Demenko • Siemens (Ute Ziegenhain) • Middle East Technical University, Ankara (Tolga Çiloğlu) • CAS (Jinhua Tao) • Uni Munich (Uwe Reichel) • Associated partners and Observers • Nokia (Imre Kiss) • Microsoft Portugal (Daniela Braga) • University of Bielefeld (Dafydd Gibbon) • CNRS Aix en Provence (Daniel Hirst) 2. Status members of College Language Resources College 11th ECESS meeting
3. Interests and acceptance of associated members and observers Voting a member of LR College • CNRS, Aix en Provence (Daniel Hirst) • University of Bielefeld (Dafydd Gibbon) • Others potentially interested in LR? Language Resources College 11th ECESS meeting
4. Acceptance of College minutes of last meeting • introduction of theagenda • Dafydd Gibbon (Uni Bielefeld) want to contribute (MBROLA diphone voice, German lexicon) • CNRS wants to become member of LR college • present resources: UK lexicon, UK baseline voice, Mandarin lexicon, Mandarin voice, Polish lexicon (extended format), Catalan(UK baseline voice and Polish lexicon still have to be validated) • POS tagging still has to be specified (size of text, domains, tokenisationproblems, tag set, format of POS tags, validation) • minimal requirements for recording voice (Hartmut Pfitzinger) • plans of partners (table of supported languages) Language Resources College 11th ECESS meeting
discussion, general issues: settled documents are on the public web-page, documents wich are still underdiscussion will be only on the internal page • agreed specifications will be renamed as ECESS version, not TC-STAR anymore • splitting LRs, for instance phonetic lexicon: proper names should be put in a separate lexicon, because they are task specific, may confuse the OOV routines, and increase production costs • in college "tools", Maribor acts as a distributor of tools needed forevaluation • promotion of ECESS LR (LREC 2008) • extension of LR collection (new pools, languages) Language Resources College 11th ECESS meeting
5. College-Action List of 10thmeeting • Finalizing specifications for Text Corpora POS: PT1, PT2 • Finalizing specifications for Acoustic data fot TTS voices (PV2) • Lexicons PL1, PL2: final documentation, reports of validation to be published on the internal ECESS pages • Extension of LR collection (new types of Pools e.g., speaker characterization/emotional/pathological voices/speech) Language Resources College 11th ECESS meeting
6. Status of partners Pronunciation lexica (Pool Lex1, Pool Lex2) Acousticdata for TTS voices (Pool Voice1, Pool Voice2) Text Corpora (Pool Text1, Pool Text2) Language Resources College 11th ECESS meeting
7. The actual state of LR specification • Accepting the specification for Text Corpora (Pool Text), Ute Ziegenhain, SIEMENS Tagged text corpora (end of Sept.) • Finalizing the specification for Acoustic data for TTS voices (Pool Voice2), IPDS Kiel • Preparing Polish lexicon (extended version) for validation Language Resources College 11th ECESS meeting
Uni Bielefeld: Input for ECESS The topics proposed so far by the Bielefeld partner are based on current Bielefeld activities and need to be adapted to ECESS needs. After further discussion, it is suggested that the top priority should be in the area of lexicon design i.e. formal specification and XML model for a flexible lexicon format which will permit extension in the following areas:a) Multilingual lexicon for speech synthesis b) Integrated lexicon for multimodal speech synthesis (e.g. gesture sublexicons) c) Integrated lexicon for NLP and synthesis components.A demonstration core lexicon for German is being prepared. Language Resources College 11th ECESS meeting
9. Discussion. General issues • ECESS LR specification documents (public page): The language independent specification is public and should be accessible from the public web-page. • LR distribution (internal webpage): contact information • LSPs specifications (internal page): The language specific data (LSP – language specific peculiarities) is part of the LR dedicated for a pool. The LSPs have to be approved by the LR college and be located on the internal webpage of ECESS (College LR). • Splitting LR The data in the lexicon pool could be divided into lexicon of common words and lexicon of proper names: partners interested only in parts of the lexica could then choose what they want to deliver and exchange. Advantage: some partners may only want to deliver/get certain parts of a particular language; production costs for the different parts are more comparable. Language Resources College 11th ECESS meeting
10. Discussion. Further directions of LR College • Extension of LR collection New types of Pools (e.g. acoustic databases for speaker characterization, emotional databases, special databases with pathological voices/speech) depending on interests and needs of ECESS. Inclusion of new languages. • Specification for new types of Pools: preliminary remarks • Promotion of ECESS LR, publications: SASR, Poland 2008, update the publication list Language Resources College 11th ECESS meeting
11. New Action List of College • Make available to partners, end of Sept. decide on Ute specifications • promotion of ECESS activities SASR Workshop, Poland 2008 (flyers, presentation) (AW) • LR – publications/SASR/Poland’2008 (AW) • emotional databases (exchange the information) (IH) • Specifications for the acoustic data, make the info available (Hatrmut), (AW) • lexicon (PL) evaluation (AW) • Availability of lexica (splitting) (AW) • Collect info about lexica for inflected languages (adding new spcification) (ZK) Language Resources College 11th ECESS meeting