1 / 35

CLEF-2007 Cross-Language Speech Retrieval Track Overview

CLEF-2007 Cross-Language Speech Retrieval Track Overview. CU: Pavel Pecina , Jan Hajic, Petra Hoffmannova DCU: Gareth Jones , Ying Zhang UMD: Doug Oard , Dagobert Soergel , Scott Olsson IBM: Bhuvana Ramabhadran JHU: Bill Byrne (Cambridge), Zak Shafran (OHSU) USC: Sam Gustman

darrin
Download Presentation

CLEF-2007 Cross-Language Speech Retrieval Track Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLEF-2007Cross-Language Speech RetrievalTrack Overview CU: Pavel Pecina, Jan Hajic, Petra Hoffmannova DCU: Gareth Jones, Ying Zhang UMD: Doug Oard, Dagobert Soergel, Scott Olsson IBM: Bhuvana Ramabhadran JHU: Bill Byrne (Cambridge), Zak Shafran (OHSU) USC: Sam Gustman UWB: Pavel Ircing

  2. Speech “Retrieval” Evaluations • 1996-1998: TREC SDR • English broadcast news; English queries • 1997-2004: TDT • multilingual news; query by example • 2003-2004: CLEF CL-SDR • English broadcast news; Multilingual queries • 2005-2007: CLEF CL-SR • English/Czech interviews; Multilingual queries • 2007: CLEF QAST • English lectures/meetings, English questions

  3. What’s New in 2007? • Czech • Fixed “quickstart” time alignment problem! • 29 training topics, 42 new evaluation topics • 3 new teams (Brown, Chicago, Charles U) • English • 17% relative improvement over 2006 (TD, ASR) • 4 new teams (Brown, Chicago, Jaen, Amsterdam) • Same topics and ASR as 2006 • 63 training topics, 33 evaluation topics

  4. CLEF‒2007Cross-Language Speech Retrieval Track Overview

  5. English ASR ASR2006A ASR2004A ASR2003A Training: 200 hours from 800 speakers

  6. <DOC> <DOCNO>VHF00009-056154.003</DOCNO> <INTERVIEWDATA> Sidonia L... | 1927 | Shaindl | L... | Sydzia </INTERVIEWDATA> <NAME>Issac L..., Cyla L...</NAME> <MANUALKEYWORD> Shabbat | Jewish identity | customs and observances, Jewish | Przemysl (Poland) | food | Poland 1918 (November 11) - 1939 (August 31) </MANUALKEYWORD> <SUMMARY>SL recounts her daily activities. She notes her family's Jewish identity and she talks about a typical Shabbat. SL describes cholent.</SUMMARY> <ASRTEXT2003A>december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of =em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home any … <ASRTEXT2004A>december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of =em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home … <ASRTEXT2006A>december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of =em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home any … <ASRTEXT2006B>december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of =em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home any … <AUTOKEYWORD2004A1> cultural and social activities | customs and observances, Jewish | family life | food | Shabbat | sports and games | education | family homes | grandmothers | education, Jewish | Jewish-gentile relations | schools | synagogues | Polish (language) | working life | photographs (stills) 1930s | Poland 1918 (November 11) - 1939 (August 31) | Poland 1935 (May 13) - 1939 (August 31) | Cracow (Poland) | Germany 1918 (November 11) - 1939 (August 31) </AUTOKEYWORD2004A1> <AUTOKEYWORD2004A2> Poland 1918 (November 11) - 1939 (August 31) | customs and observances, Jewish | education | cultural and social activities | extended family members | education, Jewish | family life | Jewish-gentile relations | Jewish identity | Hungary 1918 (November 11) - 1939 (August 31) | Shabbat | sports and games | Budapest (Hungary) | Poland 1941 (June 21) - 1944 (July 21) | synagogue attendance | Hungary 1939 (September 1) - 1944 (March 18) | food in the ghettos | forced labor in the ghettos | fate of loved ones | food </AUTOKEYWORD2004A2> </DOC>

  7. An English Topic Number: 1148 Title: Jewish resistance in Europe Description: Provide testimonies or describe actions of Jewish resistance in Europe before and during the war. Narrative: The relevant material should describe actions of only- or mostly Jewish resistance in Europe. Both individual and group-based actions are relevant. Type of actions may include survival (fleeing, hiding, saving children), testifying (alerting the outside world, writing, hiding testimonies), fighting (partisans, uprising, political security) Information about undifferentiated resistance groups is not relevant.

  8. Automatic English TD Runs Ottawa DCU Brown Chicago Amsterdam

  9. Automatic English TD Runs AK1 = AUTOKEYWORD2004A1, AK2 = AUTOKEYWORD2004A2, ASR03 = ASRTEXT2003A, ASR04 = ASRTEXT2004A, ASR06A =ASRTEXT2006A, and ASR06B = ASRTEXT2006B.

  10. Wilcoxon Signed-Rank Test UO ‒ DCU The number of nonzero tests is --> 33 The sum of the signed rank is --> 86.000000 The 95.0% level of confidence is --> 184.129807 Methods cannot be separated. UO ‒ BLLIP The number of nonzero tests is --> 33 The sum of the signed rank is --> 147.000000 The 95.0% level of confidence is --> 184.129807 Methods cannot be separated. DCU ‒ BLLIP The number of nonzero tests is --> 32 The sum of the signed rank is --> 12.000000 The 95.0% level of confidence is --> 175.945801 Methods cannot be separated. BLLIP ‒ UC The number of nonzero tests is --> 33 The sum of the signed rank is --> 198.000000 The 95.0% level of confidence is --> 184.129807 Method 1 is better than method 2. UC ‒ UVA The number of nonzero tests is --> 33 The sum of the signed rank is --> 138.000000 The 95.0% level of confidence is --> 184.129807 Methods cannot be separated.

  11. Monolingual vs. Cross-lingual (Automatic TD Runs)

  12. Pavel Pecina pecina@ufal.mff.cuni.cz CLEF 2007: The CL-SR Czech Track

  13. What’s Different in Czech? • Lack of manual interview annotation • no topic boundaries (start and stop times) • no description (summary, assessors' scratchpad) • English labels (assigned thesaurus terms) not used • Unknown-boundary relevance assessment • manual labeling of start and stop times • Task: to identify appropriate replay points • focus on start time, stop times ignored • modified mGAP used as the evaluation measure • penalization for not exact match

  14. Interviews • Czech Holocaust survivors testimonies • 357 mostly seen speakers, ~565 hours • 35% ASR mean Word Error Rate 2007 Quickstart collection • 11377 automatically generated overlapping passages • average passage duration 3.75 min, 33% overlap • fields: DOCNO, INTERVIEWDATA, ASRSYSTEM, CHANNEL, ASRTEXT • no thesaurus terms used in 2007

  15. Interview Usage Unseen Seen 180 60 90 120 30 150 0 ASR Train Minutes from interview start … 335 Seen Czech 22 Unseen … … 800 ASR Train English 297 IR Eval …

  16. <DOCNO>VHF04106-7401.30</DOCNO> <INTERVIEWDATA>Tommy K...-K...</INTERVIEWDATA> <ASRSYSTEM>2006</ASRSYSTEM> <CHANNEL>right</CHANNEL> <ASRTEXT>PŘIVEZLI VĚZNĚ NA NOSÍTKÁCH A PROSTĚ MUSEL TADY BYL SKUTEČNĚ KAŽDÝ KDO TO V TEREZÍNĚ V TÉ DOBĚ BYL ALE STEJNĚ VÝSLEDEK BYL MYSLÍM ŽE MLUVÍ </s> <s> STOJÍM PROSTORU KDE </s> <s> AŽ DO KONCE KVĚTNA ČTYŘICET TŘI PŘICHÁZELY TRANSPORTY Z BOHUŠOVIC TEĎ JEŠTĚ CHODILI PĚŠKY ALE POTOM UŽ JE TO DEVĚT ČASTO MOJE TO ZACHOVALA ČÁST KOLEJÍ KTERÉ VEDLY AŽ TAK ZA HAMBURSKÝ KASÁRNA TAM POTOM UŽ OD KVĚTNA ČTYŘICET TŘI DOCHÁZELY VŠECHNY TRANSPORTY JAK Z VENKU TAKÉ OPAČNĚ KDYŽ TEREZÍNA ŠLY TRANSPORTY DO TAKOVÉHO </s> <s> ZA NÁMI TO BYLA JSOU HAMBURSKÝ KASÁRNA TAM JSOU HANNOVERSKÝCH KASÁRNÁCH TADY NA NA TOM O TOM PROSTORU PŘED TOU HROMADOU VĚCÍ MIMOŘÁDNĚ ŠPATNÉ I TADY TŘEBA MÍSTO KDE PŘICHÁZELY TRANSPORTY KDYŽ ČILI DIALOSTECKÝ DĚTI DIALOSTECKÝ DĚTI DVANÁCTSET NEBO KDYŽ POTOM CHODILY TRANSPORTY Z NĚMECKA TAK PROSTĚ NA TEN NA TĚCH KOLEJÍCH SE DĚLALO JAK TRANSPORTOVÁNI LIDI CO PŘICHÁZELI SEM TAK TRANSPORTOVÁNI TY VĚCI DÁL DO OSVĚTIMI NEJVĚTŠÍ TRANSPORTY SE TO ZÁŘÍ ČTYŘICET ČTYŘI KDY TAKÉ ODEŠEL MŮJ OTEC A BRATR TO UŽ BYLO ASI DESET TRANSPORTU PO TISÍCI MUŽÍCH TI ŽIDOVŠTÍ ČINNOST KTERÝ SE VELKÁ VĚTŠINA NEVRÁTIL </s> <s> TAK SEM PODÍVALA DĚLÁME NA BAŠTU TO JE ZAJÍMALI PROSTOR TÍM NĚJAKÝM OSOBNÍ VZPOMÍNKU </s> <s> NA TOM NAHOŘE TO BYLO ZAHRADNICTVÍ VLÁDNOUT ALE NĚJAKÝ DO DOBY NEŽ U TEN TAM BYL POSTAVENÝ DOMEČEK ALE BYLO TAM TAKÉ MALÝ V ODBOJI HŘIŠTĚ TAM JSME NIKDY NE- NĚKDY PŘESNĚ TO VÍM SEDMADVACÁTÉHO KVĚTNA ČTYŘICET ČTYŘI JSME TAM HRÁLI FOTBAL UTKÁNÍ SPARTA NEŽ JÁ VÍM SPARTA TO BYL KLUK Z KLUKŮ KTERÝ JSME BYDLELI TADY V HAMBURSKÝCH KASÁRNÁCH TAKŽE TEN PODVOZEK ALE NAPŘED BYLI MY JSME MĚLI SVÍČKU NEBO SE SPARTA A SESTRY STA SEDMNÁCT TY TO BYL DĚTSKÝ DOMOV KLUKŮ TAM MĚLI SE STALO PŘED VNUČKU NEŽ ALE MY JSME TEHDY DO UTKÁNÍ </s> <s> PROHRÁLI TŘI JEDNA ALE TO NENÍ PODSTATNÝ PODSTATNĚ TO ŽE TEN ZÁPAS A JÁ TADY DVACÁTÉHO SEDMÉHO KVĚTNA ČTYŘICET ČTYŘI A ŽE SE NA NĚ BYL JEŠTĚ PŮJDE OTEC PAK UŽODJEL DÁL A UŽ SEM U NÍ NEVĚDĚL </s> <s> KOUKÁME SE NÁM ÚZKÁ KASÁRNA TO JSOU KASÁRNA KTERÁ BYLA DŘEVĚNÁ PRO ŽENY TAK SEM TADY S MAMINKOU HNED V TOM ČTYŘICÁTÉM DRUHÉM JSME SEM PŘIŠLI TAK JSME TADY BYLI UBYTOVANÝ NA POKOJ MĚSTĚ ŠEST ŽE TO TYPICKY KASÁRENSKÝCH DOBU KASÁRENSKÝCH BUDOVAL S DVĚMA DVORY VELIKÝM </s> <s> A VPRAVO BYLO JSOU HANNOVERSKÝCH KASÁRNÁCH TA MOJE ČERNÁ PEKÁRNA TAM SE TAK CHLEBA I PRO TEREZÍN TO PRO NÁS A BYL TAM TAKÉ TAKOVÝ DVŮR ODJÍŽDĚLI ŽENY CO VOZILI NA TU DOBRANSKÝCH VOZECH VŠECHNO CO BYLO POTŘEBA TEREZÍNĚ DĚLAL VŠE SE VOZILO S- VOZY KTERÝ BYLI V TOM DVOŘE TAMHLE VZADU MAMINKA TAM BYL ZAMĚSTNÁN OSUDU HUNDERTSCHAFT </s> <s> JSME UMĚLY HAMBURSKÝCH KASÁREN PŘED TÍM JSME SE DÍVALI OTEC PO TOM PRVNÍM DVOŘE HAMBURSKÝ KASÁRNA KDE BYLI </s> <s> ÚPLNĚ TEDA VĚTŠINOU ŽENY JÁ SEM ZABIL S MATKOU PŮL ROKU DVAAČTYŘICET </s> <s> BUDE TAKÉ VIDĚT MÍSTO KDE JSME BYLI BYLO TO MÍSTO ČÍSLO DVĚ STĚ ŠEST S MAMINKOU SPALY SAMOŘEJMĚ TÍM ŽE MATKA CHTĚLA</ASRTEXT>

  17. Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 9 Unused 10 2006 Czech Safety 115 28 2006 Czech Evaluation 1 2006 Czech Evaluation 29

  18. Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 9 Unused 10 2006 Czech Safety 115 28 2006 Czech Evaluation 1 2006 Czech Evaluation 29 29 2007 Czech Training 29

  19. Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 9 Unused 10 2006 Czech Safety 3 2007 Czech Safety 118 28 2006 Czech Evaluation 1 2006 Czech Evaluation 29 29 2007 Czech Training 29

  20. Topics selection 40 Possible 2007 Czech Evaluation 10 Possible 2007 Czech Evaluation (6 or more relevant passages identified during search-guided assessment) 50 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 9 Unused 10 2006 Czech Safety 3 2007 Czech Safety 118 28 2006 Czech Evaluation 1 2006 Czech Evaluation 29 29 2007 Czech Training 29

  21. Topics selection 8 2007 Czech Evaluation 34 2007 Czech Evaluation (highly-ranked assessment completed) 42 40 Possible 2007 Czech Evaluation 10 Possible 2007 Czech Evaluation (6 or more relevant passages identified during search-guided assessment) 50 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 9 Unused 10 2006 Czech Safety 3 2007 Czech Safety 118 28 2006 Czech Evaluation 1 2006 Czech Evaluation 29 29 2007 Czech Training 29

  22. Evaluation Measure • based on the mean Generalized Average Precission • human assessments are binary • degree of match to the assessments can be partial • penalization for non 100% match up to 150 sec 0 sec -75 sec +150 sec 1.0 0.5 0.0 • quantization noise (scores lower than for English) • 15 sec assessment granurality • quickstart documents begin every 150 sec

  23. Relevance Judgements • performed by 6 relevance assessors in Prague Search-guided assessment • completed for 87 topics • 2156 rel. passages identified in the evaluation topics Highly-ranked assessment • completed for 42 topics • pool depth set to 50 start times • 11896 highly-ranked start times checked (284/topic) • 233 rel. passages identified

  24. Relevance Assessment Interface

  25. Relevance Judgement Results

  26. Relevance Judgement Statistics

  27. Participation • Brown University (BLLIP) • Matthew Lease, Eugene Charniak • Charles University (CUNI) • Pavel Češka, Pavel Pecina • University of Chicago (UC) • Gina-Anne Levow • University of West Bohemia (UWB) • Pavel Ircing, Luděk Müller • total of 15 runs submitted • required condition: automatic queries from Title and Description

  28. Results

  29. Results

  30. Results: Term normalization • the effect of term normalization for handling Czech morphology is quite significant: • 60-120% relative improvement

  31. Alignment Issues in the Quickstart Collection • 2006 data release (affected 2006 working notes) • Time mismatch made mGAP uninformative (pauses ignored) • Post-CLEF 2006 evaluation (“corrected” in 2006 proceedings) • Post-hoc start time correction (but missing tapes counted as 30 min) • AUTO and MANUAL KEYWORDS still misaligned • 2007 data release • Some additional corrections for ASR timing • AUTO and MANUAL KEWYORDS removed (too hard to fix) • 2007 evaluation (reported in 2007 working notes) • Missing-tape timing corrected post-hoc

  32. Test Collection Release • CLEF CL-SR track test collections: • Package for release • Independent (cross-site) validation • Deposit at ELDA • MALACH ASR training data: • Package English and Czech for release • With Polish, Russian, Slovak (+ maybe Hungarian) • Deposit at LDC

  33. What Did We Learn? • Searching conversational speech works • Real user needs, two languages • Improving ASR helps less than expected • Error rates vary by speaker • Ranked retrieval prefers lower error rates • Automatic classification can help ASR • At least if error rates are high • Unsegmented sources bring new challenges • Cross-sourcealignment • Evaluation measure design

  34. Critiquing the Collection • Large for ASR is small for IR • ~1,000 hours of speech = ~20,000 “documents” • No manual reference transcription • Would cost ~$100,000 • Interviews are just one type of conversation

More Related