580 likes | 741 Views
Using Oral History to Learn About Searching Spontaneous Conversational Speech. Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies University of Maryland, College Park. Outline. Spoken word collections The MALACH Project
E N D
Using Oral History to Learn About Searching Spontaneous Conversational Speech Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies University of Maryland, College Park University of Kentucky
Outline • Spoken word collections • The MALACH Project • Building a IR test collection • First experiments • Some things to think about
Spoken Word Collections • Broadcast programming • News, interview, talk radio, sports, entertainment • Scripted stories • Books on tape, poetry reading, theater • Spontaneous storytelling • Oral history, folklore • Incidental recording • Speeches, oral arguments, meetings, phone calls
Outline • Spoken word collections • The MALACH Project • Building an IR test collection • First experiments • Some things to think about
Shoah Foundation Collection • Substantial scale • 116,000 hours; 52,000 interviews; 32 languages • Spontaneous conversational speech • Accents, elderly, emotional, … • Accessible • $100 million collection and digitization investment • Manually indexed (10,000 hours) • Segmented, thesaurus terms, people, summaries • Users • A department working full time on dissemination
Interview Excerpt • Audio characteristics • Accented (this one is unusually clear) • Separate channels for interviewer / interviewee • Dialog structure • Interviewers have different styles • Content characteristics • Domain-specific terms • Named entity mentions and relationships
Topic Segmentation Categorization Extraction Translation Language Technology English Czech Russian Slovak Interactive Search Systems Speech Technology Search Technology Test Collection Interface Development User Studies The MALACH Project
English ASR Accuracy Training: 200 hours from 800 speakers
Outline • Spoken word collections • The MALACH Project • Building an IR test collection • First experiments • Some things to think about
History Linguistics Journalism Material culture Education Psychology Political science Law enforcement Book Documentary film Research paper CDROM Study guide Obituary Evidence Personal use Who Uses the Collection? Discipline Products Based on analysis of 280 written requests
8 independent searchers Holocaust studies (2) German Studies History/Political Science Ethnography Sociology Documentary producer High school teacher 8 teamed searchers All high school teachers Thesaurus-based search Rich data collection Intermediary interaction Semi-structured interviews Observational notes Think-aloud Screen capture Qualitative analysis Theory-guided coding Abductive reasoning Observational Studies
Powerful testimonies give teachers ideas on what to discuss in the classroom (topic) and how to introduce it (activity). “The brainstorming really guided my search today, and I felt like I finally had a big enough chunk of time on this to really find something, but I need about a couple hundred more hours … Yesterday in my search, I just felt like I was kind of going around in the dark. But that productive writing session really directed my search, even though I stayed with [the same testimony] the whole time.” • Group discussions clarify themes and define activities, which hone teachers’ criteria. 8 teachers, working in groups of 4
Thesaurus-Based Search 8 teachers, working in groups of 4
Relevance Criteria 6 Scholars, 1 teacher, 1 film producer, working individually
Topicality Total mentions 6 Scholars, 1 teacher, 1 movie producer, working individually
Search Architecture Query Formulation Speech Recognition Automatic Search Boundary Detection Content Tagging Interactive Selection
Interviews Topic Statements Comparable Collection Ranked Lists Relevance Judgments Evaluation Mean Average Precision MALACH Test Collection Query Formulation Speech Recognition Automatic Search Boundary Detection Content Tagging
4,000 English Interviews 9.947 segments ~400 words each (total: 625 hours) 10,000 hours, full-description manual indexing
<DOCNO>VHF00017-062567.005</DOCNO> <KEYWORD> Warsaw (Poland), Poland 1935 (May 13) - 1939 (August 31), awareness of political or military events, schools </KEYWORD> <PERSON> Sophie Perutz, Henry Hemar </PERSON> <SUMMARY> AH talks about the college she attended before the war. She mentions meeting her husband. She discusses young peoples' awareness of the political events that preceded the outbreak of war. </SUMMARY> <SCRATCHPAD>graduated HS, went to college 1 year, professional college hotel management; met future husband, knew that they'd end up together; sister also in college, nice social life, lots of company, not too serious; already got news from Czechoslovakia, Sudeten, knew that Poland would be next but what could they do about it, very passive; just heard info from radio and press </SCRATCHPAD> <ASRTEXT> no no no they did no not not uh i know there was no place to go we didn't have family in a in other countries so we were not financially at the at extremely went so that was never at plano of my family it is so and so that was the atmosphere in the in the country prior to the to the war i graduate take the high school i had one year of college which was a profession and that because that was already did the practical trends f so that was a study for whatever management that eh eh education and this i i had only one that here all that at that time i met my future husband and that to me about any we knew it that way we were in and out together so and i was quite county there was so whatever i did that and this so that was the person that lived my sister was it here is first year of of colleagues and and also she had a very strongly this antisemitic trend and our parents there was a nice social life young students that we had open house always pleasant we had a lot of that company here and and we were not too serious about that she we got there we were getting the they already did knew he knew so from czechoslovakia from they saw that from other part and we knew the in that that he is uhhuh the hitler spicy we go into this year this direction that eh poland will be the next country but there was nothing that we would do it at that time so he was a very very he says belong to any any organizations especially that the so we just take information from the radio and from the dress </ASRTEXT>
Topic Construction • 280 topical requests, in folders at VHF • From scholars, teachers, broadcasters, … • 50 selected for use in the collection • Recast in TREC topic format • Some needed to be “broadened” • 30 assessed during Summer 2003 • 28 yielded at least 5 relevant segments
An Example Topic Number: 1148 Title: Jewish resistance in Europe Description: Provide testimonies or describe actions of Jewish resistance in Europe before and during the war. Narrative: The relevant material should describe actions of only- or mostly Jewish resistance in Europe. Both individual and group-based actions are relevant. Type of actions may include survival (fleeing, hiding, saving children), testifying (alerting the outside world, writing, hiding testimonies), fighting (partisans, uprising, political security) Information about undifferentiated resistance groups is not relevant.
Assessment Strategy • Exhaustive (Cranfield) is not scalable • Pooled (TREC) is not yet possible • Requires a diverse set of ASR and IR systems • Will be used at CLEF 2005 Speech Retrieval track • Search-guided (TDT) was viable • Iterate topic research/search/assessment • Augment with review, adjudication, reassessment • Requires an effective interactive search system • 28 topics: 821 hours/3 months/4 assessors
Defining Topical “Relevance” • “Classic” relevance (to “food in Auschwitz”) • Direct Knew food was sometimes withheld • Indirect Saw undernourished people • Additional relevance types • Context Intensity of manual labor • Comparison Food situation in a different camp • Pointer Mention of a study on the subject
Recording Judgments • 14 topics independently assessed • Assessors later met to resolve differences • 14 topics assessed and then reviewed • Decisions of the reviewer were final Average: 3.2 minutes per judgment
“Relevant” Mapping to Binary Relevance Number of judgments, by type and degree of relevance 3,643 adjudicated judgment pairs
Assessor Agreement (2122) (1592) (184) (775) (283) (235) • 44% topic-averaged overlap for Direct+Indirect 2/3/4 judgments 14 topics, 4 assessors in 6 pairings, 1806 judgments
Outline • Spoken word collections • The MALACH Project • Building an IR test collection • First experiments • Some things to think about
ASR-Based Search Mean Average Precision Title queries, adjudicated judgments
Comparing Index Terms +Persons Title queries, adjudicated judgments
Comparing ASR and Metadata Title queries, adjudicated judgments
Failure Analysis ASR % of Metadata Title queries, adjudicated judgments
What Causes the Difference? • Hypothesis 1: Good human indexers • Maybe people don’t speak the query terms • Human indexers can still detect the topic • Hypothesis 2: Weak ASR language model • ASR does best on “newspaper terms” • {Bulgaria, partisans} >> {Auschwitz, sonderkommando} • Mixture: 200 hours in-domain + gigaword corpus
Failure Analysis ASR % of Metadata Title queries, adjudicated judgments
Searching Manual Transcripts jewish kapo(s) fort ontario refugee camp Title queries, adjudicated judgments
Results on 15 interviews Named Entity Word Error Rate (NE WER) Halved the Named Entity WER! Subheading: 20pt Arial Italicsteal R045 | G182 | B179 Overall Word Error Rate (WER) Highlight: 18pt Arial Italic,teal R045 | G182 | B179 Key Result: Use smaller , metadata-adapted vocabularies Text slide withsubheading and highlight
Categorization Using Automatic Speech Recognition kNN, test: 332 segments, 216 categories with 10+ training samples Subheading: 20pt Arial Italicsteal R045 | G182 | B179 Training: training ASR Word Error Rate ~ 47% • Equal performance for human and ASR transcripts • Improvement with additional training data Highlight: 18pt Arial Italic,teal R045 | G182 | B179 Text slide withsubheading and highlight
Segment with Categories what can you tell me about the holidays in your house they were very very nice and my father was very religious and everything was kept like it's suppose to be you know like it's written in the Torah … and there was a extra room specialif a very poor man came and he stayed overnight there wasa room for him to sleep there … my mother mainly she thought that this is a very big mitzvah to do you know because there in in where I come from there was a lot of poor people not from our town from out of town but use to come and sometimes they couldn't make it back home so they slept over and sometime they stayed over Shabbats … my mother helped in the business and we had a nanna that took care of us and we also had a maid in the house…my mother only did the cooking … my father with his second wife didn't have children for twenty years he lived with her and they didn't have any children … then he married my mother and with my mother they had four children… Human Indexing kNN Jewish customs and observance family life socioeconomic status Czechoslovakia 11/11/1918 - 3/14/1939 Jewish customs and observance family life extended family members family homes Poland 11/11/1918 - 8/31/1939 Precision: 2/5, Recall: 2/4, F = 44%
3,199 Training segments test segments Spoken Words (hand transcribed) Spoken Words (ASR transcript) kNN Categorization Thesaurus Terms Thesaurus Terms F=0.19 (microaveraged) Index Thesaurus Terms ASR Words Title queries, linear score combination, adjudicated judgments Category Expansion
Average of 3.4 relevant segments in top 20 +27% ASR-Based Search Mean Average Precision Title queries, adjudicated judgments
Human-Assigned Segment Boundary ... because the roads were crowded with with army units going back and forth you know .. and you also were off you had to walk no on the main road because you were afraid you were going to be picked up for work .. that's what some did they came to Loetche and some people were picked up and held four weeks for work .. when they came home they told us on the way --- segment boundary --- we came we came home was was about the time of Succoth .. you know the city was deserted there was a they were already taking people to work .. when we came home we couldn't recognize the city .. my parents first of all they confiscated everything .. they told us to get out of the orchard .. they took whatever they wanted they took over the whole ranch ... arrival Agenda slide
Probabilistic Models for Segmentation • Model features • semantic • left-right window similarity • lexical • “key” words and phrases: • yes: “tell me”, “back to”, no: “did they”, “and there” • prosodic • silence duration, rate of speech • structural • position in the file, clause length Agenda slide
Segmentation Using Automatic Speech Recognition Training: training ASR Word Error Rate ~ 47% Subheading: 20pt Arial Italicsteal R045 | G182 | B179 • Equal performance for human and ASR transcripts • Modest improvement with additional ASR training data Highlight: 18pt Arial Italic,teal R045 | G182 | B179 Text slide withsubheading and highlight
CLEF CL-SR Evaluation • Test collection release • Available on research license from ELDA • Packaged as a standard IR collection • ASR Transcripts / known topic boundaries • Contrastive metadata • Training topics (with relevance judgments) • 25 Double-blind evaluation topics • Runs due June 1 2005, results returned Aug 1 • Plan to add Czech in 2006
What Have We Learned? • User studies help guide test collection design • Named entities are important to scholars • Age at time of experience is important to teachers • Test collections guide component development • Dynamic ASR lexicon cuts NE error rate in half • Text classification seems to be helping • Presently depends on lexical overlap w/thesaurus
Shoah Foundation Sam Gustman Cambridge University Bill Byrne Johns Hopkins Jim Mayfield (APL) Charles University Jan Hajic Univ of West Bohemia Josef Psutka IBM TJ Watson Bhuvana Ramabhadran Michael Picheny Martin Franz Nanda Kambhatla University of Maryland Doug Oard (IS) Dagobert Soergel (IS) David Doermann (CS) Bonnie Dorr (CS) Philip Resnik (Linguistics) The MALACH Team
Some Things to Think About • Privacy protection • Working with real data has real consequences • Are fixed segments the right retrieval unit? • Or is it good enough to know where to start? • What will it cost to tailor an ASR system? • $100K to $1 million per application? • Is ASR fast enough to really scale up? • 0.1 to 10 machine-hours per hour of speech
For More Information • The MALACH project • http://www.clsp.jhu.edu/research/malach • CLEF-2005 evaluation • http://www.clef-campaign.org • NSF/DELOS Spoken Word Access Group • http://www.dcs.shef.ac.uk/spandh/projects/swag