120 likes | 343 Views
Þórunn Blöndal. ÍSTAL The Icelandic Corpus of Spoken Language Nordtalk – NorFa: Using spoking language corpora Göteborg Aug 19-24 2002. Research on Spoken Icelandic. Research on regional differences in pronunciation language acquisition the development of narrative skills.
E N D
Þórunn Blöndal ÍSTAL The Icelandic Corpus of Spoken Language Nordtalk – NorFa: Using spoking language corpora Göteborg Aug 19-24 2002
Research on Spoken Icelandic • Research on • regional differences inpronunciation • language acquisition • the development of narrative skills
The ÍSTAL Group • Ásta Svavarsdóttir • The Institute of Lexicography (asta@lexis.hi.is) • Eiríkur Rögnvaldsson • University of Iceland (eirikur@hi.is) • Hrafnhildur Ragnarsdóttir • Iceland University of Education (hragnars@khi.is) • Kristín Bjarnadóttir • The Institute of Lexicography (kristinb@lexis.hi.is) • Sigurður Konráðsson • Iceland University of Education (sigkon@khi.is) • Þóra Björk Hjartardóttir • University of Iceland (thorah@hi.is) • Þórunn Blöndal • Iceland University of Education (thblond@khi.is)
The Goal • From the outset, the ÍSTAL group’s primary objective was to establish a corpus of spoken language for use in two broadly defined fields: • linguistic research on the spoken language; i.e., in syntax, morphology, conversation analysis, etc. • computational linguistics and language technology
? ? Interview ? interviews shopping formal meetings ? informal conversation ? phone conversation task-oriented dialogue formal conver- sation (doctor/patient consultation, etc.) native / non- native speakers non-native speakers of Icelandic children / parents ?
The Orthography Standard orthography is used in ÍSTAL, but deviations from the most common pronunciation are given in brackets: • dálítið (a little) > dáldið > doldið Loan words embedded in Icelandic are spelled according to Icelandic phonetic rules: • OK >ókei
The Header ... • Heiti upptöku: 04-701-02 – Number • Dagsetning upptöku: 040400 – Date • Stutt lýsing á efni: Spjall á kennarastofu – Short description • Kaflar umritunar: kynlífsvæðing – Topics transcribed • Stuttnefni: kennkynlíf – Abbreviated title • Lengd upptöku: 00:08:58 – Duration • Upptökutæki: Sony digital, mini disc MZ-B3 – Recording device • Þátttakandi: A = Þ1; kk 34 kennari – Participant – male 34 - teacher • Þátttakandi: B = Þ2; kk 41 kennari – Participant – male 34 - teacher • Þátttakandi: C = Þ3; kvk 40 kennari – Participant – female 34 - teacher • Þátttakandi: D = Þ4; kvk 45 kennari – Participant – female 34 - teacher • Heiti umritunar: UM-04-701-02 – Second listening/transcription • Umritari: KE – Transcriber’s initials • Dagsetning umritunar: 0800 – Date of secondlistening/ transcription • Hvað umritað: Material transcribed
.... • Umritunarkerfi: AUGLUMSTAFS – Standard orthography • Yfirlesari: Proofreader • Dagsetning yfirlesturs: Proofreading date • Skráður tími: **?** • Athugasemdir: Í upphafi koma Sv skólastjóri=Sv, og Be=Be inn í samtalið sem er tekið upp í frímínútum á kennarastofu. Þ1, Þ2, Þ3 og Þ4 eru samkennarar. – Comments: In the beginning of the conversation, the headmaster (Sv) and Be participate; then they leave. Participants 1, 2, 3, and 4 are colleagues, teachers in the same school. 2-yfirlestur: HBE 060102– Second proofreading 060102
ÍSTAL as it is now • The data bank contains 31 conversations with 2 to 4 participants. • The participants are 30-60 years old. • The data are collected in various geographical regions of Iceland. • Each transcription is marked with a header showing information on the participants’ age, gender, and relationship to one another; the duration of the conversation; and other relevant information. • The total duration of transcribed material is approximately 20 hours. • Of 31 conversations, 6 take place among males, 5 are among females, and 20 are mixed. • The material is transcribed according to the standard orthography with only slight deviation.
ÍSTAL’s Role in Research The following have been presented as works in progress: ·Comparison between word frequencies in spoken and written Icelandic (Ásta) ·Investigation on ‘það’(‘it’ /’there’) in Icelandic (Eiríkur) ·A collaborative completion of turn constructional units (TCU) in conversation (Þórunn)