580 likes | 594 Views
Introduction to CHILDES and TalkBank. Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute. The goal of TalkBank. The core idea. Human communication is a single unified process. However, patterns in communication are analyzed by 20 different fields.
E N D
Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute
The core idea • Human communication is a single unified process. • However, patterns in communication are analyzed by 20 different fields. • The time scales of the processes varies from milliseconds to centuries. • But all of these processes must have their ultimate effect in the Moment. • We can capture the Moment on video.
Principles • Data-sharing, Informed Consent • Multimedia • Open Access, Web Access, Commentary • Specified Format • Interoperability • Community integration
Availability • http://childes.psy.cmu.edu • http://talkbank.org • programs, manuals, fonts, morphologies, CA conventions, video production guides, XML Schema, links to other programs • data can be either downloaded or played back over the web
Current target areas • CHILDES • PhonBank • BilingualBank • AphasiaBank • CABank • ClassBank
CHILDES • Child Language Data Exchange System • Founded in 1984 in Concord MA • Director: Brian MacWhinney macw@cmu.edu • Programmers: Leonid Spektor, Franklin Chen • 3000 Members • 130 corpora • Over 3200 published articles
Practical Considerations • Learning CLAN takes about a week • Transcription is slow. Perhaps 15:1 ratio. Blitzscribe, LENA, etc. probably will not work • Currently available data may not be perfect for a given issue • Corpora may need enhancement through MOR or Coder’s editor
Tools from the Web • Data: childes.psy.cmu.edu/data • CLAN: childes.psy.cmu.edu/clan • Manuals: childes.psy.cmu.edu/manuals • Morphosyntax: childes.psy.cmu.edu/morgrams • Phon childes.psy.cmu.edu/phon • Tutorial videos talkbank.org/training • Digital video: talkbank.org/dv • CA Methods: talkbank.org/CABank
Why no handout? “Overviews” link has this PPT presentation CHILDES is now fully electronic. No more paper.
Available Methods • Microanalysis - CA, phonetics, ethology • Microgenetic analysis - CA, code-switching (NEXT) • Group and treatment comparisons - Genesee • Error analysis - YipMatthews • Diffusion analysis - in preschools • Longitudinal studies - growth curves • Modeling - neural nets, dynamic systems, evolutionary models
CLAN Tools • Transcribing • Editing • Counts -- FREQ, KWAL • Analyses: MOR, GRASP, PHON • Interoperability -- ELAN, Praat, SFS, EXMARaLDA, CLAPI, PHON
Ground Rules • Ethical use, informed consent • Levels of permission • Respect for dignity of participants • Respect for contributors • Requirement to cite sources • Requirement to contribute data
Info-CHILDES and Membership • Info-childes@googlegroups.com • Archived at LinguistList • Info-CHIBolts for nuts and bolts • Membership list • IASCL Membership
Getting Set Up • Download CLAN from Programs link
Windows issues • You can work in c:\childes • But your administrator may have this locked, so, you may need shortcuts. • Windows IPA is difficult. • Windows compression may produce .wmf
Downloading Manuals CHAT, CLAN
Getting Started • Open CLAN Manual to Chapter 2 • Double-click application • Control-D to open Commands Window • Set Working Directory to c:\childes\clan\lib\samples
Should look like this: Windows will be c:\childes\clan\lib\samples
Run FREQ • Freq sample.cha • Hit RUN or carriage return • In output, does “want” occur 3 times?
Interface Features • Help • CLAN • Files In • Recall • Set MOR, Lib, Output directories
Building Commands • mlu +t*CHI +f sample.cha • mlu *.cha • Wildcards • File output • *.cha
Changing Directories • Set Working to: ne32 combo +t*MOT +s"is^*ing" *.cha • Set Working to: samples kwal +sbunny +w2 -w2 0042.cha • Triple click on output line to go back to source file
GEM • Set Working to: Workshop • GEM +s* pau001.cha • Open output, play audio
Exercises - Chapter 8 • MLU50 • mlu +t*CHI +z50u +f *.cha • MLU5 • maxwd +t*CHI +g1 +c5 +dl 68.cha | mlu > 68.ml5.cex • TTR • freq +t*CHI +s"*-%%" +f *.cha
BatchFile • maxwd +t*CHI +g1 +c5 +dl 14.cha | mlu > 14.ml5.cex • maxwd +t*CHI +g1 +c5 +dl 55.cha | mlu > 55.ml5.cex • maxwd +t*CHI +g1 +c5 +dl 66.cha | mlu > 66.ml5.cex • maxwd +t*CHI +g1 +c5 +dl 68.cha | mlu > 68.ml5.cex • maxwd +t*CHI +g1 +c5 +dl 98.cha | mlu > 98.ml5.cex • Batch batch.cex • Or just run by highlighting in Commands (Windows)
Playing a linked file • Esc-8 • Esc-A • Cont-Click • F5
Linking a File - F5 • Cursor on *FAT • Find file • F5 • Press space for each utterance • Save
F5 Tricks • Go back to last good link • Space quickly through contained overlap • If a bullet is missing, cut and paste an old one • For precision, try Sonic Mode
Sonic Mode • Esc-0 to start • Highlight area • Shift-click to move edge • Have cursor on line in file • S to insert time marks • Triple click a linked sentence
Transcribing • Open new window (Command-N) • Insert headers • @Begin • @Languages: en • @Participants: CHI Target_Child, MOT Mother, FAT Father, ROS Brother • @Date • F5 with space at each utterance • Go back and transcribe each bullet (c-click) • Adjust time marks using Esc-A
CHECK • CHECK is CRUCIAL • Internal: Esc-L • External: check *.cha • External CHECK provides fuller control
Options • Backup • Wrapping • Line Numbers • CHECK
More Options Line numbers F5 bullets SoundAnalyzer
Coder's Editor • Open barry.cha • Esc-0 • Cursor on first line • Open codeshar.cut • %spa • Insert $NIA:AC:IN
Coder's Editor Commands • F1 finish current tier and go to the next • Esc-c finish coding current tier • Esc-t restrict coding to a particular speaker • Esc-Esc go on to the next speaker • Esc-s rotate subcodes • Control-g cancel illegal command
Send to Praat Open Praat, Click before link, Send to Praat, Run Analysis
Searching, Replacing • Cont-R, Cont-F • Space, No, !, control-G
Fixing Things • CHSTRING • INSERT (inserts @ID headers) • FIXIT • LONGTIER • FIXBULLETS • REN • COMBTIER
Tour of English MOR Files • Download a copy • A-rules • C-rules • Sf.cut • Lexicon