230 likes | 314 Views
Socal Workshop 2009 @ UCLA. From linear sequences to abstract structures: Distributional information in infant-direct speech. Hao Wang & Toby Mintz Department of Psychology University of Southern California.
E N D
Socal Workshop 2009 @ UCLA From linear sequences toabstract structures:Distributional information in infant-direct speech Hao Wang & Toby Mintz Department of Psychology University of Southern California This research was supported in part by a grant from the National Science Foundation (BCS-0721328).
Outline • Introduction • Learning word categories (e.g., noun and verb) is a crucial part of language acquisition • The role of distributional information • Frequent frames (FFs) • Analyses 1 & 2, structures of FFs in child-directed speech • Conclusion and implication
Speakers’ Implicit Knowledge of Categories Upon hearing: I saw him slich. Hypothesizing: They slich. He sliches. Johny was sliching. The truff was in the bag. He has two truffs. She wants a truff. Some of the truffs are here.
Distributional Information • The contexts a word occurs • Words before and after the target word • Example • the cat is on the mat • Affixes in rich morphology languages • Cartwright & Brent, 1997; Chemla et al, 2009; Maratsos & Chalkley, 1980; Mintz, 2002, 2003; Redington et al, 1998
Frequent frames (Mintz, 2003) • Two words co-occurring frequently with one word intervening FRAME FREQ. you__it 433 you__to 265 you__the 257 what__you 234 to__it 220 want__to 219 . . . the__is 79 . . . • Frame you_itPeter Corpus (Bloom, 1970) • 433 tokens, 93 types, 100% verbs
Structure of Natural Languages • In contemporary linguistics, sentences are analyzed as hierarchical structures • Word categories are defined by their structural positions in the hierarchical structure • But, FFs are defined over linear sequences • How can they accurately capture abstract structural regularities?
Why FFs are so good at categorizing words? • Is there anything special about the structures associated with FFs? • FFs are manifestations of some hierarchically coherent and consistent patterns which largely constrained the possible word categories in the target position.
Analysis 1 • Corpora • Same six child-directed speech corpora from CHILDES (MacWhinney, 2000) as in Mintz (2003) • Labeled with dependency structures (Sagae et al., 2007) • Speech to children before age of 2;6 Eve (Brown, 1973), Peter (Bloom, Hood, & Lightbown, 1974; Bloom, Lightbown, & Hood, 1975), Naomi (Sachs, 1983), Nina (Suppes, 1974), Anne (Theakston, Lieven, Pine, & Rowland, 2001), and Aran (Theakston, et al., 2001).
Grammatical relations • A dependency structure consists of grammatical relations (GRs) between words in a sentence • Similar to phrase structures, it’s a representation of structural information. Sagae et al., 2005
Method • Consistency of structures of FFs • Combination of GRs to represent structure • W1-W3, W1-W2, W2-W3, W1-W2-W3 • Measures • For each FF, percentage of tokens accounted for by the most frequent 4 GR patterns • Control • Most frequent 45 unigrams (FUs) • E.g., the__ W1 W2 W3
Results * t(5)=26.97, p<.001
Analysis 1 Summary • Frequent frames in child-directed speech select very consistent structures, which help accurately categorizing words • Analysis 2, internal organizations of frequent frames
Analysis 2 • Same corpora as Analysis 1 • GRs between words in a frame and words outside that frame (external links) and GRs between two words within a frame (internal links) • For each FF type, the number of links per token was computed for each word position External links Not counted Internal links
Conclusion & implications • Frequent frames, which are simple linear relations between words, achieve accurate categorization by selecting structurally consistent and coherent environments. • The third word (W3) helps FFs to focus on informative structures • This relation between a linear order pattern and internal structures of languages may be a cue for children to bootstrap into syntax
Thank you! • References • MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. Mahwah, NJ: Lawrence Erlbaum Associates. • Mintz, T. H. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90(1), 91-117. • Sagae, K., Lavie, A., & MacWhinney, B. (2005). Automatic measurement of syntactic development in child language. ACL Proceedings. • Sagae, K., Davis, E., Lavie, A., MacWhinney, B. and Wintner, S. High-accuracy annotation and parsing of CHILDES transcripts. In, Proceedings of the ACL-2007 Workshop on Cognitive Aspects of Computational Language Acquisition.
Ana. 2 FF external links Table 3 Average number of links per token for frequent frames