250 likes | 396 Views
Institute for Language and Information Depts. of General Linguistics & Computerlinguistics. Conceptual noun types: grammar and automatic classification. Christian Horn & Christof Rumpf CTF 07, Düsseldorf. Structure. The four conceptual noun types and their contextual properties
E N D
Institute for Language and Information Depts. of General Linguistics & Computerlinguistics Conceptual noun types:grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf
Structure • The four conceptual noun types and their contextual properties • Investigation of grammatical properties of the conceptual noun types on the basis of a German text corpus • A framework for the automatic classification of concept types • Conclusion Horn & Rumpf: Conceptual noun types: grammar and automatic classification
Conceptual noun types differ according to their referential properties. Do they differ regarding their grammatical uses? 1. The four conceptual noun types and their contextual properties Conceptual noun types Löbner (1979, 1985, 1998) Horn & Rumpf: Conceptual noun types: grammar and automatic classification
1. The four conceptual noun types and their contextual properties Grammatical uses of conceptual noun types Sortal concepts Arose is a nice present. Manyroses are an even nicer present. Individual concepts Thesun is burning. §Asun is burning. §The suns are burning. §Manysuns are burning. §Mysun is burning. / §The sunof mine is burning. § = use differing from underlying concept type Horn & Rumpf: Conceptual noun types: grammar and automatic classification
1. The four conceptual noun types and their contextual properties Grammatical uses of conceptual noun types Relational concepts One of Mary‘slegs is too short. §Mary‘sleg is too short. / §The legof Mary is too short. §Manylegsof Mary are too short. Functional conceptsMary is Peter‘smother. / Mary is themotherof Peter. §Mary is amotherof Peter. §Mary is themother. Horn & Rumpf: Conceptual noun types: grammar and automatic classification
1. The four conceptual noun types and their contextual properties Contextual properties of conceptual noun types • grammatical characteristics • possessive use: his mother / mother of him • definiteness: the sun • subcategorization: certain verbs require IC/FC as complements • morphological properties: certain nouns are often functional • deadjectival nouns (Intelligenz ‘intelligence’) • deverbal nouns (Krümmung ‘bend’, Dauer ‘length’) • compounds -wert ‘value’ Bestwert ‘optimum value’-grad ‘degree’ Wirkungsgrad ‘degree of efficiency’-größe ‘size’ Kleidergröße ‘dress size’ Horn & Rumpf: Conceptual noun types: grammar and automatic classification
2. Investigation of grammatical properties 2. Investigation of grammatical properties of the conceptual noun types on the basis of a German text corpus Goals: • to identify the possible uses of the different concept types and their specific context features • to develop and implement a method for the automatic classification of concept types in texts based on morphosyntactic features Hybrid approach: • semantic and grammatical analysis of the conceptual noun types • statistic investigation: automatic classification allows the processing of large amounts of data • investigation is initially carried out on the basis of a German text corpus (108.000 words) as a training corpus • perspective: further research intended on English, French, Japanese Horn & Rumpf: Conceptual noun types: grammar and automatic classification
2. Investigation of grammatical properties Predictions Assumptions: • The lexicalized concept type of a noun is the most frequently used type for each noun. • Conceptual noun types occur particularly often in grammatical uses that match their underlying conceptual properties. • sortal concepts (rose): singular, plural, with quantifiers, indefinite ... • individual concepts (sun): singular, definite • relational concepts (leg): indefinite, possessive • functional concepts (mother): singular, definite, possessive • Other uses (‚type shifts‘) are still possible. The conditions under which these type shifts occur still have to be investigated. Horn & Rumpf: Conceptual noun types: grammar and automatic classification
2. Investigation of grammatical properties Counting (selection, ‚definiteness‘) 1 definite: def. determiner, poss. pron., gen. pron., d-Prep, d-selb, d-einzig,genitive deren/dessen), d-jen 2 quantifiers/indefinite: quantifiers, indefinite determiner, demonstratives, numbers, kein, d-beid, d-ord 3 null determiner 4 incl. -1 Horn & Rumpf: Conceptual noun types: grammar and automatic classification
2. Investigation of grammatical properties Results (selection) Results so far confirm our predictions. Horn & Rumpf: Conceptual noun types: grammar and automatic classification
2. Investigation of grammatical properties Tasks & Challenges • Type shifts in certain readingsThemeaning of the word. (FC) The word bottle has manymeanings. (RC) • Generic and anaphoric usesThelightbulb was invented by Heinrich Göbel. (generic) • Polysemy • Analysis of possessive constructions, plurals, null determiner Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types 3. A framework for the automatic classification of concept types • Architecture • Training corpus • Morphosyntactic analysis • Training sample • Computing classifiers • Maximum entropy models • Conclusion Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Architecture of the framework manual annotation of concept types training corpus test corpus msyn: dependency grammar parser learning morphosyntactical analysis morphosyntactical analysis application extraction of relevant context features training sample test sample Generalized Iterative Scaling learning / applicationof a classifier annotated test korpus maximum entropy model Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Training corpus • Manually annotated version of Löbner (2003) Semantik • Concept types of nouns marked with tags Die <f1>Semantik</f1> ist das <r2>Teilgebiet</r2> der <f2>Linguistik</f2>, das sich mit <r2>Bedeutung</r2> befasst. Diese <r2>Art</r2> von <f2>Definition</f2> mag vielleicht ihrem <r2>Freund</r2> genügen, der Sie zufällig mit diesem <so>Buch </so> in der <r2>Hand</r2> sieht und Sie fragt, was denn nun schon wieder sei, aber als <f2>Autor</f2> einer solchen <r2>Einführung</r2> muss ich natürlich präziser erklären, was der <f2>Gegenstand</f2> dieser <so>Wissenschaft</so> ist. Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Morphosyntactical analysis • We use Connexor‘s msyn to analyse German texts. www.connexor.com • Syntactical information consists of dependency trees. • Morphological features include part-of-speech, gender, number, case, time, mood and some more. • Some postprocessing is done by ourselves, i.e. to add definitness markers. Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Dependency tree Die Semantik ist das Teilgebiet der Linguistik, …The semantics is that branch of linguistics main - ist possessor subj - Semantik comp - Teilgebiet det - DieDef det - dasDef mod - LinguistikGen det - derDef Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Output of Connexor‘s msyn <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE analysis SYSTEM "http://www.connexor.com/dtds/4.0/fdg3.dtd"> <analysis><sentence id="w1"> <token id="w2"> <text>Die</text> <lemma>die</lemma> <depend head="w3">det</depend> <tags><syntax>PREMOD</syntax><morpho>DET Def FEM SG NOM</morpho></tags></token> <token id="w3"> <text>Semantik</text> <lemma>semantik</lemma> <depend head="w4">subj</depend> <tags><syntax>NH</syntax> <morpho>N FEM SG NOM</morpho></tags></token> <token id="w4"> <text>ist</text> <lemma>sein</lemma> <depend head="w1">main</depend> <tags><syntax>MAIN</syntax> <morpho>V IND PRES SG P3</morpho></tags></token> <token id="w5"> <text>das</text> <lemma>das</lemma> <depend head="w6">det</depend>‘ <tags><syntax>PREMOD</syntax> <morpho>DET Def NEU SG NOM</morpho></tags></token> <token id="w6"> <text>Teilgebiet</text> <lemma>teil#gebiet</lemma> <depend head="w4">comp</depend> <tags><syntax>NH</syntax> <morpho>N NEU SG NOM</morpho></tags></token> <token id="w7"> <text>der</text> <lemma>die</lemma> <depend head="w8">det</depend> <tags><syntax>PREMOD</syntax> <morpho>DET Def FEM SG GEN</morpho></tags></token> <token id="w8"> <text>Linguistik</text> <lemma>linguistik</lemma> <depend head="w6">mod</depend> <tags><syntax>NH</syntax> <morpho>N FEM SG GEN</morpho></tags></token> Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types training sample Extraction of relevant contextual features with regular expressions mapped on dependency trees with the programming language Perl. Results in pairs (concept type | list of context features): (f1, [tnr=2, tok=semantik, suff=ik, num=sg, art=def]) (r2, [tnr=5, tok=teilgebiet, num=sg, art=def, poss=rgen]) (f1, [tnr=7, tok=linguistik, suff=ik, num=sg, art=def]) (f2, [tnr=12, tok=bedeutung, suff=ung, num=sg, art=none]) (r2, [tnr=16, tok=art, num=sg, art=indef, poss=von]) (f2, [tnr=18, tok=definition, num=sg, art=none]) (r2, [tnr=22, tok=freund, num=sg, art=def]) (so, [tnr=30, tok=buch, num=sg, art=indef]) (r2, [tnr=33, tok=hand, num=sg, art=def]) (f2, [tnr=49, tok=autor, num=sg, art=none]) (r2, [tnr=52, tok=einführung, suff=ung, num=sg, art=indef]) (f2, [tnr=61, tok=gegenstand, num=sg, art=def]) Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Automatic classification • given: • training sample t= {(a1,b1),…,(an,bn)} • classes ai{f1, f2, r1, r2} • contextsbi = {m1,…,mm} • featuresmi{art=def, art=indef, poss=lgen, …} • searched: • classifier p(a|b) How probable is class a given context b? • maximal argument a’ = arg maxa p(a|b) Which is the most probable class a’given context b? Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Computing a (bad) classifier • simplest account: • Counting coocurrences ofclasses and contexts: • shortcomings: • Only the contexts in t are learned. • Varying degrees of evidence of single features are disregarded. • way out: • Computation of the classifier with a maximimum entropy model. Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Maximum entropy models • Basics • Entropy: number of bits required to encode events of a particular type (tossing a coin: 1 bit, rolling a die: 2 ½ Bit). • Principle of maximum entropy: choose a model with maximum entropy, i.e. don‘t go beyond the data. • Specific features • Decompositon of contexts into single context features or their combination. • Possibility to combine features from heterogenous sources (e.g. syntax, semantics, morphology, …). • Computation of the weights (evidence) of single features or their combination for every class over all contexts. Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Contextual and binary features The weights for contextual features are determined indirectly with binary features. These relate classes and contextual features. • simple binary features example instance • complex binary features example instance Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Maximum entropy framework cf. Ratnaparkhi 1998 where aj > 0 is a wheight for feature fj, k is the total number of binary features,and Z(b) is a normalization constant to ensure that Sa p(a|b)= 1 resp. 100% Horn & Rumpf: Conceptual noun types: grammar and automatic classification
3. A framework for the automatic classification of concept types Generalized Iterative Scaling Unfortunately, there is no analytical method to determine the weights a. There are some iterative approximation algorithms to determine the a, which converge to a ‚correct‘ p(a|b) and respect the principle of maximum entropy. We use Generalized Iterative Scaling (GIS): initialization is the expectation value for feature fj in the training corpus is the expectation value for feature fj in the previous iteration The constant C is the total number of ‚active‘ binary features over all contexts. iteration Horn & Rumpf: Conceptual noun types: grammar and automatic classification
4. Conclusion Conclusion • The investigations so far support the assumption that the referential properties of the concept types match their grammatical uses. • The maximum entropy framework allows a fine grained analysis of the evidence contributed by a single context feature to the classification. • The selection of relevant features is essential for the success of the automatic classification. Our research objective consists to a great deal in the examination of this features. • We start experiments with complex features to model combined evidence of context features. Horn & Rumpf: Conceptual noun types: grammar and automatic classification