70 likes | 175 Views
Resources for paraphrase detection. Caroline Hagège Caroline.Hagege@xrce.xerox.com Caroline Brun Caroline.Brun@xrce.xerox.com. Resource types. Derivational morphology Deep syntax Domain-specific resources. 1. Derivational morphology. Use of the CELEX database (distributed by the LDC)
E N D
Resources for paraphrase detection Caroline Hagège Caroline.Hagege@xrce.xerox.com Caroline Brun Caroline.Brun@xrce.xerox.com
Resource types • Derivational morphology • Deep syntax • Domain-specific resources
1. Derivational morphology • Use of the CELEX database (distributed by the LDC) • http://www.kun.nl/celex/index.html • Hand made revision of the extracted pairs in order to typify the kind of relations (predicate) between them. • Automatic extraction of verbs and corresponding deverbal nouns • Suffixes: +OR, +ER, +ION • Predicates relating noun-verbs from the same morphological family (~ 1600 pairs) • Predicate types: • S0 : The noun paraphrases the action expressed by the verb. • e.g. S0(acceleration,accelerate) • S1H : The noun corresponds to the first actant of the action • expressed by the verb and has a human:+ feature. • e.g. S1H(writer,write)
1. Derivational morphology (cntd) • S1NH : The noun corresponds to the first actant of the action • expressed by the verb and has a human:~ feature. • e.g. S1NH(abbreviation,abbreviate) • S2 : The noun corresponds to the second actant of the action • expressed by the verb. • E.g. S2(affirmation,affirm) • Automatic extraction of noun and corresponding adjective • Suffix: +AN
2. Deep syntax • Use of Comlex lexicon (Grisham & al. 1994) in order to extract logical subject/objects of infinitives. • Example 1 “He ordered Peter to go” • SUBJ-N(order,he), OBJ-N(order,Peter), SUBJ-N(go,Peter) • Example 2 “He promised Peter to go” • SUBJ-N(promise,he), OBJ-N(promise,Peter), SUBJ-N(go,he) • Active-Passive transformation • Use of verb class alternation (Levin 93) • Example 3 “Acetone burns easily” • SUBJ-N(burn,VARIABLE), OBJ-N(burn,acetone),
About 120 rules exploiting the derivational morphology and deep syntactic resources are necessary for the general normalization grammar.
3. Domain-specific resource • Hand-made resources. Directly encoded as XIP rules • Creation of specific relations between lexical items (about 30 relations) • SYNONYMY relations e.g. odor-smell • HASN relation e.g. evaporate-volatility • TURNTO relation e.g. evaporate-vapor • ISAJ relation e.g. burn-burnable • Elaboration of XIP rules exploiting these relations and the normalized syntactic analysis (about 150 rules) • If ( SUBJ-N(#1[lem:have],#2) & OBJ-N(#1,#3) & HASN(#4,#3) ) • PROPERTY(#2,#4) • This rule gives equivalent representations to • X has volatility and X evaporates, X has flammability and X burns etc.