370 likes | 503 Views
Fine-Grained Soft Semantic Constraints. Yuval Marton University of Maryland http://umiacs.umd.edu/~ymarton/pub/umanch/Hybrid Knowledge-CorpusBasedSem-Manchester_090614.ppt. Why Care?. Tell’em apart: These, too:. FOX. FOX = FOX = FO rkhead/winged-heli X replicator gene. Road map.
E N D
Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland http://umiacs.umd.edu/~ymarton/pub/umanch/Hybrid Knowledge-CorpusBasedSem-Manchester_090614.ppt
Why Care? Tell’em apart: These, too: Yuval Marton, U Manchester talk
FOX • FOX = • FOX = FOrkhead/winged-heliX replicator gene Yuval Marton, U Manchester talk
Road map • Brief overview of doctoral work • Hybrid knowledge / corpus-based semantic similarity methods • Pure and hybrid methods • Hard and soft constraints • Fine-grained • Named-entities Yuval Marton, U Manchester talk
Dissertation Theme • Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Syntactic and Semantic Constraints • Soft Constraints • Fine-Grained • Syntactic (parsing) • Semantic (“concepts”, paraphrases) • Evaluated in • Word-pair similarity ranking and • Statistical Machine Translation (SMT) Yuval Marton, U Manchester talk
Univ. Hard Univ. Soft Soft Constraints • Hard constraints • [0,1]; in/out • Decrease search space • “structural zeroes” • Theory-driven • Faster, slimmer • Soft constraints • [0..1]; fuzzy • Only bias the model • Data-driven: Let patterns emerge Yuval Marton, U Manchester talk
Fine-grained • Granularity is a big deal • Soft syntactic constraints in SMT • Chiang 2005 vs. Marton and Resnik 2008 • Neg results pos results • Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009 • Pos results better results Yuval Marton, U Manchester talk
Soft Syntactic Constraints • X X1speech ||| X1espiche • What should be the span of X1? • Chiang’s 2005 constituency feature • Reward rule’s score if rule’s source-side matches a constituent span • Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward) • Good idea -- Neg-result • But what if… Yuval Marton, U Manchester talk
Rule granularity • Chiang: Single weight for all constituents (parse tags) • … But what if we can assign a separate feature and weight for each constituent? • E.g., NP-only: (NP=) • Or VP-only: (VP=) Yuval Marton, U Manchester talk
Fine-grained • Granularity is a big deal • Soft syntactic constraints in SMT • Chiang 2005 vs. Marton and Resnik 2008 • Neg results pos results • Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009 • Pos results better results Yuval Marton, U Manchester talk
Word-pair similarity ranking • Give each word pair a similarity score • Rooster – voyage • Coast – shore • Noun-noun (Rubinstein & Goodenough, 1965) • Verb-verb (Resnik & Diab, 2000) • Result: list of pairs ordered by similarity • Spearman rank correlation Yuval Marton, U Manchester talk
Similarity measures • Distributional profiles (DP) • Which words did I occur next to? • Context vectors • Similar vectors similar meaning Yuval Marton, U Manchester talk
Bank (pure word-based) Bank Yuval Marton, U Manchester talk
Bank (pure concept-based) Bank Teller Money … River Bank Water … • Compare closest senses • Bankriver= water ?? Yuval Marton, U Manchester talk
Bank (Hybrid Model) BankFin.Inst BankRiver Yuval Marton, U Manchester talk
Fine-grained • Granularity is a big deal • Soft syntactic constraints in SMT • Chiang 2005 vs. Marton and Resnik 2008 • Neg results pos results • Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009 • Pos results better results Yuval Marton, U Manchester talk
Unified Model • Soft constraints in a log-linear model • Syntactic • Semantic • … • ihi(x) • Add more terms to the sum Yuval Marton, U Manchester talk
Road map • Brief overview of doctoral work • Hybrid knowledge / corpus-based semantic similarity methods • Pure and hybrid methods • Hard and soft constraints • Fine-grained • Named-entities Yuval Marton, U Manchester talk
Distributional profiles (DPs) • DPW: word-based distributional profile • First order • Distributional Hypothesis (Harris 1940; Firth 1957) • Second order (vector representation) • Strength of association • Counts, PMI, TF/IDF-based, Log-likelihood ratios … • Vector similarity (cosine, L1, L2,..) Yuval Marton, U Manchester talk
Taxonomies and Groupings • WordNet • Synsets • Relations (“is-a”) • Arc distance • UMLS • Thesaurus • Flat • Coarse • Bankriver= water ?? job Is-a Is-a Industry job Academic job Is-a Is-a CEO Postdoc Yuval Marton, U Manchester talk
Hybrid measures • WordNet • Resnik’s method (info content) • Lin and others • Thesaurus Concept-based • Mohammad and Hirst (coarse-grained) • word may be listed under several concepts • Distance b/w most similar senses • Pro: Resource-poor languages and domains • Con: Small thesaurus low applicability • WCCM: Financial instit. ~ academic instit. • Bankriver= water ?? Yuval Marton, U Manchester talk
WCCM: Concept-Word matrix • WCCM: word-concept collocation matrix • DPC: concept-based distributional profile • Potentially iterative process • Clean-up Yuval Marton, U Manchester talk
Bank Use concept-based DPCs to bias word-based DPWs + = Yuval Marton, U Manchester talk
Fine-grained soft constraints • DPWS: distributional profile of word senses • Use concept-based DPCs to bias word-based DPWs • Hybrid-filtered • Hybrid-proportional Yuval Marton, U Manchester talk
Hybrid-filtered Filter out collocates in DPW, if not appearing in DPC Yuval Marton, U Manchester talk
Hybrid-proportional Only discount collocate’s value in DPW, in proportion to the ratio of its count in current DPC relative to all DPCs of the target word Yuval Marton, U Manchester talk
WSD with DPWS • Each sense of each word has a unique profile • Bankfin.inst≠ Bankriver≠ water ! • Pro: • Not aggregated: DPC profiles are • Non/less smearing: DPW profiles smear all senses in a single profile Yuval Marton, U Manchester talk
Results Yuval Marton, U Manchester talk
evaluation • Word-pair similarity ranking • Spearman Rank correlation • Paraphrasing in SMT • BLEU, TER, METEOR, .. Yuval Marton, U Manchester talk
comparison • WordNet results • LSA results Yuval Marton, U Manchester talk
Challenges • Antonyms (black – white) • “Hyperonyms” (vehicle – car) • Co-hypernyms / co-taxonyms Yuval Marton, U Manchester talk
Named Entities • Challenges: • Bush – Obama • Potentially helpful: • H2O – Water • FOX – “forkhead/winged-helix replicator” • FOXP2 – SPCH1 • “SPCH1” turned out to be a member of the FOX (forkhead/winged-helix replicator genes) family, of which several other genes are known all across the animal world. It was then labeled FOXP2, that being its current, and more conventional, name. Yuval Marton, U Manchester talk
Biomedical/Chemical WSD • Explore hybrid methods to create DPWS • FOXgene , FOXanimal • requires a lexical resource • UMLS or other resources • Useful for smaller training sets! Yuval Marton, U Manchester talk
Univ. Soft conclusion • Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Constraints • Soft Constraints • Fine-Grained • Semantic (“concepts”) • resource-poor setting, special domains Yuval Marton, U Manchester talk
Thank you! Questions? ymarton@umiacs.umd.edu Advisors: Philip Resnik & Amy Weinberg Department of Linguistics and CLIP Lab Yuval Marton, U Manchester talk
Fine-grained semantic • Word-based: • Bank: river, money, water, teller, … • “concept”-based • River: water, bank, boat, … • Financial institution: bank, money, teller,… • Humans compare closest senses • Bankriver= water ?? • Hybrid: • Bankriver: more strongly associated with water • Bankfin.inst: more strongly associated with money Yuval Marton, U Manchester talk
SMT • Statistical Machine Translation • What translational units to use? • Syntactic constituents, re-ordering • “es gibt” • Paraphrases • Pivoting vs. bitext-free paraphrasing • Typically monolingual • Translation = bilingual / cross-domain paraphrasing • Can be evaluated in SMT Yuval Marton, U Manchester talk