90 likes | 191 Views
Center for Computational Learning Systems. Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin Jansche, Rebecca Passonneau, Owen Rambow We are part of “The NLP Group” but not of the CS department What we do: Researchers
E N D
Center for Computational Learning Systems • Independent research center within the Engineering School • NLP people at CCLS: Mona Diab, Nizar Habash, Martin Jansche, Rebecca Passonneau, Owen Rambow • We are part of “The NLP Group” but not of the CS department • What we do: • Researchers • Work with Kathy and Julia • Our own projects • Sometimes teach • Supervise students (PhD, Masters, independent studies) • Some of us are in CEPSR, some in the Interchurch Building • Some NLP Group meetings will take place in Interchurch Center
CLiMB 2: Computational Linguistics for Metadata Building, phase 2 • Becky Passonneau (with University of Maryland) • Interactive workbench for image cataloguers/indexers: Use NLP to extract descriptive terms from scholarly text • Mellon Foundation • http://www.umiacs.umd.edu/~climb/
Automated Readers Advisor, Heiskell Talking Books and Braille Library (NYPL) • Becky Passonneau • Replace some of librarians’ tasks in current over-the-phone borrowing system with automated dialogue system • Use Wizard-of-Oz paradigm for data collection • Joint project with CCNY (Esther Levin) • http://www.cs.columbia.edu/~becky/pubs/WozVariant.ppt
Tracking Emergent Narrative Skills (TENS) • Becky Passonneau • Current data set: ten-year olds retelling silent movies • Develop quantitative methods to compare semantic and pragmatic content (e.g., adapt Pyramid Method for evaluating summary content) • Joint project with University of Connecticut (Elena Levy)
Arabic NLP • CADIM Group: Mona Diab, Nizar Habash, Owen Rambow • Focus on Standard Arabic AND the dialects • NLP tools for Arabic: • Morphological analysis (exists) • Morphological tagging (exists, best-performing) • Tokenization • POS tagging (best-performing) • Diacritization (best-performing) • Word-sense disambiguation (in progress) • Sentence-boundary detection for ASR (in progress) • Parsing (initial research) • Names-entity recognition (joint with Fair Isaacs, in progress) • …
Machine Translation • Nizar Habash • Focus: Arabic-English MT • Different hybrid MT approaches explored • Linguistic preprocessing for Statistical MT • Morphological and Syntactic preprocessing • Adding statistical resources to rule-based MT systems • Automatically extracted phrase tables combined with Generation-Heavy MT • Columbia first time participation in NIST MTEval (2006)
Word Sense Modeling and Disambiguation • Mona Diab • Using corpora (including multilingual parallel and similar) for unsupervised learning • Arabic WordNet • Arabic PropBank
Email Summarization:Social Networks • Aaron Harnly (PhD student) and Owen Rambow, with Kathy McKeown • Study interaction between: • Email-intrinsic factors • Language in email (lexison, syntax, …) • Email genre • Structure of dialog • Threads • Speech acts • Relation among people • Roles in organization • Social networks • Use to predict on factor from others • Use in high-level summaries of large amounts of email communication
Multilingual Metagrammars • Owen Rambow (with University of Pennsylvania) • Goal: high-level abstract representation of syntax of (many/all) natural languages, from which we can automatically generate grammars that can be used for NLP • Have: Universal Grammar component and language-specific modules for Korean, German, Yiddish • Next: Icelandic, Mainland Scandinavian, English, Kashmiri, …