190 likes | 369 Views
Infrastructures in Taiwan and for the Chinese Languages Chu-Ren Huang Institute of Linguistics Academia Sinica churen@sinica.edu.tw ACL 2000 WORKSHOP: Infrastructures for Global Collaboration Saturday, October 7, Hong Kong. Types of Infrastructures
E N D
Infrastructures in Taiwan and for the Chinese Languages Chu-Ren Huang Institute of Linguistics Academia Sinica churen@sinica.edu.tw ACL 2000 WORKSHOP: Infrastructures for Global Collaboration Saturday, October 7, Hong Kong
Types of Infrastructures Sharable resources(for Chinese computational linguistics) Mechanisms for international collaboration Mechanisms for scholarly exchange
Host Institutes • -The Association for Computational Linguistics and Chinese Language Processing • (ACLCLP, a.k.a. ROCLING) • -Academia Sinica • -National Science Council (NSC)
Sharable Resources for Chinese Computational Linguistics • Corpora • Lexicons • Procedures • http://rocling.iis.sinica.edu.tw/ROCLING/
Sharable Resources for Chinese Computational Linguistics--Corpora • -Academia Sinica Balanced Corpus of Mandarin Chinese (Sinica Corpus) • -Sinica Treebank • -Standard Segmentation Corpus • -ROCLING Corpus • -Mandarin-Across-Taiwan (MAT) Speech Database
Academia Sinica Balanced Corpus of Mandarin Chinese (Sinica Corpus) • 5 million words, segmented and tagged • Direct WWW Access • -http://www.sinica.edu.tw/~tibe/2-words/modern-words/index.html OR • -http://www.sinica.edu.tw/ftms-bin/kiwi.sh • License Information • -http://rocling.iis.sinica.edu.tw/ROCLING/corpus98/sinicor_E.htm
Sinica Treebank 1.0 38,725 Trees 239,532 Words Direct WWW Access (1000 sample trees) http://godel.iis.sinica.edu.tw/CKIP/trees1000.htm License Information http://rocling.iis.sinica.edu.tw/ROCLING/Treebank/Treebank-E.htm
Mandarin-Across-Taiwan (MAT) • Speech Database • Speech files are collected through telephone networks. The content Includes spontaneous speech (short answering statements) and read speech (numbers, Mandarin syllables, words of 2 to 4 syllables, phonetically balanced sentences). • MAT-160 (160 speakers) • MAT-2000 • http://rocling.iis.sinica.edu.tw/ROCLING/MAT/index_cf.htm
Sharable Resources for Chinese Computational Linguistics-Procedures Segmentation Standard for Chinese Language Processing Segmentation Standard http://godel.iis.sinica.edu.tw/ROCLING/juhuashu1.htm Standard Segmentation Corpus (2 million words, segmented) http://godel.iis.sinica.edu.tw/ROCLING/corpus98/segcorp_E.htm Standard Segmentation Lexicon (42,138 entries, w/ frequency) http://godel.iis.sinica.edu.tw/ROCLING/corpus98/segdic_E.htm Segmentation Program (free download) http://godel.iis.sinica.edu.tw/CKIP/ws/
Sharable Resources in Languages • Other than Modern Mandarin • Classical Chinese Corpora • http://www.sinica.edu.tw/~tibe/2-words/old-words/index.html • Corpus of Formosan Austronesian Languages • Under construction, part of the National • Digital Archive Initiative • Lexical Databases of other Sino-Tibetan and • Tibeto-Burmese Languages
Mechanisms for • International Collaboration • Major Sponsors of International Collaboration Involving Taiwan • --The Chiang Ching-kuo Foundation for International Scholarly Exchange • http://www.cckf.org http://www.cckf.org.tw • --The National Science Council • --Academia Sinica
Synchronic and Diachronic • Chinese Corpora • Three Projects Sponsored by the CCK Foundation (1990-1995) • Chu-Ren Huang, Keh-jiann Chen and Pei-chuan Wei, Academia Sinica • Paul Thompson, SOAS, University of London • Chaofen Sun, Stanford University
Mechanisms for Scholarly Exchange and Collaboration • Department of International Programs, NSC • http://www.nsc.gov.tw/int/2_cooperation/index_02.html • Canada: NRC France: CNRS Japan: EAACST • Germany: DFG, DAAD, DKFG • Netherlands: NWO, IIAS • USA: NSF, NIH • UK: Royal Society of London, ETC
A NSF/NSC International Joint Project • NSF:Asian Language Digital Library Project • Ching-Chih Chen, Simmons College • NSC International Digital Library Collaborative Projects • --Lexicon-based Knowledge Linking -Approaches Towards a WordNet Infrastructure for Multilingual Digital Library • Chu-Ren Huang, Academia Sinica • --Linguistic Technology and Resources for English-Chinese Bilingual Information System • Hsin-Hsi Chen, National Taiwan University
Mechanisms for International Collaboration-Bilateral Projects • -Case by Case Negotiation • Academia Sinica vs. Hong Kong Chinese University, LDC, Stanford, UCSB etc.
Mechanisms for Scholarly Exchange-Conferences • ROCLING (annually since 1988) • PACLIC [Pacific Asia Conference on Language Information and Computation] • (regional conference involving Hong Kong, Japan, Korea, Singapore, and Taiwan) • http://www.rcl.cityu.edu.hk/paclic15 • COLING2002 • http://www.COLING2002.sinica.edu.tw
Mechanisms for Scholarly Exchange-Exchange Scholars • Academia Sinica and EHESS: Yearly exchange • Academia Sinica and University of Pennsylvania (under negotiation) • NSC and CNRS, NSC and NWO: Cognitive Science
Mechanisms for Scholarly Exchange-Post-doctoral Fellows -Academia Sinica Post-doctoral Fellowships Application through Project PI’s or directly by applicants -NSC Post-doctoral Fellowships
Mechanisms for Scholarly Exchange-International Students • Computational Linguistics and Chinese Language Processing • An international graduate (PhD) program (Proposal under review) • Visiting Students • Internships