Advisor: Hsin-Hsi Chen Speaker: Yong-Sheng Lo Date: 2008/03/14

SIGHAN - 2006 Advisor: Hsin-Hsi Chen Speaker: Yong-Sheng Lo Date: 2008/03/14

Introduction • The Third SIGHAN Chinese Language Processing Bakeoff ( Bakeoff2006 ) • Chinese Word Segmentation • CKIP (中研院的語料) • Closed Test – the highest F measure • This paper (0.958) • Open Test – the highest F measure • This paper (0.959)

Conditional Random Fields (CRF) • CRF is a statistical sequence modeling framework • Work by Peng et al. first used this framework for Chinese word segmentation by treating it as binary decision task such that each Chinese character is labeled either as the beginning of a word (B) or not (I) ( Peng et al., 2004 ) • Λ={λ1,λ2,…} • y = { y1,…,yT } : tag set • x = { x1,…,xT } : Chinese character set

Conditional Random Fields • Testing Result • 王 • 建 • 民 • 是 • 真 • 男 • 人 • For example • Training format • 王 B • 建 I • 民 I • 是 B • 真 B • 男 B • 人 I • 王建民/是/真/男人 • Feature template • Unigram • Cn, n = -1,0,1 • Bigram • Cn Cn+1, = -1,0

Conditional Random Fields • Testing Result • 王 B • 建 I • 民 I • 是 B • 真 B • 男 B • 人 I • For example • Training format • 王 B • 建 I • 民 I • 是 B • 真 B • 男 B • 人 I • 王建民/是/真/男人 • Feature template • Unigram • Cn, n = -1,0,1 • Bigram • Cn Cn+1, = -1,0

Tag Set Selection • There are two kinds of schemes that are used to distinguish the character position in a word in previous work • Xue and Ng • Maximum entropy model • Peng and Tseng • CRF model

Tag Set Selection • In this paper • To effectively perform tagging for long words • To expend the 4-tag set of Ng / Xue into a 6-tag set • Begin • Middle • End • Single • B2 • B3 • 長詞範例 : • 故宮博物院 == { B, B2, B3, M, E }

Feature Template For Closed Test

Feature Template For Closed Test • Code e : date, digital and letter • Class 1 = Number • Class 2 = Date • Class 3 = English letter • Class 4 = Other • Code f : tone • For example : • 中, 國, 很, 大, 嗎 == { 1, 2, 3, 4, 0 }

Feature Template For Open Test • (1) External Dictionary • To use the online dictionary from Peking University • Consisting of about 108000 words of length 1 to 4 characters • If there is some sequence of neighboring characters around Co in the sentence that matches a word in this dictionary • Greedily choose the longest such matching word W • For example • 中國大陸 => 中國大陸 => { B E B E } • 若字典有『中國大陸』 • 則中國大陸=> { B B2 B3 E }

Feature Template For Open Test • (1) External Dictionary (cont.)

Feature Template For Open Test • (2) Assistant Segmenter • Idea ( observation ) • Most words are still segmented in the some way according to different segmentation standards • Thus, though those segmenters trained on different corpora will give some different segmentation rules • Main segmenter • Assistant segmenters • A feature template will be added for assistant segmenter : t (Co) • The output tag of the assistant segmenter for Co ( ex. B)

Feature Template For Open Test • (2) Assistant Segmenter (cont.) • To integrate all other segmenters that are trained on all corpora from Bakeoff-2003, 2005 and 2006 with the feature set used in closed test. • The segmenter, MSRSeg, described in (Gao, 2003) is also integrated, too.

Feature Template For Open Test • Assistant segmenter method VS. The additional training corpus method • The performance of additional corpus method depends on the performance of the trained segmenter that carries out the corpus extraction task • If the segmenter is not well-trained, then it cannot effectively extract the most wanted additional corpus to some extent • The additional corpus method is only able to integrate useful corpus, but it cannot integrate a well-trained segmenter while the corpus cannot be accessed • The additional corpus is very difficult to use in CRF model • The increase of corpus can lead to a dramatic increase of memory and time consuming in this case

Evaluation Results

Advisor: Hsin-Hsi Chen Speaker: Yong-Sheng Lo Date: 2008/03/14

Advisor: Hsin-Hsi Chen Speaker: Yong-Sheng Lo Date: 2008/03/14

Presentation Transcript

ARM Assembly Programming

Briefing: RVUs and RWPs, An Advanced View Speaker: Rich Holmes, Wendy Funk Date: 22 March 2007 Time: 1010 - 1100 – T

Presenter : Min- Chia Chang Advisor : Prof. Jane Hsu Date : 201 1 - 06 -30

Advisors: Rurng-Sheng Guo Wen -Chen Chang Graduate: Su-Yin Wang 2009/06/19, NKNU

Check your speaker volume—1 of 5

Weekly Report

Welcome

Coastal Ocean Modeling and Prediction

Null-field approach for multiple circular inclusion problems in anti-plane piezoelectricity

Probing properties of neutron stars with heavy-ion reactions

ARM Instruction Set

Chapter 4 Lists Fundamentals of Data Structures in C

Do Now!

Speaker Name: Tim Walker

Hiden symmetry and strongly interacting fermions correlations at Finite T and ρ N

FCUSA Annual Meeting 2008

Approximate Boyer-Moore String Matching

DCP 1172 Introduction to Artificial Intelligence

Chapter 2 Modeling

Chapter 2 Modeling

Advanced Architecture