210 likes | 279 Views
Term Extraction from Financial News. Jian-Shiun 2008/10/31. Financial News -鉅亨網. Data Collection. Period : 2008/10/10 ~ 2008/10/30 Number of news : 1,987. Accumulated Grams. grams. docs. Metrics. Frequency Conditional Probability Mutual Information. Mutual Information.
E N D
Term Extraction from Financial News Jian-Shiun 2008/10/31
Data Collection • Period:2008/10/10 ~ 2008/10/30 • Number of news:1,987
grams docs
Metrics • Frequency • Conditional Probability • Mutual Information
Mutual Information • If f(w) ≥ f(c1) f(c2)… f(cn), then Mi(w) ≥ 0
Extreme Status Using MI • f(w) is very low, and MI is very high* • f(w) is very low, and MI is very low • f(w) is very high, and MI is very high* • f(w) is very high, and MI is very low
Further Work • PAT-Tree • Pattern Filter • Cross Validate with CKIP
Reference • 劉開瑛(2000),中文文本自動分詞和標註,北京:商務印書館。