1 / 14

NLP Project 1

NLP Project 1. 程智聪 韩冬 张坚. Agenda. Data Preprocess Feature Selection Classification Result Analysis Summary. Data Preprocess. XML library: XOM 1.1 Missing lexical-sample.dtd Missing POS tags na, nx, Ug, … Customized POS tag: * Handling subword -Xms128M -Xmx512M. 钻研 中医 理论 , 试图从前人. v.

Download Presentation

NLP Project 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLP Project 1 程智聪 韩冬 张坚

  2. Agenda • Data Preprocess • Feature Selection • Classification • Result Analysis • Summary

  3. Data Preprocess • XML library: XOM 1.1 • Missing lexical-sample.dtd • Missing POS tags • na, nx, Ug, … • Customized POS tag: * • Handling subword • -Xms128M -Xmx512M

  4. 钻研中医理论,试图从前人 v n n w * * * Data Preprocess • Handling multiple target items in the context • Handling punctuation 面临我国心脑血管疾病发病和死亡率逐年上升的严重趋势,现任中国<head>中医</head>研究院长城医院院长的周文志教授历时30余年艰辛探索,采用中医中药方法治疗和预防心脑血管疾病取得显著成果。

  5. Feature Selection IncludeTokenPreOffset = 1 IncludeTokenPostOffset = 1 吴宏权<head>使</head>出全身 EndOffset = 1 StartOffset = -1 0 nr,v,吴,宏权,出,全身,use

  6. Classification • Data mining library • Weka 3.6, maxent 20041229 • Classifier • MLP • L: 0.6 H:12, 4, (adaptive) M: 0.9 • SMO • NormalizedPolyKernel C:1.9 • NaiveBayes • MEM • G: 20

  7. Result Analysis 68% • Only POS

  8. Result Analysis • Only POS (con’t) Context

  9. Result Analysis • Including tokens

  10. Result Analysis • Including tokens (con’t)

  11. Result Analysis • Punctuation optimization

  12. Result Analysis • Performance

  13. Summary • Less POS features are better • Post POS/token features are more important • Punctuation matters • Possible improvements • Typical words in sentence as features • Collocations as features

  14. Q & A Thanks

More Related