160 likes | 324 Views
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic. Presenter : Min-Cong Wu Authors : Lam Hong Lee, Dino Isa, Wou Onn Choo , Wen Yeen Chue 2012.ESA. Outlines. Motivation Objectives Methodology Experiments
E N D
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic Presenter : Min-Cong WuAuthors : Lam Hong Lee, Dino Isa, WouOnnChoo, Wen YeenChue2012.ESA
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • Bayesian classification as compared to other classification approaches is its ability and simplicity in handling raw text data directly • As a trade-off to its simplicity, Bayesian classification has been reported as one of the poorest-performing classification approaches.
Objectives • By use to HRKE facility enhance the accuracy of Bayesian classifier without sacrificing the low cost.
Methodology – TF-IDF method TF*IDF • TF-IDF = • TF(Term Frequency),IDF(Inverse Document Frequency) • N=This word contains the number of document in dataset • Example:
Methodology – HRKE facility • The degree of relevance of keywords in the classification task can be adjusted by setting a threshold, m/n.
Conclusions • HRKE facility is achieved through applying unique feature selection method based on the occurrence of keywords in documents from a specified category, and compares the occurrence of those keywords in each of the competing categories.
Comments • Advantages Improve Bayesian classification performance and can maintain low cost. • Applications - Feature selection