190 likes | 334 Views
Text Mining: an Introduction. By:Alireza Vazifedoost Univeristy of Tehran Elec. & Computer Eng. Department a.vazifedoost@ece.ut.ac.ir. Agenda . Definitions Applications Text Mining Process Conclusion References. Introduction.
E N D
Text Mining: an Introduction By:Alireza Vazifedoost Univeristy of Tehran Elec. & Computer Eng. Department a.vazifedoost@ece.ut.ac.ir
Agenda • Definitions • Applications • Text Mining Process • Conclusion • References
Introduction • Huge volume of Information : it’s difficult to find what really we have! • 80% of our Information is in unstructured of semi structured format. • Three main approaches: • Information Retrieval or Document Retrieval : vector space, LSI… • Information Extraction: such as filling a database from some emails Information. • Knowledge Discovery: Oops! can be described as the process of identifying novel information from a collection of texts
Introduction (cont.) • Text Data mining=Text Mining= Knowledge discovery in Text (KDT) • Some Definitions: • Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. (Hearst) • A key element is the linking together of the extracted information together to form new facts or new hypotheses to be explored further by more conventional means of experimentation
Introduction (cont.) • the process of discovering heretofore unknown information from a text source (Hearst) • looking for patterns in unstructured text (Nahm) • text mining applies the same analytical functions of data mining to the domain of textual information(Doore(
Introduction (cont.) • Text Mining is different with Information Extraction • IE likes filling a database with already known information from unstructured texts. • There is no novelty involved
Text Mining, a Conjunction! • Text mining is an inter-disciplinary field • using techniques from the fields of • information retrieval • natural language processing • machine learning • visualization • clustering • summarization
Some Applications • News Mining • Feature Extraction. • Search and Retrieval • Categorization( Supervised classification) • Clustering Unsupervised Classification) • Summarization • Trends Analysis • Associations • Visualization
Text Mining Methodologies • Text Mining can be performed by a collection of methods from various technological areas. • can be roughly grouped under two main headings. • performance-based • knowledge-based
Performance Based • designers are concerned with the effective behavior of the system and not necessarily with the means used to obtain that behavior. • Statistical Methods • Neural Network
Performance Based: Association Rules Extraction • A={w1,w2,…,wn} : a set of keywords • T={t1,t2,…,tn}: each ti is associated with a subset of A, i.e. ti(A). • Let W c A be a set of key words, the set of all documents t in T such that W c t(A) will be called covering set for W and denoted [W]. • Any pair (W,w), where W c A is a set of keywords and w E A\W will be called association rule, and denoted by: W=>w
Performance Based: Association Rules Extraction (cont.) • R : W=>w • S ( R,T)= |[W ∪ {w}]| is called Support of R . • C (R,T) = |[W ∪ {w}]| / |[W]| is called Confidence of R. • By Confidence we mean conditional probability of a text indexed with keywords w, if it is already indexed with keyword set W. • S ( R,T) > σ , C (R,T) >γ
Knowledge-based systems • Knowledge-based systems on the other hand use explicit representations of knowledge. • meaning of words, relationships between facts, and rules • NLP based • Using patterns. • GATE: • POS, Geographical taging,… • Ontology based
Conclusions • There is a great need for transforming Information to knowledge. • Text Mining is relatively young. • NLP will have a great role in this field.
References [1] M. Hearst, Untangling text data mining. [2] Ah-Hwee Tan, Text Mining: The state of the art and the challenges [3] Text analysis and understanding [4] Martin Rajman,TextMining knowledge extraction from unstructured textual data. [5] Aditya Kumar Sehgal,Text Mining: The Search for Novelty in Text
THANK YOU Questions