120 likes | 262 Views
Text Mining Application Programming Chapter 1 Introduction. Manu Konchady, 2006. Definition: Text Mining. all types of text processing that deal with finding, organizing, and analyzing information. (formal) the creation of new information that is not obvious in a collection of documents.
E N D
Text Mining Application ProgrammingChapter 1 Introduction Manu Konchady, 2006
Definition: Text Mining • all types of text processing that deal with finding, organizing, and analyzing information. • (formal) the creation of new information that is not obvious in a collection of documents. • New information is defined as a pattern, trend, or relationship that can’t be easily gleaned by reading individual documents. • The term document to refer to any unit of text, such as a Web page, an e-mail, a formatted article, a set of slides, or a plain text file.
Data Mining vs. Text Mining • Data mining deals with structured numeric data, text mining deals with unstructured text. • Data used for data mining is extracted, transformed, and loaded in a data warehouse. • Text mining attempts to build a model from data that is assumed to be imprecise.
Origins of Text Mining • Information Retrieval • Natural Language Processing
Understanding Text • “Alice saw the rabbit with glasses,” • Polysemy • “In what state would you find Lincoln” • “free software” • Synonymy • More than one word can be expressed the same meaning. • Exuberant: lush, luxuriant, profuse, and riotous.
Text Mining Functions • Searching • Information Extraction • Clustering • Categorization • Summarization • Information Monitor • Question and Answer
Text Mining Installation • Text Mine (http://textmine.sf.net) is a collection of Perl modules and code on SourceForge to index, cluster, classify, and summarize text.
Usage • Command line • Web-based interface.