220 likes | 411 Views
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 679: Advanced NLP. Lecture #1: Introduction to Text Mining. Objectives for Today. Quick course info. Overview of Text Mining Discuss your applications of Text Mining Elements of Text Mining
E N D
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 679: Advanced NLP Lecture #1: Introduction to Text Mining
Objectives for Today • Quick course info. • Overview of Text Mining • Discuss your applications of Text Mining • Elements of Text Mining • Introduce course objectives
Course Info. • Office Hours: • Tue & Thu. 3-4pm (without appointment) • OR by appointment • TA: TBD • Web page: https://facwiki.cs.byu.edu/cs679 • Syllabus • Regularly updated schedule: Due dates, Reading assignments, Projects guidelines, Lecture Notes • Google Group “BYU CS 679” • Email: ringger AT cs DOT byu DOT edu • Grades: http://gradebook.byu.edu
Assignments • Readings – with max. one page reports • Mostly research papers (see course web page for all hyperlinks) • Usually one reading report per week • Intro. Projects • Presentation • Report • Semester Project • Proposal • Presentation • Report
Course Policies • Early • Late • Grades • Other See Syllabus for details
Text Mining The process of discovering previously unknown information in large text collections Paraphrased from M. Hearst
Other Definitions • Looking for patterns in unstructured text (Nahm) • Text mining applies the same analytical functions of data mining to the domain of textual information (Doore(
“Search” versus “Discover” Search (goal-oriented) Discover (opportunistic) Structured Data Data Retrieval Data Mining Unstructured Data (Text) Information Retrieval Text Mining Credit: adapted from slide by Nathan Treloar, AvaQuest
Additional Applications • News Mining • Sentiment Detection • Summarization • Trend Analysis • Association Detection
Course Objectives • Acquire experience conducting exploratory data analysis on large collections of text • Gain in-depth experience with and understanding of approaches to • document classification • sentiment classification • feature engineering • feature selection • document clustering • unsupervised topic identification • visualization, including document summarization • Build a foundation of techniques for approximate Bayesian reasoning for unsupervised text analysis
Course Objectives (2) • Obtain experience with techniques for evaluating and visualizing the results of unsupervised learning processes • Independent investigation of methods of your choice! • Application of your methods to learn something important from a significant text corpus of your choice
Simplistic Text Mining Process Credit: NCSA
Methods • Feature Engineering • Feature Selection • Information Extraction • Categorization (Supervised) • Clustering (Unsupervised) • Topic Identification / Topic Modeling • Visualization
Some Available Data Sets • 20 Newsgroups -- Usenet • Reuters (1990s) newswire • Del.icio.us bookmarked web pages • Enron Email • Movie Reviews • Gamespot game reviews • General Conference • State of the Union • Campaign Speeches… • Yours!
Assignment • Reading for next time: • Course Syllabus • "Tapping the Power of Text Mining" by Fan et al. (CACM 9/2006) • "Text-Mining the Voice of the People" by Evangelopoulos et al. (CACM 2/2012) • Skim: Alta Plana Text Analytics Report • Reading Report #1 • % Completed • Questions