150 likes | 347 Views
A Data Mining Course for Computer Science and non Computer Science Students. Jamil Saquer Computer Science Department Missouri State University Springfield, MO. Outline. Introduction Motivation Challenges Design of the Course Topics Covered Assignments Examination Format Conclusion.
E N D
A Data Mining Course for Computer Science and non Computer Science Students Jamil Saquer Computer Science Department Missouri State University Springfield, MO
Outline • Introduction • Motivation • Challenges • Design of the Course • Topics Covered • Assignments • Examination Format • Conclusion
Introduction • What is data mining (DM)? • non-trivial process of identifying valid, novel, useful, and ultimately understandable patterns in large volumes of data. • DM is an interdisciplinary topic • Has many things in common with machine learning and pattern recognition
Motivation for the Course • Introducing more electives • Introducing graduate level CS courses • Informatics Program • Interest to faculty members and students from other departments • Author’s main area of research
Challenges in Designing the Course • Diverse student population • CS vs. non-CS • undergrad vs. grad • Solution • Informatics program in design stages • MNAS CS option is new • Therefore, emphasis on undergrad CS students
Accommodating other students • Minimize prerequisites • CS 2 (or even CS 1) • Capable of using a DM software • Scientific background/mentality • One from business, another from GGP • For grad CS students: • project requires more research • Tests could be a little different • Emphasize understanding basic DM concepts and using software for mining data
Design of the Course • Used book by Dunham • Book divided into 3 parts • About 1 week spent on definitions, applications, motivations, challenges, … • Core of the course spent on core DM subjects: classification, clustering, mining association rules • Last week for project presentations
Classification • Assigning objects to classes • supervised learning • Example: classify a military vehicle as a friendly or an enemy vehicle • Methods covered include: decision trees, Naïve Bayesian, k-nearest neighbor, backpropogation
Clustering • Grouping objects into different classes • unsupervised learning • Example: cluster Weblog data to discover groups of similar access patterns • Techniques covered include: link algorithms, nearest neighbor, k-means, PAM, BIRCH, DBSCAN, CURE, ROCK
Association Rules • Finding patterns that occur together • Example: diapers and beer are usually bought together • Techniques covered: Apriori, sampling, partitioning, FP-growth
Assignments • Students need to learn how to mine data • One assignment on each core DM topic • apply two different algorithms on at least two data sets, one has to be relatively large • can use any DM package (Weka) • Students write a report • Students learn how to run an experiment
Term Project • Group projects • Either provide a non-trivial implementation of a DM algorithm • Or, learn about a DM topic not discussed in class • Graduate students required to read at least three research papers and to write a report • All students present their project in class
Examination Format • Open book • Two types of questions • First type, require basic knowledge of the material • definitions, T/F, short answers • Second type, apply certain algorithms on small data sets
Conclusion • DM is an interesting course for CS and non-CS students • DM can be taught for non-CS students • A DM course can be taught for students with minimal CS background