1 / 8

Assignment Help

This group project involves finding and exploring data using Data Mining techniques such as Clustering, ARM, and Classification with algorithms like k-means, Apriori, and Naïve Bayes. The project requires a report and presentation analyzing results, data complexity, preprocessing, and relevance. Tasks include exploring how preprocessing affects results, analyzing clustering outcomes based on pre-processing, and classifying datasets efficiently. Students will need to address attribute nominality, zero values, and labeling for accurate analysis.

Download Presentation

Assignment Help

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assignment Help

  2. Basic Information • Due date • 5pm Friday 31st May 2019 • Group work • Two maximum (but could allow up to 3 if there are good reasons) • Procedure • Find a data to explore • Explore and find patterns using DM techniques • AT LEAST one DM area and AT LEAST one DM algorithm • Clustering with k-means, DBSCAN or/and hierarchical clustering • ARM with apriori or FP-growth • Classification with kNN, DT, NN, BN etc • 40% total • 30% Report • 10% Presentation

  3. Basic Information • 30% Report (10-15 pages) • Data complexity & preprocessing: 5% • Relevance & appropriateness: 5% • Readability & presentation: 5% • Analysis of results and conclusion: 5% • Scientific & technical quality: 5% • Structure & organisation: 5% • 10% Presentation (5min + 2min Q/A) • Visual aids (2%) • Information communication (2%) • Good eye contact and presentation gestures (2%) • Length of presentation (2%) • Delivery and Q/A (2%)

  4. ARM • Algorithm • Apriori or FP-growth • Things to consider • Ensure all attributes are nominal • Ensure zero (or absence or negative or unimportant) value does not dominate the result • Remove zeros • Demo: BookClub • Possible scenarios • Explore how preprocessing affects the results and report • NumericToBinary • Discretize • Explore multi-level (hierarchical ARM) • Demo with the crime data

  5. Clustering • Algorithm • k-means & DBSCAN & Hierarchical • Possible scenarios • For a dataset with labels • Explore how pre-processing affects the clustering results • Demo: with the iris dataset (k-means vs. cfssubsetevalvs. PCs) • Explore how parameter tuning affects the clustering results • Different number of seeds (the effect of seeds), different k • For a dataset without labels • We don’t know the number of k here • Explore how to choose the best value k using the k-means for a chosen dataset • Use the within cluster sum squared errors (might be k from 1 to 10) • Draw a distribution when the errors drop suddenly • Or explore the dataset to find any interesting patterns

  6. Classification • Algorithm • 1R, J48, Ibk, MultilayerPerceptron, SMO, NaïveBayes • Things to consider • Ensure the dataset has a label attribute • Possible scenarios • Explore how preprocessing affects the results and report • How dimension reduction or attribution selection affects the results and classification efficiency for a certain classifier (or multiple classifiers) • Compare and contrast classification accuracy with various classifiers to see which performs well for a certain dataset and you derive your justifications for why is that?

  7. A Sample Report Structure • Introduction (1 page) • Brief background • Description on Dataset (1-2 pages) • Details about the dataset including how many instances, attributes etc • Preprocessing • Details about the preprocessing done for the dataset including cleaning, transformation etc • DM Area and DM Algorithm • Brief introduction into DM area & algorithm of your choice and provide justifications for the choice • Scenarios (Exploration of the Effect of k in k-means clustering) • Explanation on what you are going to do in your DM project • Results and Analysis • Conclusion • Reference

  8. Basic Information • Due date • 5pm Friday 31stMay 2019 • Presentation • In-class during Week 13 lecture time • What to submit? (through LearnJCU) • Report & data • Power point slide • Scripts/codes you’ve written

More Related