70 likes | 146 Views
7ET023 – MSc Dissertation. Student Name: Colin Hopson Student Number: 0482647 Course Title: MSc Computer Science (Internet Engineering). Research Question : What is the most suitable web mining technique for a specified business and mobile application case study?. Contents:
E N D
7ET023 – MSc Dissertation Student Name: Colin Hopson Student Number: 0482647 Course Title: MSc Computer Science (Internet Engineering) Research Question: What is the most suitable web mining technique for a specified business and mobile application case study? • Contents: • 1 – Introduction to the subject of web mining and techniques • 2 – Overview of research conducted (both theory and practical) • 3 – Software applications on which to test web mining techniques • 4 – Demonstration (Digital Solutions and Repairs) • 5 – Evaluating results (suitability and practicality)
7ET023 – MSc Dissertation 1 – Introduction to the subject of web mining and techniques • Sequential research of techniques for an empirical study • Initial research into data mining (databases) • Previous knowledge of web services (RSS, REST, etc.) • Research into theory of web mining • Web usage mining – logs to examine navigation patterns • Web structure mining – examine link hierarchy • Web content mining – “the discovery of useful information from the Web by examining the data that is contained in the Web site” (Pendharkar, 2003 pg.243) * Pendharkar, P.C. (2003) Managing data mining technologies in organizations: techniques and applications, Idea Group Pub, Hershey. • Data extraction from HTML (machine learning algorithms) • Wrapper Induction • Semi-Automatic Extraction
7ET023 – MSc Dissertation 2 – Overview of research conducted (both theory and practical) • Researching Theory of Data and Web Mining • Empirical research method to acquire knowledge, • Research into data mining, web mining, data extraction algorithms, etc., • Sequential investigation of applicable techniques. • Artefact Design and Development • E-commerce prototype website (Digital Solutions and Repairs), • Mobile application (Mobile Shopper). • Practical Research to Implement Techniques • Resolution of web services (Amazon APIs), • HTML extraction technique using XML; DOM; Xpath; PHP Arrays, • Consuming Google API with REST; DOM; Xpath; PHP Arrays, • Third-Party Software (Newprosoft and Automation Anywhere), • Functionality of XSLT.
7ET023 – MSc Dissertation 3 – Software applications on which to test web mining techniques
7ET023 – MSc Dissertation 4 – Demonstration (Digital Solutions and Repairs) • Web Mining Technique 1 • Amazon API • (coded class/methods) • Web Mining Technique 2 • HTML Extraction • (DOMDocument, Xpath and PHP Arrays) • Web Mining Technique 3 • Google API • (REST, DOMDocument, XPath and PHP Arrays) • Web Mining Technique 4 • Third-Party Software • (Automation Anywhere and Newprosoft) • Web Mining Technique 5 • None Implemented, but XSLT investigated Website Demonstration >>>
7ET023 – MSc Dissertation 5 – Evaluating results (suitability and practicality) • Web Mining Technique 1: Amazon API • Requires registration and associate keys, • Product Advertising API has most requirements (plus more), • ASINs assist administration system, • Top quality delivery and discounts, • Regular updates although lengthy documentation. • Web Mining Technique 2: HTML Extraction • No cost, but requires programming knowledge, • Bespoke algorithm specific for HTML format, • Limited to one online organisation. • Web Mining Technique 3: Google API • Requires registration and associate keys, • Searches products from many online organisations, • GoogleId does not assist administration system, • Web service retrieves limited product information, • Top security measures, but lengthy documentation. • Web Mining Technique 4: Third-Party Software • Limited free trial with subscription costs, • Possible difficulty with integration with administration system • Web Mining Technique 5: XSLT investigated • Limited free trial with subscription costs, • Integration difficulties with administration system
7ET023 – MSc Dissertation SUMMARY • Study of web mining and some of its techniques • Empirical study, data mining, web services, web content mining, data extraction algorithms. • Sequential research conducted (theory and practical) • Web services (APIs), HTML extraction, Third-Party software, XSLT. • E-commerce prototype website and mobile application • ‘Digital Solutions and Repairs’ and ‘Mobile Shopper’. • Demonstration of web mining techniques • DSR computer repairs administration system • Evaluation of web mining techniques investigated • Comparison between APIs, HTML extraction, third-party software and XSLT. Questions?