10 likes | 195 Views
Data Mining in the Weblog Dr. Teh Ying Wah Faculty of Computer Science and Information Technology University of Malaya. Introduction
E N D
Data Mining in the WeblogDr. Teh Ying WahFaculty of Computer Science and Information TechnologyUniversity of Malaya Introduction For a data warehouse environment, sales managers need to deal with very large data sets of sales items due to globalised marketing as current and future trends. To make globalisation possible, we must allow sales managers throughout the world to log on the system. On the average, users can tolerate at most 8 seconds, as this is the limit of peoples’ ability to keep their attention focused while waiting. Getting a reasonable response time is a very critical issue for a company that is going for globalization. Indexes have emerged as one of the techniques for dealing with very large data volumes and fast response time requirements in the data warehouse environment. Table 2 shows a training data set with four data attributes and two classes. Table 2 : Training Data Set Fig. 2 shows how the data mining technique works with the training data set. Fig. 2: Decision Tree Model Literature Review Current research in query processing techniques comprises either the automatic or non-automatic selection of query processing techniques (Table 1). Both approaches, however, are not suitable for a data warehouse. There are too many parameters to select in data warehouse performance tuning. Microsoft’s AutoAdmin and Microsoft SQL 2000’s tuning wizard use the optimiser estimated cost for all the SQL statement. Microsoft SQL 2000’s tuning wizard is not an open-source software, thus, it is impossible to change the existing codes. Therefore, data mining techniques are proposed as intelligent ways to handle the query processing techniques in this research. Evaluation The test data which is evaluated is based on Transaction Processing Performance benchmark Council’s web log file. Table 3 shows the performance TPC-H sample web log file. Table 3 : Performance TPC-H Web Log Data Mining Techniques in Indexes A high priority user’s (such as a manager) access Weblog file keeps track of the high priority user decision-support queries from time T1 to time T19, as shown in Fig. 1 Fig. 1: Weblog Conclusion There are great improvements in response times of queries after applying data mining models in indexes.