10 likes | 232 Views
Class activities. Data mining Software. Topics of Interests. Cost. Market Growth. AN INVESTIGATION OF DATAMINING AND ITS CURRENT TRENDS IN UNIVERSITIES Adrian Nagy, McNair Scholar; Peter Westfall, Ph.D. Area of Information Systems & Quantitative Sciences,
E N D
Class activities Data mining Software Topics of Interests Cost Market Growth AN INVESTIGATION OF DATAMINING AND ITS CURRENT TRENDS IN UNIVERSITIES Adrian Nagy, McNair Scholar; Peter Westfall, Ph.D. Area of Information Systems & Quantitative Sciences, Texas Tech University, Lubbock, TX 79409 ABSTRACT Capturing an accurate data in timely manner in business world is of considerable interest for both industry and academia. Datamining is very promising as a new effective technique for decision making processes. Recently it has been shown that datamining can create a competitive advantage for a company. Because of its decision support capabilities, datamining has substantial significance for academic research and industrial application. Although datamining is widely used in industry, research and teaching at the university level is relatively neglected. The object of this study was to investigate the current status of datamining classes and future trends in this line of study. To analyze current datamining classes, a data set was collected consisting of datamining class syllabi. From these syllabi, datamining software, topics of interest, the length of study, and the department that the class was associated with was identified. Additionally a survey was conducted to determine what these classes plan on incorporating in the future. The hypothesis of this study is that cost has an inverse relationship on the use of data mining software. More data mining classes should implement data mining software as financial resources become available and as software costs decline. The findings of this study suggest that syllabi for data mining classes should be composed of generic components. For instructors interested in establishing datamining classes to improve learning experiences, this study provides a brief foundation for designing a prototype syllabus, the software, and activities associated with a datamining class. • METHODOLOGY • The results of this data mining study are based on data mining class syllabi and survey with faculty members in 66 universities in U.S, two-third of which are currently using data mining softwares, and one-fourth that are planning to do so in the future. • Participant • There were two different participant groups for this study. The first, and primary, group was the faculty members of universities who teach or had taught data mining classes. This group was represented in this study to gather the main factors in selecting data mining software. The secondary group of participants was the syllabi of data mining classes. • Research Method • The primary data collection for this study was gathered from the syllabi of data mining classes in public as well as private universities in the U.S. The data used in this study were of classes offered between the Spring of 2002 to the Spring of 2004. During the data collection, a data set was collected consisting of 56 data mining class syllabi from 66 universities. From these syllabi the courses were examined for data mining software, topics of interest, the length of study, and the department that the class was associated with. In order to collect the secondary data, the faculty members who taught these data mining classes were surveyed. This survey consisted of 8 questions based on the review of data mining software packages from Haughton et al.(2003). The survey was administered by emailing the faculty members and was designed to identify the main factors in selecting a data mining software. Figure 5. Future software adoption factors Figure 6. Data mining Class activities One of the most crucial factors of adopting a data mining software for the future relates to cost. 47% of faculty members ranked cost their main consideration factor. As illustrated in Figure 6, data mining courses evaluated their students through six major class activities consisting of test, project, assignment/homework, literature review, and participation. Tests, which includes both midterms and finals, accounted for 33% of a student’s grade. Projects accounted for 32% of a student’s grade. Among 56 data mining classes, 79% of data mining classes surveyed were taught at a graduate level whereas 21% were taught at an undergraduate level. The instructors who were surveyed in this study varied from full professors to Ph.D students. 50% of data mining classes were taught by assistant professors and full professors followed with 33% of remaining classes. 74% of data mining classes met for 150 minutes per week. The textbook, Data mining : concepts and techniques written by J.Han and M.Kamber, was used by 29% of surveyed classes. The second most popular textbook was Datamining : introductory and advanced topics written by Margaret H. Dunham which was used in 19% of the remaining classes. High Test(midterm/final), 33% Cost, 47% Project, 32% Ubiquitous Computing Assignment/Homework, 20% Features, 12% Presentation, 6% Availability, 31% Paper/Article review, 5% Fitness to course content, 10% Participation & Misc. 4% Low BACKGROUND Computers are so much a part of modern life that we hardly notice them. Today’s technology makes it easy for a business to accumulate massive amount of information. Many organizations are content to retrieve information using the queries, searches, and reports. But others are finding that there is gold hidden in their large databases. Paul Gray defined data mining as “finding answers about an organization from the information in the data warehouse that the executive or the analyst had not thought to ask (Paul Gray, 1997)”. Beekman and Rathswohl add that data mining “uses advanced statistical methods and techniques to capture trends and patterns in data that would have been neglected by normal database queries” (Beekman & Rathswohl, 2003). The data mining market has been rapidly increasing every year. According to recent IDC’s report entitled “Worldwide Business Intelligence Forecast and Analysis, 2003-2007”, the data mining market increased 5.1% to reach $486 million in 2002. How can data mining be so versatile for a company in weak, sluggish economy? Data mining enables a company gain a competitive advantage over other competitors by identifying customer’s buying patterns and trends. For an instance, one leading beverage company is data mining savvy. This company has adopted data mining to figure out their consumers’ buying patterns as well as their competitor’s sales volume on a daily basis. The analysis of primary data collected by sale representatives enables this company to make best fit decisions for operations on a corporate level. This competitive advantage enables this company to gain the largest market share in beverage industry history. To response to the technical increases in this field, companies need more skilled and better trained data mining employees. Where does academia fit into this equation for industries data mining needs? This problem domain is the focus of this study. This study is not attempting to perform a comparison on the abilities of different data mining softwares or classes, but attempts to investigate the current status of datamining classes and their future trends. RESULTS This study tested two hypotheses. The findings from the syllabi and survey are graphically illustrated based on datamining software used, topics of interest, the length of study, and the department that the data mining class was associated with. Figure 1. Types of data mining software Figure 2. Main factors in selecting a data mining software The numbers represent how many times a data mining software was used. The findings show that Weka is the most frequently used in a data mining class. The results of the survey shows that the main factors in selecting a data mining software was an open source software, known as freeware. Fig. 3 Topics of Interests Figure 4. Department associated As Figure 3 shows, the topics of interests in a data mining class vary from introductory courses to research oriented courses. A computer science or computer engineering department is heavily associated with a data mining class as Figure 4 shows DISCUSSION One of the distinctive differences between business schools and Engineering schools was the use of a data mining software. The findings of this study show that engineering schools do not necessarily need a specific data mining software. This is because data mining classes of engineering schools focus on developing algorithms, methods, and techniques. However, business schools have a different focus. Business schools center on applying data mining techniques and software to solve a specific problem. As is common with study involving different universities(56 different universities), a number of factors could have operated separately or in combination to confound results. A limitation of the study design was its focus on one syllabus at the exclusion of others. This reduces ability to generalize to other syllabi. The retrospective nature of the study was the most important limitation of the study. Even though several steps were taken to improve respondent recall, it can not be assumed that individuals had perfect recall during the study period. In conclusion, findings support the hypotheses that more data mining classes would implement data mining software if financial resources were available. 47% of respondents ranked software cost as the most important factor when determining software adoption. Hypothesis 2, if software costs decline more softwares would be adopted, had ambiguous support. Although no respondent specifically said that they would purchase data mining software if costs decreased, it can be inferred by the fact that cost was the most important factor when determining software adoption. Future studies on this topic should attempt to gather a larger data set for statistical relevance. Additionally future studies may focus on the purpose of the data mining courses. This could better define the course offerings as either theory based or application based. Business/Engineering/Statistics ? Hot Data mining Skill Data mining ACKNOWLEDGEMENTS Thanks must first be expressed to Dr. Peter Westfall, without whom this project would have been impossible. I would like to extend my thanks to Dr. Zhangxi Lin and Dr. Elizabeth Teagan for invaluable, generous help in the direction and completion of this study. I really appreciated collaborating with the McNair Scholars Program, Howard Hughes Medical Institute Science Education Foundation, and Texas Tech University. Ms. Pei-Yu, Gao, my graduate mentor, should also be recognized for her help with this project. Finally, great appreciation is given to the data mining faculty members of 56 universities as well as the instructors who it is hoped will one day benefit from this and similar research. HYPOTHESES The use of a data mining software in an educational setting will affect the learning experience. The hypothesis of this study are more data mining classes would implement data mining software: Hypothesis 1. if financial resources were available Hypothesis 2. if software costs decline