140 likes | 308 Views
Datamining. Data Mining. Data mining is the analysis of large pools of data to find patterns and rules that can be used to guide decision making and predict future behavior.
E N D
Data Mining • Data mining is the analysis of large pools of data to find patterns and rules that can be used to guide decision making and predict future behavior. • Datawarehouse is a database that stores current and historical data of potential interest to managers throughout the company. The data originate in many core operational systems and external sources. Existing Database Data Access and Analysis Datawarehouse Operational systems Excel Oracle Express Ess Base DIAPRISM RedBrick Business Objects Powar Play Experiments and Research Excel S-Plus STASTICA SPSS SAS KnowledgeSEEKER JUSE-MA
Knowledge Management • The process of datamining to find potentially useful information Obtain Knowledge Apply Knowledge and Solutions in actual operation Search Data Obtain Information Build Hypothesis Test Hypothesis Find Problems Find Solutions
Statistical Techniquesused for Datamining • Mean • Median • Mode • Standard Variance • Histogram, Scatters Graph, • Regression Analysis (ANOVA, MANOVA)
Mean • Mean is an average of numerical data. For instance, average of 2, 5, 9, 3, and 1 is 5 that is calculated by summing up all data and dividing that summation by five that is the number of data. • Mean value shows central tendency of data. • To calculate an average, data must be rational that the value of data has meaning.
Median • Median is central point of numeric data. • Median can be found by incrementally of detrimentally ordered every values of data and pick up center value. For instance median of values 2, 5, 9, 3, and 1 can be found by sequentially ordered all values. • 1,2,3,5,9 • Hence, the central value is 3.
Mode • Mode is the most frequently appears value in the data. • Mode identifies the typical value in the data. • Example • The most often appeared number is 53, so the mode of these values is 53.
Standard Deviation • The central tendency such as mean, median, and mode, can show the characteristic of data that can be used as information. However, they are not only values that describe the characteristics of data. • The standard deviation is often used to show how much the values are spread. • Standard deviation can be calculated by the data analysis function in Excel. • In this example, the standard deviation is 8.024951. This SD indicate that 70% of values in the data is within one standard deviation from mean. Also, 95% of values in the data is within two standard deviation. • 53.4 – 8.024951 = 45.37 • 53.4 + 8.024951 = 61.42 • Hence, 70% of data is located between 45.37 and 61.42
Overall • Using mean, median, mode, and standard deviation allow us to analyze data in more objective manner without any analyst’s bias. For instance, when there are two different groups of data that you want to compare. You can use mean, median, mode, and standard deviation. • Example • You can compare the sales data of small shop and market by calculating mean, median, mode, and standard deviation.
Regression Analysis • In the case that the analysis of relationship between two or more set of data is needed, you can use regression analysis. • The regression analysis can be done by the regression analysis function of Excel. • For instance, the relationship between the size of shop and amount of sales may have relationship, but you could not be sure. Then, you can use regression analysis to see if there is significant relationship.