1 / 23

Basic Statistics with Microsoft Excel

Basic Statistics with Microsoft Excel. Helen Dixon. Aim and Objectives. Aim of today’s course To illustrate how Excel can be used to carry out some basic statistical analyses and tests Objectives To show you how to use some of the statistical worksheet functions available within Excel

Download Presentation

Basic Statistics with Microsoft Excel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Statistics with Microsoft Excel Helen Dixon

  2. Aim and Objectives • Aim of today’s course • To illustrate how Excel can be used to carry out some basic statistical analyses and tests • Objectives • To show you how to use some of the statistical worksheet functions available within Excel • To show you how to use some of the tools available in the Analysis ToolPak • To make you aware of the limitations of Excel

  3. Why use Excel? • Software more accessible • Previous familiarity with software • Easy to format output • Better charting facilities than some statistical applications • Access to other key Excel facilities • Easy to use results with other applications

  4. Problems with Excel • Errors due to rounding, missing data or extreme values • Not suitable for very large data sets • Output labelled or arranged inappropriately • Need to repeat processes for different variables or options • No record of analyses • Some algorithms are numerically unstable - little or no information about algorithms employed • Analysis ToolPak results are not dynamic and may vary with results generated by functions

  5. Statistical Functions • Frequency Distributions • Mean, Median and Mode • Percentiles and Quartiles • Deviation and Squared Deviation about the Mean • Variance and Standard Deviation • Covariance and the Correlation Coefficient

  6. Frequency • Use COUNTIF to count how many times an item appears in a list • =COUNTIF(range, criteria) • Use FREQUENCY to calculate how often values occur within a range • =FREQUENCY(data_array, bins_array) • Can also use Histogram tool in Analysis Toolpak

  7. Mean, Median, Mode • Use AVERAGE or AVERAGEA to calculate the arithmetic mean • =AVERAGE(number1, number2, etc.) • Use MEDIAN to return the middle number • =MEDIAN(number1, number2, etc) • Use MODE to return the most common value • =MODE(number1, number2, etc)

  8. Percentiles and Quartiles • Use PERCENTILE to return the kth percentile of a data set • =PERCENTILE(array, percentile) • Percentile argument is a value between 0 and 1 • Use QUARTILE to return the given quartile of a data set • =QUARTILE(array, quart) • Quart is 1, 2, 3 or 4 • IQR = Q3-Q1 • May return different values to statistical package

  9. Variance and Standard Deviation • Use VAR, VARA, VARP or VARPA to calculate the variance for a range • E.g. =VAR(value1, value 2, etc.) • Squared deviations about the mean/N or /n-1 • Use STDEV, STDEVA, STDEVP or STDEVPA to calculate the standard deviation for a range • =E.g. =STDEV(value1, value2, etc.) • Positive square root of variance

  10. Covariance and the Correlation Coefficient • Use COVAR to calculate the covariance • =COVAR(array1, array2) • Average of products of deviations for each data point pair • Depends on units of measurement • Use CORREL to return the correlation coefficient • =CORREL(array1, array2) • Returns value between -1 and +1 • Also available in Analysis ToolPak

  11. Probability • Numerical measure of the likelihood that an event will occur • Some probabilities that can be calculated using Excel: • Binomial Probabilities • Poisson Probabilities • Hypergeometric Probabilities • Normal Probabilities • Exponential Probabilities

  12. Binomial Probabilities • Use BINOMDIST to compute binomial distribution probabilities and cumulative binomial probabilities • =BINOMDIST(number_s, trials, probability_s, cumulative) • Calculates the probability that a sequence of independent trials with two possible outcomes will have a given number of successes • Cumulative is either TRUE or FALSE

  13. Poisson Probabilities • Use POISSON to compute Poisson Probabilities • =POISSON(x, mean, cumulative) • Shows the probability of x occurrences of an event over a specified interval of time or space

  14. Hypergeometric Probabilities • Use HYPGEOMDIST to compute hypergeometric probabilities • =HYPGEOMDIST(sample_s, number_sample, population_s, number_pop) • Computes the probability of x successes (sample_s) in n trials (number_sample) when the trials are dependent • Similar to Binomial except trials are not independent – probability of success changes from trial to trial • Does not compute cumulative probabilities

  15. Normal Probabilities • Use NORMSDIST or NORMDIST to compute the cumulative probability • =NORMSDIST(z) • =NORMDIST(x, mean, standard_dev, cumulative) • Use NORMSINV or NORMINV to compute the z or x value given a cumulative probability • =NORMSINV(probability) • =NORMINV(probability, mean, standard_dev)

  16. Exponential Probabilities • Use EXPONDIST to compute exponential probabilities • =EXPONDIST(x, lambda, cumulative) • x is the random variable • Lambda is 1/mean • Useful in computing probabilities for the time it takes to complete a task

  17. Analysis ToolPak • Descriptive Statistics • Correlation • Linear Regression • t-Tests • z-Tests • ANOVA • Covariance

  18. Mean, Median, Mode Standard Error Standard Deviation Sample Variance Kurtosis Skewness Confidence Level for Mean Range Minimum Maximum Sum Count kth Largest kth Smallest Descriptive Statistics

  19. Correlation and Regression • Correlation is a measure of the strength of linear association between two variables • Values between -1 and +1 • Values close to -1 indicate strong negative relationship • Values close to +1 indicate strong positive relationship • Values close to 0 indicate weak relationship • Linear Regression is the process of finding a line of best fit through a series of data points • Can also use the SLOPE, INTERCEPT, CORREL and RSQ functions

  20. t-Tests and z-Tests • Used to test hypotheses by comparing means • If sample means are equal suggests both samples came from same population • t-Test – n <30 • Equal or unequal variances or paired test • Check result using TTEST function • z-Test – n>30 • Used for means with known variances

  21. ANOVA: Analysis of Variances • Compares variances in two or more data sets • If difference is found it can be assumed that the means of the data sets are different • Single Factor – use instead of t-Test for more than 2 samples • Two Factor with Replication – useful when data can be classified along 2 different dimensions • Two Factor without Replication – as above but only one observation for each pair

  22. PivotTables • Use for crosstabulations • Data must be in tabular format: columns with headings, no blank columns • Easy to pivot data • Easy to create PivotCharts • Can summarise and analyse data without affecting data source

  23. Final Tips • Excel only suitable for basic analysis using small data sets • Later versions of Excel more reliable than Excel 97 • Check Analysis TookPak results with worksheet functions • Check overall results by hand or with dedicated statistical package

More Related