Basic Statistics with Microsoft Excel

Basic Statistics with Microsoft Excel Helen Dixon

Aim and Objectives • Aim of today’s course • To illustrate how Excel can be used to carry out some basic statistical analyses and tests • Objectives • To show you how to use some of the statistical worksheet functions available within Excel • To show you how to use some of the tools available in the Analysis ToolPak • To make you aware of the limitations of Excel

Why use Excel? • Software more accessible • Previous familiarity with software • Easy to format output • Better charting facilities than some statistical applications • Access to other key Excel facilities • Easy to use results with other applications

Problems with Excel • Errors due to rounding, missing data or extreme values • Not suitable for very large data sets • Output labelled or arranged inappropriately • Need to repeat processes for different variables or options • No record of analyses • Some algorithms are numerically unstable - little or no information about algorithms employed • Analysis ToolPak results are not dynamic and may vary with results generated by functions

Statistical Functions • Frequency Distributions • Mean, Median and Mode • Percentiles and Quartiles • Deviation and Squared Deviation about the Mean • Variance and Standard Deviation • Covariance and the Correlation Coefficient

Frequency • Use COUNTIF to count how many times an item appears in a list • =COUNTIF(range, criteria) • Use FREQUENCY to calculate how often values occur within a range • =FREQUENCY(data_array, bins_array) • Can also use Histogram tool in Analysis Toolpak

Mean, Median, Mode • Use AVERAGE or AVERAGEA to calculate the arithmetic mean • =AVERAGE(number1, number2, etc.) • Use MEDIAN to return the middle number • =MEDIAN(number1, number2, etc) • Use MODE to return the most common value • =MODE(number1, number2, etc)

Percentiles and Quartiles • Use PERCENTILE to return the kth percentile of a data set • =PERCENTILE(array, percentile) • Percentile argument is a value between 0 and 1 • Use QUARTILE to return the given quartile of a data set • =QUARTILE(array, quart) • Quart is 1, 2, 3 or 4 • IQR = Q3-Q1 • May return different values to statistical package

Variance and Standard Deviation • Use VAR, VARA, VARP or VARPA to calculate the variance for a range • E.g. =VAR(value1, value 2, etc.) • Squared deviations about the mean/N or /n-1 • Use STDEV, STDEVA, STDEVP or STDEVPA to calculate the standard deviation for a range • =E.g. =STDEV(value1, value2, etc.) • Positive square root of variance

Covariance and the Correlation Coefficient • Use COVAR to calculate the covariance • =COVAR(array1, array2) • Average of products of deviations for each data point pair • Depends on units of measurement • Use CORREL to return the correlation coefficient • =CORREL(array1, array2) • Returns value between -1 and +1 • Also available in Analysis ToolPak

Probability • Numerical measure of the likelihood that an event will occur • Some probabilities that can be calculated using Excel: • Binomial Probabilities • Poisson Probabilities • Hypergeometric Probabilities • Normal Probabilities • Exponential Probabilities

Binomial Probabilities • Use BINOMDIST to compute binomial distribution probabilities and cumulative binomial probabilities • =BINOMDIST(number_s, trials, probability_s, cumulative) • Calculates the probability that a sequence of independent trials with two possible outcomes will have a given number of successes • Cumulative is either TRUE or FALSE

Poisson Probabilities • Use POISSON to compute Poisson Probabilities • =POISSON(x, mean, cumulative) • Shows the probability of x occurrences of an event over a specified interval of time or space

Hypergeometric Probabilities • Use HYPGEOMDIST to compute hypergeometric probabilities • =HYPGEOMDIST(sample_s, number_sample, population_s, number_pop) • Computes the probability of x successes (sample_s) in n trials (number_sample) when the trials are dependent • Similar to Binomial except trials are not independent – probability of success changes from trial to trial • Does not compute cumulative probabilities

Normal Probabilities • Use NORMSDIST or NORMDIST to compute the cumulative probability • =NORMSDIST(z) • =NORMDIST(x, mean, standard_dev, cumulative) • Use NORMSINV or NORMINV to compute the z or x value given a cumulative probability • =NORMSINV(probability) • =NORMINV(probability, mean, standard_dev)

Exponential Probabilities • Use EXPONDIST to compute exponential probabilities • =EXPONDIST(x, lambda, cumulative) • x is the random variable • Lambda is 1/mean • Useful in computing probabilities for the time it takes to complete a task

Analysis ToolPak • Descriptive Statistics • Correlation • Linear Regression • t-Tests • z-Tests • ANOVA • Covariance

Mean, Median, Mode Standard Error Standard Deviation Sample Variance Kurtosis Skewness Confidence Level for Mean Range Minimum Maximum Sum Count kth Largest kth Smallest Descriptive Statistics

Correlation and Regression • Correlation is a measure of the strength of linear association between two variables • Values between -1 and +1 • Values close to -1 indicate strong negative relationship • Values close to +1 indicate strong positive relationship • Values close to 0 indicate weak relationship • Linear Regression is the process of finding a line of best fit through a series of data points • Can also use the SLOPE, INTERCEPT, CORREL and RSQ functions

t-Tests and z-Tests • Used to test hypotheses by comparing means • If sample means are equal suggests both samples came from same population • t-Test – n <30 • Equal or unequal variances or paired test • Check result using TTEST function • z-Test – n>30 • Used for means with known variances

ANOVA: Analysis of Variances • Compares variances in two or more data sets • If difference is found it can be assumed that the means of the data sets are different • Single Factor – use instead of t-Test for more than 2 samples • Two Factor with Replication – useful when data can be classified along 2 different dimensions • Two Factor without Replication – as above but only one observation for each pair

PivotTables • Use for crosstabulations • Data must be in tabular format: columns with headings, no blank columns • Easy to pivot data • Easy to create PivotCharts • Can summarise and analyse data without affecting data source

Final Tips • Excel only suitable for basic analysis using small data sets • Later versions of Excel more reliable than Excel 97 • Check Analysis TookPak results with worksheet functions • Check overall results by hand or with dedicated statistical package

Basic Statistics with Microsoft Excel

Basic Statistics with Microsoft Excel

Presentation Transcript

Basic Statistics with Excel

Microsoft Excel

Microsoft  Excel 

Working with Microsoft Excel 2010

Statistics With Excel

Calculating Statistics in Microsoft Excel

Microsoft Excel

Microsoft Excel

Statistics for Managers Using Microsoft Excel

Microsoft Excel Basic Skills

Microsoft Excel