1 / 36

Matlab Training Sessions 8: Introduction to Statistics

Matlab Training Sessions 8: Introduction to Statistics. Course Outline Weeks: Introduction to Matlab and its Interface (Jan 13 2009) Fundamentals (Operators) Fundamentals (Flow) Functions and M-Files Importing Data Plotting (2D and 3D) Plotting (2D and 3D)

Download Presentation

Matlab Training Sessions 8: Introduction to Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Matlab Training Sessions 8:Introduction to Statistics

  2. Course Outline Weeks: • Introduction to Matlab and its Interface (Jan 13 2009) • Fundamentals (Operators) • Fundamentals (Flow) • Functions and M-Files • Importing Data • Plotting (2D and 3D) • Plotting (2D and 3D) • Statistical Tools in Matlab Additional classes will begin next week (Feb 10 2009) and will continue from where the first 8 sessions left off. These sessions will be run by Andrew Pruszynski (4jap1@qlink.queensu.ca) Course Website: http://www.queensu.ca/neurosci/matlab.php

  3. Week 8 Lecture Outline Basic Matlab Statistics • Basic Matlab Statistics • Mean, Median, Variance • Correlations • Statistics Toolbox • Parametric and Non-parametric statistical tests • Curve fitting

  4. Part A: Basics • The Matlab installation contains basic statistical tools. • Including, mean, median, standard deviation, error variance, and correlations • More advanced statistics are available from the statistics toolbox and include parametric and non-parametric comparisons, analysis of variance and curve fitting tools

  5. Mean and Median Mean: Average or mean value of a distribution Median: Middle value of a sorted distribution M = mean(A), M = median(A) M = mean(A,dim), M = median(A,dim) M = mean(A), M = median(A): Returns the mean or median value of vector A. If A is a multidimensional mean/median returns an array of mean values. Example: A = [ 0 2 5 7 20] B = [1 2 3 3 3 6 4 6 8 4 7 7]; mean(A) = 6.8 mean(B) = 3.0000 4.5000 6.0000 (column-wise mean) mean(B,2) = 2.0000 4.0000 6.0000 6.0000 (row-wise mean)

  6. Mean and Median Examples: A = [ 0 2 5 7 20] B = [1 2 3 3 3 6 4 6 8 4 7 7]; Mean: mean(A) = 6.8 mean(B) = 3.0 4.5 6.0 (column-wise mean) mean(B,2) = 2.0 4.0 6.0 6.0 (row-wise mean) Median: median(A) = 5 median(B) = 3.5 4.5 6.5 (column-wise median) median(B,2) = 2.0 3.0 6.0 7.0 (row-wise median)

  7. Standard Deviation and Variance • Standard deviation is calculated using the std() function • std(X) : Calcuate the standard deviation of vector x • If x is a matrix, std() will return the standard deviation of each column • Variance (defined as the square of the standard deviation) is calculated using the var() function • var(X) : Calcuate the variance of vector x • If x is a matrix, var() will return the standard deviation of each column

  8. Standard Error of the Mean • Often the most appropriate measure of error/variance is the standard error of the mean • Matlab does not contain a standard error function so it is useful to create your own. • The standard error of the mean is defined as the standard deviation divided by the square root of the number of samples

  9. Standard Error of the Mean In Class Exercise 1: • Create a function called se that calculates the standard error of some vector supplied to the function Eg. se(x) should return the standard error of matrix x

  10. Standard Error of the Mean In Class Exercise 1: Solution function [result] = se(input_vect) result = STD(input_vect)/sqrt(length(input_vect)); return

  11. In Class Exercise 2 • From the class website download the file testdata1.txt (http://www.queensu.ca/neurosci/matlab.php) • This text file contains data from two subjects arranged in columns • Load the text file into matlab using any method you like (load, import, textread(), fscanf()) • Calculate the mean and standard error for each subject • In figure 1, plot the data distribution for each subject using the hist() plotting function • In figure 2, plot the mean and standard error of each subject using a bar graph (bar() function and errorbar() functions).

  12. In Class Exercise 2Solution %read data [subj1, subj2] = textread('testdata1.txt','%f%f','headerlines',1) %plot distributions of each subject figure(1) hold on subplot(2,1,1) hist(subj1) subplot(2,1,2) hist(subj2) %plot mean and standard error on bar graph figure(2) hold on bar([1,2],[mean(subj1),mean(subj2)]) errorbar([1,2],[mean(subj1),mean(subj2)],[se(subj1), se(subj2)],'r')

  13. In Class Exercise 2Solution Subject 1 Subject 2 Subject 1 Subject 2

  14. Data Correlations • Matlab can calculate statistical correlations using the corrcoef() function • [R,P] = corrcoef(A,B) • Calculates a matrix of R correlation coefficiencts and P significance values (95% confidence intervals) for variables A and B A B R = A AcorA BcorA B AcorB BcorB

  15. Data Correlations • Matlab can calculate statistical correlations using the corrcoef() function • [R,P] = corrcoef(A,B) • Calculates a matrix of R correlation coefficiencts and P significance values (95% confidence intervals) for variables A and B A B R = A AcorA BcorA = 1 BcorA B AcorB BcorB AcorB 1

  16. Data Correlations • Matlab can calculate statistical correlations using the corrcoef() function • [R,P] = corrcoef(A,B) • Calculates a matrix of R correlation coefficiencts and P significance values (95% confidence intervals) for variables A and B A B R = A AcorA BcorA = 1 BcorA B AcorB BcorB AcorB 1 A B P = A sig(AcorA) sig(BcorA) = 1 sig(BcorA) B sig(AcorB) sig(BcorB) sig(AcorB) 1

  17. Data Correlations Variable 1 Variable 2

  18. Data Correlations Variable 1 Variable 2

  19. Data Correlations % Compute sample correlation [r, p] = corrcoef([var1,var2]) Variable 1 Variable 2

  20. Data Correlations % Compute sample correlation [r, p] = corrcoef([var1,var2]) r = 1.0000 0.7051 0.7051 1.0000 p = 1.0000 0.0000 0.0000 1.0000 Variable 1 Variable 2

  21. In Class Exercise 3 • From the class website download the file testdata2.txt (http://www.queensu.ca/neurosci/matlab.php) • This text file contains data from variables arranged in columns • Load the text file into matlab using any method you like (load, import, textread(), fscanf()) • Plot the data points • Calculate the Correlation

  22. In Class Exercise 3Solution %read data [var1, var2] = textread('testdata2.txt','%f%f','headerlines',1) % Compute sample correlation [r] = corrcoef([var1,var2]) % Plot data points figure(1) plot(var1,var2,'ro') Variable 1 Variable 2

  23. Part B: Statistics Toolbox • The Statistics tool box contains a large array of statistical tools. • This lecture will concentrate on some of the most commonly used statistics for research • Parametric and non-parametric comparisons • Curve Fitting

  24. Comparison of Means • A wide variety of mathametical methods exist for determining whether the means of different groups are statistically different • Methods for comparing means can be either parametric (assumes data is normally distributed) or non-parametric (does not assume normal distribution)

  25. Parametric Tests - TTEST [H,P] = ttest2(X,Y) Determines whether the means from matrices X and Y are statistically different. H return a 0 or 1 indicating accept or reject nul hypothesis (that the means are the same) P will return the significance level

  26. Parametric Tests - TTEST [H,P] = ttest2(X,Y) Determines whether the means from matrices X and Y are statistically different. H return a 0 or 1 indicating accept or reject nul hypothesis (that the means are the same) P will return the significance level

  27. Parametric Tests - TTEST Example: For the data from exercise 3 [H,P] = ttest2(var1,var2) >> [H,P] = ttest2(var1,var2) H =1 P = 0.00000000000014877 Variable 1 Variable 2

  28. Non-Parametric Tests Ranksum • The wilcoxin ranksum test assesses whether the means of two groups are statistically different from each other. • This test is non-parametric and should be used when data is not normally distributed • Matlab implements the wilcoxin ranksum test using the ranksum() function ranksum(X,Y) statistically compares the means of two data distributions X and Y

  29. Non-Parametric Tests - RankSum Example: For the data from exercise 3 [P,H] = ranksum(var1,var2) P = 1.1431e-014 H = 1 Variable 1 Variable 2

  30. Curve Fitting • Plotting a line of best fit in Matlab can be performed using either a traditional least squares fit or a robust fitting method. 12 10 8 6 Least squares 4 Robust 2 0 -2 1 2 3 4 5 6 7 8 9 10

  31. Curve Fitting • A least squares linear fit minimizes the square of the distance between every data point and the line of best fit polyfit(X,Y,N) finds the coefficients of a polynomial P(X) of degree N that fits the data Uses least-square minimization N = 1 (linear fit) [P] = polyfit(X,Y,N) returns P, a matrix containing the slope and the x intercept for a linear fit [Y] = polyval(P,X) calculates the Y values for every X point on the line of best fit

  32. Curve Fitting • Example: • Draw a line of best fit using least squares approximation for the data in exercise 2 [var1, var2] = textread('testdata2.txt','%f%f','headerlines',1) P = polyfit(var1,var2,1); Y = polyval(P,var1); close all figure(1) hold on plot(var1,var2,'ro') plot(var1,Y)

  33. Curve Fitting • A least squares linear fit minimizes the square of the distance between every data point and the line of best fit • P = robustfit(X,Y) returns the vector B of the y intercept and slope, obtained by performing robust linear fit

  34. Curve Fitting • Example: • Draw a line of best fit using robust fit approximation for the data in exercise 2 [var1, var2] = textread('testdata2.txt','%f%f','headerlines',1) P = robustfit(var1,var2,1); Y = polyval([P(2),P(1)],var1); close all figure(1) hold on plot(var1,var2,'ro') plot(var1,Y)

  35. Ideas for Next Term? • Additional Statistics, ANOVAs ect.. • Curve fitting with quadratic functions and cubic splines • Algorithms and Data structures • Improving Program Execution Time • Assistance Tutorials for individual programming problems • Any Suggestions?

  36. Getting Help • Help and Documentation • Digital • Accessible Help from the Matlab Start Menu • Updated online help from the Matlab Mathworks website: • http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html • Matlab command prompt function lookup • Built in Demo’s • Websites • Hard Copy • Books, Guides, Reference • The Student Edition of Matlab pub. Mathworks Inc.

More Related