240 likes | 670 Views
Data analysis in MATLAB. Christian Ruff. Why use MATLAB to analyse data?. One single programme can be used for: importing single-subject data from any format re-arranging for multi-subject analyses statistical tests plotting results Errors are less likely
E N D
Data analysis in MATLAB Christian Ruff
Why use MATLAB to analyse data? • One single programme can be used for: • importing single-subject data from any format • re-arranging for multi-subject analyses • statistical tests • plotting results Errors are less likely One single script for analysis and documentation This can even be used by your experimental COGENT-script (online-analysis) Ultimately, MATLAB is **much** more flexible than SPSS or EXCEL, especially for graphs • Nuisances: • some details of SPSS procedures not available (but on the web) • Use not as intuitive as SPSS buttons, but help<functionname> and doc <functionname>
Outline • How to: (1) Import single-subject data from any format (2) Inspect single-subject data for distribution / outliers etc. (3) Re-arrange data for multi-subject analyses (4) Perform statistical tests all as steps in one single script
Outline • How to: (1) Import single-subject data from any format (2) Inspect single-subject data for distribution / outliers etc. (3) Re-arrange data for multi-subject analyses (4) Perform statistical tests all as steps in one single script
(1) Importing data: Reading in files • MATLAB can read in many different types of files, using different functions • These can be listed with help fileformats • Examples are: • xlsread:EXCEL data • dlmread:tab-delimited text (or any other form of delimited text, e.g., whitespace) • csvread: comma-separated numbers • textread: any mixture of text and numbers • importdata:any formatted data as a full file (looks for the most appropriate function to use) • fopen/fread: any formatted data by line, but need extensive user specification of format • help<functionname> and doc <functionname> give instructions and examples • MATLAB can also be used to save data in the corresponding formats (e.g., dlmwrite, csvwrite, fopen/fwrite/fprintf)
(1) Importing data: Types of variables • Data can be stored in files in very different formats (see e.g. different field-types in excel-sheets) • Three elementary formats are: • Strings: characters (such as letters), cannot be (sensibly) manipulated numerically e.g., variable names or condition descriptions example_string = ‘123.456’; • Double: used for numbers, can be numerically manipulated Doubles are not stored element-by-element, but as wholes example_number = 123.456; • Logicals: used for boolean logic, so can take only the value 0 (false) and 1 (true) can be numerically manipulated, but does not make sense often used for indexing example_logical = logical(123.456);
(1) Importing data: Variable conversion • Raw data files often contain mixtures of strings and numbers • Numerical values are often represented as strings in imported data • After importing data into a variable in MATLAB, the format of each variable can be seen by typing whos<variable_name> (<variable_name>), or tested with isnumeric,ischar, or islogical • The Matlab workspace contains an array editor that is similar to Excel • Strings can be converted into doubles by the commands double or str2num, thisturns numbers in “text format” into numbers that you can do computations with e.g. example_number = double(example_string); • Doubles can be converted into strings by the command char or num2str;this makes it possible to include numbers in text that you want to write into a file e.g. example_string = char(example_string);
(1) Importing data: Variable formats • Relevant variable formats include: • Matrices: - contain m( x n x o…) elements, can be accessed by row or column - all elements in a matrix are forced to be in the same format matrices are well suited for storing numbers matrices are not ideal for strings (of different lengths e.g. words) • Cells: - contain m( x n x o…) elements, can only be accessed element-by-element - each element can be of different format and length well suited for storing string variables, and mixtures of variables not ideal for storing only number variables that have to be accessed and manipulated as a group (e.g., by row and column)
(1) Importing data: Variable formats • Relevant variable formats include: • Structures: - contain m( x n x o…) elements that all have several fields - each field in any element can contain any variable (e.g., string, numerical) in any format (e.g., cell, matrices…) - the fields of different elements can easily be combined if they have the same format well suited for different variables that are nevertheless linked (e.g., data from different subjects) not ideal for storing only number variables that have to be accessed and manipulated as a group (e.g., by row and column) easy to combine one field of different elements into a matrix (e.g., different trials) see strucdem
(1) Importing data: Transforming variables • Arrays / cells / structures can easily be converted into each other: • Numerical array to cell: num2cell or mat2cell cell2mat • String array to cell: cellstr char • Structure to array: struct2array struct • Structure to cell: struct2cell cell2struct • Arrays / cells / structures can be appended or combined • Numerical arrays: [123;456] or cat • String array: strvcat or strcat • Cells: cat • Structures: cat If the dimensions of the to-be-combined variables are known, then all of these operations can also be performed simply by indexing (e.g. num3(1,:) = num1; num3(2,:) = num2;)
Outline • How to: (1) Import single-subject data from any format (2) Inspect single-subject data for distribution / outliers etc. (3) Re-arrange data for multi-subject analyses (4) Perform statistical tests all as steps in one single script
(2) Inspecting data: Descriptive statistics • Descriptive statistics: mean, median, min, max, prctile,range, var, std, skewness, kurtosis, cdfplot - many of these also work for data with missing values, by appending “nan”(e.g., nanmean) • Visualisation of distribution: - Histogram: hist, also available with superimposed normal distribution: histfit - Test for normal distribution: - visually with normplot - statistically with lillietest (when testing for normality), kstest (when testing for any distribution) or kstest2 (when testing for identity of distributions of two or more variables) - Scatterplot of two variables: scatter, also available for several variables: plotmatrix - Lineplot of data against one dimension (e.g., time): plot, or two dimensions: plot3 - visual check for outliers: boxplot (or check for impact of outliers with trimmean)
Outline • How to: (1) Import single-subject data from any format (2) Inspect single-subject data for distribution / outliers etc. (3) Re-arrange data for multi-subject analyses (4) Perform statistical tests all as steps in one single script
(3) Transforming data for multi-subject analyses Matrices are by far the most convenient data format for statistical analyses: • Most descriptive-statistics commands work on dimensions of matrices e.g., mean(matrix,1)over rows, mean(matrix,2) over columns, etc. • Matrices can easily be indexed with logicals e.g., rows = (matrix(:,2)==1);data(:,1) = matrix(rows,:); • Condition indices can easily be created as matrices e.g., data(:,[2:3]) = fullfact([2 12]); • Matrices can be easily transformed with • Sort and sortrows to sort data • flipud, fliplr, flipdim, rot90 to flip dimensions • reshape to change dimensions • squeeze to remove dimensions • shiftdim, circshift to shift dimensions
Outline • How to: (1) Import single-subject data from any format (2) Inspect single-subject data for distribution / outliers etc. (3) Re-arrange data for multi-subject analyses (4) Perform statistical tests all as steps in one single script
(4) Statistics: mean comparison The MATLAB statistics toolbox contains functions for many (non-)parametric tests (help stats) These ask for data in different input formats (help <functionname> and doc <functionname> They give out all relevant statistics as variables, and/or as tables (if displayopt = ‘on’) • Comparing several independent measures:anova1, anova2, anovan, manova1, kruskalwallis • Comparing several dependent (or mixed) measures:rmaov1, rmaov2, bwoav2, rmaov31, rmaov32, rmaov33, friedman, epsGG, epsHF (all repeated measures ANOVAs from http://www.mathworks.com/matlabcentral/fileexchange) • Post-hoc contrasts: multcompare, grpstats • Comparing two independent measures: Comparing two dependent variables: ttest2, ranksum ttest, signtest, signrank
(4) Statistics: association/ dimension reduction • Bivariate associations: • correlation: corrcoef • linear regression: regress or robustfit(weighted to minimise impact of outliers) • nonlinear regression (e.g. logistic regression): nlinfit • Multivariate associations: • Canoncorr, manova1, mdscale, classify, cluster • Dimension reduction: • princomp, factoran • Bootstrapping is available: Bootstrp
(4) Statistics: many other useful things • The statistics toolbox contains functions for many statistical distributions (beta, binomial, exponential, gamma, poisson, weibull…): • Fits • Cumulative and probability density functions and their inverses • random number generation • Efficient design of factorial experiments (e.g. Fullfact; randn) • Advanced statistical methods are either implemented (e.g., hidden Markov Models, decision trees) or can be found on the web: • http://www.statsci.org/matlab • http://www.mathworks.com/matlabcentral/fileexchange • If you want to know more, look at the excellent MATLAB documentation at: • http://www.mathworks.com/access/helpdesk/help/techdoc/