270 likes | 292 Views
Learn MATLAB basics for statistical computing, from variables and matrices to flow control and functions. Develop your skills for data analysis and visualization.
E N D
Statistical Computing in MATLAB AMS597 Spring 2011 Hao Han April 05, 2011
Introduction to MATLAB • The name MATLAB stands for MATrixLABoratory. • Typical uses include: • Math and computation • Algorithm development • Data acquisition • Modeling, simulation, and prototyping • Data analysis, exploration, and visualization • Scientific and engineering graphics • Application development, including graphical user interface (GUI) building • We will focus on the statistical computing in MATLAB.
Desktop Tools &Development Environment • Workspace Browser – View and make changes to the contents of the workspace. • Command Windows – Run MATLAB statements (commands). • Hotkey: Ctrl+c -> break while the status is busy • M-file Editor – Creating, Editing, Debugging and Running Files.
MATLAB Variables • Variable names are case sensitive. Variable names must start with a letter and can be followed by digits and underscores. • MATLAB does not require any type of declarations or dimension statements. When it encounters a new variable name, it automatically creates the variable and allocates the appropriate amount of storage. For example: New_student = 25; To view the matrix assigned to any variable, simply enter the variable name. • Special Variables: • pi value of π • eps smallest incremental number • inf infinity • NaN not a number • realmin the smallest usable positive real number • realmax the largest usable positive real number
MATLAB Matrices • MATLAB treats all variables as rectangular matrices. • Separate the elements of a row with blanks or commas. • Use a semicolon ‘;’ to indicate the end of each row. • Surround the entire list of elements with square brackets ‘[ ]’. Claim a scalar: x = 2; Claim a row vector: r = [1 2 3] r = [1,2,3] Claim a column vector: c = [1;2;3] c = [1 2 3]’ Claim a matrix: a = [1 2 3; 4 5 6; 7 8 9] a = 1 2 3 4 5 6 7 8 9 Subscripts: the element in row i and column j of A is denoted by A(i,j). a(3,2)=8 or a(6)=8
Matrix Manipulations • The Colon Operator: 1:5 is a row vector containing integers from 1 to 5. To obtain non-unit spacing, specify an increment. For example, 100:-7:50 • Extracting a sub-matrix: Sub_matrix = matrix(r1:r2,c1:c2); sub_a= a(2:3,1:2) sub_a= 4 5 7 8 • Replication: b = [1 2; 3 4]; b_rep= repmat(b,1,2) b_rep= 1 2 1 2 3 4 3 4 • Concatenation: c = ones(2,2); c_cat= [c 2*c; 3*c 4*c] c_cat= 1 1 2 2 1 1 2 2 3 3 4 4 3 3 4 4 c_cat= cat(DIM,A,B); • Deleting rows or columns: c_cat(:,2)=[];
Structures and Cell Arrays Structure Cell Array • Way of organizing related data • Create a structure, s, with fields, x, y, and name s.y= 1; s.x = [1 1]; s.name = 'foo'; % or equivalenty s2 = struct('y',1,'x',[1 1],'name','foo'); • Test for equality: % works for any s1, s2 isequal(s1,s2); • Cell arrays can have entries of arbitrary datatype % create 3 by 2 cell array a = cell(3,2); a{1,1} = 1; a{3,1} = 'hello'; a{2,2} = randn(100,100); • Using cell arrays with other datatypescan be tricky % create 2 by 1 cell array a = {[1 2], 3}; y = a{1}; % y is 1 by 2 numeric array ycell =a(1); % is 1 by 1 cell array x = y+1; % allowed xcell = ycell+1; % not allowed onetwothree = [a{1:2}]; % = [1 2 3]
MATLAB Operators • Relational operators: • Less than < • Less than or Equal <= • Great than or Equal >= • Equal to == • Not equal to ~= • Logical operators: • not ~ % highest precedence • and & % equal precedence with or • or | % equal precedence with and • Matrix computations: + - * / ^ A’; % transpose A \ b; % returns x s.t. A*x=b A / b; % returns x s.t.x*A=b • Element wise operators:
MATLAB Functions • MATLAB provides a large number of standard elementary mathematical functions, including abs, sqrt, exp, and sin. • For a list of the elementary mathematical functions, type: help elfun • For a list of more advanced mathematical and matrix functions: helpspecfun helpelmat • Seek help for MATLAB function references, type: helpsomefun or more detailed docsomefun
Flow Control (‘if’ statement) • The general form of the ‘if’ statement is if expression … elseifexpression … else … end • Example 1: if i == j a(i,j) = 2; elseifi >= j a(i,j) = 1; else a(i,j) = 0; end • Example 2: if (common>60)&&(area>60) pass = 1; end
Flow Control (‘switch’ statement) • switchSwitchamong several cases based on expression • The general form of the switchstatement is: switch switch_expr case case_expr1 … case case_expr2 … otherwise … end • Example : x = 2, y = 3; switch x case x==y disp('x and y are equal'); case x>y disp('x is greater than y'); otherwise disp('x is less than y'); end % x is less than y
Flow Control (‘for’ loop) • for Repeat statements a specificnumber of times • The general form of a for statement is for variable=expression … … end • Example 1: for x = 0:0.05:1 fprintf('%3.2f\n',x); end • Example 2: a = zeros(3,4); for i = 1:3 for j = 1:4 a(i,j) = 1/(i+j); end end
Flow Control (‘while’ loop) • while Repeat statements anindefinitenumber of times • The general form of a whilestatement is while expression … … end • Example 1: n = 1; y = zeros(1,10); while n <= 10 y(n) = 2*n/(n+1); n = n+1; end • Example 2: x = 1; while x %execute statements end
Flow Control (‘break’ statement) • breakterminates the execution of for and while loops • In nested loops, break terminates from the innermost loop only • Example: y = 3; for x = 1:10 fprintf('%d\n',x); if (x>y) break; end end % Question: what is the output?
Graphics: 2-D plot • Basic commands: • Example 1 [plot(vector)]: plot(x,'s') plot(x,y,'s') plot(x1, y1,'s1', x2,y2,'s2', …) title('…') xlabel('…') ylabel('…') legend('…','…') x=0:pi/10:2*pi; x=[sin(x)' cos(x)']; figure; plot(x)
Graphics: 2-D plot (cont’d) • Example 2: • Example 3 [plot(vector,matrix)]: x = 0:0.01:2*pi; y = sin(x); z= cos(x); hold on; plot(x,y, 'b'); plot(x,z,'g'); hold off; t=(0:pi/50:2*pi)'; k=0.4:0.1:1; Y=cos(t)*k; plot(t,Y)
Graphics: 2-D plot (cont’d) • plot(x1, y1,’s1’, x2,y2,’s2’, …) t=(0:pi/100:pi)'; y1=sin(t)*[1,-1]; y2=sin(t).*sin(9*t); t3=pi*(0:9)/9; y3=sin(t3).*sin(9*t3); plot(t,y1,'r:',t,y2,'b',t3,y3,'bo') axis([0,pi,-1,1]) • Linetype- : -- -. • Colorb g r c m y k w • Markertype. + * ^ < > v d h o p s x plot(t,y1,'.r',t,y2, 'b+',t3,y3,'ob:')
Subplots >> subplot(2,2,1) >> … >> subplot(2,2,2) >> … >> subplot(2,2,3) >> … >> subplot(2,2,4) >> …
Graphics: 3-D plot • plot3(x,y,z) t=(0:0.02:2)*pi;x=sin(t);y=cos(t);z=cos(2*t); plot3(x,y,z,'b-',x,y,z,'bd'); view([-82,58]); box on; legend('Chain','Gemstone')
Basic Data Analysis • Import/Export data: • Use the system import wizard File -> import data -> find and open files -> finish • Use commands as follows: 1. help load &help save 2. help xlsread&help xlswrite % Reading into a text file fid = fopen(‘filename.txt’,‘r’); X = fscanf(fid,‘%5d’); % or fread fclose(fid); % Writing onto a text file fid = fopen(‘filename.txt’,‘w’); count = fwrite(fid,x); % or fprintf fclose(fid); • Scatter plot • Statistics Toolbox: help stats
Data Preprocessing • Missing values: You should remove NaNsfrom the data before performing statistical computations. • Removing outliers: You can remove outliers or misplaced data points from a data set in much the same manner as NaNs. • Calculate the mean and standard deviation from the data set. • Get the column of points that lies outside the 3*std. (3σ-rule) • Remove these points
Regression and Curve Fitting • The easiest way to find estimated regression coefficients efficiently is by using the MATLAB backslash operator. • Note that we should avoid matrix inversion (from slow to fast…): % Fit X*b=Y xx = x’*x; xy=x’*y; tic; bhat1 = (xx)ˆ(−1)*xy; toc; tic; bhat2 = inv(xx)*xy; toc; tic; bhat3 = xx \ xy; toc; • Other waysuse build-in functions: regress() or glmfit() • Multiple linear regression model: y = b0 + b1x1 + b2x2 + … • Example: Suppose you measure a quantity y at several values of time t. t=[0 .3 .8 1.1 1.6 2.3]'; y=[0.5 0.82 1.14 1.25 1.35 1.40]'; plot(t,y,'o') grid on
Regression Example (cont’d) • Polynomial regression: • There are six equations in three unknowns, represented by the 6-by-3 matrix X = [ones(size(t)) t t.^2] • The solution is found with the backslash operator. a = X\y a =0.53180.9191-0.2387 • Now evaluate the model at regularly spaced points and overlay the original data in a plot. T=(0:0.1:2.5)'; Y=[ones(size(T)) T T.^2]*a; plot(T,Y,'-',t,y,'o') grid on
Regression Example (cont’d) • Linear-in-the-parameters regression, e.g. exponential function: X = [ones(size(t)) exp(-t) t.*exp(-t)]; a = X\y a = 0.1018 0.4844 -0.2847 T=(0:0.1:2.5)'; Y=[ones(size(T)) exp(-T) T.*exp(-T)]*a; plot(T,Y,'-',t,y,'o') grid on