340 likes | 532 Views
Quantitative Methods of Data Analysis. Bill Menke, Instructor. Natalia Zakharova, TA. Lecture 2 MatLab Tutorial and Issues associated with Coding. MatLab Fundamentals. Most important Data Types. Numerical: Scalars – single value Vectors – Column or row of values
E N D
Quantitative Methods of Data Analysis Bill Menke, Instructor Natalia Zakharova, TA
Most important Data Types Numerical: Scalars – single value Vectors – Column or row of values Matrices – two-dimensional tables of values Text: Character string
Scalars A number you enter a = 1.265; A predefined number b = pi; The result of a calculation c = a*b;
Vectors 1.4 2.3 MatLab can manipulate both column-vectors and row-vectors But my advice to you is only use column-vectors Because its so easy to introduce a bug by doing an operation on one that should have been done on the other. Use the transform operator ‘ to immediately convert any row-vector that you must create into a column-vector 0.1 9.1, 7.1, 4.2, 8.9
Transform Operator Swap rows and columns of an array, so that Standard mathematical notation: aT MatLab notation: a’ 1 2 3 4 becomes [ 1, 2, 3, 4 ] (and vice versa)
Vector Note immediate conversion to a column-vector A vector you enter a = [1.88, 7.22, 5.31, 7,53]’; Result of a calculation b = 2 * a; The result of a function call c = sort(a);
Matrix That’s the matrix 1 4 7 2 5 8 3 6 9 by the way … A matrix you enter A = [ [1,2,3]', [4,5,6]', [7,8,9]' ]; Result of a calculation B = 2 * A; The result of a function call C = zeros(3,3);
Character strings You type in a quoted sequence of characters: s = ‘hi there’; Occasionally, the result of a function call: capS = upper(s); That’s ‘HI THERE’, by the way …
arithmetic a = 2; a scalar b = 2; a scalar c = [1, 2, 3]’; a column-vector d = [2, 3, 4]’; another column-vector M = [ [1,0,0]', [0,2,0]', [0,0,3]' ]; e = a*b; a scalar f = c’*d; the dot-product, a scalar g = M*d; a column-vector h = d’*M*d; a scalar 1 0 0 0 2 0 0 0 3 Normal rules of linear algebra apply, which means that the type of the result depends critically on what’s on the r.h.s. – and on its order! Lot’s of room for bugs here!
1 4 7 2 5 8 3 6 9 1 4 7 2 5 10 3 6 9 Element access Suppose A = [ [1,2,3]', [4,5,6]', [7,8,9]' ]; Then A(2,3) is Arow=2,col=3 = 8 b = A(2,3); sets b to 8 A(2,3) = 10; resets A(2,3) to 10 Then A(:,2) is the second column of A b = A(:,2); And A(3,:) is the third row of A c = A(3,:); c=[3, 6, 9 ]; but we agreed, no row vectors d = A(3,:)’; 4 5 6 3 6 9
1 4 7 2 5 8 3 6 9 More on : Suppose A = [ [1,2,3]', [4,5,6]', [7,8,9]' ]; Then A(1:2,1:2) extracts a range of columns Note that a quick way to make a vector with regularly-spaced elements is: dx = 0.01; N=100; t = dx*[1:N]’; 1 4 2 5 0.01 0.02 … 0.99 1.00
Logical functions MatLab assign TRUE the value 1 and FALSE the value 0, so ( 1 > 2 ) equals 0 ( 1 < 2 ) equals 1 a = [1, 2, 3, 4, 5, 4, 3, 2, 1]’; b = (a>=4); [0, 0, 0, 1, 1, 1, 0, 0, 0]’; sum( (a>=4) ); is the number of elements in the vector a that are equal to or greater than 4
Logical tests Blocks of MatLab code that are executed only when a test is true. One handy use is turning on or off bits of code intended primarily for debugging Here its gets plotted Here it doesn’t doplotone=1; if (doplotone) plot(t,d); end doplotone=0; if (doplotone) plot(t,d); end
To Loop or Not to Loop a=[1, 2, 3, 4, 3, 2, 1]’; b=[3, 2, 1, 0, 1, 2, 3]’; N=length(a); Dot product using loop c = 0; for i = 1:N c = c + a(i)*b(i); end Dot product using MatLab syntax c = a*b;
You should avoid loops except in cases whereNo MatLab syntax is available to provide the functionality in a simpler wayAvailable MatLab syntax is so inscrutable that a loop more clearly communicates your intent
A Tutorialusing the Neuse River Hydrograph Rain falls and the river rises, the discharge quickly increases After the rain, the river falls, the discharge slowly decreases So, is the river more often falling than rising ? rain time discharge time
What would constitute an appropriate analysis ? Find, for the 11 year period, the percent of days that the discharge is increasing*, compare it to 50%. Make a histogram of the rate of increase and decrease of discharge and see whether it is centered around zero or some other number. * Rising today if today’s discharge minus yesterday’s discharge is positive.
Steps Import the Neuse hydrograph data Convert units what we’re most familiar with Plot discharge vs time, examine it for errors Compute discharge rate (today minus yesterday) Plot rate vs time, examine it for errors Count up % of days rate is positive Output the % of days Compute histogram of rates and plot it Tricks: work first with a subset of the data
MatLab Web Site is one place that your can get a description of syntax, functions, etc.
Example 1: the LENGTH command Can be very useful in finding exactly what you want if you’ve only found something close to what you want!
. . .(two more pages below) Example 2: the SUM command Some commands have long, complicated explanations. But that’s because they can be applied to very complicated data objects. Their application to a vector is usually short and sweet.
Advice #1 Think about what you want to do before starting to type in code! Block out on a piece of scratch paper the necessary steps Without some forethought, you can code for a hour, and then realize that what you’re doing makes no sense at all.
Advice #2 Sure, cannibalize a program to make a new one … But keep a copy of the old one … And make sure the names are sufficiently different that you won’t confuse the two ,,,
Advice #3 Be consistent in the use of variable names amin, bmin, cmin, minx, miny, minz Don’t use variable names that can be too easily confused, e.g xmin and minx. (Especially important because it can interact disastrously with MatLab automatic creation of variables. A misspelled variable becomes a new variable). guaranteed to cause trouble
Advice #4 Build code in small section, and test each section thoroughly before going in to the next. Make lots of plots.
Advice #5 Test code on smallish simple datasets before running it on a large complicated dataset Build test datasets with known properties. Test whether your code gives the right answer!
Advice #6 Don’t be too clever! Inscrutable code is very prone to error.
Advice #7 use comments to communicate the BIG PICTURE Which set of comments gives you the most sense of what’s going on? % c is the dot product of a and b c = 0; for i = 1:N c = c + a(i)*b(i); end % set c to zero c = 0; % loop from one to N for i = 1:N % add a times b to c c = c + a(i)*b(i); % end of the loop end
Advice #8 BUGS – DON’T MAKE THEM (an ounce of prevention is worth a pound of cure) Practices that reduce the likelihood of bugs are almost always worthwhile, even though they may seem to slow you down a bit … They save time in the long run, since you will spend much less time debugging … By the way, cutting-and-pasting code, especially when it must them be modified by changing variable names, is a frequent source of bugs, even though its so tempting …