1 / 43

CS 498 Probability & Statistics

CS 498 Probability & Statistics. Lecture 01. Course logistics. Instructor David Forsyth Email: daf@illinois.edu SC3310 (best way to reach) TA: Zicheng Liao Email: liao17@illinois.edu. Class schedule MWF 11:00-11:50 am 1214 Siebel Center. Office hours TBD Evaluation

risa-arnold
Download Presentation

CS 498 Probability & Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 498 Probability & Statistics Lecture 01

  2. Course logistics • Instructor • David Forsyth • Email: daf@illinois.edu • SC3310 (best way to reach) • TA: Zicheng Liao • Email: liao17@illinois.edu • Class schedule • MWF 11:00-11:50 am • 1214 Siebel Center. • Office hours • TBD • Evaluation • Homework, midterm, final http://luthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013/498-home.html

  3. Where to start?

  4. About matlab • “The language of technical computing” • The language of MATRIX • Easy interface (C-like), simple syntax, and well-documented. • Interpreted rather than compiled • Cross-platform • Cross-language (matlab <==> C/C++) • Free student license!

  5. Install matlab • Half way to success • Step 1: go to http://webstore.illinois.edu/home/ • Step 2: follow instructions.. And you’re all set

  6. Install matlab • Half way to success • Step 1: go to http://webstore.illinois.edu/home/ • Step 2: follow instructions.. And you’re most likely to run into sorts of problem • Check out a matlab package (I am using R2012b) • License.dat, installation file, and installation key. • Strictly follow http://dl.webstore.illinois.edu/docs/ii/matlabconc.htm • Tricky part: connect to the license manager on server • Physically connected to campus network • VPN (http://dl.webstore.illinois.edu/docs/ii/vpn.htm) • Follow instructions.. And you’re all set.

  7. You’ve got here!

  8. Create a scalar variable, vector, matrix >> a = 1; b = 2; %create variable a=1, b=2 >> c = [1 0 1] %create a row vector c = 1 0 1 >> c = [1, 0, 1] %comma is equivalent to ‘ ‘ c = 1 0 1 >> c = [1; 0; 1] %create a column vector with semicolons c = 1 0 1 >> c = [1; 0 1] %rows must match in dimension Error using ‘vertcat’ CAT arguments dimensions are not consistent.

  9. Create a scalar variable, vector, matrix >> d = [1 -2 0; 0 1 2] %create a 2x3 matrix d = 1 -2 0 0 1 2 >> e = zeros(3,3) %create a 3x3 zero matrix e = 0 0 0 0 0 0 0 0 0 >> f = ones(3,3) %create a 3x3 matrix with all 1 f = 1 1 1 1 1 1 1 1 1 >> g = rand(2) %create a 2x2 matrix with random values g = 0.6557 0.8491 0.0357 0.9340

  10. Indexing >> a = [1 2 3 4 5 6 7 8 9 10]; >> a = 1:10 %quick way to create a sequence a = 1 2 3 4 5 6 7 8 9 10 >> a(3) %retrieve the 3rd elm, 1-based indexing, C is 0-based ans = 3 >> a(end) %retrieve the last element ans = 10 >> a(2:6) %retrieve a sub-sequence ans = 2 3 4 5 6 >> a(:)’ %colon retrieves the wholevector ans = 1 2 3 4 5 6 7 8 9 10

  11. Indexing >> a = rand(3,3) a = 0.6555 0.0318 0.0971 0.1712 0.2769 0.8235 0.7060 0.0462 0.6948 >> a(2,3) %retrieve element at row 2 column 3 ans = 0.8235 >> a(8) %column-major indexing; C is row-major ans = 0.8235 >> a(1,:) %retrieve the whole first row ans = 0.6555 0.0318 0.0971 >> a(2:3,1:2) %retrieve a sub-matrix ans = 0.1712 0.2769 0.7060 0.0462 <c2> <c3> <c1> <r1> <r2> <r3>

  12. Basic operators: + - * / >> a + b % a = 1, b = 2 ans = 3 >> c - a % vector - scalar ans = 0 -1 0 >> a - c % scalar – vector, a = 1, c = [1 0 1] ans = 0 1 0 >> c * b % =c * b, vector-scalar multiplication; commutative ans = 2 0 2 >> c / b % vectordividedbyscalar, c = [1 0 1], b = 2 ans = 0.5000 0 0.5000 >> b / c % scalar divided by vector Error using / Matrix dimensions must agree.

  13. Basic operators: + - * / .* ./ >> c + d % vector plus vector c = [1 0 1], d = [2 2 -1] ans = 3 2 0 >> c + 1:5 % a 1x3 vector plus a 1x5 vector Error using + Matrix dimensions must agree. >> e = [2; 2; -1] % a 3x1 columnvector e = 2 2 -1 >> c + e % a rowvector plus a columnvector Error using + Matrix dimensions must agree. >> e’ % transpose of e ans = 2 2 -1 >> c + e’ ans = 3 2 0

  14. Basic operators: + - * / .* ./ >> c*d %c = [1 0 1], d = [2 2 -1] Error using * Inner matrix dimensions must agree. >>c*d’ %dot product ans = 1 >> c.*d %element-wiseoperation, [1*2 0*2 1*(-1)] ans = 2 0 -1 >> c./d ans = 0.5 0 -1.0 >> e = 1:5; >> c./e Error using ./ Matrix dimensions must agree.

  15. Basic operators: + - * / .* ./ %---implement dot product of two vectors---% >> c = [1 0 1]; d = [2 2 - 1]; %C-style impl: always try to avoid for loops if possible >> ans = 0; >> fori = 1:length(c) ans = ans + c(i)*d(i); end >> disp(ans); 1 %matlab way of doing it >> c*d’; %already shown >> dot(c, d); %matlab built-in function >> sum(c.*d) %your way of doing it explicitly ans = 1

  16. Basic operators: ^ .^ >> 5^2 % 5 to the power of 2 ans = 25 >> d^2 % d = [2 2 - 1] Error using ^ Inputs must be a scalar and a square matrix. >> d.^2 ans = 4 4 1 >> 2.^d % scalar .^ vector ans = 4.0000 4.0000 0.5000 >> d.^c% vector .^ vector, c = [1 0 1] ans = 2 1 -1

  17. Logical subscripting • Logical operators: &(and), |(or), xor, a>b, etc >> if(2 > 3) || (1&1) disp('true'); else disp('false'); end true >> a = 1:4; %a=[1 2 3 4] >> res = a>2 res = 0 0 1 1%logical type

  18. Logical subscripting • Logical operators: &(and), |(or), xor, a>b, etc >> a = 1:4; %a=[1 2 3 4] >> res = a>2 res = 0 0 1 1%logical type %--- continue from here---% >> a(a>2) %=a(logical([0 0 1 1])), not a([0 0 1 1]) ans = 3 4 %% >> a = randn(1, 10000); % 10000 samples from normal distribution >> sum(a<1 & a>-1)/10000 %guess an answer.. ans = 0.6732 %1-sigma of normal distribution

  19. Concatenate >> a = [1 2]; >> a = [a 3] %concatenate a scalar a = 1 2 3 >> a = [a [3 2 1]] %concatenate a with a vector a = 1 2 3 3 2 1 >> b = 1:6; a = [a; b] %concatenate in the vertical dim a = 1 2 3 3 2 1 1 2 3 4 5 6 >> a = [a; 1:7] %dimension must match Error using vertcat CAT arguments dimensions are not consistent.

  20. Delete >> a a = 1 2 3 3 2 1 1 2 3 4 5 6 >> a(1,:) = [] %delete the first row a = 1 2 3 4 5 6%matrix size changed >> a(2:3) = [] %delete two elements in a vector a = 1 4 5 6 >> a(2) = [] %delete one more element a = 1 5 6

  21. Online resources • A quick tutorial • http://web.eecs.umich.edu/~aey/eecs451/matlab.pdf • Get started with matlab • http://www.mathworks.com/help/pdf_doc/matlab/getstart.pdf • Matlab online document (everything is here!) • http://www.mathworks.com/help/matlab/ • >> doc func_name • >> doc; search with key words

  22. First tools for looking at Data • It’s all about data • “what’s going on here?” • Descriptive statistics Look into data (make sense of what’s going on) Not working.. Bingo! Problem + Data Algorithm Re-design algorithm

  23. Datasets • School dataset http://lib.stat.cmu.edu/DASL/Datafiles/PopularKids.html

  24. Bar charts • Count of categorical data matlab\plotschooldata.m (Walk through the whole process)

  25. Datasets • Pizza size dataset http://www.amstat.org/publications/jse/jse_data_archive.htm

  26. Histogram • Count of continuous data in even (or uneven) intervals matlab\plotpizzasize.m

  27. Class-conditional histogram • Histogram of a certain class matlab\plotpizzasize_condhist.m

  28. Series data Number of burglaries each month in Hyde Park http://lib.stat.cmu.edu/DASL/Datafiles/timeseriesdat.html

  29. Plot series data matlab\plotburglary.m

  30. Summarizing 1D data • Mean • Standard deviation • Variance • Median • Percentile • Interquartile range Net worth of people you meet in a bar

  31. Mean • Mean: >> a = [1 2 3 5 6]; >> mean(a) ma = 3.4 >> a = [1 2 3; 4 5 6]; >> mean(a) %by default, take mean per-column ans = 2.5000 3.5000 4.5000 >>mean(a, 2) %take mean in the 2nd dimension (row) ans = 2 5

  32. Median • Median: The data half way along the sorted data points >> a = [1 2 3 5 6]; >> median(a) ma = 3 >> a = [a 6]; %a = [1 2 3 5 6 6] >> median(a) %take the mean of the two middle points ans = 4 >> median([1 2 2 2 2 2 5 10 15 100]) %biased measure ans = 2

  33. Std. and variance • Standard deviation: • Variance: >> a = [1 2 3 5 6]; >> std(a) ans = 2.0736 %not exactly by the formula >> std(a,1) %based on the above formula ans = 1.8547 >> var(a,1) %variance ans = 3.4400 >>std(a,1)^2 %variance = std^2 ans = 3.4400 >> mean((a-mean(a)).*(a-mean(a))) %what var(a) does ans = 3.4400

  34. Percentile and interquartile range • Percentile: The k-th percentile is the value such that of the data is less than or equal to. • Interquartile range: >> a = rand(10000,1); >> prctile(a, 20) %20th-percentile of 0-1 randomsamples ans = 0.1991 %as expected >> prctile(a, 80) %80th-percentile: ~0.8 ans = 0.7978 >> iqr(a) %interquartile range of a: ~0.5 ans = 0.4984 >> prctile(a, 75) - prctile(a, 25) %sanity check ans = 0.4984

  35. Summarizing 1D data >> networths = [100360, 109770, 96860, 97860, 108930, 124330, 101300,… 112710,106740, 120170]; >> m = mean(networths) m = 107903 >> sd = std(networths) sd = 9.2654e+03 >> v = var(networths) v = 8.5848e+07 Net worth of people you meet in a bar

  36. Summarizing 1D data >> bnetworths = [networths, 1e9]; >> bm = mean(bnetworths) bm= 9.1007e+07 >> bsd = std(bnetworths) bsd= 3.0148e+08 >> bv = var(bnetworths) bv = 9.0889e+16 A billionaire comes in Sensitive to outliers!

  37. Summarizing 1D data >> md = median(networths) md = 107835 >> bmd = median(bnetworths) bmd= 108930 Networths with a billionaire

  38. Summarizing 1D data >> pcts = prctile(networths, [25 50 75]) pcts = 100360 107835 112710 >> bpcts = prctile(bnetworths, [25 50 75]) bpcts = 100595 108930 118305 >> interqtl = iqr(networths) interqtl = 12350 >> binterqtl = iqr(bnetworths) binterqtl = 17710 Networths with a billionaire

  39. The pizzasize puzzle • Understand what’s going on • Look at other labels: type of crust and type of topping • Cannot compare many histogram together • Need a more compact plot

  40. boxplot • More compact way of summarizing data than histogram >> boxplot([dsizes esizes], 'whisker', 1.5);

  41. Boxplot with type of crust • EagleBoys has tighter control over size • Dominos ThinNCrispy is unusual • shrinking during baking • control portion by weight • mistakes by chef (?)

  42. Boxplot with crust and topping Dominos EagleBoys

  43. Wrap-up • “A matlab start is half way to success” • It’s all about data. • Plot data with bar chart, histogram, series plot and box plot. • Summarize 1D data with mean, std, variance, median, percentile and interquartile range.

More Related