510 likes | 641 Views
Statistics 3. F71SC3. Contact Times (Summer Term 2008) Monday, 10.15 -11.15, Lecture in LT1 Tuesday, 1.15 - 4.15, Maple Practical, in SR G12/13 Thursday, 12.15 – 1.15 and 2.15- 4.15, Data Analysis Practical, in SR G12/13 Friday, 10.15 - 11.15, Lecture in LT1
E N D
Statistics 3 F71SC3
Contact Times (Summer Term 2008) • Monday, 10.15 -11.15, Lecture in LT1 • Tuesday, 1.15 - 4.15, Maple Practical, in SR G12/13 • Thursday, 12.15 – 1.15 and 2.15- 4.15, Data Analysis Practical, in SR G12/13 • Friday, 10.15 - 11.15, Lecture in LT1 • There are three practical groups. • All students should, therefore, attend 2 lectures and 2 one hour practicals each week. • AMS: Group 1 1.15 on Tues and 3.15 on Thurs • Group 2 3.15 on Tues and 2.15 on Thurs
The web pages for this module on statistical computing and computer algebra are at http://www.macs.hw.ac.uk/~jphillips/stats3 and http://www.ma.hw.ac.uk/~anatolyk/f71sc3/
Two projects for John Phillips on Data Analysis. These will be given out in lectures and students work at them on their own. Three class tests, given in Tuesday labs, on Anatoly Konechny’s Maple work. No Exam!
Statistical Computing Using R
Using a statistical package is essential when you are faced with analysing a large data set. It would take a long time to do the calculations and diagrams by hand. There are many packages that can be used, such as MINITAB and Microsoft Excel, but the one covered in this course is called R.
Example : A survey produced the following 200 results of individuals salaries: 23454 20622 19314 19882 22467 16611 17790 17613 19892 17397 22340 17731 20058 22083 18055 18212 24114 20396 20394 20521 17643 19692 24214 16876 22545 17608 24631 21333 21797 20734 17836 20930 16709 18319 19097 20512 17693 23130 20316 19209 21220 17315 22102 21472 19974 22764 18183 20918 19358 20685 21261 21394 22333 21732 19734 19280 18696 21055 25762 18258 20255 19762 17016 20326 19479 18699 18686 17483 20843 20395 19734 19911 18990 19220 17313 21357 17514 17455 21932 21523 21606 23169 21461 19624 18931 18785 20225 25406 21376 20141 18541 23768 19024 21353 19802 19216 19442 19450 19385 20995 21162 21399 18805 18217 17847 19992 17105 14488 20522 21032 19191 20268 19996 17428 21877 19433 20625 19453 19081 21502 21890 2184420116 17601 22296 21751 . 19513 19300 21031 19784 19767 16619 24021 22686 17818 22233 17774 20918 17180 19279 21029 19983 19703 23421 18140 20845 22054 17858 21523 20041 19968 20537 17755 19872 19005 19835 19717 20134 21757 19093 19692 21445 19219 19669 20769 22049 20561 20810 22525 21458 21618 16973 19093 18551 20841 17032 20549 18219 19224 19999 21367 22332 19235 22697 23620 22420 16811 20250 21124 19267 20400 18743 22448 20443 19634 21185 18448 21236 24047 20621
Graphical Representation • Histogram • Stem-and-Leaf • Boxplot • Frequency Polygon
Remember, histograms are formed by taking class intervals, for example: Salary(£) Frequency 14 000 - under 16 000 4 16 000 - under 18 000 30 18 000 - under 20 000 69 20 000 - under 22 000 70 22 000 - under 24 000 21 24 000 - under 26 000 6
> stem(salaries) The decimal point is 3 digit(s) to the right of the | 14 | 5 15 | 16 | 66789 17 | 0001233445556666778888889 18 | 112222334567777889 19 | 000111122222223333344445555667777777888889999 20 | 00000001111233333444445555566667788888999 21 | 00001122223344444445555556678888999 22 | 01112333344555778 23 | 124568 24 | 00126 25 | 48
> mean(salaries) [1] 20123.01
> mean(salaries) [1] 20123.01 > median(salaries)
> mean(salaries) [1] 20123.01 > median(salaries) [1] 20020
> mean(salaries) [1] 20123.01 > median(salaries) [1] 20020 > sd(salaries)
> mean(salaries) [1] 20123.01 > median(salaries) [1] 20020 > sd(salaries) [1] 1878.09
x y 5 6.2 7 9.3 3 6.0 4 6.1 11 12.8 7 8.1 6 8.1 15 16.7 20 23.4 3 4.7 8 10.5 7 7.7 12 14.0 15 16.6 22 24.2
> plot(x,y) > abline(lm(y~x))
> television=scan( ) 1: 1 1 2 2 1 4 3 3 5 5 1 1 1 2 1 3 3 3 3 3 4 1 2 1 3 4 27: Read 26 items
> television=scan( ) 1: 1 1 2 2 1 4 3 3 5 5 1 1 1 2 1 3 3 3 3 3 4 1 2 1 3 4 27: Read 26 items > barplot(table(television))
> television.counts=table(television) > names(television.counts)=c("BBC1","BBC2", "ITV1","CH4","Other") >pie(television.counts,col=c("purple","green2", "cyan","yellow","white"))
Installing R PC Caledonia
Simply double click on the “Installer” then select the “R” icon. This will produce a short-cut to R which should be available every time you log on.
Installing R On your own pc
Download free from the Comprehensive R Archive Network http://cran.r-project.org
R screen Type Command here…appears in red
R screen Arrow keys on keyboard are very useful. Pressing repeatedly allows you to retrieve previous commands entered.
Many keys and function names are very much as you would expect. > 6+4 [1] 10 > 18*3 [1] 54 > log(100) [1] 4.60517 > pi [1] 3.141593 > sin(pi) [1] 1.224606e-16
Many keys and function names are very much as you would expect. > cos(pi) [1] -1 > x=7 > y=10 > x+y [1] 17 > sqrt(x*x+7*x*y-2*y*y) [1] 18.41195 >
Binomial Distribution It takes ages to calculate a series of probabilities
If n= 5, a=0.2 and x runs from 0 to 5 5! p(0)= 0.20 0.85 0! 5! P(0) = 0.32768
If n= 5, a=0.2 and x runs from 0 to 5 5! p(1)= 0.21 0.84 1! 4! P(1) = 0.4096
If n= 5, a=0.2 and x runs from 0 to 5 5! p(2)= 0.22 0.83 2! 3! P(2) = 0.2048
If n= 5, a=0.2 and x runs from 0 to 5 5! p(2)= 0.22 0.83 2! 3! P(2) = 0.2048 …………and so on
Using R > dbinom(0:5,5,0.2) [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032
Using R > dbinom(0:5,5,0.2) [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032 > pf=dbinom(0:5,5,0.2) > pf [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032 >
Using R > pf [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032 > barplot(pf) >