1 / 40

R Installation

R Installation. R is an open source software package for statistical data analysis. R Installation . R d ownload : http://www.r-project.org / Germany, Stefan Drees Bonn: http://cran.r-mirror.de / This is the main program !

lovie
Download Presentation

R Installation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R Installation R is an open sourcesoftwarepackageforstatisticaldataanalysis

  2. R Installation • R download: http://www.r-project.org/ • Germany, Stefan Drees Bonn: http://cran.r-mirror.de/ • This isthemainprogram! • RStudio (Auxiliaryprogramforediting R files): http://www.rstudio.com/ide/download/ • Installtheprograms (R shouldbetherealready).

  3. Working withthe R command prompt (withoutRStudio) • Practicalsession

  4. Working withRStudio • Start RStudio • „File -> New -> R Script“ Letsyouedit a R commandor R script (= smallprogramme = severalconsecutivecommands)

  5. Working withRStudio • Select • Files • Plots • Packages (foradvancedanalyses) • Help New R files The command prompt

  6. Working withRStudio • Commandsandprogramscanbestored in R files • Execute onecommandline: • Ctrl+Enteror • Button „Run“ • Execute severallines: • Mark linesanduse „Ctrl+Enter“ or „Run“ button

  7. WhatisStatistics?

  8. Whatisstatistics? • Statisticsis a meanstoconnectempiricalknowledgeandtheoryandisconstitutedasfollows: • Data representation (Empirics) • Methodsfordescription, analysis, andinterpretationofdata, in ordertoallowpredictions, conclusionsanddecisions(Statistical Theory)

  9. Whatisstatistics? • DescriptiveStatistics • Probabilitytheory • Test theory

  10. DescriptiveStatistics

  11. DescriptiveStatistics • Basic concepts: • Population: Collectionofobjectsforwhich a conclusionshallbemade (canbe human beingsbut also a collectionofatomswhenapplied in physics) • Sample: a representativepart/sub-set ofthepopulation • Random sample: elementsofthepopulationdrawnrandomlyandindependentlyofeachother • Example: „Mietspiegel“ (= statisticsofrents) forthecityof Bonn • Population: all rooms, flats etc. forrent in Bonn (toomanytoinvestigate all) • Sample: selectedpart; all flatsfromPoppelsdorf • Random sample: Investigation of n = 100, 200,… randomobjectsfrom Bonn

  12. Attributes / traits Values qualitative quantitative Descriptivestatistics patients, bloodsamples, DNA samples, houses, atoms Observationalobjects bloodpressure, weight, age, bloodgroup, numberofsiblings, maritalstatus, rent Blood group, maritalstatus discrete Numberofsiblings continuous bloodpressure, weight, age, rent

  13. Descriptivestatistics • Scaling: • Nominal scale: attributevaluesthatare not directlycomparable (sex, subjectofstudies, countryoforigin) (qualitative) • Ordinalscale: attributevaluesthathave a „natural“ order (grades, fontsizes: tiny-small-medium-large-huge) • Intervalscale: differencebetweenattributevaluesisinterpretable (temperature in °C) (quantitative) • Tobedistinguished: • Discreteattributes: Attribute valuescanbecounted • Continuousattributes: All real numbers, orat least all numbersfrom an interval, arepossible

  14. Descriptivestatistics • Frequencies: • Absolute frequencyni: • Numberofobersvationswithattributevaluei (counts) • Relative frequency hi: • Portion ofelementswithattributevaluei • Tobecomputedas absolute frequencydevidedby total numberofobjectsN: ni/ N • Relative frequenciesliebetween 0 and 1 • Relative frequencieshavetoaddupto 1 (<- canbeusedto check computation)

  15. tally sheet value absolute frequencyni relative frequency hi Bonn Köln Bonn Köln 0.34 0.39 1 0 17 78 0.38 0.38 2 A1 19 76 0.12 0.10 3 A2 6 20 0.10 0.09 4 B 5 18 0.04 0.03 5 A1B 2 6 0.02 0.01 6 A2B 1 2 0.00 0.00 7 other 0 0 1.00 200 Descriptivestatistics AB0 bloodgroup N = 50 1.00

  16. Descriptivestatistics • Frequencies: • Cumulativefrequency: • Sumof all frequenciesupto a givenvaluei. • Denotedasfor absolute frequenciesanddenotedasifor relative frequencies • Oftenusedwhenvaluesaresubdividedintoclasses • Classification: • Arrangement ofattributevaluesintodisjointgroups, so called „classes“ • Classesaredisjoint, i.e. non-overlapping, andneighbouringintervalsofattributevalues, whicharedefinedby a lowerand an upperbound. Neighbouringvaluesimpliesthateachvaluebelongsto a classanddoes not lieoutiside(completenessoftheclassification).

  17. 150 200 height [cm] classlimits: • (160; 170] contains all values, that are > 160 but  170. 150 160 170 180 190 200 height [cm] ] ( ] ( ( ] ] ( ( ] ( Descriptivestatistics height • complete • disjoint (eachvaluebelongstoonlyoneclass) classification:

  18. Class number i Class limits (ai-1; ai] Tally sheet frequency Cumulative frequency absolute ni relative hi absolute Ni relative Hi 1  150 0 0.00 0 0.00 2 (150; 160] 5 0.05 5 0.05 3 (160; 170] 30 0.30 35 0.35 4 (170; 180] 35 0.35 70 0.70 5 (180; 190] 25 0.25 95 0.95 6 (190; 200] 5 0.05 100 1.00 7 > 200 0 0.00 100 1.00 N=100 1,00 Descriptivestatistics height [cm]

  19. Descriptivestatistics Graphicalrepresentation

  20. Descriptivestatistics • Piechart (R function: pie() ) • Shows absolute frequencies • Example: bloodgroups

  21. Descriptivestatistics • Bar chart (R function: barplot() ) • Shows relative frequencies • Example: bloodgroups

  22. Frequencies Cumulativefrequencies Number of children Tally sheet absoluteni relative hi 5 0.10 1 0 5 0.10 0.50 25 2 1 20 0.40 40 0.80 3 2 15 0.30 45 4 3 0.90 5 0.10 48 0.96 5 4 3 0.06 50 1.00 6 >4 2 0.04 1.00 N = 50 Descriptivestatistics • Representationofcumulativefrequencieswithempiricaldistributionfunction F • Discretetrait: NumberofChildren relative Hi absoluteNi

  23. F H h i i 1.0 1.0 0.8 0.8 hi 0.6 0.6 0.4 0.4 0.2 0.2 hi 0.0 0.0 0 4 >4 2 1 3 0 1 2 3 4 >4 F: Empiricaldistributionfunction Sincetheattributeis quantitative discrete, weobtain a stepfunction Descriptivestatistics Number of children Bar chart

  24. Descriptivestatistics • Histogramms (R function: hist() ) • Construction: • Data issubdevidedintoclasses • Surfacearea ofcolumnsisproportional totherespectivefrequencies • Columns areneighbouringsinceclassesareneighbouring

  25. 1 0,8 0,6 0,4 0,2 0 height[cm] 150 160 170 180 190 200 Descriptivestatistics Example: Height [cm] hi Histogram

  26. f empiricaldensityfunction f 0,8 0,6 F • • 0,4 0,8 • empiricaldistributionfunction F (forcontinuoustrait) 0,6 0,2 0,4 • height[cm] 0 0,2 200 200 150 150 160 160 170 170 180 180 190 190 • • height[cm] 0 Descriptivestatistics

  27. f empirical density function f 0,8 hi 0,6 F • • 0,4 hi 0,8 • empirical distribution function F 0,6 0,2 0,4 • height[cm] 0 0,2 200 200 150 150 160 160 170 170 180 180 190 190 • • height[cm] 0 Descriptivestatistics

  28. DescriptiveStatistics • Note: Slides 23 and 26 bothshowempiricaldistributionfunctions. In thefirstcase, weobtain a stepfunctionsincethetraitunderinvestigationisdiscrete.

  29. Descriptivestatistics Measuresofcentraltendency, dispersionandspread

  30. Descriptivestatistics • Measuresofcentraltendency: • A numbertocharacterizethe „center“ ofthedata • Most important: • Mean • Median

  31. sample sample ranks ranks x(1)=3 x(1)=3 x1=5 x1=5 x(2)=4 x(2)=4 x2 =9 x2 =9 x(3)=5 x(3)=5 x3=3 x3=3 x(4)=6 x(4)=6 x4=8 x4=8 x(5)=8 x(5)=7 x5=19 x5=19 x(6)=9 x(6)=8 x6=4 x6=4 x(7)=19 x(7)=9 x7=6 x7=6 x(8)=19 x8=7 Descriptivestatistics • Median (R function: median() ) • Sample: Order accordingto: Ordered sample: • Median n = 8 even: n = 7 odd:

  32. Descriptivestatistics • Mean (R function: mean() ) • Sample: • Sample size: n • Mean

  33. Descriptivestatistics • Comparison of median andmean: • Bothsampleshave median 2500 • and arethemeanvalues • Meancanstronglybeinfluencedby a singlevalue • Median ismore robust against extreme values („outliers“) Nevertheless, themeanismoreoftenused in practicesinceithasotherdesirableproperties (seelater).

  34. outlier x Descriptivestatistics How to treat outliers? 1) Discard No! 2) Check value and correct Yes!

  35. sample A sample B x x Descriptivestatistics • Measuretheamountofvariationofthedata!  The mean (or median) is not sufficenttodescribe a sample

  36. Descriptivestatistics • Measuresofdispersionandspread: • Numbers tocharacterizetheamountvariationaroundthecenter (= mean) • Most important: • Minimum, maximum, range (dispersion) • Empiricalvariance (spread) • Empiricalstandarddeviation (spread)

  37. ranks sample x(1)=3 x1=5 x(2)=4 x2 =9 x(3)=5 x3=3 x(4)=6 x4=8 x(5)=8 x5=19 x(6)=9 x6=4 x(7)=19 x7=6 n=7 n=7 Descriptivestatistics • range: • minimum: min = x(1) • maximum: max = x(n) • range:R = x(n) – x(1)

  38. Descriptivestatistics • Variance (R function: var() ): • A measureto express thespreadaroundthecenter(mean) by a singlevalue • The squareddeviationofeachattributevaluefromthemeanisconsidered. • Formulafortheempiricalvariancefrom a sample ofelements: • The empircalstandarddeviationis just thesquarerootofthevariance, . (R function: sd() ).

  39. x1 = 75 x2 = 2 x3 = 270 x4 = n = 4 = 100 x4=53 isnot free,but givenbyothervalueswhenthemeanisknown. s2has (n-1) degreesoffreedom (f) Whydevidebyinsteadof? Example:

  40. Data • Ifyouhavedatayouwanttoanalyse, please bring italong!

More Related