1 / 35

Descripció Univariant

Descripció Univariant. MPG 30,8 31,7 30,1 31,6 32,1 33,3 31,3 31,0 32,0 32,4 30,9 30,4 32,5 30,3 31,3 32,1 32,5 31,8 30,4 30,5 32,0 31,4 30,8 32,8 32,0 31,5 32,4 31,0 29,8 31,1 32,3 32,7 31,2 30,6 31,7 31,4 32,2 31,5 31,7 30,6 32,6 31,4 31,8 31,9 32,8

alika
Download Presentation

Descripció Univariant

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Descripció Univariant

  2. MPG 30,8 31,7 30,1 31,6 32,1 33,3 31,3 31,0 32,0 32,4 30,9 30,4 32,5 30,3 31,3 32,1 32,5 31,8 30,4 30,5 32,0 31,4 30,8 32,8 32,0 31,5 32,4 31,0 29,8 31,1 32,3 32,7 31,2 30,6 31,7 31,4 32,2 31,5 31,7 30,6 32,6 31,4 31,8 31,9 32,8 31,5 31,6 30,6 32,2 Car Mileage data (consum de gasolina)

  3. >data=read.table("D:/Albert/COURSES/CursLlibreBowerman/Datasets - Text/GasMiles.txt", header=TRUE) > names(data) [1] "MPG" > dim(data) [1] 49 1 > attach(data) > stem(MPG) The decimal point is at the | 29 | 8 30 | 1344 30 | 5666889 31 | 001233444 31 | 55566777889 32 | 0001122344 32 | 556788 33 | 3 Descripció Univariant > summary(data) MPG Min. :29.80 1st Qu.:31.00 Median :31.60 Mean :31.55 3rd Qu.:32.10 Max. :33.30 > data MPG 1 30.8 2 31.7 3 30.1 4 31.6 5 32.1 6 33.3 .... 47 31.6 48 30.6 49 32.2

  4. Amb SPSS Copiar / pegar les dades a la fulla de càlcul de SPSS (amb , en lloc de punts, si estem En la versió espanyola de SPSs DESCRIPTIVES VARIABLES=gas /STATISTICS=MEAN STDDEV MIN MAX .

  5. histograma GRAPH /HISTOGRAM(NORMAL)=gas .

  6. histograma GRAPH /HISTOGRAM(NORMAL)=gas .

  7. Box Plot Q1 Q3 Inner Fences Q1 - 1.5* IQR Q3 + 1.5* IQR Mediana Min Max min max Outer Fences Q1 - 3* IQR Q3 + 3* IQR 3* IQR defines the outer fences, points Beyond that fences are extreme outliers Points beyond the inner fences but below outer fences are mild outliers.

  8. Box Plot Inner fence: Inner fance Inner fence: Inner fance 1.5*IQR min max IQR Inner Fences Q1 - 1.5* IQR Q3 + 1.5* IQR 3* IQR defines the outer fences, points Beyond that fences are extreme outliers Points beyond the inner fences but below outer fences are mild outliers.

  9. Box-Plot EXAMINE VARIABLES=gas /COMPARE VARIABLE/PLOT=BOXPLOT/STATISTICS=NONE/NOTOTAL /MISSING=LISTWISE .

  10. Cross-section data: bank data > data=read.table("D:/Albert/COURSES/cursDAS/AS2003/DATA/BANK.TXT", header=TRUE) > dim(data) [1] 100 9 > names(data) [1] "LSALNOW" "LSALBEG" "SEX" "JOBCAT" "RACE" "EDLEVEL" "TIME" [8] "AGE" "WORK" > data[sample(1:dim(data)[1],10),] LSALNOW LSALBEG SEX JOBCAT RACE EDLEVEL TIME AGE WORK 25 9.4125 8.7483 0 3 0 12 80 61.67 38.33 47 8.9227 8.3428 1 1 0 15 90 58.00 4.50 8 10.0078 9.5104 0 4 0 19 81 30.75 5.17 33 9.5324 8.4888 1 2 0 12 77 24.33 0.33 97 8.8217 8.3138 1 1 1 12 72 51.50 22.58 100 8.9065 8.3138 1 1 1 12 85 51.00 19.00 32 9.5104 8.6995 0 3 0 12 83 50.25 23.67 94 8.8479 8.3138 1 1 1 12 72 46.50 9.67 39 9.0711 8.5132 1 1 0 8 74 59.83 26.50 36 9.1695 8.5942 1 1 0 12 98 47.33 20.33 > data[runif(dim(data)[1])<.1,]

  11. Salnow by sex (boxplot) boxplot(SALNOW ~SEX, col=c("blue", "green"))

  12. Red is kernel density Green is the normal distribution > summary(INCOME) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.00 14.00 20.00 22.44 30.00 100.00

  13. Log of income linc=log(INCOME) hist(linc,12, prob= TRUE, col='blue') lines(density(linc,bw=0.4), col='red') mu=mean(linc) sd=sqrt(var(linc)) lines(sort(linc),dnorm(sort(linc),mu,sd), col='green') Red is kernel density Green is the normal distribution

  14. Shape of the distribution and Mean, Median and Mode

  15. . summarize hsnotpau Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- hsnotpau | 3609 5.616312 1.123641 1.44 9.6

  16. Proporció d’estudiants entre notes • La distribució de les notes d’un examen test és aproximadament • normal amb mitjana 6 i desviació tipica 1.7 (5.6 i desviació tipica 1.1). • Trobeu: • Els quartils de la distribució: • percentage aproximat d’estudiants amb una puntuació • entre 5 i 7. • c) El % d’estudiants que suspenen (suspenen amb nota <5). • d) El % amb nota més gran que 7. • e) Probabilitat que al triar 5 individus de la població d’estudiants • que han fet el test, n’hi hagi com a mínim 2 que tenen nota • superior a 7. • f) Quina es la distribució de la mitjana de notes de 10 estudiants • escollits a l’atzar de la població que ha realitzat el test ?.

  17. Solució: • Distribució mostral: • La mitjana mostral d’una • mostra de 10 estudiants • té distribució normal, de • mitjana 6 i desviació típica • > 1.7/sqrt(10) • [1] 0.5375872 • Quartils : • > qnorm(.25, 6, 1.7) • [1] 4.853367 • > qnorm(.5, 6, 1.7) • [1] 6 • > qnorm(.75, 6, 1.7) • [1] 7.146633 • Percentils : • > pnorm(5,6,1.7) • [1] 0.2781872 • > pnorm(7,6,1.7) • [1] 0.7218128 • > pnorm(7,6,1.7) - pnorm(5,6,1.7) • [1] 0.4436256 • > 1- pnorm(7,6,1.7) • [1] 0.2781872 • > pbinom(1, 5, 1- pnorm(7,6,1.7)) • [1] 0.5735169

  18. . . summarize hsnotpau, detail hsnotpau ------------------------------------------------------------- Percentiles Smallest 1% 3 1.44 5% 3.81 2.11 10% 4.21 2.27 Obs 3609 25% 4.85 2.3 Sum of Wgt. 3609 50% 5.58 Mean 5.616312 Largest Std. Dev. 1.123641 75% 6.38 8.94 90% 7.09 9.07 Variance 1.262569 95% 7.61 9.37 Skewness .0930459 99% 8.25 9.6 Kurtosis 2.983357

  19. Funció d densitat de distribució normal

  20. Funció d densitat de distribució normal Applet de la distribució Normal a : Statistical Applets: http://bcs.whfreeman.com/ips4e/pages/bcs-main.asp?v=category&s=00010&n=99000&i=99010.01&o Taules de la distribució normal: Taules Estadístiques : http://bcs.whfreeman.com/ips4e/pages/bcs-main.asp?v=category&s=00100&n=99000&i=99100.01&o Taules de la distribució normal a R: pnorm() qnorm() Per exemple: > pnorm(1.87) [1] 0.969258 > pnorm(-1.2) [1] 0.1150697 > qnorm(.975) [1] 1.959964 > qnorm(.25) [1] -0.6744898 % d’estudiants amb una nota entre 5 i 7 ? (mitjana = 5.616312 desviació típica = 1.123641 ) Z2=(7- 5.616312)/1.123641 Z1=(5- 5.616312)/1.123641 > pnorm(Z2) - pnorm(Z1) [1] 0.5992435 aproximadament un 60%. Més directe: pnorm(7, 5.616312,1.123641)- pnorm(5, 5.616312,1.123641) [1] 0.5992435

  21. The normal and t distributions( 10%, 5% 1% tails )

  22. Family consumption data (family.dta ): summary statistics . summarize exp1_1, detail ------------------------------------------------------------- Percentiles Smallest 1% .1520551 7.18e-06 5% .3881256 7.65e-06 10% .5420735 .0000112 Obs 2640 25% .8613541 .0000267 Sum of Wgt. 2640 50% 1.294648 Mean 1.473449 Largest Std. Dev. .9169822 75% 1.901873 8.024636 90% 2.559126 8.826962 Variance .8408563 95% 3.10731 9.368608 Skewness 2.150655 99% 4.331305 10.20112 Kurtosis 13.92168

  23. Quantiles

  24. Dot-plot. dotplot food

  25. Histogramgraph exp1_1, bin(20) normal

  26. Boxplot of expend. on food

  27. Comparison of Distributions graph exp1_1, box by( group)

  28. Distrib. of transform. var.

  29. . summarize newfood, detail BC(exp1_1,.367) ------------------------------------------------------------- Percentiles Smallest 1% -1.35978 -2.689332 5% -.7995382 -2.6885 10% -.5484188 -2.68311 Obs 2640 25% -.1452355 -2.667499 Sum of Wgt. 2640 50% .2708729 Mean .2863131 Largest Std. Dev. .6866023 75% .7250079 3.126675 90% 1.122054 3.334947 Variance .4714228 95% 1.406071 3.468853 Skewness .0912667 99% 1.941548 3.66543 Kurtosis 4.251757 .

  30. Pisa 2003 > Rendiment en Matemàtiques, > Nombre de llibres a casa

  31. Pisa 2003

  32. Pisa 2003

  33. Repàs d’alguns conceptes

  34. Some exercises for the practice on the Normal Distribution Exercises 1. The heights of adult men are normally distributed with a mean of 69.5 inches and a variance of 7.025 inches. Find the probabilities that a man chosen at random will be (a) at least 72 inches tall, (b) at most 72 inches tall. 2. Scores on standard IQ Tests are usually designed to be normally distributed with a mean of 100 and a standard deviation of 15. On such a test, find the probability that a person chosen at random will score (a) below 90, (b) above 90. 3. On American Roulette wheels, the probability of the ball landing on red is 18 / 38. Suppose 200 bets are placed on red. Use the Normal Approximation of the Binomial to approximate the probability of there being from 100 to 120 winners. 4. It is estimated that Americans average 200 deaths yearly (per 100,000 people) from heart attacks. Use the Normal Approximation of the Poisson to approximate the probability that 180 to 210 such deaths will occur in a random group of 100,000 Americans during a given year.

  35. Mean Value (1)(Mean of a random variable) When a random phenomenon is repeated many times, the proportion of trials on which an outcome occurs eventually approaches the probability of the outcome. If the outcomes are numerical, the average of the observed outcomes eventually approaches the expected value. Sometimes we express the random outcome as X, a random variable; then the expected value is also called the mean of X. http://www.whfreeman.com/scc/con_index.htm?99spt

More Related