1 / 32

Jackknife

Jackknife. Motiveeriv näide. Soovime hinnata suurust f (E X ), näiteks log(EX) = ? (EX) 4 = ? sin(EX) = ? Idee: kui kasutaks valimikeskmist? log(EX) ≈ log(x) (EX) 4 ≈ x 4. Paraku saame nihkega hinnangu.

easter
Download Presentation

Jackknife

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jackknife

  2. Motiveeriv näide • Soovime hinnata suurust f(EX), näiteks log(EX) = ? (EX)4 = ? sin(EX) = ? Idee: kui kasutaks valimikeskmist? log(EX) ≈ log(x) (EX)4≈ x4 .....

  3. Paraku saame nihkega hinnangu... Statistikas on hinnangu nihe üllatavalt sageli suurusjärgus k/n ( +o(n-1) ) Kuidas sellisest nihkest lahti saada (või vähemalt teda pisendada)?

  4. Algidee • Olgu antud statistik S • Vaatame statistikut S*(i) = nS – (n-1) S(i) kus S(i) on statistiku S väärtus arvutatud ilma i. vaatlust kasutamata. Antud statistiku nihe on: nihe( S*(i)) = n nihe(S) - (n - 1) nihe(S(i)) nihe( S*(i)) = n (c/n+...) - (n - 1) (c/(n-1)+...) nihe( S*(i)) = c + n∙... - c – (n-1)∙...

  5. Miks algidee ei kõlba? • Millise vaatluse me paljude seast välja valime? Tulemus sõltub ju sellest? • Kui statistikuks on valimikeskmine, siis mis tuleb suuruse S*(i) väärtuseks?

  6. Lahendus • Leiame n erinevat nihketa (vähendatud nihkega) statistikut ja lõpphinnanguks kasutame nende hinnangute keskmist:

  7. Usaldusintervallid Tukey (1958) ligikaudsed usaldusintervallid: S*+ta/2; df=n-1 s* ... S*+t1-a/2; df=n-1 s*

  8. Näide: (EX)4 • n=10 • Valimikeskmine4 : Nihe: (2.3...2.6) Jackknife: Nihe: (-0.1...0.2)

  9. Determinatsioonikordaja ja uhke mudel > x=1:5 > y=0*x+rnorm(5) > m=lm(y~x+I(x^2)) > summary(m)$r.square [1] 0.3085117 R2 = 0,31

  10. > S=summary(m)$r.square > > Sn_1=rep(NA, 5) > for(i in 1:5){ + ym=y[-i] + xm=x[-i] + m1=lm(ym~xm+I(xm^2)) + Sn_1=summary(m1)$r.square + } > > Si_tarn=5*S-4*Sn_1 > mean(Si_tarn) [1] -0.9277159 R2 = -0,93 ???

  11. R2 = ( D(Y) – D(prognoosivead) ) / D(Y) > m=lm(y~x+I(x^2)) > xx=rep(x, 10000) > yy=0*x+rnorm(5*10000) > prognoosiviga=yy-predict(m, data.frame(x=xx)) > (var(yy)-var(prognoosiviga))/var(yy) [1] -0.225787

  12. Mudeli valik

  13. Mudeli valik

  14. Ristvalideerimine(Cross-Validation)

  15. Ristvalideerimine(Cross-Validation)

  16. Ristvalideerimine(Cross-Validation)

  17. Ristvalideerimine(Cross-Validation)

  18. Ristvalideerimine(Cross-Validation)

  19. Ristvalideerimine(Cross-Validation)

  20. Ristvalideerimine(Cross-Validation)

  21. Ristvalideerimine(Cross-Validation) RV= 2,55 Ristvalideerimisel RV=17,71

  22. RV=3,69 Ristvalideerimisel RV=5,77

  23. Leave-One-Out Cross-Validation > m1=lm(y~x+I(x^2)) > sum((y-predict(m1))**2) [1] 2.554625 > > m2=lm(y~1) > sum((y-predict(m2))**2) [1] 3.694387 prog1=rep(NA, 5) prog2=rep(NA, 5) for (i in 1:5){ ym=y[-i] xm=x[-i] m1=lm(ym~xm+I(xm^2)) prog1[i]=predict(m1, data.frame(xm=x[i])) m2=lm(ym~1) prog2[i]=predict(m2, data.frame(xm=x[i])) } > sum((prog1-y)**2) [1] 17.71242 > sum((prog2-y)**2) [1] 5.77248

  24. > summary(m1) Call: lm(formula = y ~ x + I(x^2)) Residuals: 1 2 3 4 5 0.009211 0.336618 -1.065121 1.083543 -0.364252 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.21858 2.42397 -0.503 0.665 x 0.63302 1.84723 0.343 0.764 I(x^2) -0.05011 0.30205 -0.166 0.883 Residual standard error: 1.13 on 2 degrees of freedom Multiple R-squared: 0.3085, Adjusted R-squared: -0.383 F-statistic: 0.4462 on 2 and 2 DF, p-value: 0.6915

More Related