1 / 48

Regression using serial data

This study examines the effectiveness of a treatment on the growth of the upper jaw in Japanese children. The regression analysis is conducted using serial data to determine the impact of age on the dependent variable.

Download Presentation

Regression using serial data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression using serial data Jyoti Sarkar, IUPUI jsarkar@math.iupui.edu

  2. The Problem Given: On n units, (x,y) “before” and (x,y) “after” a treatment Goal: Regress y on x • X=a predictor (easy/inexpensive) • Y=a response (difficult/expensive) • Assume n units are independent jsarkar@math.iupui.edu

  3. Example Concern: Many Japanese exhibit a bigger lower jaw than upper jaw. Treatment (for growth of upper jaw): Children (4 -12 years old) wore a mouth gear 8 -10 hours daily for 1- 2 years. Questions • Was the treatment effective? … no control gp • How did measurements change with age? jsarkar@math.iupui.edu

  4. jsarkar@math.iupui.edu

  5. Face Mask Experiment Sample: 25 boys, 18 girls “before” and “after” treatment • age (year  day) From X-ray plates measure: • ccorr = corrected C-axis SM (mm) • theta =  (C-axis, anterior cranial base SN) • alpha =  (C-axis, palatal plane thro’ M) ( ½ degree) Objective: Regress y=ccorr on x=age jsarkar@math.iupui.edu

  6. Face Mask data ☺ patient gender age1 theta1 alpha1 ccorr1 age2 theta2 alpha2 ccorr2 ☺ 1 2 4.99 39.0 35.0 76.076 5.99 40.5 32.0 77.064 ☺ 2 2 9.90 43.0 32.0 68.666 11.43 47.0 34.0 72.124 etc. jsarkar@math.iupui.edu

  7. Regress y=ccorr on x=age:(1) “before” data: (n=18) >Regress ccorr1 on 1 age1 ccorr1=66.97 + 0.2530 age1 • r2=.023, r2(adj)=.000 • S=3.632, SE(b1)=0.4096, p-value=.545 • t.975,16=2.120 • 95% CI(b1) = (-0.6153,1.1214) jsarkar@math.iupui.edu

  8. jsarkar@math.iupui.edu

  9. (2) “after” data: girls (n=18) >Regress ccorr2 on 1 age2 ccorr2=71.30 + 0.1142 age2 • r2=.006, r2(adj)=.000 • S=3.321, SE(b1)=0.3738, p-value=.764 • t.975,16=2.120 • 95% CI(b1) = (-0.6782, 0.9066) jsarkar@math.iupui.edu

  10. jsarkar@math.iupui.edu

  11. (3) “superimposed” data: (n=36) Data size doubled, range expanded >Stack age1 age2 age >Stack ccorr1 ccorr2 ccorr >Regress ccorr on 1 age Ccorr =67.5636 + 0.3793 age • r2=.049, r2adj=.021 • S=3.745, SE(b1)=0.2880, p-value=.197 • t.975,34=2.032 • 95% CI(b1) = (-0.2060, 0.9646) jsarkar@math.iupui.edu

  12. jsarkar@math.iupui.edu

  13. Regress y on x: naïve attempts All 3 naïve attempts yield • Low r2 • Large p-value => slope=0 • CI э 0 Conclusion: • Either “ccorr does not depend on age” • Or “we need a better regression model” jsarkar@math.iupui.edu

  14. jsarkar@math.iupui.edu

  15. Serial Bivariate Plot  • ccorr increases with age (for most girls) • Regression of ccorr on age should have positive slope,especially under treatment Why then is r2 low? Between-subject variation is high. Study within-subject change, to see if ccorr depends on age. jsarkar@math.iupui.edu

  16. Within-subject change • Dage = age2 - age1 = Treatment duration • Dccorr = ccorr2 – ccorr1 = Change in ccorr • Dccorr / Dage = within-subject slope Means (n=18 girls) age2 = 8.39 ccorr2 = 72.26 age1 = 7.26 ccorr1 = 68.80 Dage = 1.13 Dccorr = 3.46 Dccorr/ Dage = 3.0251 Recall b1= (1) 0.2530 (2) 0.1142 (3) 0.3793 jsarkar@math.iupui.edu

  17. jsarkar@math.iupui.edu

  18. Regress Dccorr on Dage >Regress dccorr on 1 dage; >noconstant. dccorr = 3.0763 dage S=2.374, SE(b1)=0.4847, p-value = .000 t.975,17=2.110 95% CI(b1) = (2.0536,4.0990) Conclusion: ccorr increases with age jsarkar@math.iupui.edu

  19. A Paradox: • Naïve regression slopes are zero • Within-subject slope is non-zero What to do? • Find the proper regression model. • Repeated Measures/Growth Curves • Repeated Measures with Covariate • Serial Correlation jsarkar@math.iupui.edu

  20. Serial Correlation Model 1 • Regression model ccorr = b0 + b1 age + error • error variables ID N(0,s2), dependent • Between-subject errors uncorrelated • Within-subject errors have correlation r jsarkar@math.iupui.edu

  21. Regression Model 1 jsarkar@math.iupui.edu

  22. If r unknown Pre-multiply by jsarkar@math.iupui.edu

  23. Orthogonalized Model 1 jsarkar@math.iupui.edu

  24. Stacking … jsarkar@math.iupui.edu

  25. If r unknown jsarkar@math.iupui.edu

  26. Algorithm: Estimate r 0. Begin = correlation(ccorr1, ccorr2) 1. Orthogonalize age and ccorr using to obtain tage & tccorr 2. Regress tccorr on 1 tage Save residuals 3. If = corr(tresi1, tresi2) < .001, STOP Else = + Go to Step 1. jsarkar@math.iupui.edu

  27. MINITAB codes1 >corr c7 c12 # initial rho >let k3=.730 # enter above/updated rho >let k1=(1/sqrt(1+k3)+1/sqrt(1-k3))/2 >let k2=(1/sqrt(1+k3)-1/sqrt(1-k3))/2 # orthogonalize age >let c21=k1*c3+k2*c8 >let c22=k2*c3+k1*c8 >stack c21 c22 c31 >name c31 'tage' jsarkar@math.iupui.edu

  28. MINITAB codes2 >let c23=k1*c7+k2*c12 # orthog… ccorr >let c24=k2*c7+k1*c12 >stack c23 c24 c32 >name c32 'tccorr' >regress 'tccorr' 1 'tage'; >resi c33; >coef c34. >unstack c33 c35 c36; subs c18. >corr c35 c36 # STOP if <.001, else >let k3=k3+corr(c35,c36)/2 jsarkar@math.iupui.edu

  29. jsarkar@math.iupui.edu

  30. jsarkar@math.iupui.edu

  31. “Orthogonalized” data: (n=36) First iteration: (Model 1) Initial =.730 • tccorr =46.9184 + 1.1271 tage • r2=.216, r2(adj)=.193, p-value=.004 • Corr(tresi1, tresi2)=.191 Revised =.82545 jsarkar@math.iupui.edu

  32. jsarkar@math.iupui.edu

  33. Iteration History (Model 1) Iter 0 .730 .191 1 .825 .066 2 .858 .012 3 .8641 .001 4 .8646 .000 5 .864621 jsarkar@math.iupui.edu

  34. jsarkar@math.iupui.edu

  35. “Orthogonalized” data: (n=36) After Five iterations: =.8646 • Corr(tresi1, tresi2)=.000 • tccorr =42.132 + 1.6613 tage • r2=.347, r2(adj)=.328, • S=5.1319, SE(c1)=0.3908, p-value=.000 jsarkar@math.iupui.edu

  36. jsarkar@math.iupui.edu

  37. Regress y on x : (Model 1) ccorr = 57.532 + 1.6613 age • =0.8646 • =5.2091, SE(b1)=.3967, p-value=.000 • t.975,33=2.0345 • 95% CI(b1) = (0.8560, 2.4683) jsarkar@math.iupui.edu

  38. jsarkar@math.iupui.edu

  39. Serial Correlation Model 2 • Regression model 2 ccorr = b0 + b1 (age) + error • error variables ID N(0,s2), dependent • Between-subject errors uncorrelated • Within-subject errors have correlation r(age2-age1) jsarkar@math.iupui.edu

  40. Regression Model 2 jsarkar@math.iupui.edu

  41. MINITAB Codes 3 >let c19=‘age2’ – ‘age1’ >name c19 ‘dage’ >corr c7 c12 >let k3=.730 # enter above/updated correlation # use rDage to orthogonalize >let c51=(1/sqrt(1+k3**c19)+1/sqrt(1-k3**c19))/2 >let c52=(1/sqrt(1+k3**c19) -1/sqrt(1-k3**c19))/2 >let c21=c51*c3+c52*c8 >let c22=c52*c3+c51*c8 etc. jsarkar@math.iupui.edu

  42. Iteration History: (Model 2) Iter 0 .730 .231 1 .845 .063 2 .877 .002 3 .8782 .001 4 .8781 .000 5 .878120 jsarkar@math.iupui.edu

  43. “Orthogonalized” data: (n=36) After Five iterations: =.8781 Corr(tresi1, tresi2)=.000 • tccorr =57.935 intdage + 1.6097 tage • r2=.336, r2(adj)=.316, • S=5.092, SE(c1)=0.3912, p-value=.000 jsarkar@math.iupui.edu

  44. Regress y on x : (Model 2) ccorr = 57.935 + 1.6098 age • =0.8781 • =5.169, SE(b1)=0.3971, p-value=.000 • t.975, 33=2.0345 • 95% CI(b1) = (0.8018, 2.4176) jsarkar@math.iupui.edu

  45. jsarkar@math.iupui.edu

  46. jsarkar@math.iupui.edu

  47. Summary • Model serial data properly • Estimate serial correlation Use iterated algorithm • Regress orthogonalized data • Obtain regression of y on x • Adjust , SE(b1) and CI(b1) • Can extend to more repeats per subject jsarkar@math.iupui.edu

  48. Thank you. jsarkar@math.iupui.edu

More Related