220 likes | 384 Views
One-Dimensional Curve-Fitting. Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004. INTRODUCTION. Curve-fitting : Sample data:{(x 0 ,y 0 ), (x 1 ,y 1 ), ... (x n , y n )} interpolation & extrapolation
E N D
One-Dimensional Curve-Fitting Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004
INTRODUCTION • Curve-fitting: • Sample data:{(x0,y0), (x1,y1), ... (xn, yn)} • interpolation & extrapolation • One-dimensional curve-fitting (section 8.7): • The functional form is not pre-specified • SPLINES (ns, smooth.spline) • Local Regression (LOESS, SUPSMU, KERNEL SMOOTHER and LOCPOLY) • Data set: • One independent & one dependent Examples: GAGurine & Mercury level
Dataset: Variables: Age: independent GAG: dependent Sample size: 314 Classical way: library(MASS) attach(GAGurine) plot(Age, GAG, main=”Degree 6 polynomial”) GAG.lm<-lm(GAG~Age+I(Age^2) +I(Age^3) +I(Age^4) +I(Age^5) +I(Age^6) +I(Age^7) +I(Age^8)) anova(GAG.lm) GAG.lm2<-lm(GAG~Age+I(Age^2) +I(Age^3) +I(Age^4) +I(Age^5) +I(Age^6)) xx<-seq(0, 17, len=200) lines(xx, predict(GAG.lm2, data.frame(Age=xx), col=“red”) Age: 0.00 0.00……0.46 0.47.….17.30 7.67 GAG 23.0 23.8……18.6 26.4.…..1.9 9.3 ======================================= Terms added sequentially (first to last) Df Sum of Sq Mean Sq F-value Pr(F) Age 1 12590 12590 593.58 0.0000 I(Age^2) 1 3751 3751 176.84 0.0000 I(Age^3) 1 1492 1492 70.32 0.0000 I(Age^4) 1 449 449 21.18 0.00001 I(Age^5) 1 174 174 8.22 0.00444 I(Age^6) 1 286 286 13.48 0.00028 I(Age^7) 1 57 57 2.70 0.10151 I(Age^8) 1 45 45 2.12 0.14667 GAGurine (MASS)
SPLINES • Algorithm: • Function: ns( ) • Generate a Basis Matrix for Natural Cubic Splines • Usage: ns(x, df, knots, intercept=F, Boundary.knots,derivs) • Arguments: • Required: x the predictor variable. • Optional: • Df: degrees of freedom. One can supply df rather than knots; ns then chooses df-1-intercept knots at suitably chosen quantiles of x. This argument is ignored if knots is supplied. • Knots: breakpoints that define the spline.
SPLINES Function: smooth.spline( ) • Fits a cubic B-spline smooth to the input data. • Usage: smooth.spline(x, y, w = <<see below>>, df = <<see below>>, spar = 0, cv = F, all.knots = F, df.offset = 0, penalty = 1) • Arguments: • Required: X, values of the predictor variable. There should be at least ten distinct x values. • Optional: • Y: response variable, of the same length as x. • Df:a number which supplies the degrees of freedom = trace(S)rather than a smoothing parameter.
SPLINES library(splines) plot(Age, GAG, type=”n”, main=”Spline”)#splines lines(Age, fitted(lm(GAG~ns(Age, df=5))), col=”red”) lines(Age, fitted(lm(GAG~ns(Age, df=10))), lty=3, col=”green”) lines(Age, fitted(lm(GAG~ns(Age, df=20))), lty=4, col=”blue”) lines(smooth.spline(Age, GAG), lwd=3, col=”black”)# Smoothing splines legend(12, 50, c(“red: df=5”, “green:df=10”, “blue:df=20”, “Smoothing”), lty=c(1,3, 4,1), lwd=c(1, 1,1, 3), bty=”n”)
KERNEL SMOOTH Function: ksmooth( ) • Estimates a probability density or performs scatterplot smoothing using kernel estimates. • Usage: ksmooth(x, y=NULL, kernel="box", bandwidth=0.5, range.x=range(x), n.points=length(x), x.points=<<see below>>) • Arguments: • Required: X, vector of x data • Optional: • Y: vector of y data. This must be same length as x, and missing values are not accepted. • Kernel: "box“,"triangle“,"parzen“,"normal” • Bandwidth:Larger values of bandwidth make smoother estimates, smaller values of bandwidth make less smooth estimates.
Kernel Smoother #kernel smoother: plot(Age, GAG, type=”n”, main=”ksmooth”) lines(ksmooth(Age, GAG, “normal”, bandwidth=1), col=”red”) lines(ksmooth(Age, GAG, “normal”, bandwidth=5)) legend(12, 50, c(“red: bandwidth=1”, “black: bandwidth=5”),bty=”n”)
LOESS • Using Local Polynomial Regression fit a curve determined by one or more numerical predictors • gets a predicted value at each point by fitting a weighted linear regression, where the weights decrease with distance from the point of interest
LOESS Parameters • f:controls the window size • weights: distance from some point x • span: the parameter alpha which controls the degree of smoothing • degree: the degree of the polynomials to be used, up to 2
LOESSCode: library(MASS)attach(GAGurine)plot(Age,GAG,type="n",main="loess")lines(loess.smooth(Age,GAG,span=2/3,degree=1),col="red",lwd=1)lines(loess.smooth(Age,GAG,span=2/3,degree=4),col="blue",lwd=2)lines(loess.smooth(Age,GAG,span=1/3,degree=4),col="green",lwd=1)legend(10,45, c("Red: span=2/3,deg=1","Blue: span=2/3,deg=4",”green: span=1/3,deg=4"),bty="n")
SUPSMU • Serves a purpose similar to that of the function loess • The best of the three smoothers is chosen by cross-validation • If there are substantial correlations in x-value, then a pre-specified fixed span smoother should be used. Reasonable span values are 0.2 to 0.4
SUPSMU Parameters: • span: the fraction of the observations in the span of the running(lines smoother, or ‘“cv”’ to choose this by leave-one-out cross-validation) • bass: controls the smoothness of the fitted curve. Values of up to 10 indicate increasing smoothness • periodic: if TRUE, the smoother assumes x is a periodic variable with values in the range [0.0, 1.0] and period 1.0. An error occurs if x has values outside this range References: Friedman, J. H. (1984) A variable span scatter-plot smoother. Laboratory for Computational Statistics, Stanford University Technical Report No. 5
Code:plot(Age,GAG,type="n",main="supsmu")lines(supsmu(Age,GAG))lines(supsmu(Age,GAG,bass=3),lty=3)lines(supsmu(Age,GAG,bass=10),lty=4)legend(12,50,c("default","bass=3","bass=10"),lty=c(1,3,4),bty="n")Code:plot(Age,GAG,type="n",main="supsmu")lines(supsmu(Age,GAG))lines(supsmu(Age,GAG,bass=3),lty=3)lines(supsmu(Age,GAG,bass=10),lty=4)legend(12,50,c("default","bass=3","bass=10"),lty=c(1,3,4),bty="n")
LOCPOLY • Estimates a probability density function using local polynomials • A fast binned implementation over an equally-spaced grid is used • Use approximations over an equally-spaced grid for fast computation • In a simple form : locpoly(x, y, degree=#, bandwidth=# ) Parameters: • locpoly(x, y, drv=0, degree=<<see below>>, kernel="normal“ bandwidth,gridsize=401, bwdisc=25, range.x=<<see below>>, binned=FALSE, truncate=TRUE ) • drv: order of derivative to be estimated • degree: degree of local polynomial used • bandwidth: the kernel bandwidth smoothing parameter • range.x: vector containing the minimum and maximum values of 'x' at which to compute the estimate
LOCPOLY Code: library(MASS) attach(GAGurine) library(KernSmooth) plot(Age, GAG, type="n", main="(Age, GAG) Locpoly") (h<- dpill(Age, GAG)) lines(locpoly(Age, GAG, degree=0, bandwidth=h), col="red",lty=1,lwd=2) lines(locpoly(Age, GAG, degree=1, bandwidth=h), col="blue",lty=3,lwd=3) lines(locpoly(Age, GAG, degree=2, bandwidth=h), col="green",lty=4,lwd=3) legend(10,40,c("const=0 red","linear=1 blue","quad=2 green"),lty=c(1,3,4),bty="n") detach()
Example: Mercury Level • Model : Mercury and Alkalinity • In 1990 to 1991, largemouth bass fish were studied in 53 different Florida lakes to examine the Mercury contamination level and the factors that influenced the level of mercury absorpsion in the fish • One factor studied was the Alkaliniity level of the water • The graph of Mercury level and Alkalinity level is plotted to study the relationship
Mercury Level Graphs Coding: • #1 loess • plot( Alkalinity, Mercury, main="Alkalinity and Mercury, Loess") • lines(loess.smooth(Alkalinity,Mercury,span = 2/3, degree = 1), col="red",lwd=2) • lines(loess.smooth(Alkalinity,Mercury,span = 2/3, degree = 2), col="blue",lwd=2) • legend(65,1.0, c("deg=1 Red","deg=2 Blue"),bty="n") • #2 supsmu • plot( Alkalinity, Mercury, main="Alkalinity and Mercury, Supsmu") • lines(supsmu(Alkalinity,Mercury, bass=1), lty=1,col="red",lwd=2) • lines(supsmu(Alkalinity,Mercury, bass=10), lty=3,col="blue",lwd=3) • legend(58,1.0, c("base=1red","base=10blue"),lty=c(1,3),bty="n",lwd=2) • #3 ksmooth • plot(Alkalinity, Mercury, type="n", main="Alkalinity and Mercury, Ksmooth") • lines(ksmooth(Alkalinity, Mercury, "normal", bandwidth=1),col="green",lwd=2) • lines(ksmooth(Alkalinity, Mercury, "normal", bandwidth=5),col="red", lty=2,lwd=2) • legend(75,1.0, c("bw=1","bw=5"),lty=c(1,2),bty="n") • #4 locpoly • library(KernSmooth) • plot( Alkalinity, Mercury, type="n",main="Alkalinity and Mercury, Locpoly") • #select bandwidth • (h <- dpill(Alkalinity,Mercury)) • lines(locpoly(Alkalinity,Mercury,degree=0, bandwidth=h),lty=1,col="green",lwd=2) • lines(locpoly(Alkalinity,Mercury,degree=1, bandwidth=h),lty=2,col="red",lwd=2) • lines(locpoly(Alkalinity,Mercury,degree=2, bandwidth=h),lty=3,col="purple",lwd=3) • legend(75,1.0, c("const","linear","quad"),lty=c(1,2,3),bty="n")
SUMMARY • Use One-Dimensional Curve-Fitting when: Scatter Plot does not result in a Linear Model Data Transformation does not give satisfactory Linear Model result Accommodate future data Include previous outliers Business applications • Several methods discussed including: 1. SPLINES 2. LOESS 3. SUPSMU 4. KSMOOTH 5. LOCPOLY • Parameters: such as bandwidth, df, derivative, smoothness, degree etc can help the curve fitting.