We downloaded this data set from the Carbon Dioxide Information Analysis Center at Oak Ridge National Laboratory (http://cdiac.ESD.ORNL.GOV/ftp/ndp019). The data contains mean monthly temperature from 1890 to 1996 from 1221 stations in US (note that this data set has been updated on this site since we downloaded it five years ago), geological locations of these stations in terms of longitude (long) and latitude (lat). We use this data set to illustrate how to fit periodic spline, thin-plate spline, SS ANOVA and NNR models. We use data from 1961 to 1990 from 48 stations in Texas only. The data is saved as TXtemp.
We first fit a periodic spline, compute the estimates without the constant term (yearly mean) and its amplitude for each station.
> data(TXtemp) > TXtemp$cm<- (TXtemp$month-0.5)/12 > amp <- NULL > for(i in 1:48){ tmpfit<- ssr(mmtemp~1, rk=periodic(cm), spar="m", data=TXtemp[TXtemp$stacod==unique(TXtemp$stacod)[i]& TXtemp$mmtemp!=-99.99,]) p.tmpfit <- predict(tmpfit, terms=c(0,1), pstd=F, newdata=data.frame(cm=seq(0,1,len=100)))$fit amp <- c(amp,max(abs(p.tmpfit))) }
Figure shows the estimated amplitudes on the logarithm scale plotted against longitude and latitude. It is clear that the middle and northern parts of Texas tend to have larger amplitudes (hotter Summer and colder Winter).
We now fit a thin-plate spline with and to model the effect of longitude and latitude on the log(amplitude):
> loc <- TXtemp[TXtemp[,4]==1961&TXtemp[,6]==1,2:3] > data <- data.frame(amp=log(amp),lat=loc[,1],long=loc[,2]) > tx.fit1 <- ssr(amp~long+lat,rk=tp.pseudo(list(long,lat)),data=data, spar="m") > i <- interp(data$long,data$lat,data$amp) > grid1 <- list(long=i$x,lat=i$y) > p.tx.fit1 <- predict(tx.fit1,expand.grid(grid1),pstd=F)The contour and 3-d plots of the fit are shown in Figure .
We can use SS ANOVA or NNR models to investigate seasonal trend (temporal) and location effect (spatial) together. For this purpose, we first compute average mean monthly temperature from 1996-1990 for each station.
> y <- gapply(TXtemp, which=5, FUN=function(x) mean(x[x!=-99.99]), group=TXtemp$stacod*TXtemp$month) > tx.dat <- data.frame(y=as.vector(t(matrix(y,48,12,byrow=F)))) > tx.dat$month<-rep((1:12-0.5)/12, 48) > tx.dat$lat<-rep(TXtemp$lat[seq(1, nrow(TXtemp),by=360)] ,rep(12,48)) > tx.dat$long<-rep(TXtemp$long[seq(1, nrow(TXtemp),by=360)] ,rep(12,48)) > tx.dat$stacod<-rep(TXtemp$stacod[seq(1, nrow(TXtemp),by=360)] ,rep(12,48))Denote , , , and . We model month () effect using a periodic spline, and spatial () effect using a TPS. Then we have the following SS ANOVA model:
> tx.fit3 <- ssr(y~long+lat, data=tx.dat, spar="m", rk=list(periodic(month), tp(list(long,lat)), rk.prod(periodic(month),kron(long)), rk.prod(periodic(month),kron(lat)), rk.prod(periodic(month),tp(list(long,lat))))) > tx.fit3 ... GML estimate(s) of smoothing parameter(s) : 8.184394e-06 2.426733e-02 1.076039e-01 5.558457e-02 2.683768e-04 Equivalent Degrees of Freedom (DF): 345.5728 Estimate of sigma: 0.117142
If mean monthly temperature profiles from all stations have the same
shape except a vertical shift and scale transformation, we may consider
the following NNR model
> S3 <- periodic(tx.dat$month) > f3.tmp <- ssr(y~1,rk=S3,data=tx.dat,spar="m") > f3.ini <- as.vector(S3%*%f3.tmp$rkpk.obj$c) > tx.fit4 <- nnr(y~f1(long,lat)+exp(f2(long,lat))*f3(month), func=list(f1(x,z)~list(~x+z,tp(list(x,z))), f2(x,z)~list(~x+z-1,tp(list(x,z))), f3(x)~list(periodic(x))), data=tx.dat,start=list(f1=mean(y),f2=0,f3=f3.ini)) > tx.fit4 Nonlinear Nonparametric Regression Model Fit by Gauss-Newton Method Model: y ~ f1(long, lat) + exp(f2(long, lat)) * f3(month) Data: tx.dat GCV estimate(s) of smoothing parameter(s): 2.576027e-06 1.736111e-03 4.471411e-07 Equivalent Degrees of Freedom (DF): 95.47625 Residual standard error: 0.9433764 Number of Observations: 576 Converged after 3 iterations
Contour plots of the estimates of and are shown in Figure . There is clear spatial effects to the mean temperature and amplitude. Southern part of the state is warmer and has less seasonal variation.
We now discuss two approaches two check the NNR model. We have fitted each station separately and saved the estimates without yearly mean in p.tx.fit1. One way to check if mean monthly temperature profiles from all stations have the same shape after removing yearly mean is to plot the estimates from one station against another to see if the points fall on a straight line. We rescale estimates from all stations such that all of them have amplitudes equal one. We then calculate Euclidean distances between stations. We select paired stations which have distances correspond to the 1%, 5%, 10%, 25%, 50%, 75%, 90%, 95% and 99% quantile of all possible paired stations. Figure shows plots of these estimates for these selected stations. Some deviation from the straight line can be seen when distance becomes large.
> d <- dist(diag(1/amp)%*%t(p.tx.fit1)) > st <- NULL > for (i in 1:47) { for (j in (i+1):48) st <- rbind(st,c(i,j))} > ord <- order(d) > tmp <- cbind(d[ord],st[ord,])
It is easy to see that model () reduces to
the model () iff
> grid2 <- data.frame(month=rep(seq(0,1,len=40), 48), lat=rep(TXtemp$lat[seq(1,nrow(TXtemp),by=360)] ,rep(40,48)), long=rep(TXtemp$long[seq(1, nrow(TXtemp),by=360)] ,rep(40,48))) > p.tx.fit3.f2 <- predict(tx.fit3,grid2[1:40,], terms=c(0,0,0,1,0,0,0,0),pstd=F)$fit > p.tx.fit3.f12 <- predict(tx.fit3,grid2, terms=c(0,0,0,0,1,1,1,1),pstd=F)$fitSuch a plot is shown in Figure which indicates certain deviation from the straight line.
For the purpose of illustration, we now fit the following additive
model
> tx.fit5 <- ssr(y~long+lat, rk=list(periodic(month), tp(list(long,lat))), data=tx.dat, spar="m") ... GML estimate(s) of smoothing parameter(s) : 0.05823508 16.36685971 Equivalent Degrees of Freedom (DF): 41.59342 Estimate of sigma: 1.773092 Number of Observations: 576
It is obvious that above additive model is a special case of the NNR model () with . We can compare three models using a model selection procedure such as GCV, AIC or BIC.
> n <- 576 > rss1 <- sum(tx.fit3$resi**2) > rss2 <- sum(tx.fit4$resi**2) > rss3 <- sum(tx.fit5$resi**2) > gcv1 <- rss1/n/(1-tx.fit3$df/n)**2 > gcv2 <- rss2/n/(1-tx.fit4$df$f/n)**2 > gcv3 <- rss3/n/(1-tx.fit5$df/n)**2 > aic1 <- n*log(rss1/n)+2*tx.fit3$df > aic2 <- n*log(rss2/n)+2*tx.fit4$df$f > aic3 <- n*log(rss3/n)+2*tx.fit5$df > bic1 <- n*log(rss1/n)+log(n)*tx.fit3$df > bic2 <- n*log(rss2/n)+log(n)*tx.fit4$df$f > bic3 <- n*log(rss3/n)+log(n)*tx.fit5$df > print(c(rss1,rss2,rss3)) [1] 3.161981 428.052072 1680.096971 > print(c(gcv1,gcv2,gcv3)) [1] 0.0343016 1.0672903 3.3885449 > print(c(aic1,aic2,aic3)) [1] -2306.88183 19.73069 699.79434 > print(c(bic1,bic2,bic3)) [1] -801.5293 435.1371 880.9798All model selection procedures choose the SS ANOVA model ().