The dataset, downloaded from the website
http://www.psych.mcgill.ca/faculty/ramsay/
fda.html,
includes mean monthly temperatures at 35 Canadian weather stations.
The left panel in Figure shows the plots of mean
temperature from 4 selected weather stations.
Ramsay and Silverman (1997) and Ramsay and Li (1998) used this dataset to illustrate the usefulness of Functional Data Analysis (FDA). Among others, they addressed the following two questions: how to measure the departure from a sinusoidal model and how to register the curves for further analyses. In the following, we approach these questions with SNM models.
We assume that mean temperatures over month at all weather
stations share a common shape function. Variation between
stations are modeled by vertical shift, vertical scale and
horizontal shift parameters. That is, we assume that
> canada.fit <- snm(temp~b1+exp(b2)*f(month-alogit(b3)), func=f(u)~list(~sin(2*pi*u)+cos(2*pi*u)-1,lspline(u,type=''sine0'')), fixed=list(b1~1), random=list(b1+b2+b3~1), cor=corAR1(), groups=~station, data=canadaTemp.dat, control=list(rkpk.control=list(limnla=c(-4,0)),prec.out=0.005)) > summary(canada.fit) ... AIC BIC logLik 1529.772 1607.5 -745.6426 Random effects: Formula: list(b1 ~ 1, b2 ~ 1, b3 ~ 1) Level: station Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr b1 5.94427125 b1 b2 b2 0.30031948 -0.745 b3 0.06308751 -0.013 -0.462 Residual 1.48044625 Correlation Structure: AR(1) Formula: ~1 | station Parameter estimate(s): Phi 0.7258356 Fixed effects: list(b1 ~ 1) Value Std.Error DF t-value p-value b1 1.690498 0.6188728 385 2.731576 0.0066 GCV estimate(s) of smoothing parameter(s): 0.0001000000 Equivalent Degrees of Freedom (DF): 10.24364 Converged after 5 iterationsThe resulting fits are plotted in the left panel of Figure for the four selected stations. Fits for other stations are similar. The estimated correlation coefficient is with an approximate confidence interval . Thus there is evidence that serial correlation does exist in the data. Figure shows the overall fit of the common curve and its projections on subspaces (sinusoidal part) and (remaining part) together with their Bayesian confidence intervals. The small departure from sinusoidal form is statistically significant, especially in Spring and Fall. Our conclusion here is comparable to that of Ramsay and Silverman (1997), but their approach is on individual stations and does not provide a formal test.
By removing the horizontal shift in each curve, we can easily align all curves. Therefore our method can be used for curve registration.
grid <- NULL for (i in 1:35) { grid <- c(grid,seq(0,1,len=40)+ alogit(canada.fit$coef$random$station[70+i])+.5) } p.canada.2 <- predict(canada.fit, newdata=data.frame(month=grid, station=rep(1:35,rep(40,35))))The right panel in Figure displays the fitted curves for all 35 stations after aligning them together by removing shifts.
Below is new
Ramsay and Silverman (1997) also fitted various functional linear models (FLM) to this data set. The functional data can appear in the FLM as (i) the dependent variable, (ii) the independent variable, or (iii) both. We now show that those FLM are special cases of general smoothing spline models. Therefore they can be fitted by the ssr function.
To investigate geographical differences in the pattern of the
annual tempreture function, Ramsay and Silverman (1997)
divided Canada into four meteorological zones: Atlantic, Pacific,
Continental and Arctic. Then they considered the following FLM
(Model (9.1) in Ramsay and Silverman (1997))
Now consider expected temperature as a function of both zone
and time :
.
Suppose that we want to model zone effect using an
one-way ANOVA model and time effect using a periodic spline. Then
we have the following SS ANOVA decomposition
> temps <- matrix(scan(``monthtemp.dat'',0), 35, 12, byrow=T) > atlindex <- c(1,2,3,4,5,6,7,8,9,10,11,13,14,16) > pacindex <- c(25,26,27,28,29) > conindex <- c(12,15,17,18,19,20,21,22,23,24,30,31,35) > artindex <- c(32,33,34) > t <- seq(0.5, 11.5, 1)/12 > tgrid <- seq(0,1,len=50) > x <- apply(temps,1,sum) > n <- 35*12 > temp <- as.vector(t(temps[c(atlindex,pacindex,conindex,artindex),])) > grp <- as.factor(rep(1:4,12*c(14,5,13,3))) > tempdata <- data.frame(temp=temp,t=rep(t,35),grp=grp) > canada4 <- ssr(temp ~ grp, spar="m",data=tempdata, rk=list(periodic(t),rk.prod(periodic(t),shrink1(grp)))) > summary(canada4) Smoothing spline regression fit by GML method Call: ssr(formula = temp ~ grp, rk = list(periodic(t), rk.prod(periodic(t), shrink1(grp))), data = tempdata, spar = "m") Coefficients (d): (Intercept) grp2 grp3 grp4 4.787500 2.809167 -5.204167 -16.651389 GML estimate(s) of smoothing parameter(s) : 0.2622870 0.9971152 Equivalent Degrees of Freedom (DF): 23.74366 Estimate of sigma: 3.762337 Number of Observations: 420 > grid <- list(grp=as.factor(rep(1:4,rep(50,4))),t=rep(tgrid,4)) > p.canada4 <- predict(canada4,newdata=expand.grid(t=tgrid,grp=as.factor(1:4)), terms=c(1,1,1,1,0,1))
Figures and display the estimated zone effects and the fits for four regions. They are very similar to Figures 9.1 and 9.2 in Ramsay and Silverman (1997). In addition to the estimates, we can provide confidence intervals.
To investigate the possible replationship between total annual
precipitation and the temperature function, Ramsay and Silverman (1997)
considered the following FLM (Model (10.3))
Without loss of the generality, suppose that the time interval has been transformed into . That it, . It is reasonable to assume that is a smooth periodic function. Specifically, we model using cubic periodic spline space . Define linear functionals . are bounded when which we assume to be true. Then model () is a partial spline model.
Denote
where
.
Let be the reproducing kernel (RK) matrix for . From the
definition in Section , the th elements of the matrix
> prec <- matrix(scan(``monthprec.dat'',0), 35, 12, byrow=T) > y <- log(apply(prec,1,sum)) > R <- temps %*% periodic(t) %*% t(temps) > canada5 <- ssr(y ~ x, rk=R) > summary(canada5) Smoothing spline regression fit by GML method Call: ssr(formula = y ~ x, rk = R, spar = "m") Coefficients (d): (Intercept) x 6.433069274 0.006122233 GML estimate(s) of smoothing parameter(s) : 0.006772994 Equivalent Degrees of Freedom (DF): 4.062615 Estimate of sigma: 0.3588121 Number of Observations: 35
The estimated weight function is represented by
Bayesian confidence intervals are not available for non-evaluational functionals. Therefore we use the following bootstrap procedure to compute confidence intervals. We used simple percentile intervals. Other approaches may be used (Wang and Wahba, 1995).
> S <- periodic(tgrid,t)%*%t(temps) > f <- canada5$coef$d[2] + S%*%canada5$coef$c > nboot <- 999 > fb <- NULL > for (i in 1:nboot) { yb <- canada5$fit+sample(canada5$resi,35,replace=T) bfit <- ssr(yb ~ x, rk=R) fb <- cbind(fb,bfit$coef$d[2] + S%*%bfit$coef$c) } > lb <- apply(fb,1,quantile,prob=.025) > ub <- apply(fb,1,quantile,prob=.975)
We used the default GCV method to select the smoothing parameter in the above computation. We repeated the computation with the GML method to select the smoothing parameter. Figure displays the estimated regression weight functions. It is interesting to note that the estimates with GCV and GML choices of the smoothing parameter are similar to the upper left plot and the lower right plot in Figure 10.3 of Ramsay and Silverman (1997). As indicated in Figure 10.6 of Ramsay and Silverman (1997) that it is difficult to get a precise estimate of the smoothing parameter due to the small sample size. Interestingly that the GCV method picks an estimate of the smoothing parameter that is close to the lower bound and the GML method picks an estimate that is close to the upper bound. Wide confidence intervals are the consequence of the small sample size ().
To investigate the dependence of the complete log precipitation
profile on the complete tempreture profile,
Ramsay and Silverman (1997) considered the following FLM
The expected annual precipitation profile depends on two non-parametric
functions: and .
Again, we assume that , and and are
smooth periodic functions. Specifically, we assume that
,
and
. Equivalently,
we use the following SS ANOVA decomposition for and :
(92) |
Again, we discretize the annual precipitation profile by 12 months.
Model () becomes
Let
,
,
,
,
,
,
and
.
Then model () can be written in a matrix form
We have two parametric terms, and , and four non-parametric terms, , , and , in model (). To fit this model, we need to find RK matrices for four non-parametric terms. It is not difficult to see that the RK matrices for , and are , , , where is the RK of , , represents a matrix of evaluated at all combinations of , and is a matrix defined in (). To compute the RK matrix for , we define the linear functional and assume that it is bounded. Since the RK of a tensor product space is the product of the two RK's, the RK of is . Then elements of the repreducing kernel matrix for is
> prec <- matrix(scan(``monthprec.dat'',0), 35, 12, byrow=T) > y <- as.vector(t(log(prec))) > xx <- rep(x,rep(12,35)) > R1 <- kronecker(matrix(1,35,35),periodic(t)) > R2 <- kronecker(R,matrix(1,12,12)) > R3 <- kronecker(x%*%t(x),periodic(t)) > R4 <- rk.prod(R1,R2) > canada6 <- ssr(y ~ xx, rk=list(R1,R2,R3,R4)) > summary(canada6) Smoothing spline regression fit by GCV method Call: ssr(formula = y ~ xx, rk = list(R1, R2, R3, R4)) Coefficients (d): (Intercept) xx 4.745601623 0.003035296 GCV estimate(s) of smoothing parameter(s) : 1.727184e+03 3.630885e-02 1.416582e+07 4.597862e-04 Equivalent Degrees of Freedom (DF): 60.52553 Estimate of sigma: 0.3549322 Number of Observations: 420
We now show how to compute estimated functions evaluated at grid points. We stacked all observations as . Denote the corresponding months for as . Denote as the total number of observations. Then the estimated functions are represented by
> S1 <- kronecker(t(rep(1,35)),periodic(tgrid,t)) > S2 <- kronecker(S,t(rep(1,12))) > S3 <- kronecker(t(x),periodic(tgrid,t)) > S4 <- NULL > for (i in 1:50) {for (j in 1:50) S4 <- rbind(S4,S1[i,]*S2[j,])} > alpha <- canada6$coef$d[1] + n*canada6$lambda[1]*S1%*%canada6$coef$c > beta0 <- canada6$coef$d[2] > beta1 <- n*canada6$lambda[2]*S2%*%canada6$coef$c > beta2 <- n*canada6$lambda[3]*S3%*%canada6$coef$c > beta12 <- n*canada6$lambda[4]*S4%*%canada6$coef$c > beta <- beta0 + rep(beta1,rep(50,50)) + rep(beta2,50) + beta12 > nboot <- 99 > alphab <- betab0 <- betab1 <- betab2 <- betab12 <- betab <- NULL > for (i in 1:nboot) { yb <- canada6$fit+rnorm(n,sd=canada6$sig) bfit <- ssr(yb ~ xx, rk=list(R1,R2,R3,R4)) alphab <- cbind(alphab,bfit$coef$d[1] + n*bfit$lambda[1]*S1%*%bfit$coef$c) bb0 <- bfit$coef$d[2] bb1 <- n*bfit$lambda[2]*S2%*%bfit$coef$c bb2 <- n*bfit$lambda[3]*S3%*%bfit$coef$c bb12 <- n*bfit$lambda[4]*S4%*%bfit$coef$c betab0 <- c(betab0,bb0) betab1 <- cbind(betab1,bb1) betab2 <- cbind(betab2,bb2) betab12 <- cbind(betab12,bb12) betab <- cbind(betab, bb0 + rep(bb1,rep(50,50)) + rep(bb2,50) + bb12) } > lba <- apply(alphab,1,quantile,prob=.025) > uba <- apply(alphab,1,quantile,prob=.975) > lbb1 <- apply(betab1,1,quantile,prob=.025) > ubb1 <- apply(betab1,1,quantile,prob=.975) > lbb2 <- apply(betab2,1,quantile,prob=.025) > ubb2 <- apply(betab2,1,quantile,prob=.975) > lbb12 <- apply(betab12,1,quantile,prob=.025) > ubb12 <- apply(betab12,1,quantile,prob=.975) > lbb <- apply(betab,1,quantile,prob=.025) > ubb <- apply(betab,1,quantile,prob=.975)
Figure displays the estimates of and . The scale, as well as the GCV estimates of the smoothing parameters ( ), indicate that the interaction is not small. Figures and display splices of the estimated interaction function and their 95% bootstrap confidence intervals with one variable fixed approximately at the middle points of 12 months. Figures and display splices of the estimated function and their 95% bootstrap confidence intervals with one variable fixed approximately at the middle points of 12 months. Figure displays estimates of , and functions.
We also fit the data using the GML choice of the smoothing parameter.
> canada6.1 <- ssr(y ~ xx, rk=list(R1,R2,R3,R4),spar="m") > summary(canada6.1) Smoothing spline regression fit by GML method Call: ssr(formula = y ~ xx, rk = list(R1, R2, R3, R4), spar = "m") Coefficients (d): (Intercept) xx 4.7327087 0.0030902 GML estimate(s) of smoothing parameter(s) : 7.195469e-01 4.852421e-02 3.633548e+04 3.613019e+00 Equivalent Degrees of Freedom (DF): 27.24519 Estimate of sigma: 0.3733665 Number of Observations: 420
The GML estimates of smoothing parameters indicates that both the interaction and the main effect are small. Figure displays the estimates of and . Figure displays estimates of , and functions.