The dataset, downloaded from the website
http://www.psych.mcgill.ca/faculty/ramsay/
fda.html,
includes mean monthly temperatures at 35 Canadian weather stations.
The left panel in Figure shows the plots of mean
temperature from 4 selected weather stations.
![]() |
Ramsay and Silverman (1997) and Ramsay and Li (1998) used this dataset to illustrate the usefulness of Functional Data Analysis (FDA). Among others, they addressed the following two questions: how to measure the departure from a sinusoidal model and how to register the curves for further analyses. In the following, we approach these questions with SNM models.
We assume that mean temperatures over month at all weather
stations share a common shape function. Variation between
stations are modeled by vertical shift, vertical scale and
horizontal shift parameters. That is, we assume that
> canada.fit <- snm(temp~b1+exp(b2)*f(month-alogit(b3)), func=f(u)~list(~sin(2*pi*u)+cos(2*pi*u)-1,lspline(u,type=''sine0'')), fixed=list(b1~1), random=list(b1+b2+b3~1), cor=corAR1(), groups=~station, data=canadaTemp.dat, control=list(rkpk.control=list(limnla=c(-4,0)),prec.out=0.005)) > summary(canada.fit) ... AIC BIC logLik 1529.772 1607.5 -745.6426 Random effects: Formula: list(b1 ~ 1, b2 ~ 1, b3 ~ 1) Level: station Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr b1 5.94427125 b1 b2 b2 0.30031948 -0.745 b3 0.06308751 -0.013 -0.462 Residual 1.48044625 Correlation Structure: AR(1) Formula: ~1 | station Parameter estimate(s): Phi 0.7258356 Fixed effects: list(b1 ~ 1) Value Std.Error DF t-value p-value b1 1.690498 0.6188728 385 2.731576 0.0066 GCV estimate(s) of smoothing parameter(s): 0.0001000000 Equivalent Degrees of Freedom (DF): 10.24364 Converged after 5 iterationsThe resulting fits are plotted in the left panel of Figure
![]() |
By removing the horizontal shift in each curve, we can easily align all curves. Therefore our method can be used for curve registration.
grid <- NULL for (i in 1:35) { grid <- c(grid,seq(0,1,len=40)+ alogit(canada.fit$coef$random$station[70+i])+.5) } p.canada.2 <- predict(canada.fit, newdata=data.frame(month=grid, station=rep(1:35,rep(40,35))))The right panel in Figure
Below is new
Ramsay and Silverman (1997) also fitted various functional linear models (FLM) to this data set. The functional data can appear in the FLM as (i) the dependent variable, (ii) the independent variable, or (iii) both. We now show that those FLM are special cases of general smoothing spline models. Therefore they can be fitted by the ssr function.
To investigate geographical differences in the pattern of the
annual tempreture function, Ramsay and Silverman (1997)
divided Canada into four meteorological zones: Atlantic, Pacific,
Continental and Arctic. Then they considered the following FLM
(Model (9.1) in Ramsay and Silverman (1997))
Now consider expected temperature as a function of both zone
and time
:
.
Suppose that we want to model zone effect using an
one-way ANOVA model and time effect using a periodic spline. Then
we have the following SS ANOVA decomposition
> temps <- matrix(scan(``monthtemp.dat'',0), 35, 12, byrow=T) > atlindex <- c(1,2,3,4,5,6,7,8,9,10,11,13,14,16) > pacindex <- c(25,26,27,28,29) > conindex <- c(12,15,17,18,19,20,21,22,23,24,30,31,35) > artindex <- c(32,33,34) > t <- seq(0.5, 11.5, 1)/12 > tgrid <- seq(0,1,len=50) > x <- apply(temps,1,sum) > n <- 35*12 > temp <- as.vector(t(temps[c(atlindex,pacindex,conindex,artindex),])) > grp <- as.factor(rep(1:4,12*c(14,5,13,3))) > tempdata <- data.frame(temp=temp,t=rep(t,35),grp=grp) > canada4 <- ssr(temp ~ grp, spar="m",data=tempdata, rk=list(periodic(t),rk.prod(periodic(t),shrink1(grp)))) > summary(canada4) Smoothing spline regression fit by GML method Call: ssr(formula = temp ~ grp, rk = list(periodic(t), rk.prod(periodic(t), shrink1(grp))), data = tempdata, spar = "m") Coefficients (d): (Intercept) grp2 grp3 grp4 4.787500 2.809167 -5.204167 -16.651389 GML estimate(s) of smoothing parameter(s) : 0.2622870 0.9971152 Equivalent Degrees of Freedom (DF): 23.74366 Estimate of sigma: 3.762337 Number of Observations: 420 > grid <- list(grp=as.factor(rep(1:4,rep(50,4))),t=rep(tgrid,4)) > p.canada4 <- predict(canada4,newdata=expand.grid(t=tgrid,grp=as.factor(1:4)), terms=c(1,1,1,1,0,1))
Figures and
display the estimated zone
effects and the fits for four regions. They are very similar to
Figures 9.1 and 9.2 in Ramsay and Silverman (1997). In addition
to the estimates, we can provide confidence intervals.
![]() |
To investigate the possible replationship between total annual
precipitation and the temperature function, Ramsay and Silverman (1997)
considered the following FLM (Model (10.3))
Without loss of the generality, suppose that the time interval
has been transformed into . That it,
. It is reasonable
to assume that
is a smooth periodic function. Specifically,
we model
using cubic periodic spline space
.
Define linear functionals
.
are bounded when
which we assume to be true.
Then model (
) is a partial spline model.
Denote
where
.
Let
be the reproducing kernel (RK) matrix for
. From the
definition in Section
, the
th elements of the matrix
> prec <- matrix(scan(``monthprec.dat'',0), 35, 12, byrow=T) > y <- log(apply(prec,1,sum)) > R <- temps %*% periodic(t) %*% t(temps) > canada5 <- ssr(y ~ x, rk=R) > summary(canada5) Smoothing spline regression fit by GML method Call: ssr(formula = y ~ x, rk = R, spar = "m") Coefficients (d): (Intercept) x 6.433069274 0.006122233 GML estimate(s) of smoothing parameter(s) : 0.006772994 Equivalent Degrees of Freedom (DF): 4.062615 Estimate of sigma: 0.3588121 Number of Observations: 35
The estimated weight function is represented by
Bayesian confidence intervals are not available for non-evaluational functionals. Therefore we use the following bootstrap procedure to compute confidence intervals. We used simple percentile intervals. Other approaches may be used (Wang and Wahba, 1995).
> S <- periodic(tgrid,t)%*%t(temps) > f <- canada5$coef$d[2] + S%*%canada5$coef$c > nboot <- 999 > fb <- NULL > for (i in 1:nboot) { yb <- canada5$fit+sample(canada5$resi,35,replace=T) bfit <- ssr(yb ~ x, rk=R) fb <- cbind(fb,bfit$coef$d[2] + S%*%bfit$coef$c) } > lb <- apply(fb,1,quantile,prob=.025) > ub <- apply(fb,1,quantile,prob=.975)
We used the default GCV method to select the smoothing parameter
in the above computation. We repeated the computation with
the GML method to select the smoothing parameter.
Figure displays the estimated regression weight
functions. It is interesting to note that the estimates with
GCV and GML choices of the smoothing parameter are similar to
the upper left plot and the lower right plot in Figure 10.3 of
Ramsay and Silverman (1997). As indicated in Figure 10.6
of Ramsay and Silverman (1997) that it is difficult to
get a precise estimate of the smoothing parameter due to the
small sample size. Interestingly that the GCV method picks
an estimate of the smoothing parameter that is close to the
lower bound and the GML method picks an estimate that is close
to the upper bound. Wide confidence intervals are the consequence
of the small sample size (
).
![]() |
To investigate the dependence of the complete log precipitation
profile on the complete tempreture profile,
Ramsay and Silverman (1997) considered the following FLM
The expected annual precipitation profile depends on two non-parametric
functions: and
.
Again, we assume that
, and
and
are
smooth periodic functions. Specifically, we assume that
,
and
. Equivalently,
we use the following SS ANOVA decomposition for
and
:
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
(92) |
Again, we discretize the annual precipitation profile by 12 months.
Model () becomes
Let
,
,
,
,
,
,
and
.
Then model () can be written in a matrix form
We have two parametric terms, and
, and four non-parametric terms,
,
,
and
, in model (
). To fit this
model, we need to find RK matrices for four
non-parametric terms. It is not difficult to see that the RK
matrices for
,
and
are
,
,
,
where
is the RK of
,
,
represents a
matrix of
evaluated at all combinations of
, and
is a
matrix defined in (
).
To compute the RK matrix for
,
we define the linear functional
and assume that it is bounded. Since the RK
of a tensor product space is the product of the two RK's, the RK of
is
.
Then elements of the repreducing kernel matrix for
is
> prec <- matrix(scan(``monthprec.dat'',0), 35, 12, byrow=T) > y <- as.vector(t(log(prec))) > xx <- rep(x,rep(12,35)) > R1 <- kronecker(matrix(1,35,35),periodic(t)) > R2 <- kronecker(R,matrix(1,12,12)) > R3 <- kronecker(x%*%t(x),periodic(t)) > R4 <- rk.prod(R1,R2) > canada6 <- ssr(y ~ xx, rk=list(R1,R2,R3,R4)) > summary(canada6) Smoothing spline regression fit by GCV method Call: ssr(formula = y ~ xx, rk = list(R1, R2, R3, R4)) Coefficients (d): (Intercept) xx 4.745601623 0.003035296 GCV estimate(s) of smoothing parameter(s) : 1.727184e+03 3.630885e-02 1.416582e+07 4.597862e-04 Equivalent Degrees of Freedom (DF): 60.52553 Estimate of sigma: 0.3549322 Number of Observations: 420
We now show how to compute estimated functions evaluated at grid points.
We stacked all observations as
. Denote the corresponding months
for
as
.
Denote
as the total number of observations.
Then the estimated functions are represented by
> S1 <- kronecker(t(rep(1,35)),periodic(tgrid,t)) > S2 <- kronecker(S,t(rep(1,12))) > S3 <- kronecker(t(x),periodic(tgrid,t)) > S4 <- NULL > for (i in 1:50) {for (j in 1:50) S4 <- rbind(S4,S1[i,]*S2[j,])} > alpha <- canada6$coef$d[1] + n*canada6$lambda[1]*S1%*%canada6$coef$c > beta0 <- canada6$coef$d[2] > beta1 <- n*canada6$lambda[2]*S2%*%canada6$coef$c > beta2 <- n*canada6$lambda[3]*S3%*%canada6$coef$c > beta12 <- n*canada6$lambda[4]*S4%*%canada6$coef$c > beta <- beta0 + rep(beta1,rep(50,50)) + rep(beta2,50) + beta12 > nboot <- 99 > alphab <- betab0 <- betab1 <- betab2 <- betab12 <- betab <- NULL > for (i in 1:nboot) { yb <- canada6$fit+rnorm(n,sd=canada6$sig) bfit <- ssr(yb ~ xx, rk=list(R1,R2,R3,R4)) alphab <- cbind(alphab,bfit$coef$d[1] + n*bfit$lambda[1]*S1%*%bfit$coef$c) bb0 <- bfit$coef$d[2] bb1 <- n*bfit$lambda[2]*S2%*%bfit$coef$c bb2 <- n*bfit$lambda[3]*S3%*%bfit$coef$c bb12 <- n*bfit$lambda[4]*S4%*%bfit$coef$c betab0 <- c(betab0,bb0) betab1 <- cbind(betab1,bb1) betab2 <- cbind(betab2,bb2) betab12 <- cbind(betab12,bb12) betab <- cbind(betab, bb0 + rep(bb1,rep(50,50)) + rep(bb2,50) + bb12) } > lba <- apply(alphab,1,quantile,prob=.025) > uba <- apply(alphab,1,quantile,prob=.975) > lbb1 <- apply(betab1,1,quantile,prob=.025) > ubb1 <- apply(betab1,1,quantile,prob=.975) > lbb2 <- apply(betab2,1,quantile,prob=.025) > ubb2 <- apply(betab2,1,quantile,prob=.975) > lbb12 <- apply(betab12,1,quantile,prob=.025) > ubb12 <- apply(betab12,1,quantile,prob=.975) > lbb <- apply(betab,1,quantile,prob=.025) > ubb <- apply(betab,1,quantile,prob=.975)
Figure displays the estimates of
and
. The scale, as well as the GCV estimates of the
smoothing parameters (
), indicate that the
interaction
is not small. Figures
and
display splices of the estimated interaction
function
and their 95% bootstrap confidence intervals
with one variable fixed approximately at the middle points of 12
months. Figures
and
display splices
of the estimated function
and their 95% bootstrap
confidence intervals with one variable fixed approximately at the
middle points of 12 months. Figure
displays
estimates of
,
and
functions.
We also fit the data using the GML choice of the smoothing parameter.
> canada6.1 <- ssr(y ~ xx, rk=list(R1,R2,R3,R4),spar="m") > summary(canada6.1) Smoothing spline regression fit by GML method Call: ssr(formula = y ~ xx, rk = list(R1, R2, R3, R4), spar = "m") Coefficients (d): (Intercept) xx 4.7327087 0.0030902 GML estimate(s) of smoothing parameter(s) : 7.195469e-01 4.852421e-02 3.633548e+04 3.613019e+00 Equivalent Degrees of Freedom (DF): 27.24519 Estimate of sigma: 0.3733665 Number of Observations: 420
The GML estimates of smoothing parameters indicates that both
the interaction and the main effect
are small. Figure
displays the estimates of
and
. Figure
displays
estimates of
,
and
functions.
![]() |