Next: Examples Up: ASSIST: A Suite of Previous: The snm Function

Vector Spline Models

In many applications two or more dependent variables are observed at several values of independent variables such as at multiple time points. Often observations from different variables are contemporaneously correlated. Observations from the same variable may also be correlated. The statistical problems are (i) to estimate functions that model their dependences on the independent variables and (ii) to investigate relationships between these functions. Wang et al. (2000) proved that the joint estimates have smaller posterior variances than those of function-by-function estimates and are therefore more efficient. In this section we show how to use ssr and nnr to fit functions jointly.

Consider the following model

$\displaystyle y_{ji} = f_j(t_{ji}) + \epsilon_{ji},~~~~j=1,\cdots,J;~ i=1,\cdots,n_j,$

(48)

where the

th response of the

th variable $y_{ji}$ is generated as the

th function

evaluated at the design point $t_{ji}$ plus a random error $\epsilon_{ji}$ . We assume that

has the model space (

Denote $\mbox{\boldmath$t$}_j = (t_{j1},\cdots,t_{jn_j})^{T}$ , $\mbox{\boldmath$f$}_j = (f_j(t_{j1}),\cdots,f_j(t_{jn_j}))^{T}$ , $\mbox{\boldmath$y$}_j = (y_{j1},\cdots,y_{jn_j})^{T}$ , $\mbox{\boldmath$\epsilon$}_j = (\epsilon_{j1},\cdots,\epsilon_{jn_j})^{T}$ , $\mbox{\boldmath$f$}= (\mbox{\boldmath$f$}_1^{T},\cdots,\mbox{\boldmath$f$}_J^{T})^{T}$ , $\mbox{\boldmath$y$}= (\mbox{\boldmath$y$}_1^{T},\cdots,\mbox{\boldmath$y$}_J^{T})^{T}$ , and $\mbox{\boldmath$\epsilon$}= (\mbox{\boldmath$\epsilon$}_1^{T},\cdots,\mbox{\boldmath$\epsilon$}_J^{T})^{T}$ . Assume that $\mbox{\boldmath$\epsilon$}\sim \mbox{N}(0, \sigma^2 W^{-1})$ , where depends on some parameters $\mbox{\boldmath$\tau$}$ .

We estimate all functions 's jointly as the minimizer of the following penalized weighted least squares

$\displaystyle (\mbox{\boldmath$y$}-\mbox{\boldmath$f$})^T W (\mbox{\boldmath$y$... ...um_{j=1}^J \sum_{k=1}^{p_j} \theta_{jk}^{-1} \vert\vert P_{jk}f_j\vert\vert^2 ,$

(49)

where $P_{jk}$ is the orthogonal projection operator of

onto ${\cal H}_{jk}$ in ${\cal H}_j$ . We use the GML method to estimate the variance-covariance parameters $\mbox{\boldmath$\tau$}$ and the smoothing parameters $\theta_{jk}$ 's as the minimizers of (

We now show how to trick ssr to fit model (). For the simplicity of notations, we consider the case of . Situations with can be fitted similarly. Note that the domains of $t_{ji}$ may be different for different . For most applications they are the same, which is assumed in the remaining of this section. Denote the common domain as ${\cal T}$ . Rewrite as , which is now considered as a function of both and variables on the tensor product domain $\{ 1,2 \} \otimes {\cal T}$ . Then we can represent the original functions as

$\displaystyle f(j,t) = f_1(t) \times I_{[j=1]} + f_2(t) \times I_{[j=2]}.$

(50)

Let

$\displaystyle {\cal M}_j = {\cal H}_{j0} \oplus {\cal H}_{j1} \oplus \cdots \oplus {\cal H}_{jp_j}, ~~~~j=1,2$

(51)

be the model space for

, where ${\cal H}_{j0} = \mbox{span} \{ \phi_{j1}(t),\cdots,\phi_{jm_j} \}$ , and ${\cal H}_{jk}$ 's are RKHS's with RK $R_{jk}(s,t)$ for $k \ge 1$ . Then it is easy to check that

$\displaystyle f(j,t) \in {\cal H}_0 \oplus {\cal H}_1 \oplus \cdots \oplus {\cal H}_{p_1} \oplus {\cal H}_{p_1+1} \oplus \cdots {\cal H}_{p_1+p_2},$

(52)

where ${\cal H}_0=\mbox{span} \{ \phi_1(j,t),\cdots,\phi_{m_1}(j,t), \phi_{m_1+1}(j,t),\cdots,\phi_{m_1+m_2}(j,t) \}$ , $\phi_k(j,t)=\phi_{1k}(t) \times I_{[j=1]}$ for $1 \le k \le m_1$ , and $\phi_{k}(j,t)=\phi_{2k}(t) \times I_{[j=2]}$ for $m_1+1 \le k \le m_1+m_2$ . ${\cal H}_k$ are RKHS's with RKs $R_{k}((l,s),(j,t)) = R_{1k}(s,t)) \times I_{[l=1]} \times I_{[j=1]}$ for $1 \le k \le p_1$ and $R_{k}((l,s),(j,t)) = R_{2k}(s,t)) \times I_{[l=2]} \times I_{[j=2]}$ for $p_1+1 \le k \le p_1+p_2$ . The model space (

) is similar to that of a SS ANOVA model. Thus we can use the function ssr to fit functions

's jointly.

We may reparametrize as

$\displaystyle f(j,t)$	$\textstyle =$	$\displaystyle f_1(t) + (f_2(t)-f_1(t)) \times I_{[j=2]}$	(53)
	$\textstyle =$	$\displaystyle (f_1(t)+f_2(t))/2 + (f_1(t)-f_2(t)) \times (I_{[j=1]}-I_{[j=2]})/2.$	(54)

(

) and (

) are SS ANOVA decompositions of

with the set-to-zero and sum-to-zero side conditions respectively. This kind of ANOVA decomposition can be carried out for general

For (), let and . Let ${\cal M}_j$ in () be the model space of . Then

$\displaystyle f(j,t) \in {\cal H}_0 \oplus {\cal H}_1 \oplus \cdots \oplus {\cal H}_{p_1} \oplus {\cal H}_{p_1+1} \oplus \cdots {\cal H}_{p_1+p_2},$

(55)

where ${\cal H}_0=\mbox{span} \{ \phi_1(j,t),\cdots,\phi_{m_1}(j,t), \phi_{m_1+1}(j,t),\cdots,\phi_{m_1+m_2}(j,t) \}$ , $\phi_k(j,t)=\phi_{1k}(t)$ for $1 \le k \le m_1$ , and $\phi_k(j,t)=\phi_{2k}(t) \times I_{[j=2]}$ for $m_1+1 \le k \le m_1+m_2$ . ${\cal H}_k$ are RKHS's with RKs $R_{k}((l,s),(j,t)) = R_{1k}(s,t)$ for $1 \le k \le p_1$ and $R_k((l,s),(j,t)) = R_{2k}(s,t) \times I_{[l=2]} \times I_{[j=2]}$ for $p_1+1 \le k \le p_1+p_2$ .

For (), denote and . Let ${\cal M}_j$ in () be the model space of . Then

$\displaystyle f(j,t) \in {\cal H}_0 \oplus {\cal H}_1 \oplus \cdots \oplus {\cal H}_{p_1} \oplus {\cal H}_{p_1+1} \oplus \cdots {\cal H}_{p_1+p_2},$

(56)

where ${\cal H}_0=\mbox{span} \{ \phi_1(j,t),\cdots,\phi_{m_1}(j,t), \phi_{m_1+1}(j,t),\cdots,\phi_{m_1+m_2}(j,t) \}$ , $\phi_k(j,t)=\phi_{1k}(t)$ for $1 \le k \le m_1$ , and $\phi_k(j,t)=\phi_{2k}(t) \times (I_{[j=1]}-I_{[j=2]})$ for $m_1+1 \le k \le m_1+m_2$ . ${\cal H}_k$ are RKHS's with RKs $R_{k}((l,s),(j,t)) = R_{1k}(s,t)$ for $1 \le k \le p_1$ and $R_{k}((l,s),(j,t)) = R_{2k}(s,t) \times (I_{[l=1]}-I_{[l=2]}) \times (I_{[j=1]}-I_{[j=2]})$ for $p_1+1 \le k \le p_1+p_2$ .

Often we are interested in possible relationships, if any, between and . For example, one may want to check if they are equal or parallel. Let be a probability measure on ${\cal T}$ . Consider the following SS ANOVA decomposition

$\displaystyle f(j,t) = \mu + \alpha_j + g_1(t) + g_{12}(j,t) ,$

(57)

where

$\begin{eqnarray*} \mu &=& \frac{1}{2} \int_{{\cal T}} (f_1(t)+f_2(t)) dP(t) , \\... ... \\ g_{12}(j,t) &=& f_j(t) - \mu - \alpha_j - g_1(t),~~~~k=1,2. \end{eqnarray*}$

We have $\sum_{j=1}^2 \alpha_j = \int_0^1 g_1(t) dt = \sum_{j=1}^2 g_{12}(j,t) = \int_0^1 g_{12}(j,t) dt = 0$ . $\mu$ is the overall mean, $\alpha_j$ and

are the main effects and $g_{12}(j,t)$ is the interaction. (

), equivalent of (

), will make some hypotheses more transparent. It is easy to check that the following hypotheses are equivalent:

$\displaystyle \mbox{H}_0: ~~f_1(t)=f_2(t)$	$\textstyle \Longleftrightarrow$	$\displaystyle \mbox{H}_0: ~~\alpha_j+g_{12}(j,t) = 0 ,$
$\displaystyle \mbox{H}_0: ~~f_1(t)-f_2(t)=\mbox{constant}$	$\textstyle \Longleftrightarrow$	$\displaystyle \mbox{H}_0: ~~g_{12}(j,t) = 0 ,$
$\displaystyle \mbox{H}_0: ~~\int_0^1 f_1(t)dt=\int_0^1 f_2(t)dt$	$\textstyle \Longleftrightarrow$	$\displaystyle \mbox{H}_0: ~~ \alpha_j = 0 ,$
$\displaystyle \mbox{H}_0: ~~f_1(t)+f_2(t)=\mbox{constant}$	$\textstyle \Longleftrightarrow$	$\displaystyle \mbox{H}_0: ~~ g_1(t) = 0 .$

Furthermore, if $\alpha_j \ne 0$ and $g_1(t) \ne 0$ ,

$\displaystyle \mbox{H}_0: ~~af_1(t)+bf_2(t)=c,~~\vert a\vert+\vert b\vert>0$

$\textstyle \Longleftrightarrow$

$\displaystyle \mbox{H}_0: g_{12}(j,t) = \beta \alpha_j g_1(t) .$

Therefore the hypothesis that

and

are equal is equivalent to the

effect $\alpha_j+g_{12}(j,t) = 0$ . The hypothesis that

and

are parallel is equivalent to the hypothesis that the interaction $g_{12}=0$ . The hypothesis that the integral of

equals to that of

is equivalent to the hypothesis that the main effect $\alpha_j=0$ . The hypothesis that the summation of

and

is a constant is equivalent to the hypothesis that the main effect

. The hypothesis that there exists a linear relationship between the functions

and

is equivalent to the hypothesis that the interaction is multiplicative. Thus, for these simple hypotheses we can fit the SS ANOVA model (

) and perform tests on the corresponding components.

For illustration, we generate a data set from the following model

$\begin{eqnarray*} y_{1i} &=& \sin (2\pi i / 100) + \epsilon_{1i}, \\ y_{2i} &=&... ... / 100) + 2 \times (i/100) + \epsilon_{2i}, ~~~~i=1,\cdots,100 , \end{eqnarray*}$

where random errors $(\epsilon_{1i},\epsilon_{2i})$ 's follow bivariate normal with $\mbox{Var}(\epsilon_{1i})=.25$ , $\mbox{Var}(\epsilon_{2i})=1$ and $\mbox{Cor}(\epsilon_{1i},\epsilon_{2i})=.5$ . The variance-covariance structure can be specified with the combination of the weights and correlation options. We will fit with an arbitrary pairwise variance-covariance structure.

Suppose we want to use cubic splines to model and in (), and in (), and and in (). Then , , $\phi_{11}(t)=\phi_{21}(t)=1$ , $\phi_{12}(t)=\phi_{22}(t)=t-.5$ , and $R_{11}$ and $R_{21}$ are the RK of a cubic spline. In the following we first fit and using marginal data as bisp.fit1 and bisp.fit2 respectively. Then we fit jointly using the formulation () as bisp.fit3 the formulation () as bisp.fit4, and the formulation (), or equivalently the formulation (), as bisp.fit5.

> options(contrasts=rep("contr.treatment", 2))	
> n <- 100
> s1 <- .5
> s2 <- 1
> r <- .5
> A <- diag(c(s1,s2))%*%matrix(c(sqrt(1-r**2),0,r,1),2,2)
> e <- NULL
> for (i in 1:n) e <- c(e,A%*%rnorm(2))
> t <- 1:n/n
> y1 <- sin(2*pi*t) + e[seq(1,2*n,by=2)]
> y2 <- sin(2*pi*t) + 2*t + e[seq(2,2*n,by=2)]
> bisp.dat <- data.frame(y=c(y1,y2),t=rep(t,2),id=rep(c(0,1),rep(n,2)),
                         pair=rep(1:n,2))

# fit separately
> bisp.fit1 <- ssr(y~I(t-.5),rk=cubic(t),spar="m",
                   data=bisp.dat[bisp.dat$id==0,])
> p.bisp.fit1 <- predict(bisp.fit1)
> bisp.fit2 <- ssr(y~I(t-.5),rk=cubic(t),spar="m",
                   data=bisp.dat[bisp.dat$id==1,])
> p.bisp.fit2 <- predict(bisp.fit2)
	
# fit jointly	
> bisp.fit3 <- ssr(y~id*I(t-.5), rk=list(rk.prod(cubic(t),kron(id==0)),
                  rk.prod(cubic(t),kron(id==1))), spar="m",
                  weights=varIdent(form=~1|id),
                  cor=corSymm(form=~1|pair), data=bisp.dat)
> bisp.fit3
...
GML estimate(s) of smoothing parameter(s) : 0.2793703 0.3788567 
Equivalent Degrees of Freedom (DF):  11.84544 
Estimate of sigma:  0.4616973 
Correlation structure of class corSymm representing
 Correlation: 
      1 
2 0.523
Variance function structure of class varIdent representing
 0        1 
 1 2.270982

> p.bisp.fit3.1 <- predict(bisp.fit3,newdata=bisp.dat[bisp.dat$id==0,],
                           terms=c(1,0,1,0,1,0))
> p.bisp.fit3.2 <- predict(bisp.fit3,newdata=bisp.dat[bisp.dat$id==1,],
                           terms=c(1,1,1,1,0,1))
> p.bisp.fit3.1$pstd <- p.bisp.fit3.1$pstd*sqrt((2*n-bisp.fit3$df)
                        /(2*n-bisp.fit3$df-1))
> p.bisp.fit3.2$pstd <- p.bisp.fit3.2$pstd*sqrt((2*n-bisp.fit3$df)
                        /(2*n-bisp.fit3$df-1))

> bisp.fit4 <- ssr(y~id*I(t-.5), rk=list(cubic(t),
                   rk.prod(cubic(t),kron(id==1))), 
                   spar="m", weights=varIdent(form=~1|id),
                   cor=corSymm(form=~1|pair), data=bisp.dat)
> p.bisp.fit4.1 <- predict(bisp.fit4,newdata=bisp.dat[bisp.dat$id==0,],
                           terms=c(1,0,1,0,1,0))
> p.bisp.fit4.2 <- predict(bisp.fit4,newdata=bisp.dat[bisp.dat$id==1,],
                           terms=c(1,1,1,1,1,1))
> p.bisp.fit4.3 <- predict(bisp.fit4,newdata=bisp.dat[bisp.dat$id==1,],
                           terms=c(0,1,0,1,0,1))
> p.bisp.fit4.4 <- predict(bisp.fit4,newdata=bisp.dat[bisp.dat$id==1,],
                           terms=c(0,0,0,1,0,1))
> p.bisp.fit4.1$pstd <- p.bisp.fit4.1$pstd*sqrt((2*n-bisp.fit4$df)
                           /(2*n-bisp.fit4$df-1))
> p.bisp.fit4.2$pstd <- p.bisp.fit4.2$pstd*sqrt((2*n-bisp.fit4$df)
                           /(2*n-bisp.fit4$df-1))
> p.bisp.fit4.3$pstd <- p.bisp.fit4.3$pstd*sqrt((2*n-bisp.fit4$df)
                           /(2*n-bisp.fit4$df-1))
> p.bisp.fit4.4$pstd <- p.bisp.fit4.4$pstd*sqrt((2*n-bisp.fit4$df)
                           /(2*n-bisp.fit4$df-1))	
> bisp.fit5 <- ssr(y~id*I(t-.5), 
                rk=list(cubic(t),rk.prod(cubic(t),kron((id==0)-(id==1)))), 
                spar="m", weights=varIdent(form=~1|id),
                cor=corSymm(form=~1|pair), data=bisp.dat)
> p.bisp.fit5.1 <- predict(bisp.fit5,newdata=bisp.dat[bisp.dat$id==0,],
                   terms=c(1,0,1,0,1,1))
> p.bisp.fit5.2 <- predict(bisp.fit5,newdata=bisp.dat[bisp.dat$id==1,],
                   terms=c(1,1,1,1,1,1))
> p.bisp.fit5.3 <- predict(bisp.fit5,newdata=bisp.dat[bisp.dat$id==1,],
                   terms=c(0,1,0,1,0,1))
> p.bisp.fit5.4 <- predict(bisp.fit5,newdata=bisp.dat[bisp.dat$id==1,],
                   terms=c(0,0,0,1,0,1))
> p.bisp.fit5.1$pstd <- p.bisp.fit5.1$pstd*sqrt((2*n-bisp.fit5$df)
                   /(2*n-bisp.fit5$df-1))
> p.bisp.fit5.2$pstd <- p.bisp.fit5.2$pstd*sqrt((2*n-bisp.fit5$df)
                   /(2*n-bisp.fit5$df-1))
> p.bisp.fit5.3$pstd <- p.bisp.fit5.3$pstd*sqrt((2*n-bisp.fit5$df)
                   /(2*n-bisp.fit5$df-1))
> p.bisp.fit5.4$pstd <- p.bisp.fit5.4$pstd*sqrt((2*n-bisp.fit5$df)
                   /(2*n-bisp.fit5$df-1))

For each fit, we calculate the estimated functions and posterior variances. For fits bisp.fit4 and bisp.fit5, we also calculate the estimates and posterior variances of and $g_{12}$ in (). They are used to check if and if they are parallel respectively. Note that we inflate the posterior variances by one more degree of freedom used for estimating the correlation parameter. These estimates are shown in Figures , and . Even though not obvious in Figure , the Bayesian confidence intervals of the joint estimates are uniformly narrower than those of the function-by-function estimates. Obviously from Figures and that these two functions are not equal, nor parallel.

**Figure:** Upper left: estimate of from `bisp.fit1`. Upper right: estimate of from `bisp.fit2`. Lower left: estimate of from `bisp.fit3`. Lower right: estimate of from `bisp.fit3`. Dashed lines are the true function. Solid lines are the estimates. Dotted lines are 95% Bayesian confidence intervals.
$\begin{figure}\centerline{\psfig{figure=bisp1.ps,height=2in,width=6.5in}}\end{figure}$

**Figure:** Upper left: estimate of from `bisp.fit4`. Upper right: estimate of from `bisp.fit4`. Lower left: estimate of from `bisp.fit4`. Lower right: estimate of $g_{12}$ from `bisp.fit4`. Dashed lines are the true function. Solid lines are the estimates. Dotted lines are 95% Bayesian confidence intervals.
$\begin{figure}\centerline{\psfig{figure=bisp2.ps,height=2in,width=6.5in}}\end{figure}$

**Figure:** Upper left: estimate of from `bisp.fit5`. Upper right: estimate of from `bisp.fit5`. Lower left: estimate of from `bisp.fit5`. Lower right: estimate of $g_{12}$ from `bisp.fit5`. Dashed lines are the true function. Solid lines are the estimates. Dotted lines are 95% Bayesian confidence intervals.
$\begin{figure}\centerline{\psfig{figure=bisp3.ps,height=2in,width=6.5in}}\end{figure}$

Next: Examples Up: ASSIST: A Suite of Previous: The snm Function

Yuedong Wang 2004-05-19