Inferences

Next: Partial Spline Models Up: General Smoothing Spline Regression Previous: The Smoothing Parameter

Inferences

Consider the following Bayesian model

$\begin{displaymath} y_i =L_iF+\epsilon_i, \ \ \ \ i=1, \cdots, n, \end{displaymath}$

with prior for F as

$\begin{displaymath} F(t) =\sum_{\nu=1}^M d_\nu\phi_\nu(t) + \tau^{1/2} X(t), \ \ \ \ t\in {\cal T}, \end{displaymath}$

where $\mbox{\boldmath$d$}=(d_1, \cdots, d_M)^T\sim N(0, a I)$ ,

and $\tau$ are positive constants, and

is a zero mean Gaussian stochastic process independent of $\mbox{\boldmath$d$}$ with covariance

. Wahba (1978) showed that $\lim_{a\rightarrow\infty}\mbox{E}(F(t)\vert\mbox{\boldmath$y$})=\hat{f}_{\lambda}(t)$ with $\lambda=\sigma^2/(n\tau)$ . Formulae for computing posterior means and variances were provided in Gu and Wahba (1993b). Posterior variances can be used to construct confidence intervals for $\hat{f}_{\lambda}(t)$ :

$\displaystyle \hat{f}_{\lambda}(t)\pm z_{1-\alpha/2} \sqrt{\mbox{Var}(\hat{f}_{\lambda}(t)\vert\mbox{\boldmath$y$})},$

(13)

where $z_{1-\alpha/2}$ is the $1-\alpha/2$ quantile of a standard normal distribution (Wahba, 1990). The intervals defined in (

) are referred to as the Bayesian confidence intervals (Wahba, 1983). These Bayesian confidence intervals are not point-wise confidence intervals. Rather, they provide across-the-function coverage (Nychka, 1988; Wang and Wahba, 1995).

Often one needs to test

$\begin{displaymath} H_0:~f\in{\cal H}_0~~~~\mbox{against}~~~~ H_1:~f\notin {\cal H}_0. \end{displaymath}$

This hypothesis is equivalent to

or $\lambda=\infty$ . Three tests were considered in Wahba (1990): locally most powerful (LMP), GCV and GML tests. Let

$\begin{displaymath}T = (Q_1 ~~ Q_2) \left( \begin{array}{c} R \\ 0 \end{array} \right) \end{displaymath}$

be the QR decomposition of

, and

be the eigenvalue decomposition of $Q_2^T \Sigma Q_2$ with eigenvalues $\lambda_{vn},~v=1,\cdots,n-M$ . Let $\mbox{\boldmath$z$}=(z_1,\cdots,z_{n-M})^T=U^TQ_2^T\mbox{\boldmath$y$}$ . Then the test statistics for LMP, GML and GCV tests are

$\begin{displaymath} t_{\mbox{\scriptsize {LMP approx}}} = \sum_{v=1}^{n-M} \lambda_{vn} z_v^2 / \sum_{v=1}^{n-M} z_v^2 , \end{displaymath}$

$\begin{displaymath} t_{\mbox{\scriptsize {GCV}}} = \frac{\sum_{v=1}^{n-M} (z_v^2... ...a} \lambda_{vn})^2)} \times \frac{1}{\sum_{v=1}^{n-M} z_v^2}, \end{displaymath}$

and

$\begin{displaymath} t_{\mbox{\scriptsize {GML}}} = \frac{\sum_{v=1}^{n-M} (z_v^2... ...bda_{vn})^{-1/(n-M)}} \times \frac{1}{\sum_{v=1}^{n-M} z_v^2}, \end{displaymath}$

where $\hat{\gamma}=1/n\hat{\lambda}$ . It can be shown that under the corresponding Bayesian model, the LMP test is the score test and the GML test is the likelihood ratio test. Furthermore, the GCV test is closely related to the F-test based on the extra sum of squares principle (Liu and Wang, 2004). Usually the p-values cannot be calculated analytically because the null distributions under

are unknown. Standard theory for likelihood ratio tests does not apply because the parameter is on the boundary under the null hypothesis. The non-standard asymptotic theory developed by Self and Liang (1987) does not apply either because of the lack of replicated observations. Monte Carlo method can be used to approximate p-values. However, they are usually computational intensive since the smoothing parameter needs to be estimated for each Monte Carlo sample. In the current version, through the utility function anova, Monte Carlo p-values are calculated with fixed smoothing parameters. The Monte Carlo sample size is specified by the option simu.size. tt anova also provides the approximate p-values of the GML tests based on a mixture of two $\chi^2$ distributions (Self and Liang, 1987) even though they tend to be conservative. Methods developed in Liu and Wang (2004) and Liu et al. (2004) will be implemented in the future.

An alternative approach to visually check above hypothesis is to plot the projection of $\hat{f}$ onto ${\cal H}_0$ together with its Bayesian confidence intervals. When ${\cal H}_0$ is true, most parts of the zero function should be inside these confidence intervals. See Section 7 for examples. Two utility functions, predict.ssr and plot.bCI, are available to compute posterior means, standard deviations and plot fits with Bayesian confidence intervals. See help files of predict.ssr and plot.bCI for more details.

Next: Partial Spline Models Up: General Smoothing Spline Regression Previous: The Smoothing Parameter

Yuedong Wang 2004-05-19