This data set, downloaded from
http://www-personal.buseco.monash.edu.au/hyndman
/TSDL/,
contains monthly number of reported cases of chickenpox in New
York City from 1931 to the first six months of 1972. It has been
analyzed by several authors to investigate dynamics in an epidemic
(Yorke and London, 1973; Schaffer and Kot, 1985). Figure
shows
time series plots of square root of monthly cases. We illustrate
how to use the SS ANOVA and NNR models to investigate long
term trend over years, seasonal trend and their interactions.
![]() |
Denote as the square root of reported cases in month
of year
. Both
and
are transformed into the
interval
. We first consider an additive model
> data(chickenpox) > chickenpox$ct<- sqrt(chickenpox$count) > chickenpox$csmonth<- (chickenpox$month-0.5)/12 > chickenpox$csyear<- ident(chickenpox$year) > chickenpox.fit1 <- ssr(ct~csyear, rk=list(periodic(csmonth),cubic(csyear)), data=chickenpox) > chickenpox.fit1 ... GCV estimate(s) of smoothing parameter(s) : 0.189468039 0.001004518 Equivalent Degrees of Freedom (DF): 46.32076 Estimate of sigma: 3.932079 Number of Observations: 498 > chickenpox.fit2 <- update(chickenpox.fit1, rk=list(periodic(csmonth),cubic(csyear), rk.prod(periodic(csmonth),kron(csyear)), rk.prod(periodic(csmonth),cubic(csyear)))) > chickenpox.fit2 ... GCV estimate(s) of smoothing parameter(s) : 3.175017e-02 1.703363e-04 3.253160e-01 2.387995e-07 Equivalent Degrees of Freedom (DF): 278.5618 Estimate of sigma: 1.975239 Number of Observations: 498
The seasonal variation was mainly caused by two factors: (a) social
behavior of children who made close contacts when school was in session;
and (b) temperature and humidity which may affect the survival and
transmission of dispersal stages (Yorke and London, 1973; Schaffer and Kot, 1985). Thus
the seasonal variations were similar over the years. In the following
NNR model we assume that the seasonal variation has the same shape
after vertical shift and vertical scale transformations. Specifically we
assume that
> S3 <- periodic(chickenpox.data$month) > f3.tmp <- ssr(y~1,rk=S3,data=chickenpox.data,spar=''m'') > f3.ini <- as.vector(S3%*%f3.tmp$rkpk.obj$c) > chickenpox.fit3 <- nnr(y~f1(year)+exp(f2(year))*f3(month), func=list(f1(x)~list(~I(x-.5),cubic(x)), f2(x)~list(~I(x-.5)-1,cubic(x)), f3(x)~list(~sin(2*pi*x)+cos(2*pi*x)-1, lspline(x,type=''sine0''))), data=chickenpox.data,start=list(f1=mean(y),f2=0,f3=f3.ini), control=list(converg=''coef''),spar=''m'') > chickenpox.fit3 ... GML estimate(s) of smoothing parameter(s): 7.7261e-07 1.7787e-03 2.0090e-07 Equivalent Degrees of Freedom (DF): 27.85445 Residual standard error: 3.67597 Number of Observations: 498 Converged after 4 iterationsWe can compare three models using a model selection procedure such as GCV, AIC or BIC.
> n <- 498 > rss1 <- sum(chickenpox.fit1$resi**2) > rss2 <- sum(chickenpox.fit2$resi**2) > rss3 <- sum(chickenpox.fit3$resi**2) > gcv1 <- rss1/n/(1-chickenpox.fit1$df/n)**2 > gcv2 <- rss2/n/(1-chickenpox.fit2$df/n)**2 > gcv3 <- rss3/n/(1-chickenpox.fit3$df$f/n)**2 > aic1 <- n*log(rss1/n)+2*chickenpox.fit1$df > aic2 <- n*log(rss2/n)+2*chickenpox.fit2$df > aic3 <- n*log(rss3/n)+2*chickenpox.fit3$df$f > bic1 <- n*log(rss1/n)+log(n)*chickenpox.fit1$df > bic2 <- n*log(rss2/n)+log(n)*chickenpox.fit2$df > bic3 <- n*log(rss3/n)+log(n)*chickenpox.fit3$df$f > print(c(gcv1,gcv2,gcv3)) > [1] 17.04683 8.85435 14.31334 > print(c(aic1,aic2,aic3)) > [1] 1407.7146 826.9648 1323.6549 > print(c(bic1,bic2,bic3)) > [1] 1602.753 1999.877 1440.939Thus the GCV and AIC criteria select the SS ANOVA model, and the BIC criteria selects the more parsimonious NNR model. The estimates of
![]() |
![]() |