This data set, downloaded from http://www-personal.buseco.monash.edu.au/hyndman /TSDL/, contains monthly number of reported cases of chickenpox in New York City from 1931 to the first six months of 1972. It has been analyzed by several authors to investigate dynamics in an epidemic (Yorke and London, 1973; Schaffer and Kot, 1985). Figure shows time series plots of square root of monthly cases. We illustrate how to use the SS ANOVA and NNR models to investigate long term trend over years, seasonal trend and their interactions.
Denote as the square root of reported cases in month
of year . Both and are transformed into the
interval . We first consider an additive model
> data(chickenpox) > chickenpox$ct<- sqrt(chickenpox$count) > chickenpox$csmonth<- (chickenpox$month-0.5)/12 > chickenpox$csyear<- ident(chickenpox$year) > chickenpox.fit1 <- ssr(ct~csyear, rk=list(periodic(csmonth),cubic(csyear)), data=chickenpox) > chickenpox.fit1 ... GCV estimate(s) of smoothing parameter(s) : 0.189468039 0.001004518 Equivalent Degrees of Freedom (DF): 46.32076 Estimate of sigma: 3.932079 Number of Observations: 498 > chickenpox.fit2 <- update(chickenpox.fit1, rk=list(periodic(csmonth),cubic(csyear), rk.prod(periodic(csmonth),kron(csyear)), rk.prod(periodic(csmonth),cubic(csyear)))) > chickenpox.fit2 ... GCV estimate(s) of smoothing parameter(s) : 3.175017e-02 1.703363e-04 3.253160e-01 2.387995e-07 Equivalent Degrees of Freedom (DF): 278.5618 Estimate of sigma: 1.975239 Number of Observations: 498
The seasonal variation was mainly caused by two factors: (a) social
behavior of children who made close contacts when school was in session;
and (b) temperature and humidity which may affect the survival and
transmission of dispersal stages (Yorke and London, 1973; Schaffer and Kot, 1985). Thus
the seasonal variations were similar over the years. In the following
NNR model we assume that the seasonal variation has the same shape
after vertical shift and vertical scale transformations. Specifically we
assume that
> S3 <- periodic(chickenpox.data$month) > f3.tmp <- ssr(y~1,rk=S3,data=chickenpox.data,spar=''m'') > f3.ini <- as.vector(S3%*%f3.tmp$rkpk.obj$c) > chickenpox.fit3 <- nnr(y~f1(year)+exp(f2(year))*f3(month), func=list(f1(x)~list(~I(x-.5),cubic(x)), f2(x)~list(~I(x-.5)-1,cubic(x)), f3(x)~list(~sin(2*pi*x)+cos(2*pi*x)-1, lspline(x,type=''sine0''))), data=chickenpox.data,start=list(f1=mean(y),f2=0,f3=f3.ini), control=list(converg=''coef''),spar=''m'') > chickenpox.fit3 ... GML estimate(s) of smoothing parameter(s): 7.7261e-07 1.7787e-03 2.0090e-07 Equivalent Degrees of Freedom (DF): 27.85445 Residual standard error: 3.67597 Number of Observations: 498 Converged after 4 iterationsWe can compare three models using a model selection procedure such as GCV, AIC or BIC.
> n <- 498 > rss1 <- sum(chickenpox.fit1$resi**2) > rss2 <- sum(chickenpox.fit2$resi**2) > rss3 <- sum(chickenpox.fit3$resi**2) > gcv1 <- rss1/n/(1-chickenpox.fit1$df/n)**2 > gcv2 <- rss2/n/(1-chickenpox.fit2$df/n)**2 > gcv3 <- rss3/n/(1-chickenpox.fit3$df$f/n)**2 > aic1 <- n*log(rss1/n)+2*chickenpox.fit1$df > aic2 <- n*log(rss2/n)+2*chickenpox.fit2$df > aic3 <- n*log(rss3/n)+2*chickenpox.fit3$df$f > bic1 <- n*log(rss1/n)+log(n)*chickenpox.fit1$df > bic2 <- n*log(rss2/n)+log(n)*chickenpox.fit2$df > bic3 <- n*log(rss3/n)+log(n)*chickenpox.fit3$df$f > print(c(gcv1,gcv2,gcv3)) > [1] 17.04683 8.85435 14.31334 > print(c(aic1,aic2,aic3)) > [1] 1407.7146 826.9648 1323.6549 > print(c(bic1,bic2,bic3)) > [1] 1602.753 1999.877 1440.939Thus the GCV and AIC criteria select the SS ANOVA model, and the BIC criteria selects the more parsimonious NNR model. The estimates of and and their 95% confidence intervals are shown in Figure . We also superimposed yearly averages on the plot of and the logarithm of scaled ranges on the plot of . The scaled range of a specific year was calculated as the differences between the maximum and the minimum monthly number of cases divided by the range of the estimated seasonal trend . It is clear that captures the long term trend in the mean and captures the long term trend in the range of seasonal variation. From Figure , the SS ANOVA model () captures local trend, particularly biennial pattern from 1945 to 1955, better than the NNR model (). From the estimate of , we can see that yearly averages peaked in the 1930s and 1950s, and gradually decreased in the 1960s after the introduction of mass vaccination. The amplitude reflects the seasonal variation in transmission rate. From the estimate of in Figure , we can see that magnitude of the seasonal variation peaked in the 1950s and then declined in the 1960s, possibly as a result of changing public health conditions including mass vaccination. Figure shows the estimate of the seasonal trend and its projections onto the null space (the simple sinusoidal model) and the orthogonal complement of the null space . Since the projection onto the complement space is significantly different from zero, we conclude that a simple sinusoidal model does not provide an accurate approximation.