Today we will be finally creating our long awaited GDP forecast. In order to create this forecast we have to combine both the forecast from our deterministic trend model and the forecast from our de-trended GDP model.

Our model for the trend is:

trendyx= 892.656210 + -30.365580*x + 0.335586*x2

where x2=x^2

and the vector length we will make out to the 278th observation:

> x=c(1:278)

and our model for the cyclical de-trended series is from an AR(10) process:

GDP.fit<-arima(dt,order=c(10,0,0),include.mean=FALSE)

So lets say we want to predict GDP 21 periods into the future. Type in the following for the cyclical forecast:

> GDP.pred<-predict(GDP.fit,n.ahead=21)

Now when we produce our forecast we can't just add trendyx + GDP.pred$pred because the vector lengths won't match. To see this use the length() function:

> length(trendyx)

[1] 278

> length(GDP.pred$pred)

[1] 21

In order to fix this problem we are going to remove the first 258 observations from trendyx so that we only have 21 left:

> true.trend<-trendyx[-c(1:257)]

> length(true.trend)

[1] 21

Now we can plot away without any technical difficulties:

> plot(GDP,type="l",xlim=c(40,75),ylim=c(5000,18500),main="GDP Predictions")

> lines(GDP.pred$pred+true.trend,col="blue")

> lines((GDP.pred$pred+true.trend)-2*GDP.pred$se,col="red")

> lines((GDP.pred$pred+true.trend)+2*GDP.pred$se,col="red")

This code results in the following plot:

The blue line represents our point forecast and the red lines represent our 95% Confidence Level interval forecast. I feel like the plot could be significantly cooler and therefore at its current appearance receives a 2 out of 10 for style. It's bland, the x-axis doesn't have dates and there's not even any background color. If this plot had a name it would be doodoo. A war must be fought against the army of lame plots. Epic battles will proceed. Plots will be lost. Only one victor will stand.

Keep Dancin',

Steven J.

## Sunday, September 4, 2011

## Friday, September 2, 2011

### Assessing the Forecasting Ability of Our Model

Today we wish to see how our model would have faired forecasting the past 20 values of GDP. Why? Well ask yourself this: How can you know where your going, if you don't know where you've been? Once you understand please proceed on with the following post.

First recall the trend portion that we have already accounted for:

> t=(1:258)

> t2=t^2

> trendy= 892.656210 + -30.365580*t + 0.335586*t2

And that the de-trended series is just that- the series minus the trend.

dt=GDP-trendy

As the following example will demonstrate- If we decide to assess the model with a forecast of the de-trended series alone we may come across some discouraging results:

> test.data<-dt[-c(239:258)]

> true.data<-dt[-c(1:238)]

> forecast.data<-predict(arima(test.data,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred

Now we want to plot the forecast data vs. the actual values of the forecasted de-trended series to get a sense of whether this is accurate or not.

> plot(true.data,forecast.data)

> plot(true.data,forecast.data,main="True Data vs. Forecast data")

Clearly it appears as though there is little to no accuracy with the the forecast of our de-trended model alone. In fact a linear regression of the forecast data on the true data makes this perfectly clear.

> reg.model<-lm(true.data~forecast.data)

> summary(reg.model)

Call:

lm(formula = true.data ~ forecast.data)

Residuals:

Min 1Q Median 3Q Max

-684.0 -449.0 -220.8 549.4 716.8

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -2244.344 2058.828 -1.090 0.290

forecast.data 2.955 2.568 1.151 0.265

Residual standard error: 540.6 on 18 degrees of freedom

Multiple R-squared: 0.06851, Adjusted R-squared: 0.01676

F-statistic: 1.324 on 1 and 18 DF, p-value: 0.265

> anova(reg.model)

Analysis of Variance Table

Response: true.data

Df Sum Sq Mean Sq F value Pr(>F)

forecast.data 1 386920 386920 1.3238 0.265

Residuals 18 5260913 292273

Now, is a good time to not be discouraged, but rather encouraged to add trend to our forecast. When we run a linear regression of trend on GDP we quickly realize that 99.7 of the variance in GDP can be accounted for by the trend.

> reg.model2<-lm(GDP~trendy)

> summary(reg.model2)

Call:

lm(formula = GDP ~ trendy)

Residuals:

Min 1Q Median 3Q Max

-625.43 -165.76 -36.73 163.04 796.33

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.001371 21.870246 0.0 1

trendy 1.000002 0.003445 290.3 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 250.6 on 256 degrees of freedom

Multiple R-squared: 0.997, Adjusted R-squared: 0.997

F-statistic: 8.428e+04 on 1 and 256 DF, p-value: < 2.2e-16

First recall the trend portion that we have already accounted for:

> t=(1:258)

> t2=t^2

> trendy= 892.656210 + -30.365580*t + 0.335586*t2

And that the de-trended series is just that- the series minus the trend.

dt=GDP-trendy

As the following example will demonstrate- If we decide to assess the model with a forecast of the de-trended series alone we may come across some discouraging results:

> test.data<-dt[-c(239:258)]

> true.data<-dt[-c(1:238)]

> forecast.data<-predict(arima(test.data,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred

Now we want to plot the forecast data vs. the actual values of the forecasted de-trended series to get a sense of whether this is accurate or not.

> plot(true.data,forecast.data)

> plot(true.data,forecast.data,main="True Data vs. Forecast data")

Clearly it appears as though there is little to no accuracy with the the forecast of our de-trended model alone. In fact a linear regression of the forecast data on the true data makes this perfectly clear.

> reg.model<-lm(true.data~forecast.data)

> summary(reg.model)

Call:

lm(formula = true.data ~ forecast.data)

Residuals:

Min 1Q Median 3Q Max

-684.0 -449.0 -220.8 549.4 716.8

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -2244.344 2058.828 -1.090 0.290

forecast.data 2.955 2.568 1.151 0.265

Residual standard error: 540.6 on 18 degrees of freedom

Multiple R-squared: 0.06851, Adjusted R-squared: 0.01676

F-statistic: 1.324 on 1 and 18 DF, p-value: 0.265

> anova(reg.model)

Analysis of Variance Table

Response: true.data

Df Sum Sq Mean Sq F value Pr(>F)

forecast.data 1 386920 386920 1.3238 0.265

Residuals 18 5260913 292273

Now, is a good time to not be discouraged, but rather encouraged to add trend to our forecast. When we run a linear regression of trend on GDP we quickly realize that 99.7 of the variance in GDP can be accounted for by the trend.

> reg.model2<-lm(GDP~trendy)

> summary(reg.model2)

Call:

lm(formula = GDP ~ trendy)

Residuals:

Min 1Q Median 3Q Max

-625.43 -165.76 -36.73 163.04 796.33

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.001371 21.870246 0.0 1

trendy 1.000002 0.003445 290.3 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 250.6 on 256 degrees of freedom

Multiple R-squared: 0.997, Adjusted R-squared: 0.997

F-statistic: 8.428e+04 on 1 and 256 DF, p-value: < 2.2e-16

In the end we would have to had accounted for trend anyway so it just makes sense to use it when testing our models accuracy.

> test.data1<-dt[-c(239:258)]

# Important note is that the "-c(239:258)" includes everything except those particular 20 observations #

> true.data1<-dt[-c(1:238)]

> true.data2<-trendy[-c(1:238)]

> forecast.data1<-predict(arima(test.data1,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred

> forecast.data2<-(true.data2)

> forecast.data3<-(forecast.data1+forecast.data2)

> true.data3<-(true.data1+true.data2)

Don't forget to plot your data:

> plot(true.data3,forecast.data3,main="True Values vs. Predicted Values")

...and regress the forecasted data on the actual data:

> reg.model3<-lm(true.data3~forecast.data3)

> summary(reg.model3)

Call:

lm(formula = true.data3 ~ forecast.data3)

Residuals:

Min 1Q Median 3Q Max

-443.5 -184.2 16.0 228.3 334.8

Coefficients:

Estimate Std. Error t-value Pr(>|t|)

(Intercept) 8.104e+03 1.141e+03 7.102 1.28e-06 ***

forecast.data3 4.098e-01 7.657e-02 5.352 4.37e-05 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 264.8 on 18 degrees of freedom

Multiple R-squared: 0.6141, Adjusted R-squared: 0.5926

F-statistic: 28.64 on 1 and 18 DF, p-value: 4.366e-05

Looking at the plot and the regression results, I feel like this model is pretty accurate considering the fact this is a point forecast and not an interval forecast. Next time on the Dancing Economist we will plot the forecasts into the future with 95% confidence intervals. Until then-

Keep Dancin'

Steven J

## Thursday, September 1, 2011

### Forecasting In R: A New Hope with AR(10)

In our last post we determined that the ARIMA(2,2,2) model was just plain not going to work for us. Although i didn't show its residuals failed to pass the acf and pacf test for white noise and the mean of its residuals was greater than three when it should have been much closer to zero.

Today we discover that an AR(10) of the de-trended GDP series may be the best option at hand. Normally when we do model selection we start with the one that has the lowest AIC and then proceed to test the error terms (or residuals) for white noise. Lets take a look at the model specs for the AR(10):

*> model7<-arima(dt,order=c(10,0,0))*

*> model7*

*Call:*

*arima(x = dt, order = c(10, 0, 0))*

*Coefficients:*

*ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8 ar9 ar10*

*1.5220 -0.4049 -0.2636 0.2360 -0.2132 0.1227 -0.0439 -0.0958 0.3244 -0.2255*

*s.e. 0.0604 0.1105 0.1131 0.1143 0.1154 0.1147 0.1139 0.1127 0.1111 0.0627*

*intercept*

*-21.6308*

*s.e. 57.5709*

*sigma^2 estimated as 1452: log likelihood = -1307.76, aic = 2639.52*

The Ljung-Box Q test checks out to the 20th lag:

*> Box.test(model7$res,lag=20,type="Ljung-Box")*

*Box-Ljung test*

*data: model7$res*

*X-squared = 15.0909, df = 20, p-value = 0.7712*

It even checks out to the 30th lag! I changed the way Iplotted the Ljung-Box Q values because after finding this little function called "LBQPlot" which makes life way easier.

*> LBQPlot(model7$res,lag.max=30,StartLag=1)*

Most importantly both the ACF and the PACF of the residuals check out for the white noise process. In the ARIMA(2,2,2) model these weren't even close to what we wanted them to be.

*> par(mfrow=c(2,1))*

*> acf(model7$res)*

*> pacf(model7$res)*

Unfortunately, our residuals continue to fail the formal tests for normality. I don't really know what to do about this or even what the proper explanation is, but I have a feeling that these tests are very very sensitive.

> jarque.bera.test(model7$res)

Jarque Bera Test

data: model7$res

X-squared = 7507.325, df = 2, p-value < 2.2e-16

> shapiro.test(model7$res)

Shapiro-Wilk normality test

data: model7$res

W = 0.7873, p-value < 2.2e-16

The mean is also considerably closer to 0 but just not quite there.

> mean(model7$res)

[1] 0.7901055

Take a look at the plot for normality:

> qqnorm(model7$res)

> qqline(model7$res)

In the next post we are going to test how good our model actually is. Today we found our optimal choice in terms of model specs, but we also should see how well it can forecast past values of GDP. In addition to evaluating our models past accuracy we will also practice forecasting into the future. As you continue to read these posts, you should be getting significantly better with R- I know I am! We have covered many new codes that stem from many different libraries. I would like to keep on doing analysis in R so after we finish forecasting GDP I think I may move on to some Econometrics. Please keep forecasting and most certainly keep dancin',

Steven J.

Labels:
AR,
ARMA,
Forecasting,
Ljung-Box,
R,
Residual Analysis

Subscribe to:
Posts (Atom)