Sunday, September 4, 2011

Ladies and Gents: GDP has finally gotten its long awaited forecast

Today we will be finally creating our long awaited GDP forecast.  In order to create this forecast we have to combine both the forecast from our deterministic trend model and the forecast from our de-trended GDP model.

Our model for the trend is:

trendyx= 892.656210 + -30.365580*x  + 0.335586*x2

where x2=x^2
and the vector length we will make out to the 278th observation:

> x=c(1:278)

and our model for the cyclical de-trended series is from an AR(10) process:

GDP.fit<-arima(dt,order=c(10,0,0),include.mean=FALSE)

So lets say we want to predict GDP 21 periods into the future. Type in the following for the cyclical forecast:

> GDP.pred<-predict(GDP.fit,n.ahead=21)

Now when we produce our forecast we can't just add trendyx + GDP.pred$pred because the vector lengths won't match. To see this use the length() function:


> length(trendyx)
[1] 278
> length(GDP.pred$pred)
[1] 21

In order to fix this problem we are going to remove the first 258 observations from trendyx so that we only have 21 left:


> true.trend<-trendyx[-c(1:257)]
> length(true.trend)
[1] 21

Now we can plot away without any technical difficulties:

> plot(GDP,type="l",xlim=c(40,75),ylim=c(5000,18500),main="GDP Predictions")

> lines(GDP.pred$pred+true.trend,col="blue")
> lines((GDP.pred$pred+true.trend)-2*GDP.pred$se,col="red")
> lines((GDP.pred$pred+true.trend)+2*GDP.pred$se,col="red")

This code results in the following plot:

The blue line represents our point forecast and the red lines represent our 95% Confidence Level interval forecast.  I feel like the plot could be significantly cooler and therefore at its current appearance receives a 2 out of 10 for style.  It's bland, the x-axis doesn't have dates and there's not even any background color. If this plot had a name it would be doodoo.  A war must be fought against the army of lame plots.  Epic battles will proceed. Plots will be lost. Only one victor will stand.


Keep Dancin',

Steven J.


Friday, September 2, 2011

Assessing the Forecasting Ability of Our Model

Today we wish to see how our model would have faired forecasting the past 20 values of GDP. Why? Well ask yourself this: How can you know where your going, if you don't know where you've been? Once you understand please proceed on with the following post.

First recall the trend portion that we have already accounted for:


> t=(1:258)
> t2=t^2
> trendy= 892.656210 + -30.365580*t  + 0.335586*t2

And that the de-trended series is just that- the series minus the trend.

dt=GDP-trendy


As the following example will demonstrate- If we decide to assess the model with a forecast of the de-trended series alone we may come across some discouraging results:


> test.data<-dt[-c(239:258)]
> true.data<-dt[-c(1:238)]
> forecast.data<-predict(arima(test.data,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred

Now we want to plot the forecast data vs. the actual values of the forecasted de-trended series to get a sense of whether this is accurate or not.

> plot(true.data,forecast.data)
> plot(true.data,forecast.data,main="True Data vs. Forecast data")





































Clearly it appears as though there is little to no accuracy with the the forecast of our de-trended model alone.  In fact a linear regression of the forecast data on the true data makes this perfectly clear.

> reg.model<-lm(true.data~forecast.data)
> summary(reg.model)

Call:
lm(formula = true.data ~ forecast.data)

Residuals:
   Min     1Q Median     3Q    Max
-684.0 -449.0 -220.8  549.4  716.8

Coefficients:
                    Estimate    Std. Error    t value       Pr(>|t|)
(Intercept)   -2244.344   2058.828   -1.090         0.290
forecast.data     2.955      2.568         1.151         0.265

Residual standard error: 540.6 on 18 degrees of freedom
Multiple R-squared: 0.06851, Adjusted R-squared: 0.01676
F-statistic: 1.324 on 1 and 18 DF,  p-value: 0.265


> anova(reg.model)
Analysis of Variance Table

Response: true.data
                     Df  Sum Sq    Mean Sq   F value Pr(>F)
forecast.data  1     386920    386920      1.3238  0.265
Residuals     18    5260913  292273            


Now, is a good time to not be discouraged, but rather encouraged to add trend to our forecast.  When we run a linear regression of trend on GDP we quickly realize that 99.7 of the variance in GDP can be accounted for by the trend.


> reg.model2<-lm(GDP~trendy)
> summary(reg.model2)

Call:
lm(formula = GDP ~ trendy)

Residuals:
    Min      1Q  Median      3Q     Max
-625.43 -165.76  -36.73  163.04  796.33

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.001371  21.870246     0.0        1  
trendy       1.000002   0.003445   290.3   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 250.6 on 256 degrees of freedom
Multiple R-squared: 0.997, Adjusted R-squared: 0.997
F-statistic: 8.428e+04 on 1 and 256 DF,  p-value: < 2.2e-16


In the end we would have to had accounted for trend anyway so it just makes sense to use it when testing our models accuracy.  

> test.data1<-dt[-c(239:258)]  

# Important note is that the "-c(239:258)" includes everything except those particular 20 observations #

> true.data1<-dt[-c(1:238)]
> true.data2<-trendy[-c(1:238)]
> forecast.data1<-predict(arima(test.data1,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred
> forecast.data2<-(true.data2)

> forecast.data3<-(forecast.data1+forecast.data2)
> true.data3<-(true.data1+true.data2)

Don't forget to plot your data:

> plot(true.data3,forecast.data3,main="True Values vs. Predicted Values")



...and regress the forecasted data on the actual data:

> reg.model3<-lm(true.data3~forecast.data3)
> summary(reg.model3)

Call:
lm(formula = true.data3 ~ forecast.data3)

Residuals:
   Min     1Q Median     3Q    Max 
-443.5 -184.2   16.0  228.3  334.8 

Coefficients:
                       Estimate          Std. Error      t-value    Pr(>|t|)    
(Intercept)        8.104e+03      1.141e+03   7.102       1.28e-06 ***
forecast.data3  4.098e-01        7.657e-02   5.352        4.37e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 264.8 on 18 degrees of freedom
Multiple R-squared: 0.6141, Adjusted R-squared: 0.5926 
F-statistic: 28.64 on 1 and 18 DF,  p-value: 4.366e-05 

Looking at the plot and the regression results, I feel like this model is pretty accurate considering the fact this is a point forecast and not an interval forecast.  Next time on the Dancing Economist we will plot the forecasts into the future with 95% confidence intervals. Until then-

Keep Dancin'

Steven J





Thursday, September 1, 2011

Forecasting In R: A New Hope with AR(10)

In our last post we determined that the ARIMA(2,2,2) model was just plain not going to work for us.  Although i didn't show its residuals failed to pass the acf and pacf test for white noise and the mean of its residuals was greater than three when it should have been much closer to zero.
Today we discover that an AR(10) of the de-trended GDP series may be the best option at hand.  Normally when we do model selection we start with the one that has the lowest AIC and then proceed to test the error terms (or residuals) for white noise. Lets take a look at the model specs for the AR(10):


> model7<-arima(dt,order=c(10,0,0))

> model7

Call:
arima(x = dt, order = c(10, 0, 0))

Coefficients:
         ar1      ar2            ar3       ar4           ar5     ar6           ar7      ar8        ar9     ar10
      1.5220  -0.4049  -0.2636  0.2360  -0.2132  0.1227  -0.0439  -0.0958  0.3244  -0.2255
s.e.  0.0604   0.1105   0.1131  0.1143   0.1154  0.1147   0.1139   0.1127  0.1111   0.0627
      intercept
       -21.6308
s.e.    57.5709

sigma^2 estimated as 1452:  log likelihood = -1307.76,  aic = 2639.52

The Ljung-Box Q test checks out to the 20th lag: 

> Box.test(model7$res,lag=20,type="Ljung-Box")

Box-Ljung test

data:  model7$res 
X-squared = 15.0909, df = 20, p-value = 0.7712

It even checks out to the 30th lag! I changed the way Iplotted the Ljung-Box Q values because after finding this little function called "LBQPlot" which makes life way easier.

> LBQPlot(model7$res,lag.max=30,StartLag=1)
Most importantly both the ACF and the PACF of the residuals check out for the white noise process.  In the ARIMA(2,2,2) model these weren't even close to what we wanted them to be.

> par(mfrow=c(2,1))
> acf(model7$res)
> pacf(model7$res)



Unfortunately, our residuals continue to fail the formal tests for normality. I don't really know what to do about this or even what the proper explanation is, but I have a feeling that these tests are very very sensitive. 

> jarque.bera.test(model7$res)

Jarque Bera Test

data:  model7$res 
X-squared = 7507.325, df = 2, p-value < 2.2e-16

> shapiro.test(model7$res)

Shapiro-Wilk normality test

data:  model7$res 
W = 0.7873, p-value < 2.2e-16

The mean is also considerably closer to 0 but just not quite there.

> mean(model7$res)
[1] 0.7901055

Take a look at the plot for normality:

> qqnorm(model7$res)
> qqline(model7$res)
In the next post we are going to test how good our model actually is.  Today we found our optimal choice in terms of model specs, but we also should see how well it can forecast past values of GDP.  In addition to evaluating our models past accuracy we will also practice forecasting into the future.  As you continue to read these posts, you should be getting significantly better with R- I know I am!  We have covered many new codes that stem from many different libraries. I would like to keep on doing analysis in R so after we finish forecasting GDP I think I may move on to some Econometrics. Please keep forecasting and most certainly keep dancin',

Steven J.