Thursday, August 25, 2011

Forecasting in R: Modeling GDP and dealing with trend.

Okay so we want to forecast GDP. How do we even begin such a burdensome ordeal?

Well each time series has 4 components that we wish to deal with and those are seasonality, trend, cyclicality and error.  If we deal with seasonally adjusted data we don't have to worry about seasonality which leaves us with only three worries.  If we think hard we can also forget about forecasting the error component of the time series because if we can forecast the error we probably can come up with an even better trend or cyclical model.  So that leaves just two things: the trend and cyclical components.

The following graph is of real GDP and the data for this comes from FRED.



Notice the absolute rise in the series. Doesn't that strangely look similar to a quadratic equation in the form of:

y=B0+ B1*t +B2*t^2  ????

I think it does! Lets find out what the coefficients are in R and see if this is actually an appropriate model.

First get your data into R as we discussed in the previous posts:

> GDP=scan("/Users/stevensabol/Desktop/R/gdp.csv")

Then turn it into a time series so R can read it:

> GDP=ts(GDP,start=1,frequency=4)

then you have make sure that the number of observations you have in your time series will match up with the "t's"

to do this simply type


> length(GDP)
[1] 258


Then set "t" to that number:

> t=(1:258)

Then you have to regress time and time^2 on GDP. To do this type the following:

t2=t^2

and run the regression:

reg=lm(y~t+t2) 

to find out the dirty details, use the summary function:

summary(reg)


Call:
lm(formula = y ~ t + t2)

Residuals:
    Min      1Q  Median      3Q     Max
-625.43 -165.76  -36.73  163.04  796.33

Coefficients:
              Estimate            Std. Error      t value              Pr(>|t|)  
(Intercept) 892.656210  47.259202      18.89         <2e-16 ***
t           -30.365580        0.842581        -36.04        <2e-16 ***
t2            0.335586        0.003151        106.51        <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 251.1 on 255 degrees of freedom
Multiple R-squared: 0.997, Adjusted R-squared: 0.9969
F-statistic: 4.197e+04 on 2 and 255 DF,  p-value: < 2.2e-16

Okay so it appears we have a pretty great fit.  With our R^2 almost close to one I think we have a winner!

Lets plot the trend and GDP values against each other so we get the picture of what we just accomplished.

First you have to write out the equation for trend using the coefficient estimates that R gave us:

> trendy= 892.656210 + -30.365580*t  + 0.335586*t2

And for the Plot:

> ts.plot(GDP,trendy,col=1:2,ylab="Trend vs. Actual GDP")

Here is what it should look like:



Hey! Congratulations as you have officially tackled and defeated the trend component of this time series. Now we have to deal with the cyclicality of it all..

In order to do this we have to only look at the what's left when you remove trend from the equation.  we do this by simply subtracting trend from our original GDP series to get what's left- cyclicality.

De-trended (or Cyclical component) =  GDP - TREND

In R we want to write the following and plot it!


> dt=GDP-trendy
> plot(dt)



Now we want to figure out if this de-trended series is an AR, MA or ARMA process.  We do this by looking at the autocorrelation and partial autocorrelation functions. Let's take a look shall we?


par(mfrow=c(2,1))
> acf(dt,100)             *notice that i lag it out to the 100 observation
> pacf(dt,100)

This gives us the following graph:
Next time we will learn how to select a proper model and hopefully get some actual forecasting in there as well!

Keep dancin'

Steven J.












3 comments: