Showing posts with label R. Show all posts
Showing posts with label R. Show all posts

Monday, January 2, 2012

Monetary Policy & Credit Easing pt. 8: Econometrics Tests in R

Hello, folks its time to cover some important econometrics tests you can do in R.

The Akaike information criterion is a measure of the relative goodness of fit of a statistical model.  If you have 10 models and order them by AIC, the one with the smallest AIC is your best model, ceterus paribus.
The following code can figure the AIC and a similar version called BIC:



> AIC(srp1.gls)
[1] 100.7905


> BIC(srp1.gls)
[1] 140.7421

Say we wish to see if our model has an error term that follows a relatively normal distribution. For this we can perform the Jarque-Bera which tests kurtosis as well as skewness. This function requires that you load the FitAR package.


> JarqueBeraTest(srp1.gls$res[-(1)])
$LM
[1] 19.2033


$pvalue
[1] 6.761719e-05

To see if the mean of the residual values is 0 and to see the standard deviation the following code works:


> mean(srp1.gls$res[-(1)])
[1] 0.003354243
> sd(srp1.gls$res[-(1)])
[1] 0.3666269

Other tests like the Breusch-Pagan and Goldfeld-Quandt provide facts like wether autocorrelation is present and give us a hint as to wether our residual variance is stable or not. In order for these to work you have to load the lmtest package. Also you can only run these for the lm objects or for your Ordinary Least Squares Regressions for any Generalized Least Squares regressions you'll have to perform these test manually, and if you know of an easier or softer way please share.


> bptest(srp1.lm)


studentized Breusch-Pagan test


data:  srp1.lm 
BP = 48.495, df = 12, p-value = 2.563e-06


> gqtest(srp1.lm)


Goldfeld-Quandt test


data:  srp1.lm 
GQ = 0.1998, df1 = 40, df2 = 40, p-value = 1


You can also use the Durbin-Watson to test for first order autocorrelation:


> dwtest(srp1.lm)


Durbin-Watson test


data:  srp1.lm 
DW = 1.4862, p-value = 0.0001955
alternative hypothesis: true autocorrelation is greater than 0 

Wish to get confidence intervals for your parameter estimates? Then use the confint() function as shown below for the Generalized Least Squares regression on long-term risk premia from 2001-2011.


> confint(p2lrp.gls)
                                    2.5 %                               97.5 %
yc                                -0.1455727340         0.1498852728
default                      0.2994818014             1.0640354237
Volatility                   0.0336077958             0.0617798767
CorporateProfit      -0.0010916473             0.0006628209
FF                               -0.1788624533             0.0931406285
ER                              0.0001539035                        0.0016060804
Fedmbs                      -0.0061554994             0.0085638593
Support                     -0.1499342096             0.1615652273
FedComm                     -0.0108567077             0.0750407328
FedGdp                      -0.1347070955             0.2528217710
ForeignDebt                 -0.0441198164             0.1042805549
govcredit                    0.1090847204             0.6796839003
FedBalance                  -2.0940925835             0.0370114069
UGAP                        -0.4821566147             0.3188891550
OGAP                        -0.2239749029             0.1073611677

Another nice feature is finding the log-likelihood of your estimation:

> logLik(lrp2.lm)
'log Lik.' 23.05106 (df=17)

Want to see if you have a unit-root in your residual values? Then perform the augmented Dickey-Fuller. For this you'll have to load the 'tseries' package.

> adf.test(lrp2.gls$res[-(1:4)])

                              Augmented Dickey-Fuller Test

data:  lrp2.gls$res[-(1:4)]
Dickey-Fuller = -7.4503, Lag order = 3, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(lrp2.gls$res[-(1:4)]) : p-value smaller than printed p-value
> adf.test(lrp2.lm$res)




I hope this mini-series has been informative to all that tuned in. For more info on anything you see here please don't be shy to comment and keep dancin',

Steven J.

Sunday, January 1, 2012

Monetary Policy & Credit Easing pt. 7: R Econometrics Tests

In post 6 we introduced some econometrics code that will help those working with time-series to gain asymptoticly efficient results.  In this post we look at the different commands and libraries necessary for testing our assumptions and such.

Testing our Assumptions and Meeting the Gauss-Markov Theorem

In this section we will seek to test and verify the assumptions of the simple linear regression model.  These assumptions are laid out as follows and are extracted from Hill, Griffiths and Lim 2008:

SR1. The value of y, for each value of x, is
y= ß_{1}+ß_{2}x+µ
SR2. The expected value of the random error µ is
E(µ)=0
which is equivalent to assuming
E(y)= ß_{1}+ß_{2}x
SR3. The variance of the random error µ is
var(µ)=sigma^2 = var(y)
The random variables y and µ have the same variance because they differ only by a constant.
SR4. The covariance between any pair of random errors µ_{i} and µ_{j} is
cov(µ_{i}, µ_{j})=cov(y_{i},y_{j})=0
SR5. The variable x is not random and must take at least two different values.
SR6. The values of µ are normally distributed about their mean
µ ~ N(0, sigma^2)
if the y values are normally distributed and vice-versa 
Central to this topics objective is meeting the conditions set forth by the Gauss-Markov Theorem.  The Gauss-Markov Theorem states that if the error term is stationary and has no serial correlation, then the OLS parameter estimate is the Best Linear Unbiased Estimate or BLUE, which implies that all other linear unbiased estimates will have a larger variance. An estimator that has the smallest possible variance is called an "efficient" estimator.  In essence, the Gauss-Markov theorem states that the error term must have no structure; the residual levels must exhibit no trend and the variance must be constant through time.
When the error term in the regression does not satisfy the assumptions set forth by Gauss-Markov, OLS is still unbiased, but fails to be BLUE as it fails to give the most efficient parameter estimates. In this scenario, a strategy which transforms the regressions variables so that the error has no structure is in order. In time-series analysis, the problem of autocorrelation between the residual values is a common one.  There are several ways to approach the transformations necessary to ensure BLUE estimates, and the previous post used the following method to gain asymptotic efficiency and improve our estimates:
1. Estimate the OLS regression

2. Fit OLS residual to an AR(p) process using the Yule-Walker Method and find the value of p.

3.  Re-estimate model using Generalized Least Squares fit by Maximum Likelihood estimation, using the  estimated from 2, as the order for your correlation residual term.

4. Fit the GLS estimated residuals to an AR(p) process and use the estimated p's as the final parameter estimates for the error term.  

What have we done?  First we have to find out what the error term autocorrelation process is. What order is p? In order to find this out we fit the OLS residuals to an AR(p) using the Yule-Walker method. Then we take the order p of our estimated error term and run a GLS regression with an AR(p) error term.  This will give us better estimates for our model.  Research has shown that GLS estimators are asymptotically more efficient than OLS estimates almost one-hundred percent of the time. If you notice in every single regression, the GLS estimator with a twice iterated AR(p) error terms consistently results in a lower standard deviation of the residual value. Therefore the model has gained efficiency which translates into improved confidence intervals.  Additionally, by fitting the GLS residuals to an AR(p) we remove any autocorrelation(or structure) that may have been present in the residual.  

Testing For Model Miss-specification and Omitted Variable Bias

The Ramsey RESET test (Regression Specification Error Test) is designed to detect omitted variable bias and incorrect functional form. Rejection of H_{0} implies that the original model is inadequate and can be improved.  A failure to reject H_{0} conveys that the test has not been able to detect any miss-specification.

Unfortunately our models of short-term risk premia over both estimation periods reject the null hypothesis, and thus suggest that a better model is out there somewhere.  Correcting for this functional miss-specification or omitted variable bias will not be pursued here, but we must keep in mind that our model can be improved upon and is thus not BLUE.  

In R you can run the Ramsey Reset test for standard lm functions using the library lmtest:

>library(lmtest)

> resettest(srp1.lm)

RESET test

data:  srp1.lm 
RESET = 9.7397, df1 = 2, df2 = 91, p-value = 0.0001469

For GLS objects however you'll need to do it manually and that procedure will not be outline here.  Although if you really want to know please feel free to email or leave a comment below.  

Addressing Multicollinearity

In the original formulation of the model there existed an independent variable called CreditMarketSupport, that was very similar to our FedBalance variable.  Both variables are percentages and shared the same numerator while also having very similar denominators.  As a result we had suffered from a condition called exact collinearity as the correlation between these two variables was nearly one.

> cor(FedBalance1,CreditMarketSupport1)

0.9994248

With exact collinearity we were unable to obtain a least squares estimate of our ß coefficients and these variables were behaving opposite of what we were expecting.  This violated one of our least squares assumptions SR5 which states that values of x_{ik} are not exact linear functions of the other explanatory variables.  To remedy this problem, we removed CreditMarketSupport from the models and we are able to achieve BLUE estimates.

 Suspected Endogeniety

In our estimation of long-term risk premia over the first time period we suspect endogeniety in the cyclical variable Output Gap.  In order to remedy this situation we replace it with an instrumental variable - the percentage change in S&P 500 and perform the Hausman Test which is laid out as follows:

H_{0}: delta = 0 (no correlation between x_{i} and µ_{i})

H_{1}: delta ≠ 0 (correlation between x_{i} and µ_{i})

When we perform the Hausman Test using S&P 500 as our instrumental variable our delta ≠ 0 and is statistically significant.  This means that our Output Gap variable is indeed endogenous and correlated with the residual term.  If you want to learn more about the Hausman Test and how to perform it in R please leave a comment or email me and i'll make sure to get the code over to you.  When we perform the Two Stage Least Squares Regression to correct for this not a single term is significant.  This can be reasonably be attributed to the problem of weak instruments.  The 2 Stage Least Squares Estimation is provided below. Since, the percentage change in the S&P500 was only correlated with the Output Gap 0.110954, there is strong reason to suspect that weak instruments are the source of the problem.  We will choose to not locate a proper instrumental variable to emulate the Output Gap, instead we will keep in mind that we have an endogenous variable when interpreting our coefficient estimates which will now end up being slightly biased. 

Below is how to perform a two-stage least squares regression in R when your replacing an endogenous variable with an exogenous one. First you'll need to load the library sem into R. In the below regression the first part includes all the variables from the original model and the second part lists all of our exogenous and instrumental variables which in this case is just the percentage change in the S&P 500.

> tSLRP1<-tsls(lrp1~yc1+CP1+FF1+default1+Support1+ER1+FedGDP1+FedBalance1+govcredit1+ForeignDebt1+UGAP1+OGAP1,~ yc1+CP1+FF1+default1+Support1+ER1+FedGDP1+FedBalance1+govcredit1+ForeignDebt1+sp500ch+OGAP1 )

> summary(tSLRP1)

 2SLS Estimates

Model Formula: lrp1 ~ yc1 + CP1 + FF1 + default1 + Support1 + ER1 + FedGDP1 + 
    FedBalance1 + govcredit1 + ForeignDebt1 + UGAP1 + OGAP1

Instruments: ~yc1 + CP1 + FF1 + default1 + Support1 + ER1 + FedGDP1 + FedBalance1 + 
    govcredit1 + ForeignDebt1 + sp500ch + OGAP1

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -9.030  -1.870   0.021   0.000   2.230   7.310 

             Estimate Std. Error  t value Pr(>|t|)
(Intercept)  -5.28137   44.06906 -0.11984   0.9049
yc1          -1.48564   10.60827 -0.14005   0.8889
CP1          -0.01584    0.09206 -0.17204   0.8638
FF1           0.20998    2.43849  0.08611   0.9316
default1     -7.16622   65.35728 -0.10965   0.9129
Support1      6.39893   47.72244  0.13409   0.8936
ER1           4.56290   35.91837  0.12704   0.8992
FedGDP1       1.86392    9.16081  0.20347   0.8392
FedBalance1   0.73087   12.96474  0.05637   0.9552
govcredit1    0.17051    0.89452  0.19062   0.8492
ForeignDebt1 -0.22396    1.41749 -0.15799   0.8748
UGAP1         4.55897   35.33446  0.12902   0.8976
OGAP1         0.01331    0.09347  0.14235   0.8871

Residual standard error: 3.3664 on 93 degrees of freedom

Notice that our model now doesn't have any significant terms.  This is why we will choose to ignore the endogeniety of our Output Gap and probably Unemployment Gap variables.  Correcting for endogeniety does more harm than good in this case.

Results and Concluding Thoughts

As this paper hopefully shows, the Feds actions did directly impact the easing of broader credit conditions in the financial markets.  

Over our first estimation period from 1971 to 1997 we find that the Fed's support of Depository Institutions as a percentage of savings and time deposits is positively related to the short-term risk premia. Specifically we find that a 1 percentage point increase in Support leads to a 2.1 percent increase in short-term risk premia. This was as expected because Depository Institutions would only borrow from the Fed if no other options existed. We also find that a 1 percentage point increase in the federal funds rate leads to a .19 percentage increase in short-term risk premia.  This is consistent with our original hypothesis as an increased FF puts positive pressure on short-term rates like the 3 month commercial paper rate, thus resulting in an widened spread.  With respect to long-term risk premia, we find that a 1 percentage point increase in FF leads the long-term risk premia to decrease by .66 percentage points and a 1 percent increase in the federal funds rate leads to a .07 decrease in the long-term risk premia.

Over our second estimation period the composition of the Feds balance sheet is considered.  We see that the CCLF  did decrease short-term risk premiums, with every one percent increase translating to a decrease in short-term risk premia by .1145 percentage points.  Another important result is that Fed purchases of Agency Debt and Agency MBS did have a significant, although almost negligible effect on short-term risk premia.  One surprising result with the estimation of the long-term risk premia is that our Fed balance sheet size variable has a sign that is opposite of what we expected and its significance is particularly surprising.  This may be expected since this period is largely characterized by both a shrinking balance sheet and narrowing risk premia as investments were considered relatively safe.  However towards the end of the period risk premiums shot up and only after did the size of the balance sheet also increase, thus the sample period may place too much weight towards the beginning of the time period and not enough towards the end. This is a reasonable assumption given that our estimate of the balance sheet size showed a large negative impact on risk premia over our longer estimation period.  


Please people keep dancing and we'll delve further into some additional econometrics tests next week. 



Friday, December 30, 2011

Monetary Policy and Credit Easing pt. 6: Empirical Estimation and Methodology

IT is now appropriate to lay out our two regression models in full for empirical estimation over our two separate time periods. The first estimation is from 4/1/71 to 7/1/97 and the second is from 4/1/01 to 4/1/11. The methodology employed in the estimation of these two models is a procedure using Generalized Least Squares with a Cochrane-Orcutt, style iterated residual value.  For those that wish to perform the same regressions at home I have provided the following links to my data.  This is for the estimation period from 1971 to 1997 and this one is for the 2001 to 2011 estimations. The following four steps were taken for each estimation:

1. Estimate the OLS regression

2. Fit OLS residual to an AR(p) process using the Yule-Walker Method and find the value of p.

3.  Re-estimate model using Generalized Least Squares fit by Maximum Likelihood estimation, using the  estimated p from 2, as the order for your correlation residual term.

4. Fit the GLS estimated residuals to an AR(p) process using the Yule-Walker Method and use the estimated p's as the final parameter estimates for the error term.

The end goal of the above procedure is to have our models be asymptotically BLUE or the Best Linear Unbiased Estimators.  The implies that they have the smallest variance out of all the models that meet the rest of the Gauss-Markov assumptions.  A little background behind the methodology is in order.  First we will perform the standard Ordinary Least Squares (OLS) regression on our dependent variables.  This will give us at the very least unbiased estimators.  Second we take the residuals from the first step and fit them to an AR(p) process as selected by the Yule-Walker Method.  This method selects the optimal lag that characterizes the autocorrelation in the residuals.  We automatically take this step, due to the well known fact that most time-series suffer from autocorrelation problems as verified by our correlograms.  Then we re-estimate the regression using Generalized Least Squares which adjusts each term and divides them by their individual variances, while also incorporating the AR(p) lag that we discovered during the previous step.  The final step, which fits our GLS models residuals to an AR(p) process is what leads to our asymptoticly efficient results. We lay out our first estimation period models below.

First Estimation:  4/1/71 to 7/1/97 

1. Monetary Policies Impact On Short-term Risk Premiums

Our first model which seeks to answer how monetary policy impacts the risk premia on short-term commercial paper as estimated over our first time period 4/1/71 to 7/1/97 is as follows:

SR^{premium}_{t}=ß_{0}+ß_{1}*FedBalance^{size}_{t}+ß_{2}*Support_{t} + ß_{3}*UGAP_{t}+ ß_{4}*FF_{t}+ ß_{5}*ER_{t}+ ß_{6}*YC_{t}+ ß_{7}*Default^{spread}_{t}+ ß_{8}*CP_{t}+ ß_{9}*OGAP_{t} + ß_{10}*FedGDP_{t}+ ß_{11}*govcredit_{t}+ ß_{12}*ForeignDebt_{t} +  µ_{t}

Where,

SR^{premium}_{t} = Short-term Risk Premium at time, t

Support_{t}= Fed's funds at depository institutions as a percentage of their main financing streams at time, t

FedBalance^{size}_{t}=  The Fed's credit market asset holdings as percentage of the total credit market assets at time, t

FF_{t}= Federal Funds rate at time, t

ER_{t}= Excess Reserves of Depository Institutions at time, t

YC_{t}= Yield curve at time, t

Default^{spread}_{t}= Default Spread between BAA_{t} & AAA_{t} rated bonds at time, t

CP_{t} = Corporate Profits After Tax at time, t

FedGDP_{t}= Fed's holdings of total public debt as a percentage of GDP at time, t

govcredit_{t}= Government Holdings Of Domestic Credit Market Debt As A Percentage Of The Total at time, t

ForeignDebt_{t}= Foreign Holdings of Federal Debt As A Percentage Of The Total at time, t

UGAP_{t} = Unemployment gap at time, t

OGAP_{t} = Output gap at time, t

µ_{t}= error term at time, t

R DATA WORK

So now it is time for the long awaited econometrics work in R. First thing you'll want to do is read the data into R from your data file which is in this case is the Earlreg file.

> earl<- read.csv("/Users/stevensabol/Desktop/R/earlreg.csv",header = TRUE, sep = ",")

Then you define your variable names so you can easily manipulate your data in R. So when you open the official .csv data file take a look at the variable names and rename them using the following procedure.  

>yc1<-earl[,"yc"]

After you define what you call everything you're then free to go crazy and run regressions.  Below is how you run the standard Ordinary Least Squares regression.  The lm function enables you to run linear regressions:

1. Estimate the OLS regression

>srp1.lm=lm(srp1~yc1+CP1+FF1+default1+Support1+ER1+FedGDP1+FedBalance1+govcredit1+ForeignDebt1+UGAP1+OGAP1)

In order to get the output you have to use the summary function:

> summary(srp1.lm)

Call:
lm(formula = srp1 ~ yc1 + CP1 + FF1 + default1 + Support1 + ER1 + 
    FedGDP1 + FedBalance1 + govcredit1 + ForeignDebt1 + UGAP1 + 
    OGAP1)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.04289 -0.20145 -0.04041  0.15230  1.21044 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -2.7591194  1.0359966  -2.663  0.00912 ** 
yc1           0.1320996  0.0580500   2.276  0.02516 *  
CP1          -0.0022773  0.0073773  -0.309  0.75825    
FF1           0.1699788  0.0340654   4.990 2.81e-06 ***
default1      0.4382965  0.1876685   2.335  0.02167 *  
Support1      2.2383850  0.6660140   3.361  0.00113 ** 
ER1           0.3351508  0.3017644   1.111  0.26959    
FedGDP1       0.3031938  0.2558144   1.185  0.23895    
FedBalance1   0.4014920  0.3477547   1.155  0.25124    
govcredit1   -0.0928817  0.0401603  -2.313  0.02294 *  
ForeignDebt1 -0.0068900  0.0215393  -0.320  0.74977    
UGAP1        -0.0912273  0.0520491  -1.753  0.08295 .  
OGAP1         0.0006669  0.0014895   0.448  0.65536    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.3789 on 93 degrees of freedom
Multiple R-squared: 0.6474, Adjusted R-squared: 0.6019 
F-statistic: 14.23 on 12 and 93 DF,  p-value: 2.642e-16 

Next we perform step 2 which is:

2. Fit OLS residual to an AR(p) process using the Yule-Walker Method and find the value of p.

> srp.lmfit<-ar.yw(srp1.lm$res)
> srp.lmfit

Call:
ar.yw.default(x = srp1.lm$res)

Coefficients:
     1  
0.2535  

Order selected 1  sigma^2 estimated as  0.1201 

So the Yule-Walker methodology fits the residual series to an AR(1) process.

3.  Re-estimate model using Generalized Least Squares fit by Maximum Likelihood estimation, using the  estimated p from 2, as the order for your correlation residual term.

In order to run a GLS regression your going to need to load the nlme package:

> library(nlme)

Then you can go crazy:

>srp1.gls=gls(srp1~yc1+CP1+FF1+default1+Support1+ER1+FedGDP1+FedBalance1+govcredit1+ForeignDebt1+UGAP1+OGAP1, corr=corARMA(p=1,q=0),method="ML")

> summary(srp1.gls)

The following output is produced:

Generalized least squares fit by maximum likelihood
  Model: srp1 ~ yc1 + CP1 + FF1 + default1 + Support1 + ER1 + FedGDP1 + FedBalance1 + govcredit1 + ForeignDebt1 + UGAP1 + OGAP1 
  Data: NULL 
       AIC      BIC    logLik
  100.7905 140.7421 -35.39526

Correlation Structure: AR(1)
 Formula: ~1 
 Parameter estimate(s):
      Phi 
0.3696665 

Coefficients:
                  Value Std.Error   t-value p-value
(Intercept)  -3.0219486 1.2595942 -2.399145  0.0184
yc1           0.1929605 0.0640627  3.012054  0.0033
CP1          -0.0060642 0.0071791 -0.844700  0.4004
FF1           0.1918066 0.0362894  5.285466  0.0000
default1      0.5292204 0.2060591  2.568293  0.0118
Support1      2.1086204 0.7405128  2.847514  0.0054
ER1           0.5651430 0.2770125  2.040135  0.0442
FedGDP1       0.1028773 0.3143122  0.327309  0.7442
FedBalance1   0.7845392 0.4130914  1.899190  0.0606
govcredit1   -0.1240196 0.0524191 -2.365922  0.0201
ForeignDebt1  0.0009822 0.0278623  0.035252  0.9720
UGAP1        -0.1266050 0.0657633 -1.925161  0.0573
OGAP1        -0.0014094 0.0014328 -0.983623  0.3279

 Correlation: 
             (Intr) yc1    CP1    FF1    deflt1 Spprt1 ER1    FdGDP1 FdBln1 gvcrd1
yc1          -0.267                                                               
CP1           0.054 -0.062                                                        
FF1          -0.308  0.726 -0.012                                                 
default1      0.081 -0.208  0.235 -0.342                                          
Support1     -0.005 -0.109 -0.107 -0.419  0.137                                   
ER1          -0.208  0.077 -0.081  0.067 -0.180  0.028                            
FedGDP1      -0.728 -0.059 -0.048  0.083 -0.057 -0.002 -0.308                     
FedBalance1   0.461  0.250  0.020  0.208  0.036 -0.081  0.445 -0.887              
govcredit1   -0.570 -0.233 -0.095 -0.291 -0.261  0.072  0.068  0.666 -0.784       
ForeignDebt1 -0.475  0.132  0.006 -0.092  0.093  0.219  0.057  0.059 -0.045  0.227
UGAP1        -0.048 -0.193 -0.062  0.085 -0.447  0.150  0.090  0.045  0.064 -0.053
OGAP1        -0.029  0.092  0.295  0.062 -0.208 -0.024  0.053 -0.021  0.013  0.056
             FrgnD1 UGAP1 
yc1                       
CP1                       
FF1                       
default1                  
Support1                  
ER1                       
FedGDP1                   
FedBalance1               
govcredit1                
ForeignDebt1              
UGAP1        -0.016       
OGAP1         0.064  0.041

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-3.08026826 -0.62589269 -0.08409222  0.39781537  3.24233325 

Residual standard error: 0.3634024 
Degrees of freedom: 106 total; 93 residual

After you perform this step you have to refit the residuals in order to get serially uncorrelated terms.  

4. Fit the GLS estimated residuals to an AR(p) process using the Yule-Walker Method and use the estimated p's as the final parameter estimates for the error term.  

> s1glsres.ar<-ar.yw(srp1.gls$res)
> s1glsres.ar

Call:
ar.yw.default(x = srp1.gls$res)

Coefficients:
     1  
0.3718  

Order selected 1  sigma^2 estimated as  0.1163 

In order to see the results of these actions please refer to image below:
The Ljung-box Q is a test for autocorrelation between the lags in the error terms. Ideally in order to meet the BLUE criteria we have to reject the the null hypothesis for autocorrelation at each lag.  We can see that both our original OLS  and GLS estimations fail to pass the Ljung-Box Q. However when we readjust the error terms the final time we get residuals that are serially uncorrelated with each other.  

In order to get the above graph you have to first load the package that will allow you to perform the Ljung-Box Q plot:

> library(FitAR)

Then you can proceed from there and define how many plots should be in one picture.  In the above image we have 9 therefore:

> par(mfrow=c(3,3))

Then you can start adding in your plots.  Below is the code for producing the plots for the fitted GLS residuals.

> acf(s1glsres.ar$res[-(1)])
> pacf(s1glsres.ar$res[-(1)])
> LBQPlot(s1glsres.ar$res[-(1)])

We include the [-(1)] because exclude the first observation since we have an AR(1) process.

The same steps above can be applied to any time-series regression model.  In the next post we will discuss how to get some summary statistics.  Please keep dancin'

Steven J.





Monday, December 26, 2011

Monetary Policy and Credit Easing

Here at the dancing economist, we wish to educate our followers on the finer points of economics and this includes econometrics and using R. R as mentioned previously is a free statistical software that enables regular people like us to do high end economics research. Recently, I wrote a paper on how the Federal Reserves actions have impacted both short-term and long-term risk premiums. In the next few blog posts I will be posting sections of the paper along with the R code necessary to perform the statistical analysis involved. One interesting result is that the Feds balance sheet although not previously manipulated was heavily involved in reducing long-term risk premia over the period from 1971 to 1997. The methodology in the paper involved performing a Generalized Least Squares procedure and accounting for residual correlation to achieve the assumptions as stated by the Gauss-Markov Theorem. More will follow,


Keep Dancing,

Steven J.

Sunday, September 4, 2011

Ladies and Gents: GDP has finally gotten its long awaited forecast

Today we will be finally creating our long awaited GDP forecast.  In order to create this forecast we have to combine both the forecast from our deterministic trend model and the forecast from our de-trended GDP model.

Our model for the trend is:

trendyx= 892.656210 + -30.365580*x  + 0.335586*x2

where x2=x^2
and the vector length we will make out to the 278th observation:

> x=c(1:278)

and our model for the cyclical de-trended series is from an AR(10) process:

GDP.fit<-arima(dt,order=c(10,0,0),include.mean=FALSE)

So lets say we want to predict GDP 21 periods into the future. Type in the following for the cyclical forecast:

> GDP.pred<-predict(GDP.fit,n.ahead=21)

Now when we produce our forecast we can't just add trendyx + GDP.pred$pred because the vector lengths won't match. To see this use the length() function:


> length(trendyx)
[1] 278
> length(GDP.pred$pred)
[1] 21

In order to fix this problem we are going to remove the first 258 observations from trendyx so that we only have 21 left:


> true.trend<-trendyx[-c(1:257)]
> length(true.trend)
[1] 21

Now we can plot away without any technical difficulties:

> plot(GDP,type="l",xlim=c(40,75),ylim=c(5000,18500),main="GDP Predictions")

> lines(GDP.pred$pred+true.trend,col="blue")
> lines((GDP.pred$pred+true.trend)-2*GDP.pred$se,col="red")
> lines((GDP.pred$pred+true.trend)+2*GDP.pred$se,col="red")

This code results in the following plot:

The blue line represents our point forecast and the red lines represent our 95% Confidence Level interval forecast.  I feel like the plot could be significantly cooler and therefore at its current appearance receives a 2 out of 10 for style.  It's bland, the x-axis doesn't have dates and there's not even any background color. If this plot had a name it would be doodoo.  A war must be fought against the army of lame plots.  Epic battles will proceed. Plots will be lost. Only one victor will stand.


Keep Dancin',

Steven J.


Friday, September 2, 2011

Assessing the Forecasting Ability of Our Model

Today we wish to see how our model would have faired forecasting the past 20 values of GDP. Why? Well ask yourself this: How can you know where your going, if you don't know where you've been? Once you understand please proceed on with the following post.

First recall the trend portion that we have already accounted for:


> t=(1:258)
> t2=t^2
> trendy= 892.656210 + -30.365580*t  + 0.335586*t2

And that the de-trended series is just that- the series minus the trend.

dt=GDP-trendy


As the following example will demonstrate- If we decide to assess the model with a forecast of the de-trended series alone we may come across some discouraging results:


> test.data<-dt[-c(239:258)]
> true.data<-dt[-c(1:238)]
> forecast.data<-predict(arima(test.data,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred

Now we want to plot the forecast data vs. the actual values of the forecasted de-trended series to get a sense of whether this is accurate or not.

> plot(true.data,forecast.data)
> plot(true.data,forecast.data,main="True Data vs. Forecast data")





































Clearly it appears as though there is little to no accuracy with the the forecast of our de-trended model alone.  In fact a linear regression of the forecast data on the true data makes this perfectly clear.

> reg.model<-lm(true.data~forecast.data)
> summary(reg.model)

Call:
lm(formula = true.data ~ forecast.data)

Residuals:
   Min     1Q Median     3Q    Max
-684.0 -449.0 -220.8  549.4  716.8

Coefficients:
                    Estimate    Std. Error    t value       Pr(>|t|)
(Intercept)   -2244.344   2058.828   -1.090         0.290
forecast.data     2.955      2.568         1.151         0.265

Residual standard error: 540.6 on 18 degrees of freedom
Multiple R-squared: 0.06851, Adjusted R-squared: 0.01676
F-statistic: 1.324 on 1 and 18 DF,  p-value: 0.265


> anova(reg.model)
Analysis of Variance Table

Response: true.data
                     Df  Sum Sq    Mean Sq   F value Pr(>F)
forecast.data  1     386920    386920      1.3238  0.265
Residuals     18    5260913  292273            


Now, is a good time to not be discouraged, but rather encouraged to add trend to our forecast.  When we run a linear regression of trend on GDP we quickly realize that 99.7 of the variance in GDP can be accounted for by the trend.


> reg.model2<-lm(GDP~trendy)
> summary(reg.model2)

Call:
lm(formula = GDP ~ trendy)

Residuals:
    Min      1Q  Median      3Q     Max
-625.43 -165.76  -36.73  163.04  796.33

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.001371  21.870246     0.0        1  
trendy       1.000002   0.003445   290.3   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 250.6 on 256 degrees of freedom
Multiple R-squared: 0.997, Adjusted R-squared: 0.997
F-statistic: 8.428e+04 on 1 and 256 DF,  p-value: < 2.2e-16


In the end we would have to had accounted for trend anyway so it just makes sense to use it when testing our models accuracy.  

> test.data1<-dt[-c(239:258)]  

# Important note is that the "-c(239:258)" includes everything except those particular 20 observations #

> true.data1<-dt[-c(1:238)]
> true.data2<-trendy[-c(1:238)]
> forecast.data1<-predict(arima(test.data1,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred
> forecast.data2<-(true.data2)

> forecast.data3<-(forecast.data1+forecast.data2)
> true.data3<-(true.data1+true.data2)

Don't forget to plot your data:

> plot(true.data3,forecast.data3,main="True Values vs. Predicted Values")



...and regress the forecasted data on the actual data:

> reg.model3<-lm(true.data3~forecast.data3)
> summary(reg.model3)

Call:
lm(formula = true.data3 ~ forecast.data3)

Residuals:
   Min     1Q Median     3Q    Max 
-443.5 -184.2   16.0  228.3  334.8 

Coefficients:
                       Estimate          Std. Error      t-value    Pr(>|t|)    
(Intercept)        8.104e+03      1.141e+03   7.102       1.28e-06 ***
forecast.data3  4.098e-01        7.657e-02   5.352        4.37e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 264.8 on 18 degrees of freedom
Multiple R-squared: 0.6141, Adjusted R-squared: 0.5926 
F-statistic: 28.64 on 1 and 18 DF,  p-value: 4.366e-05 

Looking at the plot and the regression results, I feel like this model is pretty accurate considering the fact this is a point forecast and not an interval forecast.  Next time on the Dancing Economist we will plot the forecasts into the future with 95% confidence intervals. Until then-

Keep Dancin'

Steven J





Thursday, September 1, 2011

Forecasting In R: A New Hope with AR(10)

In our last post we determined that the ARIMA(2,2,2) model was just plain not going to work for us.  Although i didn't show its residuals failed to pass the acf and pacf test for white noise and the mean of its residuals was greater than three when it should have been much closer to zero.
Today we discover that an AR(10) of the de-trended GDP series may be the best option at hand.  Normally when we do model selection we start with the one that has the lowest AIC and then proceed to test the error terms (or residuals) for white noise. Lets take a look at the model specs for the AR(10):


> model7<-arima(dt,order=c(10,0,0))

> model7

Call:
arima(x = dt, order = c(10, 0, 0))

Coefficients:
         ar1      ar2            ar3       ar4           ar5     ar6           ar7      ar8        ar9     ar10
      1.5220  -0.4049  -0.2636  0.2360  -0.2132  0.1227  -0.0439  -0.0958  0.3244  -0.2255
s.e.  0.0604   0.1105   0.1131  0.1143   0.1154  0.1147   0.1139   0.1127  0.1111   0.0627
      intercept
       -21.6308
s.e.    57.5709

sigma^2 estimated as 1452:  log likelihood = -1307.76,  aic = 2639.52

The Ljung-Box Q test checks out to the 20th lag: 

> Box.test(model7$res,lag=20,type="Ljung-Box")

Box-Ljung test

data:  model7$res 
X-squared = 15.0909, df = 20, p-value = 0.7712

It even checks out to the 30th lag! I changed the way Iplotted the Ljung-Box Q values because after finding this little function called "LBQPlot" which makes life way easier.

> LBQPlot(model7$res,lag.max=30,StartLag=1)
Most importantly both the ACF and the PACF of the residuals check out for the white noise process.  In the ARIMA(2,2,2) model these weren't even close to what we wanted them to be.

> par(mfrow=c(2,1))
> acf(model7$res)
> pacf(model7$res)



Unfortunately, our residuals continue to fail the formal tests for normality. I don't really know what to do about this or even what the proper explanation is, but I have a feeling that these tests are very very sensitive. 

> jarque.bera.test(model7$res)

Jarque Bera Test

data:  model7$res 
X-squared = 7507.325, df = 2, p-value < 2.2e-16

> shapiro.test(model7$res)

Shapiro-Wilk normality test

data:  model7$res 
W = 0.7873, p-value < 2.2e-16

The mean is also considerably closer to 0 but just not quite there.

> mean(model7$res)
[1] 0.7901055

Take a look at the plot for normality:

> qqnorm(model7$res)
> qqline(model7$res)
In the next post we are going to test how good our model actually is.  Today we found our optimal choice in terms of model specs, but we also should see how well it can forecast past values of GDP.  In addition to evaluating our models past accuracy we will also practice forecasting into the future.  As you continue to read these posts, you should be getting significantly better with R- I know I am!  We have covered many new codes that stem from many different libraries. I would like to keep on doing analysis in R so after we finish forecasting GDP I think I may move on to some Econometrics. Please keep forecasting and most certainly keep dancin',

Steven J.