Using and interpreting output from gvlma

PCUnique picture PCUnique · Apr 6, 2017 · Viewed 8.8k times · Source

I want to test whether all assumptions for my linear regression model hold. I did this manually and it seems to be fine. However, I want to double check with the function gvlma. The output I get is:

 gvlma(x = m_lag) 

                Value p-value                   Decision
 Global Stat        82.475 0.00000 Assumptions NOT satisfied!
 Skewness           72.378 0.00000 Assumptions NOT satisfied!
 Kurtosis            1.040 0.30778    Assumptions acceptable.
 Link Function       6.029 0.01407 Assumptions NOT satisfied!
 Heteroscedasticity  3.027 0.08187    Assumptions acceptable.

My question is:

  1. How do I interpret Global Stat

  2. Since the assumption is violated, what can I do about it now? (Same with the other 2 assumptions which were not accepted)

Answer

kamran kausar picture kamran kausar · Dec 8, 2017
  1. Global Stat- Are the relationships between your X predictors and Y roughly linear?. Rejection of the null (p < .05) indicates a non-linear relationship between one or more of your X’s and Y

  2. Skewness - Is your distribution skewed positively or negatively, necessitating a transformation to meet the assumption of normality? Rejection of the null (p < .05) indicates that you should likely transform your data.

  3. Kurtosis- Is your distribution kurtotic (highly peaked or very shallowly peaked), necessitating a transformation to meet the assumption of normality? Rejection of the null (p < .05) indicates that you should likely transform your data.

  4. Link Function- Is your dependent variable truly continuous, or categorical? Rejection of the null (p < .05) indicates that you should use an alternative form of the generalized linear model (e.g. logistic or binomial regression).

  5. Heteroscedasticity- Is the variance of your model residuals constant across the range of X (assumption of homoscedastiity)? Rejection of the null (p < .05) indicates that your residuals are heteroscedastic, and thus non-constant across the range of X. Your model is better/worse at predicting for certain ranges of your X scales.