3.1.6. Independence of Residuals
Independence of errors were tested to observe if the responses of different cases were independent of each other. If errors are not independent it means that, logistic regression produces an over dispersion (great variability) effects. Because of this, it was necessary for the assumption to be tested.
Independence of errors in logistic regression assumes a between subjects design. The plot of residual and lag of residual in Figure 2 showed that, there was a slightly pattern among errors indicating that, variance was non-random. Severity of the problem of an over-dispersion, was assessed basing on Pearson and Deviance statistics. A Pearson value was found out to be 923.544 while Deviance statistic was 886.319. Coefficient of variation (CV) of the two values was found to be 0.029 obtained from
. Since, it
indicates that, discrepancy of Pearson and Deviance was low (923.544 – 886.319 = 37.225). This suggests that, an over-dispersion was not a problem.
Figure 2: Residual Lag Plot
3.2. Assessment of Contribution of Variables in the Model
This section discusses the importance of variables to the model. Two inferential tests were used: the tests of model and the individual predictors.
3.2.1. Goodness of Fit of Models
At first, the independent variables were tested by comparing with the constant only model with the full model (constant and all variables), to find out whether they contributed to the prediction of the outcome. The two models were compared through -2log-likelihoods (equations 8 and 9), Akaike Information Criterion (AIC) (equation 10) and Chi Square.
All these tests are used to find out which model approximate the reality given to the set of data. The idea of comparing models was to find out the model with a minimal loss of information than others. Statistical significance difference between the models indicates the relationship between the predictors and the outcome 15.
Log-likelihood is calculated basing on summing the probabilities associated with the predicted and the actual outcomes for each case:
Log-likelihood is multiplied by -2 in order to have a statistic that is distributed as chi-square.
AIC was developed from Kullback-Leibler Information (KLI). This represents the information lost when approximating the reality 8. AIC establishes the relationship between the Maximum Likelihood and KLI information which is defined as;
Whereby K, is the number of estimated parameters included in the model (variables and constant).
The output showed that, AIC value for the baseline model (model which contained only a constant) was 1,189.846 while for the full model (model contained constant and variables) was 923.125. Since the full model had a lower value than the constant model, there were indications that, the full model was a better fit. -2log-likelihoods values were found to be 1187.846 for the baseline model when compared to 885.125 for the full model. Like AIC, the lower value indicated a model fit. There were indications that, a full model fitted data well when compared to the baseline model.
Furthermore, it was found out that, the baseline model was accurate by 59.1%. This indicated that by nature the model had a predictive power. Omnibus tests of the model coefficients showed further that, the chi-square was significant; at 0.05. This is an indication that, there was a significant difference between the log-likelihoods between the baseline model and the new model (full model).
Because the difference was significant, it implied that, the new model was improved as it had significantly reduced -2LL compared to the baseline model. Variance in the outcome variable is more explained in the new model. Nagelkerke suggests that, the model explains about 39% of the variation in the outcome. The Hosmer-Lemeshow statistic, indicates a good fit because (A significant test indicates that, the model is not a good fit and a non-significant test indicates a good fit). The tests show that, the new model adequately fits the data. The classification rate accuracy of the new model was improved as it stood at 73.6% when compared to 59.1% of the baseline model.
3.2.2. Tests of Individual Variables
In order for any model to be lively, apart from other things, it requires to contain variables which have enough contribution. Beside to this, outcomes generated will be of doubt. There are several ways of testing whether the variables included in the model have required contribution. One of the most common methods of testing variables to be included in the model is through the p-value.
Despite of being widely used, p-values suffer several shortcomings. They simply give a cut-off beyond which the conclusion is reached whether to reject a null hypothesis or not. 2 argue that, non-significant results do not imply that there is no effect. Statistical significant results also do not necessarily imply that, the effect is physical. The importance of variables in the model is determined by size of the effects and not statistical significance.
Effects size also known as the standardized mean difference, are measured in various ways such as an absolute risk reduction, relative risk reduction, relative risk, odds ratio 2. Odds ratio is used for binary or categorical outcomes 9.
The examination of variables in the model was done by basing on the p-values and odds ratio. Thereafter, propensity scores were generated to observe if there was any significant difference. From Table 6, it can be seen that, out of the twelve variables included in the model, seven were significant since their p-values were less than 0.05. These are sex (p = 0.002), marital status (p=0.000), household size (p=0.029), distance to corn farm (p=0.001), distance to district (p=0.000), distance to tarmac (p=0.023) and participation in other PFG (p=0.000). This statistical significance of the coefficients was based on Wald test, that is,
Odds ratio (OR) shows that, out of the twelve variables, seven had odds ratio values greater than or equal to 1 which indicated that, their contribution to the model was positive. The variables are sex (OR =1.672), Age (OR = 1.000), marital status (OR = 2.557), household size (OR = 1.073), land (OR = 1.020), distance to district (OR = 1.023) and weather (OR = 1.072).
Results show that, it is not necessary for a variable to be significant and at the same time having an odd ratio of 1 or above. Only four variables were both significant and had positive odds ratio namely; sex, marital status, household size and distance to district. The significant variables are bolded as presented in Table 6.