In a previous tutorial I went over two packages for ordered logits – polr() and vglm() – and walked through some of the basics for running models in both. In those models I focused on more classical cumulative ratio models, but now I am going to introduce continuation ratio models, also called a sequential model. A good review can be found in Chapters 4 and 5 of O’Connell’s Logistic Regression Models for Ordinal Response Variables. In short though, the added value is this: Unlike the more common cumulative ordered logistic regression (i.e. ordered logit), where the model is estimating the likelihood of being at or above a certain category (e.g. P(Y>=2)), a sequential logit evaluates the likelihood of being in a particular level against the likelihood of being in the level just below it (e.g. P[Y>2|Y>=2]).
Let’s see how we can run this models using the VGAM package.
library(VGAM)
Loading required package: stats4
Loading required package: splines
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
I’ll be using the same data set described in the previous tutorial, farmer interest in adopting a new technology.
head(df)
interest acres income education econ.value envt.value info.count
1 Not 4.84 NA NA 4.33 3.6 1
2 Somewhat 3.01 1 5 4.00 3.0 1
3 Somewhat 4.17 3 2 NA NA 1
4 <NA> 4.40 3 3 5.00 NA 3
5 Not 2.44 4 7 4.67 3.6 9
6 Very 6.64 2 7 2.00 4.2 5
If we wanted to run a more classic, cumulative model, the model specifications would be as follows, namely specifying our family as “cumulative”. Note we are also specifying that parallel = T for now, indicating that we assume that the slopes between stages are parallel, and that we reverse the order of our stages because vglm() oddly defaults to descending order.
Call:
vglm(formula = ordered(interest) ~ acres + income + education +
econ.value + envt.value + info.count, family = cumulative(parallel = T,
reverse = T), data = df)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 0.88002 0.54529 1.614 0.10656
(Intercept):2 -1.01208 0.54711 -1.850 0.06433 .
acres 0.06972 0.06660 1.047 0.29517
income -0.18702 0.07119 -2.627 0.00861 **
education -0.02324 0.06035 -0.385 0.70024
econ.value -0.50212 0.10343 -4.855 1.20e-06 ***
envt.value 0.57938 0.09616 6.025 1.69e-09 ***
info.count 0.14728 0.03288 4.479 7.51e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Names of linear predictors: logitlink(P[Y>=2]), logitlink(P[Y>=3])
Residual deviance: 1002.928 on 1010 degrees of freedom
Log-likelihood: -501.4641 on 1010 degrees of freedom
Number of Fisher scoring iterations: 5
No Hauck-Donner effect found in any of the estimates
Exponentiated coefficients:
acres income education econ.value envt.value info.count
1.0722115 0.8294304 0.9770324 0.6052435 1.7849227 1.1586799
The switch to the sequential model takes just a change in the family, now specifying it as “cratio”, and no longer reversing the order.
seqntl.model <-vglm(ordered(interest) ~ acres + income + education + econ.value + envt.value + info.count, data = df, family=cratio(parallel = T))summary(seqntl.model)
Call:
vglm(formula = ordered(interest) ~ acres + income + education +
econ.value + envt.value + info.count, family = cratio(parallel = T),
data = df)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 1.11877 0.49400 2.265 0.0235 *
(Intercept):2 -0.28988 0.49660 -0.584 0.5594
acres 0.05510 0.05989 0.920 0.3576
income -0.16453 0.06407 -2.568 0.0102 *
education -0.03270 0.05432 -0.602 0.5472
econ.value -0.46247 0.09476 -4.881 1.06e-06 ***
envt.value 0.49286 0.08635 5.708 1.14e-08 ***
info.count 0.13283 0.03000 4.427 9.54e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Names of linear predictors: logitlink(P[Y>1|Y>=1]), logitlink(P[Y>2|Y>=2])
Residual deviance: 1004.007 on 1010 degrees of freedom
Log-likelihood: -502.0035 on 1010 degrees of freedom
Number of Fisher scoring iterations: 5
No Hauck-Donner effect found in any of the estimates
Exponentiated coefficients:
acres income education econ.value envt.value info.count
1.0566490 0.8482910 0.9678279 0.6297238 1.6369976 1.1420599
Here we see that the results between the two models are quite similar, though the effect sizes are slightly smaller in the sequential model with less significance. The main difference is in the conditions of the linear predictors (as noted above). Deciding which family of ordered logits is right for your model may depend on model fit, or perhaps theoretically with the nature of the variable. If you believe that the stages you are evaluating are predicated on going through the previous stage, then a sequential model might be for you.
Also important to note in vglm() is that we cannot rely on the brant() function as we can with polr models. So, testing our proportional odds assumption (that is, the assumption that parallel = T), takes a bit of manual work. Let’s first specify the model without making the proportional odds assumption.
seqntl.model.nonprop =vglm(ordered(interest) ~ acres + income + education + econ.value + envt.value + info.count, data = df, family=cratio(parallel = F))summary(seqntl.model.nonprop)
Call:
vglm(formula = ordered(interest) ~ acres + income + education +
econ.value + envt.value + info.count, family = cratio(parallel = F),
data = df)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 -0.883092 0.662736 -1.332 0.182698
(Intercept):2 2.230679 0.792002 2.817 0.004855 **
acres:1 0.093624 0.085864 1.090 0.275547
acres:2 0.006717 0.086027 0.078 0.937768
income:1 -0.223318 0.093201 -2.396 0.016572 *
income:2 -0.109974 0.091128 -1.207 0.227502
education:1 0.029837 0.076562 0.390 0.696746
education:2 -0.092454 0.079330 -1.165 0.243841
econ.value:1 -0.154035 0.129175 -1.192 0.233081
econ.value:2 -0.767367 0.144987 -5.293 1.21e-07 ***
envt.value:1 0.633108 0.114018 5.553 2.81e-08 ***
envt.value:2 0.248064 0.138626 1.789 0.073543 .
info.count:1 0.110310 0.045887 2.404 0.016220 *
info.count:2 0.153613 0.040627 3.781 0.000156 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Names of linear predictors: logitlink(P[Y>1|Y>=1]), logitlink(P[Y>2|Y>=2])
Residual deviance: 980.3759 on 1004 degrees of freedom
Log-likelihood: -490.188 on 1004 degrees of freedom
Number of Fisher scoring iterations: 5
No Hauck-Donner effect found in any of the estimates
Exponentiated coefficients:
acres:1 acres:2 income:1 income:2 education:1 education:2
1.0981470 1.0067392 0.7998604 0.8958571 1.0302870 0.9116913
econ.value:1 econ.value:2 envt.value:1 envt.value:2 info.count:1 info.count:2
0.8572417 0.4642337 1.8834549 1.2815419 1.1166244 1.1660390
A first good sign is that we did not encounter any errors. In some cases, vglm() cannot fit the non-proportional model if it really isn’t suited. If the model runs, however, we still want to check the assumption. Below is a method for testing whether or not the model overall meets the assumption (though it does not provide p-values for specific variables). We calculate the p-value against the hypothesis that the proportion odds assumption is correct.
dev <-deviance(seqntl.model) -deviance(seqntl.model.nonprop)degf <- seqntl.model@df.residual - seqntl.model.nonprop@df.residualp.value <-1-pchisq(q = dev , df = degf)p.value