cumulative {VGAM} | R Documentation |
Fits a cumulative logit/probit/cloglog/cauchit/... regression model to an ordered (preferably) factor response.
cumulative(link = "logit", earg = list(), parallel = FALSE, reverse = FALSE, mv = FALSE, intercept.apply = FALSE) scumulative(link="logit", earg = list(), lscale="loge", escale = list(), parallel=FALSE, sparallel=TRUE, reverse=FALSE, iscale = 1)
In the following, the response Y is assumed to be a factor
with ordered values 1,2,...,J+1.
M is the number of linear/additive predictors
eta_j;
for cumulative()
M=J,
and for scumulative()
M=2J.
link |
Link function applied to the J cumulative probabilities.
See Links for more choices.
|
lscale |
Link function applied to the J scaling parameters.
See Links for more choices.
|
earg, escale |
List. Extra argument for the respective link functions.
See earg in Links for general information.
|
parallel |
A logical or formula specifying which terms have
equal/unequal coefficients.
|
sparallel |
For the scaling parameters.
A logical, or formula specifying which terms have
equal/unequal coefficients.
This argument is not applied to the intercept.
The scumulative() function requires covariates; for
intercept models use cumulative() .
|
reverse |
Logical.
By default, the cumulative probabilities used are
P(Y<=1), P(Y<=2),
..., P(Y<=J).
If reverse is TRUE , then
P(Y>=2), P(Y>=3), ...,
P(Y>=J+1) will be used.
This should be set to TRUE for link=
golf ,
polf ,
nbolf .
For these links the cutpoints must be an increasing sequence;
if reverse=FALSE for then the cutpoints must be an decreasing sequence.
|
mv |
Logical.
Multivariate response? If TRUE then the input should be
a matrix with values 1,2,...,L, where L=J+1 is the
number of levels.
Each column of the matrix is a response, i.e., multivariate response.
A suitable matrix can be obtained from Cut .
|
intercept.apply |
Logical.
Whether the parallel argument should be applied to the intercept term.
This should be set to TRUE for link=
golf ,
polf ,
nbolf .
|
iscale |
Numeric. Initial values for the scale parameters.
|
By default, the non-parallel cumulative logit model is fitted, i.e.,
eta_j = logit(P[Y<=j])
where j=1,2,...,M and
the eta_j are not constrained to be parallel.
This is also known as the non-proportional odds model.
If the logit link is replaced by a complementary log-log link
(cloglog
) then
this is known as the proportional-hazards model.
In almost all the literature, the constraint matrices associated
with this family of models are known. For example, setting
parallel=TRUE
will make all constraint matrices (except for
the intercept) equal to a vector of M 1's.
If the constraint matrices are equal, unknown and to be estimated, then
this can be achieved by fitting the model as a
reduced-rank vector generalized
linear model (RR-VGLM; see rrvglm
).
Currently, reduced-rank vector generalized additive models
(RR-VGAMs) have not been implemented here.
The scaled version of cumulative()
, called scumulative()
,
has J positive scaling factors.
They are described in pages 154 and 177 of McCullagh and Nelder (1989);
see their equation (5.4) in particular,
which they call the generalized rational model.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
,
rrvglm
and vgam
.
No check is made to verify that the response is ordinal;
see ordered
.
The response should be either a matrix of counts (with row sums that
are all positive), or a factor. In both cases, the y
slot
returned by vglm
/vgam
/rrvglm
is the matrix
of counts.
For a nominal (unordered) factor response, the multinomial
logit model (multinomial
) is more appropriate.
With the logit link, setting parallel=TRUE
will fit a
proportional odds model. Note that the TRUE
here does
not apply to the intercept term.
In practice, the validity of the proportional odds assumption
needs to be checked, e.g., by a likelihood ratio test (LRT).
If acceptable on the data,
then numerical problems are less likely to occur during the fitting,
and there are less parameters. Numerical problems occur when
the linear/additive predictors cross, which results in probabilities
outside of (0,1); setting parallel=TRUE
will help avoid
this problem.
Here is an example of the usage of the parallel
argument.
If there are covariates x1
, x2
and x3
, then
parallel = TRUE ~ x1 + x2 -1
and
parallel = FALSE ~ x3
are equivalent. This would constrain
the regression coefficients for x1
and x2
to be
equal; those of the intercepts and x3
would be different.
In the future, this family function may be renamed to
``cups
'' (for cumulative probabilities)
or ``cute
'' (for cumulative probabilities).
Thomas W. Yee
Agresti, A. (2002) Categorical Data Analysis, 2nd ed. New York: Wiley.
Dobson, A. J. (2001) An Introduction to Generalized Linear Models, 2nd ed. Boca Raton: Chapman & Hall/CRC Press.
McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd ed. London: Chapman & Hall.
Simonoff, J. S. (2003) Analyzing Categorical Data, New York: Springer-Verlag.
Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481–493.
Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.
acat
,
cratio
,
sratio
,
multinomial
,
pneumo
,
logit
,
probit
,
cloglog
,
cauchit
,
golf
,
polf
,
nbolf
.
# Fit the proportional odds model, p.179, in McCullagh and Nelder (1989) data(pneumo) pneumo = transform(pneumo, let=log(exposure.time)) (fit = vglm(cbind(normal, mild, severe) ~ let, cumulative(parallel=TRUE, reverse=TRUE), pneumo)) fit@y # Sample proportions weights(fit, type="prior") # Number of observations coef(fit, matrix=TRUE) constraints(fit) # Constraint matrices # Check that the model is linear in let ---------------------- fit2 = vgam(cbind(normal, mild, severe) ~ s(let, df=2), cumulative(reverse=TRUE), pneumo) ## Not run: plot(fit2, se=TRUE, overlay=TRUE, lcol=1:2, scol=1:2) ## End(Not run) # Check the proportional odds assumption with a LRT ---------- (fit3 = vglm(cbind(normal, mild, severe) ~ let, cumulative(parallel=FALSE, reverse=TRUE), pneumo)) 1 - pchisq(2*(logLik(fit3)-logLik(fit)), df=length(coef(fit3))-length(coef(fit))) # A factor() version of fit ---------------------------------- nobs = round(fit@y * c(weights(fit, type="prior"))) sumnobs = apply(nobs, 2, sum) mydat = data.frame( response = ordered(c(rep("normal", times=sumnobs["normal"]), rep("mild", times=sumnobs["mild"]), rep("severe", times=sumnobs["severe"])), levels = c("normal","mild","severe")), LET = c(with(pneumo, rep(let, times=nobs[,"normal"])), with(pneumo, rep(let, times=nobs[,"mild"])), with(pneumo, rep(let, times=nobs[,"severe"])))) (fit4 = vglm(response ~ LET, data=mydat, cumulative(parallel=TRUE, reverse=TRUE), trace=TRUE))