multinomial {VGAM}R Documentation

Multinomial Logit Model

Description

Fits a multinomial logit model to an unordered factor response.

Usage

multinomial(zero = NULL, parallel = FALSE, nointercept = NULL)

Arguments

In the following, the response Y is assumed to be a factor with unordered values 1,2,...,M+1, so that M is the number of linear/additive predictors eta_j.

zero An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set {1,2,...,M}. The default value means none are modelled as intercept-only terms.
parallel A logical, or formula specifying which terms have equal/unequal coefficients.
nointercept An integer-valued vector specifying which linear/additive predictors have no intercepts. The values must be from the set {1,2,...,M}.

Details

The model can be written

eta_j = log(P[Y=j]/ P[Y=M+1])

where eta_j is the jth linear/additive predictor. Here, j=1,...,M and eta_{M+1} is 0 by definition. That is, the last level of the factor, or last column of the response matrix, is taken as the reference level or baseline—this is for identifiability of the parameters.

In almost all the literature, the constraint matrices associated with this family of models are known. For example, setting parallel=TRUE will make all constraint matrices (except for the intercept) equal to a vector of M 1's. If the constraint matrices are unknown and to be estimated, then this can be achieved by fitting the model as a reduced-rank vector generalized linear model (RR-VGLM; see rrvglm). In particular, a multinomial logit model with unknown constraint matrices is known as a stereotype model (Anderson, 1984), and can be fitted with rrvglm.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

Warning

The arguments zero and nointercept can be inputted with values that fail. For example, multinomial(zero=2, nointercept=1:3) means the second linear/additive predictor is identically zero, which will cause a failure.

Be careful about the use of other potentially contradictory constraints, e.g., multinomial(zero=2, parallel = TRUE ~ x3). If in doubt, apply constraints() to the fitted object to check.

No check is made to verify that the response is nominal.

Note

The response should be either a matrix of counts (with row sums that are all positive), or a factor. In both cases, the y slot returned by vglm/vgam/rrvglm is the matrix of counts.

The multinomial logit model is more appropriate for a nominal (unordered) factor response. For an ordinal (ordered) factor response, models such as those based on cumulative probabilities (see cumulative) are more suited.

multinomial is prone to numerical difficulties if the groups are separable and/or the fitted probabilities are close to 0 or 1. The fitted values returned are estimates of the probabilities P[Y=j] for j=1,...,M+1.

Here is an example of the usage of the parallel argument. If there are covariates x1, x2 and x3, then parallel = TRUE ~ x1 + x2 -1 and parallel = FALSE ~ x3 are equivalent. This would constrain the regression coefficients for x1 and x2 to be equal; those of the intercepts and x3 would be different.

In Example 4 below, a conditional logit model is fitted to a artificial data set that explores how cost and travel time affect people's decision about how to travel to work. Walking is the baseline group. The variable Cost.car is the difference between the cost of travel to work by car and walking, etc. The variable Durn.car is the difference between the travel duration/time to work by car and walking, etc. For other details about the xij argument see vglm.control and fill.

The multinom function in the nnet package uses the first level of the factor as baseline, whereas the last level of the factor is used here. Consequently the estimated regression coefficients differ.

Author(s)

Thomas W. Yee

References

Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15–41.

McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd ed. London: Chapman & Hall.

Agresti, A. (2002) Categorical Data Analysis, 2nd ed. New York: Wiley.

Simonoff, J. S. (2003) Analyzing Categorical Data, New York: Springer-Verlag.

Anderson, J. A. (1984) Regression and ordered categorical variables. Journal of the Royal Statistical Society, Series B, Methodological, 46, 1–30.

Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.

See Also

acat, cumulative, cratio, sratio, dirichlet, dirmultinomial, rrvglm, Multinomial, iris.

Examples

# Example 1: fit a multinomial logit model to Edgar Anderson's iris data
data(iris)
## Not run: 
fit = vglm(Species ~ ., multinomial, iris)
coef(fit, matrix=TRUE) 
## End(Not run)

# Example 2a: a simple example 
y = t(rmultinom(10, size = 20, prob=c(0.1,0.2,0.8))) # Counts
fit = vglm(y ~ 1, multinomial)
fitted(fit)[1:4,]   # Proportions
fit@prior.weights # Not recommended for extraction of prior weights
weights(fit, type="prior", matrix=FALSE) # The better method
fit@y   # Sample proportions
constraints(fit)   # Constraint matrices

# Example 2b: Different input to Example 2a but same result
w = apply(y, 1, sum) # Prior weights
yprop = y / w    # Sample proportions
fitprop = vglm(yprop ~ 1, multinomial, weights=w)
fitted(fitprop)[1:4,]   # Proportions
weights(fitprop, type="prior", matrix=FALSE)
fitprop@y # Same as the input

# Example 3: Fit a rank-1 stereotype model 
data(car.all)
fit = rrvglm(Country ~ Width + Height + HP, multinomial, car.all, Rank=1)
coef(fit)   # Contains the C matrix
constraints(fit)$HP     # The A matrix 
coef(fit, matrix=TRUE)  # The B matrix
Coef(fit)@C             # The C matrix 
ccoef(fit)              # Better to get the C matrix this way
Coef(fit)@A             # The A matrix 
svd(coef(fit, matrix=TRUE)[-1,])$d    # This has rank 1; = C 

# Example 4: The use of the xij argument (conditional logit model)
set.seed(111)
n = 100  # Number of people who travel to work
M = 3  # There are M+1 models of transport
ymat = matrix(0, n, M+1)
ymat[cbind(1:n, sample(x=M+1, size=n, replace=TRUE))] = 1
dimnames(ymat) = list(NULL, c("bus","train","car","walk"))
transport = data.frame(cost.bus=runif(n), cost.train=runif(n),
                       cost.car=runif(n), cost.walk=runif(n),
                       durn.bus=runif(n), durn.train=runif(n),
                       durn.car=runif(n), durn.walk=runif(n))
transport = round(transport, dig=2) # For convenience
transport = transform(transport,
                      Cost.bus   = cost.bus   - cost.walk,
                      Cost.car   = cost.car   - cost.walk,
                      Cost.train = cost.train - cost.walk,
                      Durn.bus   = durn.bus   - durn.walk,
                      Durn.car   = durn.car   - durn.walk,
                      Durn.train = durn.train - durn.walk)
fit = vglm(ymat ~ Cost.bus + Cost.train + Cost.car + 
                  Durn.bus + Durn.train + Durn.car,
           fam = multinomial,
           xij = list(Cost ~ Cost.bus + Cost.train + Cost.car,
                      Durn ~ Durn.bus + Durn.train + Durn.car),
           data=transport)
model.matrix(fit, type="lm")[1:7,]   # LM model matrix
model.matrix(fit, type="vlm")[1:7,]  # Big VLM model matrix
coef(fit)
coef(fit, matrix=TRUE)
coef(fit, matrix=TRUE, compress=FALSE)
summary(fit)

[Package VGAM version 0.7-7 Index]