vgam {VGAM}R Documentation

Fitting Vector Generalized Additive Models

Description

Fit a vector generalized additive model (VGAM). This is a large class of models that includes generalized additive models (GAMs) and vector generalized linear models (VGLMs) as special cases.

Usage

vgam(formula, family, data = list(), weights = NULL, subset = NULL, 
     na.action = na.fail, etastart = NULL, mustart = NULL, 
     coefstart = NULL, control = vgam.control(...), offset = NULL, 
     method = "vgam.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, 
     contrasts = NULL, constraints = NULL, 
     extra = list(), qr.arg = FALSE, smart = TRUE, ...)

Arguments

In the following, M is the number of additive predictors.

formula a symbolic description of the model to be fit. The RHS of the formula is applied to each linear/additive predictor. Different variables in each linear/additive predictor can be chosen by specifying constraint matrices.
family a function of class "vglmff" (see vglmff-class) describing what statistical model is to be fitted. This is called a ``VGAM family function''. See CommonVGAMffArguments for general information about many types of arguments found in this type of function.
data an optional data frame containing the variables in the model. By default the variables are taken from environment(formula), typically the environment from which vgam is called.
weights an optional vector or matrix of (prior) weights to be used in the fitting process. If weights is a matrix, then it must be in matrix-band form, whereby the first M columns of the matrix are the diagonals, followed by the upper-diagonal band, followed by the band above that, etc. In this case, there can be up to M(M+1) columns, with the last column corresponding to the (1,M) elements of the weight matrices.
subset an optional logical vector specifying a subset of observations to be used in the fitting process.
na.action a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ``factory-fresh'' default is na.omit.
etastart starting values for the linear/additive predictors. It is a M-column matrix. If M=1 then it may be a vector.
mustart starting values for the fitted values. It can be a vector or a matrix. Some family functions do not make use of this argument.
coefstart starting values for the coefficient vector.
control a list of parameters for controlling the fitting process. See vgam.control for details.
offset a vector or M-column matrix of offset values. These are a priori known and are added to the linear/additive predictors during fitting.
method the method to be used in fitting the model. The default (and presently only) method vgam.fit uses iteratively reweighted least squares (IRLS).
model a logical value indicating whether the model frame should be assigned in the model slot.
x.arg, y.arg logical values indicating whether the model matrix and response vector/matrix used in the fitting process should be assigned in the x and y slots. Note the model matrix is the LM model matrix; to get the VGAM model matrix type model.matrix(vgamfit) where vgamfit is a vgam object.
contrasts an optional list. See the contrasts.arg of model.matrix.default.
constraints an optional list of constraint matrices. The components of the list must be named with the term it corresponds to (and it must match in character format exactly). Each constraint matrix must have M rows, and be of full-column rank. By default, constraint matrices are the M by M identity matrix unless arguments in the family function itself override these values. If constraints is used it must contain all the terms; an incomplete list is not accepted.
extra an optional list with any extra information that might be needed by the VGAM family function.
qr.arg logical value indicating whether the slot qr, which returns the QR decomposition of the VLM model matrix, is returned on the object.
smart logical value indicating whether smart prediction (smartpred) will be used.
... further arguments passed into vgam.control.

Details

A vector generalized additive model (VGAM) is loosely defined as a statistical model that is a function of M additive predictors. The central formula is given by

eta_j = sum_{k=1}^p f_{(j)k}(x_k)

where x_k is the kth explanatory variable (almost always x_1=1 for the intercept term), and f_{(j)k} are smooth functions of x_k that are estimated by smoothers. The first term in the summation is just the intercept. Currently only one type of smoother is implemented and this is called a vector (cubic smoothing spline) smoother. Here, j=1,...,M where M is finite. If all the functions are constrained to be linear then the resulting model is a vector generalized linear model (VGLM). VGLMs are best fitted with vglm.

Vector (cubic smoothing spline) smoothers are represented by s() (see s). Local regression via lo() is not supported. The results of vgam will differ from the S-PLUS and R gam function (in the gam R package) because vgam uses a different knot selection algorithm. In general, fewer knots are chosen because the computation becomes expensive when the number of additive predictors M is large.

The underlying algorithm of VGAMs is iteratively reweighted least squares (IRLS) and modified vector backfitting using vector splines. B-splines are used as the basis functions for the vector (smoothing) splines. vgam.fit is the function that actually does the work. The smoothing code is based on F. O'Sullivan's BART code.

A closely related methodology based on VGAMs called constrained additive ordination (CAO) first forms a linear combination of the explanatory variables (called latent variables) and then fits a GAM to these. This is implemented in the function cao for a very limited choice of family functions.

Value

An object of class "vgam" (see vgam-class for further information).

Note

This function can fit a wide variety of statistical models. Some of these are harder to fit than others because of inherent numerical difficulties associated with some of them. Successful model fitting benefits from cumulative experience. Varying the values of arguments in the VGAM family function itself is a good first step if difficulties arise, especially if initial values can be inputted. A second, more general step, is to vary the values of arguments in vgam.control. A third step is to make use of arguments such as etastart, coefstart and mustart.

Some VGAM family functions end in "ff" to avoid interference with other functions, e.g., binomialff, poissonff, gaussianff, gammaff. This is because VGAM family functions are incompatible with glm (and also gam in the gam library and gam in the mgcv library).

The smart prediction (smartpred) library is packed with the VGAM library.

The theory behind the scaling parameter is currently being made more rigorous, but it it should give the same value as the scale parameter for GLMs.

Author(s)

Thomas W. Yee

References

Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481–493.

Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.

See Also

vgam.control, vgam-class, vglmff-class, plotvgam, vglm, s, vsmooth.spline, cao.

Examples

# Nonparametric proportional odds model 
data(pneumo)
pneumo = transform(pneumo, let=log(exposure.time))
vgam(cbind(normal,mild,severe) ~ s(let), cumulative(par=TRUE), pneumo)

# Nonparametric logistic regression 
data(hunua) 
fit = vgam(agaaus ~ s(altitude, df=2), binomialff, hunua)
## Not run: 
plot(fit, se=TRUE)
## End(Not run)

# Fit two species simultaneously 
fit2 = vgam(cbind(agaaus, kniexc) ~ s(altitude, df=c(2,3)),
            binomialff(mv=TRUE), hunua)
coef(fit2, mat=TRUE)   # Not really interpretable 
## Not run: 
plot(fit2, se=TRUE, overlay=TRUE, lcol=1:2, scol=1:2)
attach(hunua)
o = order(altitude)
matplot(altitude[o], fitted(fit2)[o,], type="l", lwd=2, las=1,
    xlab="Altitude (m)", ylab="Probability of presence",
    main="Two plant species' response curves", ylim=c(0,.8))
rug(altitude)
detach(hunua)
## End(Not run)

[Package VGAM version 0.7-7 Index]