vgam {VGAM} | R Documentation |
Fit a vector generalized additive model (VGAM). This is a large class of models that includes generalized additive models (GAMs) and vector generalized linear models (VGLMs) as special cases.
vgam(formula, family, data = list(), weights = NULL, subset = NULL, na.action = na.fail, etastart = NULL, mustart = NULL, coefstart = NULL, control = vgam.control(...), offset = NULL, method = "vgam.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, contrasts = NULL, constraints = NULL, extra = list(), qr.arg = FALSE, smart = TRUE, ...)
In the following, M is the number of additive predictors.
formula |
a symbolic description of the model to be fit.
The RHS of the formula is applied to each linear/additive predictor.
Different
variables in each linear/additive predictor can be chosen by specifying
constraint matrices.
|
family |
a function of class "vglmff" (see vglmff-class )
describing what statistical model is to be fitted. This is called a
``VGAM family function''. See CommonVGAMffArguments
for general information about many types of arguments found in this
type of function.
|
data |
an optional data frame containing the variables in the model.
By default the variables are taken from
environment(formula) , typically the environment from which
vgam is called.
|
weights |
an optional vector or matrix of (prior) weights
to be used in the fitting process.
If weights is a matrix, then it must be in
matrix-band form, whereby the first M
columns of the matrix are the
diagonals, followed by the upper-diagonal band, followed by the
band above that, etc. In this case, there can be up to M(M+1)
columns, with the last column corresponding to the (1,M) elements
of the weight matrices.
|
subset |
an optional logical vector specifying a subset of
observations to
be used in the fitting process.
|
na.action |
a function which indicates what should happen when
the data contain NA s.
The default is set by the na.action setting
of options , and is na.fail if that is unset.
The ``factory-fresh'' default is na.omit .
|
etastart |
starting values for the linear/additive predictors.
It is a M-column matrix. If M=1 then it may be a vector.
|
mustart |
starting values for the
fitted values. It can be a vector or a matrix.
Some family functions do not make use of this argument.
|
coefstart |
starting values for the coefficient vector.
|
control |
a list of parameters for controlling the fitting process.
See vgam.control for details.
|
offset |
a vector or M-column matrix of offset values.
These are a priori known and are added to the linear/additive
predictors during fitting.
|
method |
the method to be used in fitting the model.
The default (and presently only) method vgam.fit
uses iteratively reweighted least squares (IRLS).
|
model |
a logical value indicating whether the model frame should be
assigned in the model slot.
|
x.arg, y.arg |
logical values indicating whether the model matrix and response
vector/matrix used in the fitting process should be assigned in the
x and y slots. Note the model matrix is the LM model
matrix; to get the VGAM model matrix type model.matrix(vgamfit)
where vgamfit is a vgam object.
|
contrasts |
an optional list. See the contrasts.arg of
model.matrix.default .
|
constraints |
an optional list of constraint matrices. The components of the list
must be named with the term it corresponds to (and it must match in
character format exactly). Each constraint matrix must have M rows, and
be of full-column rank. By default, constraint matrices are the M
by M identity matrix unless arguments in the family function
itself override these values. If constraints is used it must
contain all the terms; an incomplete list is not accepted.
|
extra |
an optional list with any extra information that might be needed by
the VGAM family function.
|
qr.arg |
logical value indicating whether the slot qr , which returns
the QR decomposition of the VLM model matrix, is returned on the object.
|
smart |
logical value indicating whether smart prediction
(smartpred ) will be used.
|
... |
further arguments passed into vgam.control .
|
A vector generalized additive model (VGAM) is loosely defined as a statistical model that is a function of M additive predictors. The central formula is given by
eta_j = sum_{k=1}^p f_{(j)k}(x_k)
where x_k is the kth explanatory variable
(almost always x_1=1 for the intercept term),
and
f_{(j)k} are smooth functions of x_k that are estimated
by smoothers. The first term in the summation is just the intercept.
Currently only one type of smoother is
implemented and this is called a vector (cubic smoothing spline)
smoother.
Here, j=1,...,M where M is finite.
If all the functions are constrained to be linear then the resulting
model is a vector generalized linear model (VGLM).
VGLMs are best fitted with vglm
.
Vector (cubic smoothing spline) smoothers are represented
by s()
(see s
).
Local regression via lo()
is not
supported. The results of vgam
will differ from the S-PLUS and R
gam
function (in the gam R package) because vgam
uses a different knot selection algorithm. In general, fewer knots
are chosen because the computation becomes expensive when the number
of additive predictors M is large.
The underlying algorithm of VGAMs is iteratively
reweighted least squares (IRLS) and modified vector backfitting
using vector splines. B-splines are used as the basis functions
for the vector (smoothing) splines.
vgam.fit
is the function that actually does the work.
The smoothing code is based on F. O'Sullivan's BART code.
A closely related methodology based on VGAMs called
constrained additive ordination (CAO)
first forms a linear combination of the explanatory variables
(called latent variables) and then fits a GAM to these.
This is implemented in the function cao
for a very
limited choice of family functions.
An object of class "vgam"
(see vgam-class
for further information).
This function can fit a wide variety of statistical models. Some of
these are harder to fit than others because of inherent numerical
difficulties associated with some of them. Successful model fitting
benefits from cumulative experience. Varying the values of arguments
in the VGAM family function itself is a good first step if
difficulties arise, especially if initial values can be inputted.
A second, more general step, is to vary the values of arguments in
vgam.control
.
A third step is to make use of arguments such as etastart
,
coefstart
and mustart
.
Some VGAM family functions end in "ff"
to avoid
interference with other functions, e.g., binomialff
,
poissonff
, gaussianff
,
gammaff
. This is because VGAM family
functions are incompatible with glm
(and also gam
in the gam library and
gam
in the mgcv library).
The smart prediction (smartpred
) library is packed with
the VGAM library.
The theory behind the scaling parameter is currently being made more rigorous, but it it should give the same value as the scale parameter for GLMs.
Thomas W. Yee
Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481–493.
Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.
vgam.control
,
vgam-class
,
vglmff-class
,
plotvgam
,
vglm
,
s
,
vsmooth.spline
,
cao
.
# Nonparametric proportional odds model data(pneumo) pneumo = transform(pneumo, let=log(exposure.time)) vgam(cbind(normal,mild,severe) ~ s(let), cumulative(par=TRUE), pneumo) # Nonparametric logistic regression data(hunua) fit = vgam(agaaus ~ s(altitude, df=2), binomialff, hunua) ## Not run: plot(fit, se=TRUE) ## End(Not run) # Fit two species simultaneously fit2 = vgam(cbind(agaaus, kniexc) ~ s(altitude, df=c(2,3)), binomialff(mv=TRUE), hunua) coef(fit2, mat=TRUE) # Not really interpretable ## Not run: plot(fit2, se=TRUE, overlay=TRUE, lcol=1:2, scol=1:2) attach(hunua) o = order(altitude) matplot(altitude[o], fitted(fit2)[o,], type="l", lwd=2, las=1, xlab="Altitude (m)", ylab="Probability of presence", main="Two plant species' response curves", ylim=c(0,.8)) rug(altitude) detach(hunua) ## End(Not run)