cao {VGAM} | R Documentation |
A constrained additive ordination (CAO) model is fitted using the reduced-rank vector generalized additive model (RR-VGAM) framework.
cao(formula, family, data = list(), weights = NULL, subset = NULL, na.action = na.fail, etastart = NULL, mustart = NULL, coefstart = NULL, control = cao.control(...), offset = NULL, method = "cao.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, contrasts = NULL, constraints = NULL, extra = NULL, qr.arg = FALSE, smart = TRUE, ...)
The arguments of cao
are a mixture of those from
vgam
and cqo
, but with some extras
in cao.control
. Currently, not all of the following
arguments work properly.
formula |
a symbolic description of the model to be fit. The RHS of the
formula is used to construct the latent variables, upon which the
smooths are applied. All the variables in the formula are used
for the construction of latent variables except for those specified
by the argument Norrr , which is itself a formula. The LHS
of the formula contains the response variables, which should be a
matrix with each column being a response (species).
|
family |
a function of class "vglmff" (see vglmff-class )
describing what statistical model is to be fitted. This is called a
``VGAM family function''. See CommonVGAMffArguments
for general information about many types of arguments found in this
type of function.
See cqo for a list of those presently implemented.
|
data |
an optional data frame containing the variables in the model.
By default the variables are taken from environment(formula) ,
typically the environment from which cao is called.
|
weights |
an optional vector or matrix of (prior) weights to be used in the
fitting process. For cao , this argument currently should
not be used.
|
subset |
an optional logical vector specifying a subset of observations to
be used in the fitting process.
|
na.action |
a function which indicates what should happen when the data contain
NA s. The default is set by the na.action setting of
options , and is na.fail if that is unset.
The ``factory-fresh'' default is na.omit .
|
etastart |
starting values for the linear predictors. It is a M-column
matrix. If M=1 then it may be a vector. For cao ,
this argument currently should not be used.
|
mustart |
starting values for the fitted values. It can be a vector or a
matrix. Some family functions do not make use of this argument.
For cao , this argument currently should not be used.
|
coefstart |
starting values for the coefficient vector. For cao , this
argument currently should not be used.
|
control |
a list of parameters for controlling the fitting process.
See cao.control for details.
|
offset |
a vector or M-column matrix of offset values. These are
a priori known and are added to the linear predictors during
fitting. For cao , this argument currently should not be used.
|
method |
the method to be used in fitting the model. The default
(and presently only) method cao.fit uses iteratively
reweighted least squares (IRLS) within FORTRAN code called from
optim .
|
model |
a logical value indicating whether the model frame should be
assigned in the model slot.
|
x.arg, y.arg |
logical values indicating whether the model matrix and response
vector/matrix used in the fitting process should be assigned in the
x and y slots. Note the model matrix is the linear
model (LM) matrix.
|
contrasts |
an optional list. See the contrasts.arg of
model.matrix.default .
|
constraints |
an optional list of constraint matrices. For cao , this
argument currently should not be used. The components of the list
must be named with the term it corresponds to (and it must match
in character format). Each constraint matrix must have M
rows, and be of full-column rank. By default, constraint matrices
are the M by M identity matrix unless arguments in the
family function itself override these values. If constraints
is used it must contain all the terms; an incomplete list is
not accepted.
|
extra |
an optional list with any extra information that might be needed by
the family function. For cao , this argument currently should
not be used.
|
qr.arg |
For cao , this argument currently should not be used.
|
smart |
logical value indicating whether smart prediction
(smartpred ) will be used.
|
... |
further arguments passed into cao.control .
|
CAO can be loosely be thought of as the result of fitting generalized
additive models (GAMs) to several responses (e.g., species) against
a very small number of latent variables. Each latent variable is a
linear combination of the explanatory variables; the coefficients
C (called C below) are called constrained
coefficients or canonical coefficients, and are interpreted as
weights or loadings. The C are estimated by maximum likelihood
estimation. It is often a good idea to apply scale
to each explanatory variable first.
For each response (e.g., species), each latent variable is smoothed
by a cubic smoothing spline, thus CAO is data-driven. If each smooth
were a quadratic then CAO would simplify to constrained quadratic
ordination (CQO; formerly called canonical Gaussian ordination
or CGO).
If each smooth were linear then CAO would simplify to constrained
linear ordination (CLO). CLO can theoretically be fitted with
cao
by specifying df1.nl=0
, however it is more efficient
to use rrvglm
.
Currently, only Rank=1
is implemented, and only
Norrr = ~1
models are handled.
With binomial data, the default formula is
logit(P[Y_s=1]) = eta_s = f_s(nu), s=1,2,...,S
where x_2 is a vector of environmental variables, and
nu=C^T x_2 is a R-vector of latent variables.
The eta_s is an additive predictor for species s,
and it models the probabilities of presence as an additive model on
the logit scale. The matrix C is estimated from the data, as
well as the smooth functions f_s. The argument Norrr = ~
1
specifies that the vector x_1, defined for RR-VGLMs
and QRR-VGLMs, is simply a 1 for an intercept.
Here, the intercept in the model is absorbed into the functions.
A cloglog
link may be preferable over a
logit
link.
With Poisson count data, the formula is
log(E[Y_s]) = eta_s = f_s(nu)
which models the mean response as an additive models on the log scale.
The fitted latent variables (site scores) are scaled to have
unit variance. The concept of a tolerance is undefined for
CAO models, but the optima and maxima are defined. The generic
functions Max
and Opt
should work for
CAO objects, but note that if the maximum occurs at the boundary then
Max
will return a NA
. Inference for CAO models
is currently undeveloped.
An object of class "cao"
(this may change to "rrvgam"
in the future).
Several generic functions can be applied to the object, e.g.,
Coef
, ccoef
, lvplot
,
summary
.
CAO models present a difficult optimization problem, partly because the
log-likelihood function contains many local solutions. To obtain the
(global) solution the user is advised to try many initial values. This
can be done by setting Bestof
some appropriate value (see
cao.control
). Trying many initial values becomes
progressively more important as the nonlinear degrees of freedom of
the smooths increase. Use set.seed
to make your
results reproducible.
Currently the dispersion parameter for a
gaussianff
CAO model is estimated slightly differently
and may be slightly biassed downwards (usually a little too small).
CAO models are computationally expensive, therefore setting trace
= TRUE
is a good idea, as well as running it on a simple random sample
of the data set instead.
Sometimes the IRLS algorithm does not converge within the FORTRAN code. This results in warnings being issued. In particular, if an error code of 3 is issued, then this indicates the IRLS algorithm has not converged. One possible remedy is to increase or decrease the nonlinear degrees of freedom so that the curves become more or less flexible, respectively.
T. W. Yee
Yee, T. W. (2006) Constrained additive ordination. Ecology, 87, 203–213.
Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.
cao.control
,
Coef.cao
,
cqo
,
lv
,
Opt
,
Max
,
lv
,
persp.cao
,
poissonff
,
binomialff
,
negbinomial
,
gamma2
,
gaussianff
,
set.seed
,
gam
.
data(hspider) hspider[,1:6] = scale(hspider[,1:6]) # Standardized environmental vars set.seed(149) ap1 = cao(cbind(Pardlugu, Pardmont, Pardnigr, Pardpull) ~ WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux, family = poissonff, data = hspider, Rank = 1, df1.nl = c(Pardpull=2.7, 2.5), Bestof = 7, Crow1positive = FALSE) sort(ap1@misc$deviance.Bestof) # A history of all the iterations Coef(ap1) ccoef(ap1) ## Not run: par(mfrow=c(2,2)) plot(ap1) # All the curves are unimodal; some quite symmetric par(mfrow=c(1,1), las=1) index = 1:ncol(ap1@y) lvplot(ap1, lcol=index, pcol=index, y=TRUE) trplot(ap1, label=TRUE, col=index) abline(a=0, b=1, lty=2) trplot(ap1, label=TRUE, col="blue", log="xy", whichSp=c(1,3)) abline(a=0, b=1, lty=2) persp(ap1, col=index, lwd=2, label=TRUE) abline(v=Opt(ap1), lty=2, col=index) abline(h=Max(ap1), lty=2, col=index) ## End(Not run)