rrvglm {VGAM} | R Documentation |
A reduced-rank vector generalized linear model (RR-VGLM) is fitted. RR-VGLMs are VGLMs but some of the constraint matrices are estimated. In this documentation, M is the number of linear predictors.
rrvglm(formula, family, data = list(), weights = NULL, subset = NULL, na.action = na.fail, etastart = NULL, mustart = NULL, coefstart = NULL, control = rrvglm.control(...), offset = NULL, method = "rrvglm.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, contrasts = NULL, constraints = NULL, extra = NULL, qr.arg = FALSE, smart = TRUE, ...)
formula |
a symbolic description of the model to be fit.
The RHS of the formula is applied to each linear predictor. Different
variables in each linear predictor can be chosen by specifying
constraint matrices.
|
family |
a function of class "vglmff" (see vglmff-class )
describing what statistical model is to be fitted. This is called a
``VGAM family function''. See CommonVGAMffArguments
for general information about many types of arguments found in this
type of function.
|
data |
an optional data frame containing the variables in the model.
By default the variables are taken from environment(formula) ,
typically the environment from which rrvglm is called.
|
weights |
an optional vector or matrix of (prior) weights
to be used in the fitting process.
If weights is a matrix, then it must be in
matrix-band form, whereby the first M
columns of the matrix are the
diagonals, followed by the upper-diagonal band, followed by the
band above that, etc. In this case, there can be up to M(M+1)
columns, with the last column corresponding to the
(1,M) elements of the weight matrices.
|
subset |
an optional logical vector specifying a subset of observations to be
used in the fitting process.
|
na.action |
a function which indicates what should happen when the data contain
NA s.
The default is set by the na.action setting
of options , and is na.fail if that is unset.
The ``factory-fresh'' default is na.omit .
|
etastart |
starting values for the linear predictors. It is a M-column matrix. If M=1 then it may be a vector. |
mustart |
starting values for the fitted values. It can be a vector or a matrix.
Some family functions do not make use of this argument.
|
coefstart |
starting values for the coefficient vector.
|
control |
a list of parameters for controlling the fitting process.
See rrvglm.control for details.
|
offset |
a vector or M-column matrix of offset values.
These are a priori known and are
added to the linear predictors during fitting.
|
method |
the method to be used in fitting the model.
The default (and presently only) method rrvglm.fit
uses iteratively reweighted least squares (IRLS).
|
model |
a logical value indicating whether the model frame
should be assigned in the model slot.
|
x.arg, y.arg |
logical values indicating whether
the model matrix and response vector/matrix used in the fitting
process should be assigned in the x and y slots.
Note the model matrix is the LM model matrix; to get the VGLM
model matrix type model.matrix(vglmfit) where
vglmfit is a vglm object.
|
contrasts |
an optional list. See the contrasts.arg
of model.matrix.default .
|
constraints |
an optional list of constraint matrices.
The components of the list must be named with the term it
corresponds to (and it must match in character format).
Each constraint matrix must have M rows, and be of
full-column rank.
By default, constraint matrices are the M by M
identity
matrix unless arguments in the family function itself override
these values.
If constraints is used it must contain all the
terms; an incomplete list is not accepted.
|
extra |
an optional list with any extra information that might be needed
by the family function.
|
qr.arg |
logical value indicating whether
the slot qr , which returns the QR decomposition of the
VLM model matrix, is returned on the object.
|
smart |
logical value indicating whether smart prediction
(smartpred ) will be used.
|
... |
further arguments passed into rrvglm.control .
|
The central formula is given by
eta = B_1^T x_1 + A nu
where x1 is a vector (usually just a 1 for an intercept), x2 is another vector of explanatory variables, and nu=C^T x_2 is an R-vector of latent variables. Here, eta is a vector of linear predictors, e.g., the mth element is eta_m = log(E[Y_m]) for the mth Poisson response. The matrices B_1, A and C are estimated from the data, i.e., contain the regression coefficients. For ecologists, the central formula represents a constrained linear ordination (CLO) since it is linear in the latent variables. It means that the response is a monotonically increasing or decreasing function of the latent variables.
The underlying algorithm of RR-VGLMs is iteratively reweighted least squares (IRLS) with an optimizing algorithm applied within each IRLS iteration (e.g., alternating algorithm).
In theory, any VGAM family function that works for
vglm
and vgam
should work for rrvglm
too.
rrvglm.fit
is the function that actually does the work. It is
vglm.fit
with some extra code.
An object of class "rrvglm"
, which has the the same slots as
a "vglm"
object. The only difference is that the some of the
constraint matrices are estimates rather than known. But VGAM
stores the models the same internally. The slots of "vglm"
objects are described in vglm-class
.
The smart prediction (smartpred
) library is packed with
the VGAM library.
The arguments of rrvglm
are the same as those of
vglm
but with some extras in rrvglm.control
.
In the example below, a rank-1 stereotype model of Anderson (1984) is fitted to some car data. The reduced-rank regression is performed, adjusting for two covariates. Setting a trivial constraint matrix for the latent variable variables in x2 avoids a warning message when it is overwritten by a (common) estimated constraint matrix. It shows that German cars tend to be more expensive than American cars, given a car of fixed weight and width.
If fit <- rrvglm(..., data=mydata)
then summary(fit)
requires corner constraints and no missing values in mydata
.
Often the estimated variance-covariance matrix of the parameters is
not positive-definite; if this occurs, try refitting the model with
a different value for Index.corner
.
For constrained quadratic ordination (CQO) see cqo
for more details about QRR-VGLMs.
With multivariate binary responses, one must use
binomialff(mv=TRUE)
to indicate that the response (matrix)
is multivariate. Otherwise, it is interpreted as a single binary
response variable.
Thomas W. Yee
Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15–41.
Yee, T. W. (2004) A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685–701.
Anderson, J. A. (1984) Regression and ordered categorical variables. Journal of the Royal Statistical Society, Series B, Methodological, 46, 1–30.
Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.
rrvglm.control
,
lvplot.rrvglm
(same as biplot.rrvglm
),
rrvglm-class
,
grc
,
cqo
,
vglmff-class
,
vglm
,
vglm-class
,
smartpred
,
rrvglm.fit
.
Methods functions include
Coef.rrvglm
,
summary.rrvglm
,
etc.
data(car.all) attach(car.all) index = Country == "Germany" | Country == "USA" | Country == "Japan" | Country == "Korea" detach(car.all) scar = car.all[index, ] # standardized car data fcols = c(13,14,18:20,22:26,29:31,33,34,36) # These are factors scar[,-fcols] = scale(scar[,-fcols]) # Standardize all numerical vars ones = matrix(1, 3, 1) cms = list("(Intercept)"=diag(3), Width=ones, Weight=ones, Disp.=diag(3), Tank=diag(3), Price=diag(3), Frt.Leg.Room=diag(3)) set.seed(111) fit = rrvglm(Country ~ Width + Weight + Disp. + Tank + Price + Frt.Leg.Room, multinomial, data = scar, Rank = 2, trace = TRUE, constraints=cms, Norrr = ~ 1 + Width + Weight, Uncor=TRUE, Corner=FALSE, Bestof=2) fit@misc$deviance # A history of the fits Coef(fit) ## Not run: biplot(fit, chull=TRUE, scores=TRUE, clty=2, ccol="blue", scol="red", Ccol="darkgreen", Clwd=2, Ccex=2, main="1=Germany, 2=Japan, 3=Korea, 4=USA") ## End(Not run)