rcqo {VGAM} | R Documentation |
Random generation for constrained quadratic ordination (CQO).
rcqo(n, p, S, Rank = 1, family = c("poisson", "negbinomial", "binomial-poisson", "Binomial-negbinomial", "ordinal-poisson", "Ordinal-negbinomial", "gamma2"), EqualMaxima = FALSE, EqualTolerances = TRUE, ESOptima = FALSE, loabundance = if (EqualMaxima) hiabundance else 10, hiabundance = 100, sdlv = c(1.5/2^(0:3))[1:Rank], sdOptima = ifelse(ESOptima, 1.5/Rank, 1) * ifelse(scalelv, sdlv, 1), sdTolerances = 0.25, Kvector = 1, Shape = 1, sqrt = FALSE, Log = FALSE, rhox = 0.5, breaks = 4, seed = NULL, Crow1positive=TRUE, xmat = NULL, scalelv = TRUE)
n |
Number of sites. It is denoted by n below.
|
p |
Number of environmental variables, including an intercept term.
It is denoted by p below.
Must be no less than 1+R in value.
|
S |
Number of species.
It is denoted by S below.
|
Rank |
The rank or the number of latent variables or true dimension
of the data on the reduced space.
This must be either 1, 2, 3 or 4.
It is denoted by R.
|
family |
What type of species data is to be returned.
The first choice is the default.
If binomial then a 0 means absence and 1 means presence.
If ordinal then the breaks argument is passed into
the breaks argument of cut .
Note that either the Poisson or negative binomial distributions
are used to generate binomial and ordinal data, and that
an upper-case choice is used for the negative binomial distribution
(this makes it easier for the user).
If "gamma2" then this is the 2-parameter gamma distribution.
|
EqualMaxima |
Logical. Does each species have the same maxima?
See arguments loabundance and hiabundance .
|
EqualTolerances |
Logical. Does each species have the
same tolerance? If TRUE then the common value is 1 along
every latent variable, i.e., all species' tolerance matrices are the
order-R identity matrix.
|
ESOptima |
Logical. Do the species have equally spaced optima?
If TRUE then the quantity
S^(1/R) must be an
integer with value 2 or more. That is, there has to be an
appropriate number of species in total. This is so that a grid
of optimum values is possible in R-dimensional
latent variable space
in order to place the species' optima.
Also see the argument sdTolerances .
|
loabundance, hiabundance |
Numeric. These are recycled to a vector of length S.
The species have a maximum
between loabundance and hiabundance . That is,
at their optimal environment, the mean abundance of each
species is between the two componentwise values. If EqualMaxima
is TRUE then loabundance and hiabundance
must have the same values.
If EqualMaxima is FALSE then the
logarithm of the maxima are uniformly distributed between
log(loabundance) and log(hiabundance) .
|
sdlv |
Numeric, of length R
(recycled if necessary). Site scores along
each latent variable have these standard deviation values.
This must be a decreasing sequence of values because the first
ordination axis contains the greatest spread of the species'
site scores, followed by the second axis, followed by the third
axis, etc.
|
sdOptima |
Numeric, of length R (recycled if necessary).
If ESOptima=FALSE then,
for the rth latent variable axis,
the optima of the species are generated from a
normal distribution centered about 0.
If ESOptima=TRUE then the S optima
are equally spaced about 0 along every latent variable axis.
Regardless of the value of ESOptima , the optima
are then scaled to give standard deviation sdOptima[r] .
|
sdTolerances |
Logical. If EqualTolerances=FALSE then, for the
rth latent variable, the
species' tolerances are
chosen from a normal distribution with mean 1 and
standard deviation
sdTolerances[r] .
However, the first species y1 has its tolerance matrix
set equal to the order-R identity matrix.
All tolerance matrices for all species are diagonal in this function.
This argument is ignored if EqualTolerances is TRUE ,
otherwise it is recycled to length R if necessary.
|
Kvector |
A vector of positive k values (recycled to length S
if necessary) for the negative binomial distribution
(see negbinomial for details).
Note that a natural default value does not exist, however the default
value here is probably a realistic one, and that for large values
of μ one has Var(Y) = mu^2 / k
approximately.
|
Shape |
A vector of positive lambda values (recycled to length
S if necessary) for the 2-parameter gamma distribution (see
gamma2 for details). Note that a natural default value
does not exist, however the default value here is probably a realistic
one, and that Var(Y) = mu^2 / lambda.
|
sqrt |
Logical. Take the square-root of the negative binomial counts?
Assigning sqrt=TRUE when family="negbinomial" means
that the resulting species data can be considered very crudely to be
approximately Poisson distributed.
They will not integers in general but much easier (less numerical
problems) to estimate using something like cqo(..., family="poissonff") .
|
Log |
Logical. Take the logarithm of the gamma random variates?
Assigning Log=TRUE when family="gamma2" means
that the resulting species data can be considered very crudely to be
approximately Gaussian distributed about its (quadratic) mean.
The result is that it is much easier (less numerical
problems) to estimate using something like cqo(..., family="gaussianff") .
|
rhox |
Numeric, less than 1 in absolute value.
The correlation between the environmental variables.
The correlation matrix is a matrix of 1's along the diagonal
and rhox in the off-diagonals.
Note that each environmental variable is normally distributed
with mean 0. The standard deviation of each environmental variable
is chosen so that the site scores have the determined standard
deviation, as given by argument sdlv .
|
breaks |
If family is assigned an ordinal value then this argument
is used to define the cutpoints. It is fed into the
breaks argument of cut .
|
seed |
If given, it is passed into set.seed .
This argument can be used to obtain reproducible results.
If set, the value is saved as the "seed"
attribute of the returned value. The default will
not change the random generator state, and return
.Random.seed as "seed" attribute.
|
Crow1positive |
See qrrvglm.control for details.
|
xmat |
The
n * (p-1)
environmental matrix can be inputted.
|
scalelv |
Logical. If FALSE the argument
sdlv is ignored and no scaling of the latent variable
values is performed.
|
This function generates data coming from a constrained quadratic
ordination (CQO) model. In particular,
data coming from a species packing model can be generated
with this function.
The species packing model states that species have equal tolerances,
equal maxima, and optima which are uniformly distributed over
the latent variable space. This can be achieved by assigning
the arguments ESOptima = TRUE
, EqualMaxima = TRUE
,
EqualTolerances = TRUE
.
At present, the Poisson and negative binomial abundances are
generated first using loabundance
and hiabundance
,
and if family
is binomial or ordinal then it is converted into
these forms.
In CQO theory the n * p matrix X is partitioned into two parts X_1 and X_2. The matrix X_2 contains the `real' environmental variables whereas the variables in X_1 are just for adjustment purposes; they contain the intercept terms and other variables that one wants to adjust for when (primarily) looking at the variables in X_2. This function has X_1 only being a matrix of ones, i.e., containing an intercept only.
A n * (p-1+S) data frame with components and attributes. In the following the attributes are labelled with double quotes.
x2, x3, x4, ..., xp |
The environmental variables. This makes up the
n * (p-1) X_2 matrix.
Note that x1 is not present; it is effectively a vector
of ones since it corresponds to an intercept term when
cqo is applied to the data.
|
y1, y2, x3, ..., yS |
The species data. This makes up the
n * S matrix Y.
This will be of the form described by the argument
family .
|
"ccoefficients" |
The (p-1) * R matrix of
constrained coefficients
(or canonical coefficients).
These are also known as weights or loadings.
|
"formula" |
The formula involving the species and environmental variable names.
This can be used directly in the formula argument of
cqo .
|
"logmaxima" |
The S-vector of species' maxima, on a log scale.
These are uniformly distributed between
log(loabundance) and log(hiabundance) .
|
"lv" |
The n * R matrix of site scores.
Each successive column (latent variable) has
sample standard deviation
equal to successive values of sdlv .
|
"eta" |
The linear/additive predictor value.
|
"optima" |
The S * R matrix of species' optima.
|
"tolerances" |
The S * R matrix of species' tolerances.
These are the
square root of the diagonal elements of the tolerance matrices
(recall that all tolerance matrices are restricted to being
diagonal in this function).
|
Other attributes are "break"
,
"family"
, "Rank"
,
"loabundance"
, "hiabundance"
,
"EqualTolerances"
, "EqualMaxima"
,
"seed"
as used.
This function is under development and is not finished yet. There may be a few bugs.
Yet to do: add an argument that allows absences to be equal to the first level if ordinal data is requested.
T. W. Yee
Yee, T. W. (2004) A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685–701.
Yee, T. W. (2006) Constrained additive ordination. Ecology, 87, 203–213.
ter Braak, C. J. F. and Prentice, I. C. (1988) A theory of gradient analysis. Advances in Ecological Research, 18, 271–317.
cqo
,
qrrvglm.control
,
cut
,
binomialff
,
poissonff
,
negbinomial
,
gamma2
,
gaussianff
.
# Example 1: Species packing model: n = 100; p = 5; S = 5 mydata = rcqo(n, p, S, ESOpt=TRUE, EqualMax=TRUE) names(mydata) myform = attr(mydata, "formula") fit = cqo(myform, fam=poissonff, ITol=TRUE, data=mydata, Bestof=3) # Fit a CQO model to the data ## Not run: matplot(attr(mydata, "lv"), mydata[,-(1:(p-1))], col=1:S) persp(fit, col=1:S, add=TRUE) lvplot(fit, lcol=1:S, y=TRUE, pcol=1:S) # The same plot as above ## End(Not run) # Compare the fitted model with the 'truth' ccoef(fit) # The fitted model attr(mydata, "ccoefficients") # The 'truth' c(sd(attr(mydata, "lv")), sd(lv(fit))) # Both values should be approx equal # Example 2: negative binomial data fitted using a Poisson model: n = 200; p = 5; S = 5 mydata = rcqo(n, p, S, fam="negbin", sqrt=TRUE) myform = attr(mydata, "formula") fit = cqo(myform, fam=poissonff, ITol=TRUE, dat=mydata) ## Not run: lvplot(fit, lcol=1:S, y=TRUE, pcol=1:S) ## End(Not run) # Compare the fitted model with the 'truth' ccoef(fit) # The fitted model attr(mydata, "ccoefficients") # The 'truth' # Example 3: gamma2 data fitted using a Gaussian model: n = 200; p = 5; S = 3 mydata = rcqo(n, p, S, fam="gamma2", Log=TRUE) fit = cqo(attr(mydata, "formula"), fam=gaussianff, ITol=TRUE, dat=mydata) ## Not run: matplot(attr(mydata, "lv"), exp(mydata[,-(1:(p-1))]), col=1:S) # 'raw' data lvplot(fit, lcol=1:S, y=TRUE, pcol=1:S) # Fitted model to transformed data ## End(Not run) # Compare the fitted model with the 'truth' ccoef(fit) # The fitted model attr(mydata, "ccoefficients") # The 'truth'