dirmultinomial {VGAM} | R Documentation |
Fits a Dirichlet-multinomial distribution to a matrix response.
dirmultinomial(lphi="logit", ephi = list(), iphi = 0.10, parallel= FALSE, zero="M")
lphi |
Link function applied to the phi
parameter, which lies in the open unit interval (0,1).
See Links for more choices.
|
ephi |
List. Extra argument for lphi .
See earg in Links for general information.
|
iphi |
Numeric. Initial value for phi.
Must be in the open unit interval (0,1).
If a failure to converge occurs try assigning this argument a different
value.
|
parallel |
A logical (formula not allowed here) indicating whether the
probabilities pi_1,...,pi_{M-1}
are to be equal via equal coefficients.
Note pi_M will generally be different from the
other probabilities.
Setting parallel=TRUE will only work if you also set
zero=NULL because of interference between these arguments
(with respect to the intercept term).
|
zero |
An integer-valued vector specifying which
linear/additive predictors are modelled as intercepts only.
The values must be from the set {1,2,...,M}.
If the character "M" then this means the numerical value
M, which corresponds to linear/additive predictor associated
with phi.
Setting zero=NULL means none of the values from
the set {1,2,...,M}.
|
The Dirichlet-multinomial distribution arises from a multinomial distribution where the probability parameters are not constant but are generated from a multivariate distribution called the Dirichlet distribution. The Dirichlet-multinomial distribution has probability function
P(Y_1=y_1,...,Y_M=y_M) = C_{y_1,...,y_M}^{N_{*}} prod_{j=1}^{M} prod_{r=1}^{y_{j}} (pi_j (1-phi) + (r-1)phi) / prod_{r=1}^{N_{*}} (1-phi + (r-1)phi)
where phi is the over-dispersion parameter
and N_* = y_1+cdots+y_M.
Here, C_b^a means ``a choose b'' and
refers to combinations (see choose
).
The above formula applies to each row of the matrix response.
In this VGAM family function the first M-1 linear/additive
predictors correspond to the first M-1 probabilities via
eta_j = log(P[Y=j]/ P[Y=M]) = log(pi_j/pi_M)
where eta_j is the jth linear/additive predictor
(eta_M=0 by definition for P[Y=M] but not for
phi)
and
j=1,...,M-1.
The Mth linear/additive predictor corresponds to lphi
applied to phi.
Note that E(Y_j) = N_* pi_j but the probabilities (returned as the fitted values) pi_j are bundled together as a M-column matrix. The quantities N_* are returned as the prior weights.
The beta-binomial distribution is a special case of
the Dirichlet-multinomial distribution when M=2;
see betabinomial
. It is easy to show that
the first shape parameter of the beta distribution is
shape1=pi*(1/phi-1) and the second shape
parameter is shape2=(1-pi)*(1/phi-1).
Also, phi=1/(1+shape1+shape2), which
is known as the intra-cluster correlation coefficient.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
,
rrvglm
and vgam
.
If the model is an intercept-only model then @misc
(which is a
list) has a component called shape
which is a vector with the
M values pi_j * (1/phi-1).
This VGAM family function is prone to numerical problems, especially when there are covariates.
The response can be a matrix of non-negative integers, or else
a matrix of sample proportions and the total number of counts in
each row specified using the weights
argument.
This dual input option is similar to multinomial
.
To fit a `parallel' model with the phi parameter
being an intercept-only you will need to use the constraints
argument.
Currently, Fisher scoring is implemented. To compute the expected
information matrix a for
loop is used; this may be very slow
when the counts are large.
Additionally, convergence may be slower than usual due to round-off
error when computing the expected information matrices.
Thomas W. Yee
Paul, S. R., Balasooriya, U. and Banerjee, T. (2005) Fisher information matrix of the Dirichlet-multinomial distribution. Biometrical Journal, 47, 230–236.
dirmul.old
,
betabinomial
,
betabin.ab
,
dirichlet
,
multinomial
.
n = 10 M = 5 y = round(matrix(runif(n*M)*10, n, M)) # Integer counts fit = vglm(y ~ 1, dirmultinomial, trace=TRUE) fitted(fit)[1:2,] fit@y # Sample proportions weights(fit, type="prior", matrix=FALSE) # Total counts per row x = runif(n) fit = vglm(y ~ x, dirmultinomial, trace=TRUE) ## Not run: Coef(fit) # This does not work ## End(Not run) coef(fit, matrix=TRUE) (sfit = summary(fit)) vcov(sfit)