mbinomial {VGAM} | R Documentation |
Estimation of a binomial regression in a matched case-control study.
mbinomial(mvar=NULL, link="logit", earg=list(), parallel = TRUE, smallno = .Machine$double.eps^(3/4))
mvar |
Formula specifying the matching variable.
This shows which observation belongs to which matching set.
The intercept should be suppressed from the formula, and
the term must be a factor .
|
link |
Parameter link function applied to the probability.
See Links for more choices.
|
earg |
List. Extra arguments for the links.
See earg in Links for general information.
|
parallel |
This should always be set TRUE otherwise there will be
too many parameters to estimate.
See CommonVGAMffArguments for more information.
|
smallno |
Numeric, a small positive value.
For a specific observation, used to nullify the linear/additive
predictors that are not needed.
|
By default, this VGAM family function fits a logistic regression
model to a binary response from a matched case-control study. Here,
each case (Y=1) is matched with one or more controls (Y=0)
with respect to some matching variables (confounders). For example,
the first matched set is all women aged from 20 to 25, the second
matched set is women aged between 26 to 30, etc. The logistic
regression has a different intercept for each matched set but the other
regression coefficients are assumed to be the same across matched sets
(parallel=TRUE
).
Let C be the number of matched sets.
This VGAM family function uses a trick by allowing M,
the number of linear/additive predictors, to be equal to C,
and then nullifying all but one of them for a particular observation.
The term specified by the mvar
argument must be a
factor
.
Consequently, the model matrix contains an intercept plus one
column for each level of the factor (except the first (this is
the default in R)).
Altogether there are C columns.
The algorithm here constructs a different constraint matrix for
each of the C columns.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
and vgam
.
Both the memory requirements and computational time of this VGAM family function grows very quickly with respect to the number of matched sets. For example, the large model matrix of a data set with 100 matched sets consisting of one case and one control per set will take up at least (about) 20Mb of memory. For a constant number of cases and controls per matched set, the memory requirements are O(C^3) and the the computational time is O(C^4) flops.
The example below has been run successfully with n=700
(this
corresponds to C=350) but only on a big machine and it took over
10 minutes. The large model matrix was 670Mb.
The response is assumed to be in a format that can also be inputted
into binomialff
.
Thomas W. Yee
Section 8.2 of Hastie, T. J. and Tibshirani, R. J. (1990) Generalized Additive Models, London: Chapman & Hall.
Pregibon, D. (1984) Data analytic methods for matched case-control studies. Biometrics, 40, 639–651.
Chapter 7 of Breslow, N. E. and Day, N. E. (1980) Statistical Methods in Cancer Research I: The Analysis of Case-Control Studies. Lyon: International Agency for Research on Cancer.
Holford, T. R. and White, C. and Kelsey, J. L. (1978) Multivariate analysis for matched case-control studies. American Journal of Epidemiology, 107, 245–256.
## Not run: # Cf. Hastie and Tibshirani (1990) p.209. The variable n must be even. # Here, the intercept for each matched set accounts for x3 which is # the confounder or matching variable. n = 700 # Requires a big machine with lots of memory. Expensive wrt time n = 100 # This requires a reasonably big machine. mydat = data.frame(x2 = rnorm(n), x3 = rep(rnorm(n/2), each=2)) xmat = with(mydat, cbind(x2,x3)) mydat = transform(mydat, eta = -0.1 + 0.2 * x2 + 0.3 * x3) etamat = with(mydat, matrix(eta, n/2, 2)) condmu = exp(etamat[,1]) / (exp(etamat[,1]) + exp(etamat[,2])) y1 = ifelse(runif(n/2) < condmu, 1, 0) y = cbind(y1, 1-y1) mydat = transform(mydat, y = c(y1, 1-y1), ID = factor(c(row(etamat)))) fit = vglm(y ~ 1 + ID + x2, trace=TRUE, fam = mbinomial(mvar = ~ ID - 1), data=mydat) dimnames(coef(fit, mat=TRUE)) coef(fit, mat=TRUE) summary(fit) fitted(fit)[1:5] objsizemb = function(object) round(object.size(object) / 2^20, dig=2) objsizemb(fit) # in Mb VLMX = model.matrix(fit, type="vlm") # The big model matrix dim(VLMX) objsizemb(VLMX) # in Mb rm(VLMX) ## End(Not run)