mix2normal1 {VGAM}R Documentation

Mixture of Two Univariate Normal Distributions

Description

Estimates the five parameters of a mixture of two univariate normal distributions by maximum likelihood estimation.

Usage

mix2normal1(lphi="logit", lmu="identity", lsd="loge",
            ephi=list(), emu1=list(), emu2=list(), esd1=list(), esd2=list(),
            iphi=0.5, imu1=NULL, imu2=NULL, isd1=NULL, isd2=NULL,
            qmu=c(0.2, 0.8), ESD=TRUE, nsimEIM=100, zero=1)

Arguments

lphi Link function for the parameter phi. See Links for more choices.
lmu Link function applied to each mu parameter. See Links for more choices.
lsd Link function applied to each sd parameter. See Links for more choices.
ephi, emu1, emu2, esd1, esd2 List. Extra argument for each of the links. See earg in Links for general information. If ESD=TRUE then esd1 must equal esd2.
iphi Initial value for phi, whose value must lie between 0 and 1.
imu1, imu2 Optional initial value for mu1 and mu2. The default is to compute initial values internally using the argument qmu.
isd1, isd2 Optional initial value for sd1 and sd2. The default is to compute initial values internally based on the argument qmu. Currently these are not great, therefore using these arguments where practical is a good idea.
qmu Vector with two values giving the probabilities relating to the sample quantiles for obtaining initial values for mu1 and mu2. The two values are fed in as the probs argument into quantile.
ESD Logical indicating whether the two standard deviations should be constrained to be equal. If TRUE then the appropriate constraint matrices will be used.
nsimEIM See CommonVGAMffArguments.
zero An integer specifying which linear/additive predictor is modelled as intercepts only. If given, the value or values must be from the set 1,2,...,5. The default is the first one only, meaning phi is a single parameter even when there are explanatory variables. Set zero=NULL to model all linear/additive predictors as functions of the explanatory variables. See CommonVGAMffArguments for more information.

Details

The probability function can be loosely written as

f(y) = phi * N(mu1, sd1) + (1-phi) * N(mu2, sd2)

where phi is the probability an observation belongs to the first group. The parameters mu1 and mu2 are the means, and sd1 and sd2 are the standard deviations. The parameter phi satisfies 0 < phi < 1. The mean of Y is phi*mu1 + (1-phi)*mu2 and this is returned as the fitted values. By default, the five linear/additive predictors are (logit(phi), mu1, log(sd1), mu2, log(sd2))^T. If ESD=TRUE then sd1=sd2 is enforced.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

Warning

Numerical problems can occur and half-stepping is not uncommon. If failure to converge occurs, try inputting better initial values, e.g., by using iphi, qmu, imu1, imu2, isd1, isd2, etc.

This VGAM family function should be used with care.

Note

Fitting this model successfully to data can be difficult due to numerical problems and ill-conditioned data. It pays to fit the model several times with different initial values and check that the best fit looks reasonable. Plotting the results is recommended. This function works better as mu1 and mu2 become more different.

Convergence can be slow, especially when the two component distributions are not well separated. The default control argument trace=TRUE is to encourage monitoring convergence. Having ESD=TRUE often makes the overall optimization problem easier.

Author(s)

T. W. Yee

References

McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models. New York: Wiley.

Everitt, B. S. and Hand, D. J. (1981) Finite Mixture Distributions. London: Chapman & Hall.

See Also

normal1, Normal, mix2poisson.

Examples

n = 1000
mu1 =  99
mu2 = 150
sd1 = sd2 = exp(3)
(phi = logit(-1, inverse=TRUE))
y = ifelse(runif(n) < phi, rnorm(n, mu1, sd1), rnorm(n, mu2, sd2))

fit = vglm(y ~ 1, mix2normal1(ESD=TRUE))

# Compare the results
cf = coef(fit)
round(rbind('Estimated'=c(logit(cf[1], inv=TRUE),
    cf[2], exp(cf[3]), cf[4]), 'Truth'=c(phi, mu1, sd1, mu2)), dig=2)

## Not run: 
# Plot the results
xx = seq(min(y), max(y), len=200)
plot(xx, (1-phi)*dnorm(xx, mu2, sd2), type="l", xlab="y",
     main="Red=estimate, blue=truth", col="blue")
phi.est = logit(coef(fit)[1], inverse=TRUE)
sd.est = exp(coef(fit)[3])
lines(xx, phi*dnorm(xx, mu1, sd1), col="blue")
lines(xx, phi.est * dnorm(xx, Coef(fit)[2], sd.est), col="red")
lines(xx, (1-phi.est) * dnorm(xx, Coef(fit)[4], sd.est), col="red")
abline(v=Coef(fit)[c(2,4)], lty=2, col="red")
abline(v=c(mu1, mu2), lty=2, col="blue")
## End(Not run)

[Package VGAM version 0.7-7 Index]