gpd {VGAM} | R Documentation |
Maximum likelihood estimation of the 2-parameter generalized Pareto distribution (GPD).
gpd(threshold = 0, lscale = "loge", lshape = "logoff", escale = list(), eshape = if(lshape=="logoff") list(offset=0.5) else if(lshape=="elogit") list(min=-0.5, max=0.5) else NULL, percentiles = c(90, 95), iscale = NULL, ishape = NULL, tshape0=0.001, method.init=1, zero=2)
threshold |
Numeric, values are recycled if necessary.
The threshold value(s), called mu below.
|
lscale |
Parameter link function for the scale parameter sigma.
See Links for more choices.
|
lshape |
Parameter link function for the shape parameter xi.
See Links for more choices.
The default constrains the parameter to be greater than -0.5
because if xi <= -0.5 then Fisher
scoring does not work.
See the Details section below for more information.
|
escale, eshape |
Extra argument for the lscale and lshape arguments.
See earg in Links for general information.
For the shape parameter,
if the logoff link is chosen then the offset is
called A below; and then the second linear/additive predictor is
log(xi+A) which means that
xi > -A.
The working weight matrices are positive definite if A=0.5.
|
percentiles |
Numeric vector of percentiles used
for the fitted values. Values should be between 0 and 100.
See the example below for illustration.
However, if percentiles=NULL then the mean
mu + sigma / (1-xi) is returned;
this is only defined if xi<1.
|
iscale, ishape |
Numeric. Optional initial values for sigma
and xi.
The default is to use method.init and compute a value internally for
each parameter.
Values of ishape should be between -0.5 and 1.
Values of iscale should be positive.
|
tshape0 |
Positive numeric.
Threshold/tolerance value for resting whether xi is zero.
If the absolute value of the estimate of xi is less than
this value then it will be assumed zero and exponential distribution
derivatives etc. will be used.
|
method.init |
Method of initialization, either 1 or 2. The first is the method of
moments, and the second is a variant of this. If neither work, try
assigning values to arguments ishape and/or iscale .
|
zero |
An integer-valued vector specifying which
linear/additive predictors are modelled as intercepts only.
The value must be from the set {1,2} corresponding
respectively to sigma and xi.
It is often a good idea for the sigma parameter only
to be modelled through
a linear combination of the explanatory variables because the
shape parameter is probably best left as an intercept only:
zero=2 .
Setting zero=NULL means both parameters are modelled with
explanatory variables.
|
The distribution function of the GPD can be written
G(y) = 1 - [1 + xi (y-mu)/ sigma ]_{+}^{- 1/ xi}
where
mu is the location parameter
(known, with value threshold
),
sigma > 0 is the scale parameter,
xi is the shape parameter, and
h_+ = max(h,0).
The function 1-G is known as the survivor function.
The limit xi –> 0
gives the shifted exponential as a special case:
G(y) = 1 - exp[-(y-mu)/ sigma].
The support is y>mu for xi>0, and mu < y <mu-sigma / xi for xi<0.
Smith (1985) showed that if xi <= -0.5 then this is known as the nonregular case and problems/difficulties can arise both theoretically and numerically. For the (regular) case xi > -0.5 the classical asymptotic theory of maximum likelihood estimators is applicable; this is the default.
Although for xi < -0.5 the usual asymptotic properties do not apply, the maximum likelihood estimator generally exists and is superefficient for -1 < xi < -0.5, so it is ``better'' than normal. When xi < -1 the maximum likelihood estimator generally does not exist as it effectively becomes a two parameter problem.
The mean of Y does not exist unless xi < 1, and
the variance does not exist unless xi < 0.5. So if
you want to fit a model with finite variance use lshape="elogit"
.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
and vgam
.
However, for this VGAM family function, vglm
is probably preferred over vgam
when there is smoothing.
Fitting the GPD by maximum likelihood estimation can be numerically
fraught. If 1 + xi*(y-mu)/sigma <=
0 then some crude evasive action is taken but the estimation process
can still fail. This is particularly the case if vgam
with s
is used. Then smoothing is best done with
vglm
with regression splines (bs
or ns
) because vglm
implements
half-stepsizing whereas vgam
doesn't. Half-stepsizing
helps handle the problem of straying outside the parameter space.
The response in the formula of vglm
and vgam
is y.
Internally, y-mu is computed.
With functions rgpd
, dgpd
, etc., the
argument location
matches with the argument threshold
here.
T. W. Yee
Yee, T. W. and Stephenson, A. G. (2007) Vector generalized linear and additive extreme value models. Extremes, 10, 1–19.
Coles, S. (2001) An Introduction to Statistical Modeling of Extreme Values. London: Springer-Verlag.
Smith, R. L. (1985) Maximum likelihood estimation in a class of nonregular cases. Biometrika, 72, 67–90.
rgpd
,
meplot
,
gev
,
pareto1
,
vglm
,
vgam
,
s
.
# Simulated data from an exponential distribution (xi=0) threshold = 0.5 y = threshold + rexp(n=3000, rate=2) fit = vglm(y ~ 1, gpd(threshold=threshold), trace=TRUE) fitted(fit)[1:5,] coef(fit, matrix=TRUE) # xi should be close to 0 Coef(fit) summary(fit) fit@extra$threshold # Note the threshold is stored here # Check the 90 percentile i = fit@y < fitted(fit)[1,"90%"] 100*table(i)/sum(table(i)) # Should be 90 # Check the 95 percentile i = fit@y < fitted(fit)[1,"95%"] 100*table(i)/sum(table(i)) # Should be 95 ## Not run: plot(fit@y, col="blue", las=1, main="Fitted 90% and 95% quantiles") matlines(1:length(fit@y), fitted(fit), lty=2:3, lwd=2) ## End(Not run) # Another example nn = 2000; threshold = 0; x = runif(nn) xi = exp(-0.8)-0.5 y = rgpd(nn, scale=exp(1+0.2*x), shape=xi) fit = vglm(y ~ x, gpd(threshold), trace=TRUE) coef(fit, matrix=TRUE) ## Not run: # Nonparametric fits yy = y + rnorm(nn, sd=0.1) fit1 = vgam(yy ~ s(x), gpd(threshold), trace=TRUE) # Not so recommended par(mfrow=c(2,1)) plot(fit1, se=TRUE, scol="blue") fit2 = vglm(yy ~ bs(x), gpd(threshold), trace=TRUE) # More recommended plotvgam(fit2, se=TRUE, scol="blue") ## End(Not run)