Cahiers du CEREMADE

Unité Mixte de Recherche du C.N.R.S. N°7534
Abstract : The original Studentization was the conversion of a sample mean departure into the familiar $t$-statistic plus the derivation of the corresponding Student distributon function; the observed value of the distribution function is the observed $p$-value, as presented in an elemental form. We examine this process in a broadly general context: a null statistical model is available together with observed data; a statistic $t(y)$ has been proposed as a plausible measure of the location of data relative to what is expected under the null; a modified statistic, say $ ilde t(y)$, is developed that is ancillary; the corresponding distribution function is determined, exactly or approximately; and the observed value of the distribution function is the $p$-value or percentile position of the data with respect to the model. Such $p$-values have had extensive coverage in the recent Bayesian literature with many variations and some preference for two versions labelled $p_{ppost}$ and $p_{cpred}$. The bootstrap method also directly addresses this Studentization process. We use recent likelihood theory that gives a factorization of a regular statistical model into a marginal density for a full dimensional ancillary and a conditional density for the maximum likelihood variable. The full dimensional ancillary is shown to lead to an explicit determination of the Studentized version $ ilde t(y)$ together with a highly accurate approximation to its distribution function; the observed value of the distribution function is the $p$-value and its value as an integral is available numerically by direct calculation or by Markov chain Monte Carlo or other simulations. Here, for any given initial trial or test statistic proposed as a location indicator for a data point we develop: an ancillary based $p$-value designated $p_{ m anc}$; a special version of the Bayesian $p_{ m cpred}$; and a bootstrap based $p$-value designated $p_{ m bs}$. We then show under moderate regularity that these are equivalent to the third order and have uniqueness as a determination of the statistical location of the data point, as of course derived from the initial location measure. We also show that these $p$-values have a uniform distribution to third order, as based on calculations in the moderate-deviations region. For implementation the Bayesian and likelihood procedures would perhaps require the same numerical computations while the bootstrap would require a magnitude more in computation and would perhaps not be accessible. Examples are given to indicate the ease and flexibility of the approach.
Studentization and the determination of p-values
Université de PARIS - DAUPHINE
Place du Maréchal de Lattre De Tassigny - 75775 PARIS CEDEX 16 - FRANCE
Téléphone : +33 (0)1 44-05-49-23 - fax : +33 (0)1 44-05-45-99