WORKSHOP IN STATISTICAL MIXTURES AND LATENT-STRUCTURE

WORKSHOP IN

STATISTICAL MIXTURES AND LATENT-STRUCTURE MODELLING

International Centre for Mathematical Sciences,
Edinburgh, March 28 - March 30, 2001

Oganizers C.P. Robert (Paris) and D.M. Titterington (Glasgow)

Purpose of the workshop

This is a three-day workshop, held at the International Centre for Mathematical Sciences (ICMS) with 45participants, funded by the EPSRC and supported by the Research Section of the Royal Statistical Society.

The motivation for the Workshop is the awareness that the topic of mixture modelling and the wider versions that constitute latent structure analysis are currently of major interest to an increasingly wide range of scientific disciplines. The following are likely to be major sub-topics, although the field is a fast-moving one:

new latent-structure models
new theory and methodology for latent-structure models, especially on structure identification
new application areas for latent-structure models
frequentist and Bayesian approaches
cross-fertilization concerning these and other aspects between those who contribute to the statistical, computer science and other literatures.

Background

Until the last five years or so, the theory and methodology of mixture distributions were of interest mainly to statisticians, although there was a substantial amount of work in the engineering and speech-modelling literatures concerning the closely related but technically more complicated version known as hidden Markov modelling. The fields of application in which mixture models have been found relevant have however been extremely diverse, throughout science, medicine, engineering and even the humanities. So far as other manifestations of latent structure are concerned, such as latent class analysis and factor analysis, the methodological development and much of the applications have traditionally been associated with statistical researchers specialising in the social sciences. In general with all these types of model, methodology and theory have developed steadily over a period of about 100 years, since Karl Pearson first wrote about mixtures in 1894, but the field has recently exploded in activity, thanks mainly to the following events:

the identification by Dempster, Laird and Rubin in 1977 of the EM iterative algorithm as a general, usually reliable numerical method for obtaining maximum likelihood or (Bayesian) maximum a posteriori estimates of parameters in incomplete-data contexts, of which latent-structure models form an important subclass, for which new developments are currently taking place;

the exploitation in the 1990's of Markov chain Monte Carlo simulation methods for facilitating practical implementation of full-blooded Bayesian inference which hitherto had seemed totally impractical for even the simplest mixture models and which have consequences in both Bayesian and likelihood approaches to latent-structure models;

the surge of interest during the last five years or so among the neural computing community in models involving latent structure, such as the hierarchical mixtures of experts models, especially in the context of practical problems on a scale typically much larger than is in the normal experience of most statisticians;

some of the neural latent structure models purport to have relevance in the modelling of real neurological structures, which helps to explain the importance of large-scale implementability and to identify this class of models as having potentially a very exciting future.

Current trends

Statistical theory.

From the Bayesian point of view, estimation and testing for mixtures has also proved itself to be a challenging problem. Improper priors such as those appearing in noninformative analyses are generaly not usable, unless a priori correlation is introduced between the components (Mengersen and Robert, 1996; Roeder and Wasserman, 1997; Moreno and Liseo, 1998); the default Bayesian testing procedures such as Intrinsic Bayes factors (Berger and Perrichi, 1996), Fractional Bayes factors (O'Hagan, 1995) or Schwarz's criterion (Kass and Raftery, 1995) do not apply, and the attempts to bypass testing by estimating the number of components give imprecise results, as shown by the analysis of the Galaxy benchmark dataset by Carlin and Chib (1995) [3 components], Phillips and Smith (1996) [7 to 9], Raftery (1996) [2 or 3], Richardson and Green (1997) [3 or 4] or Celeux et al. (1999) [2].

The connections with the nonparametric community have also increased during the past ten years, as shown for instance by Böhning's (1999) book; mixtures of distributions are models that stand at the interface between parametric and nonparametric Statistics, and recent works like Petrone (1999) show that mixtures can be good alternatives to standard kernel estimates.

Computational aspects.
For much of the twentieth century the implementation of latent-structure models was very limited because of the associated computational demands. In some cases ad hoc approaches were developed, but the routine implementation of general paradigms was not possible until the last quarter of the century. The two crucial tools were the EM algorithm, for calculating maximum likelihood estimates or maximum a posteriori Bayesian point estimates, and Markov chain Monte Carlo methods, for performing a full Bayesian analysis.

However, even with these tools, implementation in latent structure contexts has not been straightforward and is still an important current area. In the case of the EM algorithm, there are problems where each E-step and/or each M-step is itself computationally intensive; and both Markov chain Monte Carlo and deterministic approximations (Reference) are being developed to circumvent the difficulties. Ingenious suggestions have been applied and often perform well, but more needs to be done to pin down the properties of the procedures, especially the deterministic (variational) approximations. In the fully Bayesian approach, it is possible in principle to write down obvious Gibbs samplers,cedures, especially the deterministic (variational) approximations. In the fully Bayesian approach, it is possible in principle to write down obvious Gibbs samplers, but there are identifiability problems and practical dangers of not fully covering the parameter space. New issues and possible solutions for simple mixtures constitute extremely recent work (Celeux et al., 1999), and there is clearly more to come in this line in the near future. In addition, mixtures are proving to be an important testbed for the development of perfect samplers (e.g. Casella et al., 1999, Möller et al., 2000).

At another level, mixtures were and are instrumental in the development of more advanced MCMC tools for moving between different dimension models; the methods include the method of Carlin and Chib (1994), reversible jump (Green, 1995; Richardson and Green, 1997) and birth-and-death processes (Phillips and Smith, 1996; Stephens, 1998).

Neural computing.
Interest in these models has exploded in the neural-computing community. Most neural-network models owe their flexibility to the inclusion of so-called `hidden nodes', which simply represent latent variables in statistical parlance. In addition, many neural-network researchers have come to the subject with a background in statistical physics, and therefore with knowledge of Gibbs distributions, i.e. of classes of exponential-family models, and of Markov chain Monte Carlo methods. As a result, more and more articles in that literature concern latent structure models of some type and the computational methods required to implement them. Within these contributions are some seminal ideas, such as the hierarchical mixtures of experts model (Jordan and Jacobs, 1994) and the development of the variational methods mentioned above (Jordan, 1999).

An additional feature of the work in neural computing is that the goal is often application to problems that are on a different scale, in terms of either quantity of data or dimensionality of the model, or both, from what is typical in most mainstream statistics. This tends to lead to a concentration on the empirical performance of the methods, rather than on some form of theoretical validation. Both are important, and the bringing together of appropriate individuals should yield considerable benefits in both aspects.

Other application communities.
As mentioned earlier, mixtures and latent structure models find applications in many applied fields, but those of the social sciences, including econometrics, and engineering are arguably the most important, in that the associated literatures have published important methodological advances as well as valuable direct applications. Statisticians with social-science specialisms have been the principal developers and users of factor analysis, and of variations such as latent class and latent trait models, and branches of the engineering literature have heralded much of the development of hidden Markov random field models for image modelling, including the crucial paper on the use of Gibbs sampling with annealing by Geman and Geman (1984), and of hidden Markov chain models (commonly referred to simply as hidden Markov models). Perhaps the key area for the development of the latter has been that of modelling for speech recognition, and the IEEE Transactions on Signal Processing, and on Speech and Audio Processing, abound with relevant papers. As in the case of neural computing, we believe that the inclusion of leading figures from these areas of interest has the potential for important cross-fertilization and mututal benefit.

Speakers and Participants

The Workshop has now reached its maximal audience size

The list of speakers and talks is as follows:

M. Aitkin                'Likelihood and Bayesian analysis of mixture models'
C. Andrieu              `SAME, SA^2ME, FAME and RDA'
C.M. Bishop          'Variational methods and latent variables'
G. Celeux                'Assessing the number of mixture components: a survey'
P. Dellaportas        'Latent variables for modelling volatility processes'
E. Gassiat               'The number of populations in a mixture with Markov regime'
P.J. Green                `Mixtures in time and space'
G. Hinton                'Products of mixtures'
B.G. Lindsay          'On determining an adequate number of mixture components'
N.L. Hjort               'On attempts at generalising the Dirichlet process'
D.J.C. MacKay       'The state of the art in error correcting codes'
G.J. McLachlan      'On the Incremental EM Algorithm for Speeding Up the Fitting of Finite Mixture Models'
E. Moulines              ` Maximum likelihood estimation for non-linear autoregressive processes with Markov regime'
R. Neal                       'Hierarchical mixtures using diffusion tree priors'
C. Robert            'Where do we stand on mixtures'
G.O. Roberts     'Bayesian inference for discretely observed diffusion processes'
T. Ryden                   `Continuous-time jump MCMC and model selection for HMMs'
C. Skinner                 'Estimation of distributions in the presence of measurement error'
M. Stephens              'Inferring latent population structure from genetic data '
M. Titterington     `Stock taking discussion'
C.K.I. Williams       'Image modelling with dynamic trees'

Structure of the Workshop

Although many of the proposed participants are very distinguished, many will not give a formal presentation. We prefer a comparatively small number of longish talks rather than many short presentations. However everyone has the opportunity to communicate his/her research by reserving some specific periods for one poster session and by organising these sessions in such a way that everyone attends, as in the Valencia Bayesian meetings.

Previsional Schedule

Wenesday, March 28

900 - 1000 Registration

1000 - 1100 C.P. Robert [abstract]

1100 - 1130 Coffee

1130 - 1250 M. Aitkin [abstract] and C.Skinner

1250 - 1420 Lunch

1420 - 1540 M.Stephens and P.J. Green

1540 - 1610 Tea

1610 - 1730 P. Dellaportas and G. Roberts

1730 - 1900 Wine and cheese reception

Thursday, March 29

0930 - 1050 E. Moulines [abstract] and N. Hjort

1050 - 1120 Coffee

1120 - 1240 G. Hinton and C.M. Bishop

1240 - 1400 Lunch

1400 - 1600 R. Neal [abstract] , C.K.I. Williams and D.J.C. MacKay

1600 - 1630 Tea

1615 - 1800 Posters [list]

1930 Optional dinner

Friday, March 30

0930 - 1050 G. McLachlan and C. Andrieu (abstract)

1050 - 1120 Coffee

1120 - 1240 E. Gassiat and D. Lindsay

1240 - 1400 Lunch

1400 - 1520 T. Ryden and G. Celeux

1520 - 1540 Tea

1540 - 1630 D.M. Titterington

Note: There is another meeting on mixture theory and applications, Mixtures 2001,
organised by D. Bohning and W. Seidel in Hamburg, 23 - 28 July 2001. See here for more details.

References

Berger, J.O. & Perrichi, L. (1996) J. Am. Statist. Assoc. 91 109-122.
Böhning, D. (1999) Computer-Assisted Analysis of Mixtures and Applications. Chapman and Hall
Carlin, B. & Chib, S. (1995) J.R. Statist. Soc. B 57, 473-484.
Celeux, G., Hurn, M. & Robert, C.P. (2000) J. Am. Statist. Assoc. (to appear)
Geman, S & Geman, D. (1984) IEEE Trans. PAMI 6, 721-741.
Green, P.J. (1995) Biometrika 82, 411-732.
Jordan, M.I. (Ed.) (1999) Learning in Graphical Models. MIT Press.
Jordan, M.I. & Jacobs, R.A. (1994) Neural Computation6, 181-214.
Kass, R.E. & Raftery, A.E. (1995) J. Am. Statist. Assoc. 90, 773-795.
Lindsay, B.G. (1995) Mixture models: Theory, Geometry and Applications. IMS
McLachlan, G.J. (1987) Appl. Statist. 36, 318-324.
Mengersen, K.L. & Robert, C.P. (1996) In Bayesian Statistics 5, 255-276. OUP
Møeller, J., Mira, A. & Roberts, G.O. (1999) Preprint.
Moreno, E. & Liseo, B. (1998) Preprint
O'Hagan, A. (1995) J.R. Statist. Soc. B 57, 99-138.
Petrone, S. (1999) Canadian J. Statist. 27, 105-126
Phillips, D.B. & Smith, A.F.M. (1996) In MCMC in Practice, 215-240. Chapman and Hall.
Raftery, A.E. (1996) In MCMC in Practice, 163-188. Chapman and Hall
Richardson, S. & Green, P.J. (1997) J.R. Statist. Soc. B 59, 731-792.
Roeder, K. & Wasserman, L. (1997) J. Am. Statist. Assoc. 92, 894-902.
Stephens, M. (1998). D.Phil. Thesis, Oxford University.

File translated from T_EX by T_TH

File translated from T_EX by T_TH, version 2.00.
Last update 12 Feb 2001, 12:16. ml>