International Centre for Mathematical Sciences,
Edinburgh, March 28 - March 30, 2001
This is a three-day workshop, held at the International Centre for Mathematical Sciences (ICMS) with 45participants, funded by the EPSRC and supported by the Research Section of the Royal Statistical Society.
The motivation for the Workshop is the awareness that the topic of mixture modelling and the wider versions that constitute latent structure analysis are currently of major interest to an increasingly wide range of scientific disciplines. The following are likely to be major sub-topics, although the field is a fast-moving one:
Until the last five years or so, the theory and methodology of mixture distributions were of interest mainly to statisticians, although there was a substantial amount of work in the engineering and speech-modelling literatures concerning the closely related but technically more complicated version known as hidden Markov modelling. The fields of application in which mixture models have been found relevant have however been extremely diverse, throughout science, medicine, engineering and even the humanities. So far as other manifestations of latent structure are concerned, such as latent class analysis and factor analysis, the methodological development and much of the applications have traditionally been associated with statistical researchers specialising in the social sciences. In general with all these types of model, methodology and theory have developed steadily over a period of about 100 years, since Karl Pearson first wrote about mixtures in 1894, but the field has recently exploded in activity, thanks mainly to the following events:
From the Bayesian point of view, estimation and testing for mixtures has also proved itself to be a challenging problem. Improper priors such as those appearing in noninformative analyses are generaly not usable, unless a priori correlation is introduced between the components (Mengersen and Robert, 1996; Roeder and Wasserman, 1997; Moreno and Liseo, 1998); the default Bayesian testing procedures such as Intrinsic Bayes factors (Berger and Perrichi, 1996), Fractional Bayes factors (O'Hagan, 1995) or Schwarz's criterion (Kass and Raftery, 1995) do not apply, and the attempts to bypass testing by estimating the number of components give imprecise results, as shown by the analysis of the Galaxy benchmark dataset by Carlin and Chib (1995) [3 components], Phillips and Smith (1996) [7 to 9], Raftery (1996) [2 or 3], Richardson and Green (1997) [3 or 4] or Celeux et al. (1999) .
The connections with the nonparametric community have also increased during the past ten years, as shown for instance by Böhning's (1999) book; mixtures of distributions are models that stand at the interface between parametric and nonparametric Statistics, and recent works like Petrone (1999) show that mixtures can be good alternatives to standard kernel estimates.
For much of the twentieth century the implementation of latent-structure models was very limited because of the associated computational demands. In some cases ad hoc approaches were developed, but the routine implementation of general paradigms was not possible until the last quarter of the century. The two crucial tools were the EM algorithm, for calculating maximum likelihood estimates or maximum a posteriori Bayesian point estimates, and Markov chain Monte Carlo methods, for performing a full Bayesian analysis.
However, even with these tools, implementation in latent structure contexts has not been straightforward and is still an important current area. In the case of the EM algorithm, there are problems where each E-step and/or each M-step is itself computationally intensive; and both Markov chain Monte Carlo and deterministic approximations (Reference) are being developed to circumvent the difficulties. Ingenious suggestions have been applied and often perform well, but more needs to be done to pin down the properties of the procedures, especially the deterministic (variational) approximations. In the fully Bayesian approach, it is possible in principle to write down obvious Gibbs samplers,cedures, especially the deterministic (variational) approximations. In the fully Bayesian approach, it is possible in principle to write down obvious Gibbs samplers, but there are identifiability problems and practical dangers of not fully covering the parameter space. New issues and possible solutions for simple mixtures constitute extremely recent work (Celeux et al., 1999), and there is clearly more to come in this line in the near future. In addition, mixtures are proving to be an important testbed for the development of perfect samplers (e.g. Casella et al., 1999, Möller et al., 2000).
At another level, mixtures were and are instrumental in the development of more advanced MCMC tools for moving between different dimension models; the methods include the method of Carlin and Chib (1994), reversible jump (Green, 1995; Richardson and Green, 1997) and birth-and-death processes (Phillips and Smith, 1996; Stephens, 1998).
Interest in these models has exploded in the neural-computing community. Most neural-network models owe their flexibility to the inclusion of so-called `hidden nodes', which simply represent latent variables in statistical parlance. In addition, many neural-network researchers have come to the subject with a background in statistical physics, and therefore with knowledge of Gibbs distributions, i.e. of classes of exponential-family models, and of Markov chain Monte Carlo methods. As a result, more and more articles in that literature concern latent structure models of some type and the computational methods required to implement them. Within these contributions are some seminal ideas, such as the hierarchical mixtures of experts model (Jordan and Jacobs, 1994) and the development of the variational methods mentioned above (Jordan, 1999).
An additional feature of the work in neural computing is that the goal is often application to problems that are on a different scale, in terms of either quantity of data or dimensionality of the model, or both, from what is typical in most mainstream statistics. This tends to lead to a concentration on the empirical performance of the methods, rather than on some form of theoretical validation. Both are important, and the bringing together of appropriate individuals should yield considerable benefits in both aspects.
Other application communities.
As mentioned earlier, mixtures and latent structure models find applications in many applied fields, but those of the social sciences, including econometrics, and engineering are arguably the most important, in that the associated literatures have published important methodological advances as well as valuable direct applications. Statisticians with social-science specialisms have been the principal developers and users of factor analysis, and of variations such as latent class and latent trait models, and branches of the engineering literature have heralded much of the development of hidden Markov random field models for image modelling, including the crucial paper on the use of Gibbs sampling with annealing by Geman and Geman (1984), and of hidden Markov chain models (commonly referred to simply as hidden Markov models). Perhaps the key area for the development of the latter has been that of modelling for speech recognition, and the IEEE Transactions on Signal Processing, and on Speech and Audio Processing, abound with relevant papers. As in the case of neural computing, we believe that the inclusion of leading figures from these areas of interest has the potential for important cross-fertilization and mututal benefit.
The Workshop has now reached its maximal audience size
The list of speakers and talks is as follows:
'Likelihood and Bayesian analysis of mixture models'
C. Andrieu `SAME, SA^2ME, FAME and RDA'
C.M. Bishop 'Variational methods and latent variables'
G. Celeux 'Assessing the number of mixture components: a survey'
P. Dellaportas 'Latent variables for modelling volatility processes'
E. Gassiat 'The number of populations in a mixture with Markov regime'
P.J. Green `Mixtures in time and space'
G. Hinton 'Products of mixtures'
B.G. Lindsay 'On determining an adequate number of mixture components'
N.L. Hjort 'On attempts at generalising the Dirichlet process'
D.J.C. MacKay 'The state of the art in error correcting codes'
G.J. McLachlan 'On the Incremental EM Algorithm for Speeding Up the Fitting of Finite Mixture Models'
E. Moulines ` Maximum likelihood estimation for non-linear autoregressive processes with Markov regime'
R. Neal 'Hierarchical mixtures using diffusion tree priors'
C. Robert 'Where do we stand on mixtures'
G.O. Roberts 'Bayesian inference for discretely observed diffusion processes'
T. Ryden `Continuous-time jump MCMC and model selection for HMMs'
C. Skinner 'Estimation of distributions in the presence of measurement error'
M. Stephens 'Inferring latent population structure from genetic data '
M. Titterington `Stock taking discussion'
C.K.I. Williams 'Image modelling with dynamic trees'
Although many of the proposed participants are very distinguished, many will not give a formal presentation. We prefer a comparatively small number of longish talks rather than many short presentations. However everyone has the opportunity to communicate his/her research by reserving some specific periods for one poster session and by organising these sessions in such a way that everyone attends, as in the Valencia Bayesian meetings.
1000 - 1100 C.P. Robert [abstract]
1100 - 1130 Coffee
1130 - 1250 M. Aitkin [abstract] and C.Skinner
1250 - 1420 Lunch
1420 - 1540 M.Stephens and P.J. Green
1540 - 1610 Tea
1610 - 1730 P. Dellaportas and G. Roberts
1730 - 1900 Wine
and cheese reception
1050 - 1120 Coffee
1120 - 1240 G. Hinton and C.M. Bishop
1240 - 1400 Lunch
1400 - 1600 R. Neal [abstract] , C.K.I. Williams and D.J.C. MacKay
1600 - 1630 Tea
1615 - 1800 Posters [list]
0930 - 1050 G. McLachlan and C. Andrieu (abstract)
1050 - 1120 Coffee
1120 - 1240 E. Gassiat and D. Lindsay
1240 - 1400 Lunch
1400 - 1520 T. Ryden and G. Celeux
1520 - 1540 Tea
1540 - 1630
Berger, J.O. & Perrichi, L. (1996) J. Am. Statist.
Assoc. 91 109-122.
Böhning, D. (1999) Computer-Assisted Analysis of Mixtures and Applications. Chapman and Hall
Carlin, B. & Chib, S. (1995) J.R. Statist. Soc. B 57, 473-484.
Celeux, G., Hurn, M. & Robert, C.P. (2000) J. Am. Statist. Assoc. (to appear)
Geman, S & Geman, D. (1984) IEEE Trans. PAMI 6, 721-741.
Green, P.J. (1995) Biometrika 82, 411-732.
Jordan, M.I. (Ed.) (1999) Learning in Graphical Models. MIT Press.
Jordan, M.I. & Jacobs, R.A. (1994) Neural Computation6, 181-214.
Kass, R.E. & Raftery, A.E. (1995) J. Am. Statist. Assoc. 90, 773-795.
Lindsay, B.G. (1995) Mixture models: Theory, Geometry and Applications. IMS
McLachlan, G.J. (1987) Appl. Statist. 36, 318-324.
Mengersen, K.L. & Robert, C.P. (1996) In Bayesian Statistics 5, 255-276. OUP
Møeller, J., Mira, A. & Roberts, G.O. (1999) Preprint.
Moreno, E. & Liseo, B. (1998) Preprint
O'Hagan, A. (1995) J.R. Statist. Soc. B 57, 99-138.
Petrone, S. (1999) Canadian J. Statist. 27, 105-126
Phillips, D.B. & Smith, A.F.M. (1996) In MCMC in Practice, 215-240. Chapman and Hall.
Raftery, A.E. (1996) In MCMC in Practice, 163-188. Chapman and Hall
Richardson, S. & Green, P.J. (1997) J.R. Statist. Soc. B 59, 731-792.
Roeder, K. & Wasserman, L. (1997) J. Am. Statist. Assoc. 92, 894-902.
Stephens, M. (1998). D.Phil. Thesis, Oxford University.