International Centre for Mathematical
Sciences,
Edinburgh, March 28
- March 30, 2001
Oganizers C.P.
Robert (Paris)
and
D.M.
Titterington (Glasgow)
This is a three-day workshop, held at the International
Centre for Mathematical Sciences (ICMS) with 45participants, funded
by the EPSRC and supported by the
Research Section of the Royal Statistical Society.
The motivation for the Workshop is the awareness that the topic of mixture modelling and the wider versions that constitute latent structure analysis are currently of major interest to an increasingly wide range of scientific disciplines. The following are likely to be major sub-topics, although the field is a fast-moving one:
Until the last five years or so, the theory and methodology of mixture
distributions were of interest mainly to statisticians, although there
was a substantial amount of work in the engineering and speech-modelling
literatures concerning the closely related but technically more complicated
version known as hidden Markov modelling. The fields of application
in which mixture models have been found relevant have however been extremely
diverse, throughout science, medicine, engineering and even the humanities.
So far as other manifestations of latent structure are concerned, such
as latent class analysis and factor analysis, the methodological development
and much of the applications have traditionally been associated with statistical
researchers specialising in the social sciences. In general with all these
types of model, methodology and theory have developed steadily over a period
of about 100 years, since Karl Pearson first wrote about mixtures in 1894,
but the field has recently exploded in activity, thanks mainly to the following
events:
From the Bayesian point of view, estimation and testing for mixtures has also proved itself to be a challenging problem. Improper priors such as those appearing in noninformative analyses are generaly not usable, unless a priori correlation is introduced between the components (Mengersen and Robert, 1996; Roeder and Wasserman, 1997; Moreno and Liseo, 1998); the default Bayesian testing procedures such as Intrinsic Bayes factors (Berger and Perrichi, 1996), Fractional Bayes factors (O'Hagan, 1995) or Schwarz's criterion (Kass and Raftery, 1995) do not apply, and the attempts to bypass testing by estimating the number of components give imprecise results, as shown by the analysis of the Galaxy benchmark dataset by Carlin and Chib (1995) [3 components], Phillips and Smith (1996) [7 to 9], Raftery (1996) [2 or 3], Richardson and Green (1997) [3 or 4] or Celeux et al. (1999) [2].
The connections with the nonparametric community have also increased during the past ten years, as shown for instance by Böhning's (1999) book; mixtures of distributions are models that stand at the interface between parametric and nonparametric Statistics, and recent works like Petrone (1999) show that mixtures can be good alternatives to standard kernel estimates.
Computational aspects.
For much of the twentieth century the implementation of latent-structure
models was very limited because of the associated computational demands.
In some cases ad hoc approaches were developed, but the routine implementation
of general paradigms was not possible until the last quarter of the century.
The two crucial tools were the EM algorithm, for calculating maximum likelihood
estimates or maximum a posteriori Bayesian point estimates, and Markov
chain Monte Carlo methods, for performing a full Bayesian analysis.
However, even with these tools, implementation in latent structure contexts has not been straightforward and is still an important current area. In the case of the EM algorithm, there are problems where each E-step and/or each M-step is itself computationally intensive; and both Markov chain Monte Carlo and deterministic approximations (Reference) are being developed to circumvent the difficulties. Ingenious suggestions have been applied and often perform well, but more needs to be done to pin down the properties of the procedures, especially the deterministic (variational) approximations. In the fully Bayesian approach, it is possible in principle to write down obvious Gibbs samplers,cedures, especially the deterministic (variational) approximations. In the fully Bayesian approach, it is possible in principle to write down obvious Gibbs samplers, but there are identifiability problems and practical dangers of not fully covering the parameter space. New issues and possible solutions for simple mixtures constitute extremely recent work (Celeux et al., 1999), and there is clearly more to come in this line in the near future. In addition, mixtures are proving to be an important testbed for the development of perfect samplers (e.g. Casella et al., 1999, Möller et al., 2000).
At another level, mixtures were and are instrumental in the development of more advanced MCMC tools for moving between different dimension models; the methods include the method of Carlin and Chib (1994), reversible jump (Green, 1995; Richardson and Green, 1997) and birth-and-death processes (Phillips and Smith, 1996; Stephens, 1998).
Neural computing.
Interest in these models has exploded in the neural-computing community.
Most neural-network models owe their flexibility to the inclusion of so-called
`hidden nodes', which simply represent latent variables in statistical
parlance. In addition, many neural-network researchers have come to the
subject with a background in statistical physics, and therefore with knowledge
of Gibbs distributions, i.e. of classes of exponential-family models, and
of Markov chain Monte Carlo methods. As a result, more and more articles
in that literature concern latent structure models of some type and the
computational methods required to implement them. Within these contributions
are some seminal ideas, such as the hierarchical mixtures of experts model
(Jordan and Jacobs, 1994) and the development of the variational methods
mentioned above (Jordan, 1999).
An additional feature of the work in neural computing is that the goal is often application to problems that are on a different scale, in terms of either quantity of data or dimensionality of the model, or both, from what is typical in most mainstream statistics. This tends to lead to a concentration on the empirical performance of the methods, rather than on some form of theoretical validation. Both are important, and the bringing together of appropriate individuals should yield considerable benefits in both aspects.
Other application communities.
As mentioned earlier, mixtures and latent structure models find applications
in many applied fields, but those of the social sciences, including econometrics,
and engineering are arguably the most important, in that the associated
literatures have published important methodological advances as well as
valuable direct applications. Statisticians with social-science specialisms
have been the principal developers and users of factor analysis, and of
variations such as latent class and latent trait models, and branches of
the engineering literature have heralded much of the development of hidden
Markov random field models for image modelling, including the crucial paper
on the use of Gibbs sampling with annealing by Geman and Geman (1984),
and of hidden Markov chain models (commonly referred to simply as hidden
Markov models). Perhaps the key area for the development of the latter
has been that of modelling for speech recognition, and the IEEE
Transactions on Signal Processing, and on Speech
and Audio Processing, abound with relevant papers. As in the case of
neural computing, we believe that the inclusion of leading figures from
these areas of interest has the potential for important cross-fertilization
and mututal benefit.
The Workshop has now reached its maximal audience size
The list of speakers and talks is as follows:
M. Aitkin
'Likelihood and Bayesian analysis of mixture models'
C. Andrieu
`SAME, SA^2ME, FAME and RDA'
C.M. Bishop
'Variational methods and latent variables'
G. Celeux
'Assessing the number of mixture components: a survey'
P. Dellaportas
'Latent variables for modelling volatility processes'
E. Gassiat
'The number of populations in a mixture with Markov regime'
P.J. Green
`Mixtures in time and space'
G. Hinton
'Products of mixtures'
B.G. Lindsay
'On determining an adequate number of mixture components'
N.L. Hjort
'On attempts at generalising the Dirichlet process'
D.J.C. MacKay
'The state of the art in error correcting codes'
G.J. McLachlan
'On the Incremental EM Algorithm for Speeding Up the Fitting of Finite
Mixture Models'
E. Moulines
` Maximum likelihood estimation for non-linear autoregressive processes
with Markov regime'
R. Neal
'Hierarchical mixtures using diffusion tree priors'
C. Robert
'Where do we stand on mixtures'
G.O. Roberts 'Bayesian
inference for discretely observed diffusion processes'
T. Ryden
`Continuous-time jump MCMC and model selection for HMMs'
C. Skinner
'Estimation of distributions in the presence of measurement error'
M. Stephens
'Inferring latent population structure from genetic data '
M. Titterington `Stock
taking discussion'
C.K.I. Williams
'Image modelling with dynamic trees'
Although many of the proposed participants are very distinguished,
many will not give a formal presentation. We prefer a comparatively small
number of longish talks rather than many short presentations. However
everyone has the opportunity to communicate his/her research by reserving
some specific periods for one poster session and by organising these sessions
in such a way that everyone attends, as in the Valencia Bayesian meetings.
1000 - 1100 C.P. Robert [abstract]
1100 - 1130 Coffee
1130 - 1250 M. Aitkin [abstract] and C.Skinner
1250 - 1420 Lunch
1420 - 1540 M.Stephens and P.J. Green
1540 - 1610 Tea
1610 - 1730 P. Dellaportas and G. Roberts
1730 - 1900 Wine
and cheese reception
1050 - 1120 Coffee
1120 - 1240 G. Hinton and C.M. Bishop
1240 - 1400 Lunch
1400 - 1600 R. Neal [abstract] , C.K.I. Williams and D.J.C. MacKay
1600 - 1630 Tea
1615 - 1800 Posters [list]
1930
Optional dinner
0930 - 1050
G. McLachlan and C. Andrieu (abstract)
1050 - 1120 Coffee
1120 - 1240 E. Gassiat and D. Lindsay
1240 - 1400 Lunch
1400 - 1520 T. Ryden and G. Celeux
1520 - 1540 Tea
1540 - 1630
D.M. Titterington
Berger, J.O. & Perrichi, L. (1996) J. Am. Statist.
Assoc. 91 109-122.
Böhning, D. (1999) Computer-Assisted Analysis
of Mixtures and Applications. Chapman and Hall
Carlin, B. & Chib, S. (1995) J.R. Statist. Soc.
B 57, 473-484.
Celeux, G., Hurn, M. & Robert, C.P. (2000) J.
Am. Statist. Assoc. (to appear)
Geman, S & Geman, D. (1984) IEEE Trans. PAMI 6,
721-741.
Green, P.J. (1995) Biometrika 82, 411-732.
Jordan, M.I. (Ed.) (1999) Learning in Graphical Models.
MIT Press.
Jordan, M.I. & Jacobs, R.A. (1994) Neural Computation6,
181-214.
Kass, R.E. & Raftery, A.E. (1995) J. Am. Statist.
Assoc. 90, 773-795.
Lindsay, B.G. (1995) Mixture models: Theory, Geometry
and Applications. IMS
McLachlan, G.J. (1987) Appl. Statist. 36,
318-324.
Mengersen, K.L. & Robert, C.P. (1996) In Bayesian
Statistics 5, 255-276. OUP
Møeller, J., Mira, A. & Roberts, G.O. (1999)
Preprint.
Moreno, E. & Liseo, B. (1998) Preprint
O'Hagan, A. (1995) J.R. Statist. Soc. B 57,
99-138.
Petrone, S. (1999) Canadian J. Statist. 27,
105-126
Phillips, D.B. & Smith, A.F.M. (1996) In MCMC
in Practice, 215-240. Chapman and Hall.
Raftery, A.E. (1996) In MCMC in Practice, 163-188.
Chapman and Hall
Richardson, S. & Green, P.J. (1997) J.R. Statist.
Soc. B 59, 731-792.
Roeder, K. & Wasserman, L. (1997) J. Am. Statist.
Assoc. 92, 894-902.
Stephens, M. (1998). D.Phil. Thesis, Oxford University.