# ABC in Paris

## June 26, 2009

We are organising a one-day meeting on recent advances on ABC methods in Paris, at Université Paris Dauphine, next June 26, 2009, as the final step of our ANR 2005-2008 Misgepop project, with financial support from Université Paris Dauphine (BQR) and from the GIS "Sciences de la Décision" X-HEC-ENSAE. There have been so many advances in this area in the past year or so that a single day is obviously too short to cover the whole field, but it should nonetheless put the highlight on those advances and bring the (local) communities together. Further and longer meetings may also stem from that one.

The program of the workshop is

• 9h30-10h00: Welcome coffee
• 10h00-10h30: Arnaud Doucet, Institute of Statistical Mathematics, Tokyo, and Ajay Jasra, Imperial College London, "An Adaptive Sequential Monte Carlo Method for Approximate Bayesian Computation" [slides]
Approximate Bayesian computation (ABC) is a popular approach to address inference problems where the likelihood function is intractable, or expensive to calculate. To improve over Markov chain Monte Carlo (MCMC) implementations of ABC, the use of sequential Monte Carlo (SMC) methods has recently been suggested. Effective SMC algorithms that are currently available for ABC have a computational complexity that is quadratic in the number of Monte Carlo samples and require the careful choice of simulation parameters. In this article an adaptive SMC algorithm is proposed which admits a computational complexity that is linear in the number of samples and determines on-the-fly the simulation parameters. We demonstrate our algorithm on both a toy and a population genetics example.
• 10h30-11h00: Tina Toni, Imperial College London, "ABC SMC for dynamical systems[slides]
Approximate Bayesian Computation (ABC) methods can be used in situations where the evaluation of the likelihood is computationally prohibitive. They are thus ideally suited for the analysis of complex dynamical systems (Toni et al. 2009), where knowledge of the full (approximate) posterior is often essential. Here we discuss improvements to an ABC approach, which is based on sequential Monte Carlo (SMC). We are particularly interested in applying ABC SMC to the increasingly important model selection problem. We will discuss how ABC SMC can be adapted for model selection for dynamical systems given a set of candidate models. In particular we will discuss how we can balance the "fit" to the data with the complexity of the simulation model. Being based on repeated simulation, ABC SMC is computationally expensive for models with many parameters (such as those considered in systems biology). We present an exploration of different perturbation kernels, which can improve the computational efficiency by exploring large-dimensional parameter spaces, yet still allow us to address the issue of maintaining particle diversity to obtain good approximations to the posterior distribution.
• 11h00-11h30: Marc Briers, QinetiQ Malvern, "Marginal and joint space representations within ABC, and the issue of bias[slides]
In this talk we will discuss two representations of the target distribution within an ABC context (relating to a marginal and joint space representation of the target distribution). We will also discuss the bias arguments related to the paper by Sisson et al (2007). We will establish a set of unbiased ABC-SMC based algorithms, and finally provide an application.
• 11h30-12h00: Christoph Leuenberger, Université de Fribourg, "ABC and Model Selection in Population Genetics[slides]
A key innovation to ABC was the use of a post-sampling regression adjustment, allowing larger tolerance values and as such shifting computation time to realistic orders of magnitude (Beaumont et al.). In my talk I propose a reformulation of the regression adjustment in terms of a General Linear Model (GLM). This allows a natural integration into the theoretical framework of Bayesian statistics and the use of its methods, including model selection via Bayes factors. As an illustration, the proposed methodology is applied to the question of population subdivision among western chimpanzees.
• 12h00-12h30: Oliver Ratman, Imperial College London, "Model Criticism based on likelihood-free inference, with an application to protein network evolution" [slides]
In many areas of computational biology, the likelihood of a scientiﬁc model is intractable, typically because interesting models are highly complex. This hampers scientiﬁc progress in terms of iterative data acquisition, parameter inference, model checking and model reﬁnement within a Bayesian framework. We provide a statistical interpretation to current developments in likelihood-free Bayesian inference that explicitly accounts for discrepancies between the model and the data, termed Approximate Bayesian Computation under model uncertainty (ABCµ) (1). We augment the likelihood of the data with unknown error terms that correspond to freely chosen checking functions, and describe possible Monte Carlo strategies for sampling from the associated joint posterior distribution without the need of evaluating the likelihood. We discuss the beneﬁt of incorporating model diagnostics within an ABC framework, and demonstrate how this method diagnoses model mismatch and guides model reﬁnement by contrasting three qualitative models of protein network evolution to the protein interaction datasets of Helicobacter pylori and Treponema pallidum. The presented methods will be useful in the initial stages of model and data exploration, and in particular to eﬃciently scrutinize several models for which the likelihood is intractable by direct inspection of their summary errors, prior to more formal analyses.
• 12h30-13h00: Jean-Michel Marin, Université de Montpellier 2, "ABC methods for model choice in Gibbs random fields[slides]
The core idea is that, for Gibbs random fields and in particular for Ising models, when comparing several neighbourhood structures, the computation of the posterior probabilities of the models under competition can be operated by likelihood-free simulation techniques (ABC). The turning point for this resolution is that, due to the specific structure of Gibbs random field distributions, there exists a sufficient statistic across models which allows for an exact (rather than Approximate) simulation from the posterior probabilities of the models. Obviously, when the structures grow more complex, it becomes necessary to introduce a true ABC step with a tolerance threshold $\mathbf{\epsilon}$ in order to avoid running the algorithm for too long. Our toy example shows that the accuracy of the approximation of the Bayes factor can be greatly improved by resorting to the original ABC approach, since it allows for the inclusion of many more simulations. In a biophysical application to the choice of a folding structure for two proteins, we also demonstrate that we can implement the ABC solution on realistic datasets and, in the examples processed there, that the Bayes factors allow for a ranking more standard methods (FROST, TM-score) do not.
• 13h00-13h30:  Lunch break (bring your own sandwich!!!, tea and coffee will be available)
• 13h30-14h00:  David Balding and Matt Nunes, Imperial College London, "Selecting summary statistics for ABC" [slides]
Recently Joyce and Marjoram ("Approximately sufficient statistics and Bayesian computation", Stat. Appl. Genet. Mol. Biol. 7(1):26, 2008) developed a sequential scheme for selecting the best subset of summary statistics to use in ABC, given a set of candidate summary statistics. Their approach was based on a notion of approximate sufficiency. We will report the results of our investigation seeking ways to improve on their scheme, using Kullback-Leibler divergence.
• 14h00-14h30: Paul Fearnhead, University of Lancaster, "Choice of Summary Statistics for ABC" [slides]
We will look at how simulation can be used to produce informative summary statistics within ABC. The issue will be investigated both theoretically and via simulation, including comparisons with examples of ABC taken from the literature.
• 14h30-15h00: Marc Beaumont, University of Reading, "ABC and hierarchical models: summary statistics, algorithms, and applications in population genetics" [slides]
Recently a group of techniques, variously called likelihood-free inference, or Approximate Bayesian Computation (ABC), have been quite widely applied in population genetics. These methods typically require the data to be compressed into summary statistics. In a hierarchical setting one may be interested both in hyper-parameters and parameters, and there may be very many of the latter - for example, in a genetic model, these may be parameters describing each of many loci or populations.  This poses a problem for ABC in that one then requires summary statistics for each locus, and,  if used naively, a consequent problem in conditional density estimation.  We develop a general method for addressing these problems efficiently, and we describe recent work in which the ABC method can be used to detect loci under local selection.
• 15h00-15h30: Michael Blum, TIMC, Grenoble, "Approximate Bayesian Computation: a non-parametric perspective[slides]
We present Approximate Bayesian Computation as a technique of inference that relies on stochastic simulations and non-parametric statistics. For both the original estimator of the posterior distribution based on kernel smoothing and a refined version of the estimator based on a linear adjustment, we give their asymptotic bias and variance. Additionally, we introduce an original estimator of the posterior distribution based on quadratic adjustment and we show that its bias contains a smaller number of terms compared to the estimator with linear adjustment. Although, we find that the estimators with adjustment are not universally superior to the estimator based on kernel smoothing, we find that they can achieve better performance when there is a nearly homoscedastic relationship between the summary statistics and the parameter. Last, we show that both asymptotic results and numerical simulations emphasize the importance of the curse of dimensionality in Approximate Bayesian Computation.
• 15h30-16h00: Tea (and coffee) break
• 16h00-16h30: Richard Wilkinson, University of Sheffield, "The error in ABC[slides]
The approximation error in ABC algorithms can be understood by the consideration of an additive error term, where the distribution of this error can be inferred from the choice of metric and acceptance kernel. Once we are aware of this we can begin to think more carefully about what model error we expect for our models, and consequently what metric, tolerance and summaries we would ideally use. There may also be the opportunity to rewrite some models so that sampling can be done by the ABC rejection step, thus raising the possibility of exact inference in some cases.
• 16h30-17h00: Christophe Andrieu, University of Bristol,  "ABC and exact approximations";
• 17h00-17h30: Olivier Francois, TIMC, Grenoble, "Non-linear regression models for Approximate Bayesian Computation[slides]
Approximate Bayesian inference on the basis of summary statistics is well-suited to complex problems for which the likelihood is either mathematically or computationally intractable. However the methods that use rejection suffer from the curse of dimensionality when the number of summary statistics is increased. Here we propose a machine-learning approach to the estimation of the posterior density by introducing two innovations. The new method fits a nonlinear conditional heteroscedastic regression model on the summary statistics by using a penalized least-squares method, and then adaptively improves estimation by using importance sampling. We also investigate the choice of the regularization parameter and the tolerance rate in ABC algorithm with a version of the Deviance Information Criterion. The new algorithm is compared to the state-of-the-art approximate Bayesian methods, and achieves considerable reduction of the computational burden in two examples of inference in statistical genetics and in a queueing model.
• 17h30-18h00: Arnaud Estoup, CBGP, INRA, Montpellier, "From theory to application: DIYABC, a user-friendly program to infer complex population histories using Approximate Bayesian Computation[slides]
DIYABC is a computer program with a graphical user interface and a fully click-able environment. It allows population biologists to make inference based on Approximate Bayesian Computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIYABC can be used to compare competing scenarios, estimate parameters for one or more scenarios, and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data).

This definitely is a marathonian schedule (!), but it should allow for attendees from France or nearby countries to make the round trip within the same day (if needed).

This meeting is free, with no registration, and open to anyone interested. The talks will take place in Amphitheater 2-3 of Université Paris Dauphine, located on the second floor of the (unique) university building. Université Paris Dauphine is located in downtown Paris (Porte Dauphine) and is accessible by metro (e.g., stops Porte Dauphine, or Avenue Foch) as explained there.

Contact Christian Robert at bayesianstatistics[(à)]gmail.com for further practical information (but the programme is now complete, no more talks, sorry!)