Rencontres Statistiques du CEREMADE (Anthony Ozier-Lafontaine, lundi 4 mars 2024)

15 février 24

La prochaine séance des Rencontres statistiques du CEREMADE aura lieu lundi 4 mars 2024 à 14h en salle A707. Nous aurons le plaisir d'écouter Anthony Ozier-Lafontaine (École centrale de Nantes), qui nous présentera ses travaux sur "Kernel-based testing and their application to single-cell data".

TitleKernel-based testing and their application to single-cell data

Single-cell RNA sequencing (scRNAseq) is a high-throughput technology quantifying gene expression at the single-cell level, for hundreds to thousands of observations (i.e. cells) and tens of thousands of variables (i.e. genes). New methodological challenges arose to fully exploit the potentialities of these complex data. A major statistical challenge in scRNAseq data analysis is to distinguish biological information from technical noise in order to compare conditions or tissues. The principal approach to do this is Differential Expression Analysis (DEA), which is basically gene-wise univariate two-sample tests. However, DEA misses the multivariate aspects of scRNAseq data, which carries information about gene dependencies and gene regulatory networks, and does not inform about the global similarity of the compared datasets. Thus there is a need to develop specific multivariate two-sample tests to test for any global difference between two scRNAseq datasets. Moreover, pairwise comparisons are not suited to some complex experimental designs, thus general hypothesis testing appears as a natural generalization of these two-sample univariate and multivariate tests. 

I propose to apply kernel two-sample tests to compare pairs of conditions. Among kernel two-sample tests, the test based on the Kernel Fisher Discriminant Analysis (KFDA test) may be seen as a regularized version of the famous Maximum Mean Discrepancy (MMD) test and allows to visually highlight the main cell-wise differences through a geometrical interpretation of the discriminant analysis. Then I show that a generalization of the KFDA test may be obtained by performing linear hypothesis testing on a linear model defined in the associated Reproducing Kernel Hilbert Space (RKHS). This link with a linear model allows to kernelize every classical diagnostic and interpretation tools from the linear model.