**TUTORIAL**

##

**Symbolic Data Analysis of Complex Data**

###

### Edwin Diday

CEREMADE, Paris Dauphine University

Place du M. Lattre de Tassigny 75775 Paris, France

*diday@ceremade.dauphine.fr*

### Important Dates

**Paper Submission**

~~February 6, 2012~~

**March 9, 2012**

**Paper Notification of Acceptance**

~~March 1, 2012~~

**March 31, 2012**

**Poster & Demo ****Submission**

~~February 15, 2012~~

**March 15, 2012**

**Poster & Demo**** Notification of Acceptance**

~~March 8, 2012~~

**March 31, 2012**

**Standard Registration Deadline**

~~March 19, 2012~~

~~April 4, 2012~~

**April 9, 2012**

## May 7, 8 & 9, 2012

###

### Supporter

## Abstract

"Complex data" have been defined in the following way: "in contrast to the typical tabular data, complex data can consist of heterogeneous data types, can come from different sources, or live in high dimensional spaces. All these specificities call for new data mining strategies". Sometimes "complex data" refers to complex objects like images, video, audio or text documents. Sometimes, it refers to distributed data or structured data or more specifically: spatial-temporal data or heterogeneous data as a mixture of data, as for example, a medical patient described by images, text documents and socio-demographic information. In practice, complex data are more or less based on several kinds of observations described by standard numerical or (and) categorical data contained in several related data tables. In this talk our aim is to show that the study of Complex Data in order to get new knowledge requires the use of “symbolic data” which are an extension of standard numerical or categorical data. The usual data mining model is based on two parts: the first concerns the observations, the second, contains their description by several standard variables including numerical or categorical.

The Symbolic Data Analysis (SDA) model (see Billard and Diday (2006), Diday and Noirhomme (2008)) needs two more parts: the first concerns classes of observations called concepts and the second concerns their description by symbolic data. The concepts are characterized by a set of properties called intent and by an extent defined by the set of observations which satisfy these properties. These concepts are described by symbolic data which are standard categorical or numerical data but moreover intervals, histograms, sequences of weighted values and the like, in order to take care of the variation of their extent. These new kinds of data are called symbolic as they cannot be manipulated as numbers. Then, based on this model, new knowledge can be extracted by new tools of data mining extended to concepts considered as new kinds of observations.

In this talk we try to answer the following questions:

What are Complex Data?

What are “symbolic data”?

How “Symbolic Data” are built?

Are Symbolic Data Complex Data?

In which sense Complex Data are Symbolic Data?

What is “Symbolic Data Analysis”?

In which sense Conceptual Lattices are the underlying structure of Symbolic Data?

The talk is illustrated by several industrial applications including on telephone calls text mining in order to discover “themes” and financial data in order to characterise best stocks and their typical trajectories by using an extension of standard PCA to metabins. Finally, we indicate open directions of research and we show that SDA provides a framework for extracting new knowledge from Complex Data.

##

**References**

**Recent books**

E. Diday, M. Noirhomme (eds and co-authors) (2008) “Symbolic Data Analysis and the SODAS software”.457 pages. Wiley. ISBN 978-0-470-01883-5

L. Billard, E. Diday (2006) “Symbolic Data Analysis: conceptual statistics and data Mining”. 321 pages.

Wiley series in computational statistics. Wiley. ISBN 0-470-09016-2

E. Diday (2005) "Categorization in Symbolic Data Analysis". In handbook of categorization in cognitive

science. Edited by H. Cohen and C. Lefebvre. Elsevier editor.

*http://books.elsevier.com/elsevier/?isbn=0080446124*

**International Journal papers**

Edwin Diday (2008) “Spatial classification”. DAM (Discrete Applied Mathematics) Volume 156, Issue 8, Pages 1271-1294.

L. Billard, E. Diday (2003) “From the Statistics of Data to the Statistics of Knowledge: Symbolic Data Analysis”. JASA. Journal of the American Statistical Association. Juin, Vol. 98, N° 462.

E. Diday , R. Emilion (2003) "Maximal and stochastic Galois Lattices" . Journal of Discrete Applied Mathematics . 127 , 271-284.

**TUTORIAL REGISTRATION**

Mail to** slds2012@ceremade.dauphine.fr**

**FEES**

- Normal: 200€
- Student: 100€
** **