TUTORIAL 

 

Symbolic Data Analysis of Complex Data

 

Edwin Diday

CEREMADE, Paris Dauphine University
Place du M. Lattre de Tassigny 75775 Paris, France
diday@ceremade.dauphine.fr

Important Dates

 

Paper Submission

February 6, 2012

March 9, 2012

Paper Notification of Acceptance

March 1, 2012

March 31, 2012

Poster & Demo Submission

February 15, 2012

March 15, 2012

Poster & Demo Notification of Acceptance

March 8, 2012

March 31, 2012

 

Standard Registration Deadline

March 19, 2012

April 4, 2012

April 9, 2012

 

May 7, 8 & 9, 2012

 

 

Supporter

Associazione Italiana per l'Intelligenza Artificiale

Abstract

"Complex data" have been defined in the following way: "in contrast to the typical tabular data, complex data can consist of heterogeneous data types, can come from different sources, or live in high dimensional spaces. All these specificities call for new data mining strategies". Sometimes "complex data" refers to complex objects like images, video, audio or text documents. Sometimes, it refers to distributed data or structured data or more specifically: spatial-temporal data or heterogeneous data as a mixture of data, as for example, a medical patient described by images, text documents and socio-demographic information. In practice, complex data are more or less based on several kinds of observations described by standard numerical or (and) categorical data contained in several related data tables. In this talk our aim is to show that the study of Complex Data in order to get new knowledge requires the use of “symbolic data” which are an extension of standard numerical or categorical data. The usual data mining model is based on two parts: the first concerns the observations, the second, contains their description by several standard variables including numerical or categorical.

The Symbolic Data Analysis (SDA) model (see Billard and Diday (2006), Diday and Noirhomme (2008)) needs two more parts: the first concerns classes of observations called concepts and the second concerns their description by symbolic data. The concepts are characterized by a set of properties called intent and by an extent defined by the set of observations which satisfy these properties. These concepts are described by symbolic data which are standard categorical or numerical data but moreover intervals, histograms, sequences of weighted values and the like, in order to take care of the variation of their extent. These new kinds of data are called symbolic as they cannot be manipulated as numbers. Then, based on this model, new knowledge can be extracted by new tools of data mining extended to concepts considered as new kinds of observations.

In this talk we try to answer the following questions:

What are Complex Data?

What are “symbolic data”?

How “Symbolic Data” are built?

Are Symbolic Data Complex Data?

In which sense Complex Data are Symbolic Data?

What is “Symbolic Data Analysis”?

In which sense Conceptual Lattices are the underlying structure of Symbolic Data?

The talk is illustrated by several industrial applications including on telephone calls text mining in order to discover “themes” and financial data in order to characterise best stocks and their typical trajectories by using an extension of standard PCA to metabins. Finally, we indicate open directions of research and we show that SDA provides a framework for extracting new knowledge from Complex Data.

 

 

References

Recent books
E. Diday, M. Noirhomme (eds and co-authors) (2008) “Symbolic Data Analysis and the SODAS software”.457 pages. Wiley. ISBN 978-0-470-01883-5

L. Billard, E. Diday (2006) “Symbolic Data Analysis: conceptual statistics and data Mining”. 321 pages.
Wiley series in computational statistics. Wiley. ISBN 0-470-09016-2

E. Diday (2005) "Categorization in Symbolic Data Analysis". In handbook of categorization in cognitive
science. Edited by H. Cohen and C. Lefebvre. Elsevier editor.
http://books.elsevier.com/elsevier/?isbn=0080446124

International Journal papers

Edwin Diday (2008) “Spatial classification”. DAM (Discrete Applied Mathematics) Volume 156, Issue 8, Pages 1271-1294.
L. Billard, E. Diday (2003) “From the Statistics of Data to the Statistics of Knowledge: Symbolic Data Analysis”. JASA. Journal of the American Statistical Association. Juin, Vol. 98, N° 462.
E. Diday , R. Emilion (2003) "Maximal and stochastic Galois Lattices" . Journal of Discrete Applied Mathematics . 127 , 271-284.

 

TUTORIAL REGISTRATION

Mail to slds2012@ceremade.dauphine.fr

FEES

 


 

Free Dreamweaver template created with Adobe Dreamweaver