TREE : Decision Tree
1. Presentation of TREE
    The CSCI Decision Tree proposes an algorithm of tree growing applied to explicitly imprecise data. These are formally described by probabilistic assertions in the framework of symbolic data analysis. In this context, the recursive partition procedure can be viewed as an iterative search for an organized set of symbolic objects which best fits the initial data. At each step, the best split is obtained through the use of a general information measure.

    In output, we obtain a new list of symbolic objects that permit to assign future new objects to one class of the known prior partition.
     
     

2. The input
    The input of the TREE method is a classical data matrix or a symbolic data matrix.

    We take as an example the symbolic transformation of the waveform recognition data proposed L. Breiman, J.H. Friedman, R.A. Oslhen and C. J. Stone in "Classification and Regression Trees"; Belmont Eds, 1984.

    The symbolic data matrix is here composed of 30 symbolic objects described by 21 symbolic interval variables and a classification variable which is a qualitative variable. The "BASE" (see FIG1.) is the file wave30.sds.

    FIG1. Chaining example

3. The parameters definitions
    The TREE method takes also as input a list of variables and some parameters we will define now.

    3.1 List of the selected variables

The user must choose the set of the predictor variables which is: The mixed data table is not treated in this version..
    3.2 Parameter details
Seven parameters have to be defined : *possible values : 2 to number of OS.

*default value : number of OS/2

*possible values : Pure or Fuzzy

*default value : Fuzzy

If the value is Pure then the a posteriori probabilities of OS are 1 or 0. If the value is Fuzzy then a posteriori probabilities of OS are included in [0,1]. *possible values : Gini or Information or Likelihood

*default value : Gini for Pure or Likehood for Fuzzy

Remark:

If Soft Assignment is Fuzzy then Splitting criterion is Likelihood

If Soft Assignment is Pure then Splitting criterion is Gini or Information.

The construction of the decision tree is equivalent to a recursive partitionning of the representation space into subspace. During tree construction, nodes are successively split until the following stopping rule is true for all leafs of the tree. TREE use two stopping rules defined by the parameters Minimum size of no-majority class and Minimum size of right or left descendant nodes.

If the size of the leaf is smaller than Minimum size to split the node this leaf is not split.

If the size of the OS don't belong to the no-majority class is smaller than Minimum size of no-majority this leaf is not split *possible values : 1 to Minimum size to split the node/2

*default value : 1

If the size of right or left descendant nodes is smaller than Minimum size of right or left descendant nodes.then the binary split is not selected. If the value is 0. test set is empty.
    3.3Waveform example
In this, in the SODAS TREE?s parameters definition window (see FIG 2.), the user has to choose between qualitative and continuous variables and should not mix them. In the waveform example, the 21 continuous variables position1 to position21 are selected.
 
 


 
 

FIG2. The SODAS window for the selection of the predictor variables

After you must selected only one qualitative variable which represents classification variable.

FIG3. The SODAS window for the selection of the classification variable

In the waveform example (see FIG3.) we have chosen a dissimilarity with no normalization, the number of classes is 3 which mean that the DIV method will perform a partition in two classes and a partition in 3 classes, and we have chosen not to create a partition file.


 
 

FIG4. The SODAS window for the definition of the parameters





4. The output

After having defined the parameters and run the TREE method, a listing is given as output (see FIG5.)

FIG5. The chaining after running the DIV method

5. Waveform example output
    5.1 General information
      For instance, the end of the listing obtained with the waveform example is the following :
         ID_CLASS  NAME_CLASS
      
            1      wave_1 
            2      wave_2
            3      wave_3
      
        CLASS SIZE LEARNING TEST 
      
         1     10    10      0 
      
         2     10    10      0 
      
         3     10    10      0 
      
        TOTAL  30    30      0
      In this case the test set is empty and each a priori class contains 10 symbolic objects. The first table gives the correspondence between the name NAME_CLASS and the number ID_CLASS for each a priori classes.

      The aim is to build the set of binary questions that will be used in order to grow the tree. TREE will employ standard binary questions.
       

    5.2 Splitting step
      TREE aims at choosing the best binary split at a given node, among all admissible questions. It relies on the choice of an information measure that permits to calculate the degree of homogeneity of two new nodes induced by a given binary split.

      For each node we have :

        ================================= 
        | SPLIT OF A NODE       :     1 |
        ================================= 
      
        LEARNING SET 
      
           ========================================================
           |          |  N(k/t)  |   N(k)   |  P(k/t)  |  P(t/k)  |
           ========================================================
           | wave_1   |    10.00 |    10.00 |    33.33 |   100.00 |
           | wave_2   |    10.00 |    10.00 |    33.33 |   100.00 |
           | wave_3   |    10.00 |    10.00 |    33.33 |   100.00 |
           ========================================================
      TREE CRITERION  0.666667 
      
           ===========================================================
           |  Ord |   variable         |    value     |    criterion |
           ===========================================================
           |    1 |(  13) position 13  |       3.8100 |       0.3333 |
           |    2 |(   5) position  5  |       0.4400 |       0.4167 |
           |    3 |(   9) position  9  |       3.5200 |       0.4167 |
           |    4 |(   7) position  7  |       2.1100 |       0.4284 |
           |    5 |(   8) position  8  |       2.8600 |       0.4444 |
           ===========================================================
      The variables "position 5", "position 9","position 7" and "position 8" are alternative variables of the variable "position 13".
       SPLITTING NODE:    1 
      
       VARIABLE      : (  13) position 13 
       SPLIT         : 3.810000 
       CRITERION     : 0.333333
      The first division has been performed according to the variable "position 13" and to the cut value 3.81. The first binary question is [ position 13 <= 3.81]? The objects in node 2 have answered "yes" to this question. They correspond to the 20 objects ( 10 belong the a priori class wave_1 and 10 belong wave_2. The 10 other objects have answered "no" to this question and correspond to the objects of the node 3 .
            LEARNING SET
      
           ===================================================
           |          | left node  | right node | Row totals |
           |  node    |         2  |       3    |       1    |
           ===================================================
           | wave_1   |     10.00  |      0.00  |     10.00  |
           | wave_2   |     10.00  |      0.00  |     10.00  |
           | wave_3   |      0.00  |     10.00  |     10.00  |
           ===================================================
           |  Total   |     20.00  |     10.00  |     30.00  |
           ===================================================
      Each node has to be tested in order to decide if it should be or not a terminal node (= a leaf). Classical tests will be used for this phase: a node will be considered terminal if its size is less than a given parameter (fixed by the user), or if it contains objects belonging all to the same class of the prior partition (the node is called pure). The node 3 is a terminal node.
        ================================= 
        | SPLIT OF A NODE       :     3 |
        ================================= 
      
        LEARNING SET 
      
           ========================================================
           |          |  N(k/t)  |   N(k)   |  P(k/t)  |  P(t/k)  |
           ========================================================
           | wave_1   |     0.00 |    10.00 |     0.00 |     0.00 |
           | wave_2   |     0.00 |    10.00 |     0.00 |     0.00 |
           | wave_3   |    10.00 |    10.00 |   100.00 |   100.00 |
           ========================================================
      
       THIS STOP-SPLITTING RULE IS TRUE : The size of the no-majority classes is too small 
            SIZE OF THE NO-MAJORITY CLASSES 0.000000  VALUE OF STOP-SPLITTING RULE 2.000000 
      
        THIS NODE IS A TERMINAL NODE
      We have the list of objects assigned to the terminal node 3.
       List of objects :
      
       ( 3)"wave_3_0"     ( 3)"wave_3_6"     ( 3)"wave_3_1"     ( 3)"wave_3_7"     
       ( 3)"wave_3_3"     ( 3)"wave_3_4"     ( 3)"wave_3_5"     ( 3)"wave_3_2" 
       ( 3)"wave_3_8"     ( 3)"wave_3_9"
    5.3 More details on each leaf or terminal node
      Each leaf or terminal node are described by the following tables:
       LEAF    :    4 
       ==============
       
      
           ========================================================
           |          |  N(k/t)  |   N(k)   |  P(k/t)  |  P(t/k)  |
           ========================================================
           | wave_1   |    10.00 |    10.00 |    90.91 |   100.00 |
           | wave_2   |     1.00 |    10.00 |     9.09 |    10.00 |
           | wave_3   |     0.00 |    10.00 |     0.00 |     0.00 |
           ========================================================
       RULE :
          IF    [ position  9 <= 3.800000] IS TRUE 
         AND    [ position 13 <= 3.810000] IS TRUE 
      
           THEN ASSIGN_CLASS IS wave_1
      
       r(t)= 0.090909 p(t)= 0.366667 
       R(t)= 0.033333
       List of objects :
      
       ( 1)"wave_1_3"     ( 1)"wave_1_1"     ( 1)"wave_1_6"     ( 1)"wave_1_9"     
       ( 1)"wave_1_5"     ( 1)"wave_1_7"     ( 1)"wave_1_2"     ( 1)"wave_1_8"   
       ( 1)"wave_1_0"     ( 1)"wave_1_4"     ( 2)"wave_2_5"
    5.4 Final result

    It contains the main following information?s :

       RESULTS BY SYMBOLIC OBJECT 
      
           ==============================================
           | No |    Nom       |Leaf    |  Class        |
           |    |              |  No    | true  | assig.|
           ==============================================
           |  1 | "wave_1_3"   |      4 |     1 |  1    |
           |  2 | "wave_1_1"   |      4 |     1 |  1    |
           |  . |   ...        |      . |     . |  .    |
           | 13 | "wave_2_7"   |      5 |     2 |  2    |
           | 14 | "wave_2_5"   |      4 |     2 |  1 (*)|
           | 15 | "wave_2_4"   |      5 |     2 |  2    |
           |  . |   ...        |      . |     . |  .    |
           | 29 | "wave_3_8"   |      3 |     3 |  3    |
           | 30 | "wave_3_9"   |      3 |     3 |  3    |
           ==============================================
      For each object we have the code of the terminal node or leaf which is assigned, the a priori class or true class and the assign class. If the mark (*) exists then the object is misclassify.
       R(T)=   0.0333
      R(T) is equal to the sum of R(t) for all leaf of the tree T. R(T) represents the resubstitution estimate of the misclassification cost of the tree T.

      On the test set the value R(T) gives an estimate of the misclassification cost.

           CONFUSION MATRIX FOR TRAINNING SET
      
           ===========================================================
           |         | wave_1    | wave_2    | wave_3    |  Total    |
           ===========================================================
           | wave_1  |       10  |        0  |        0  |       10  |
           | wave_2  |        1  |        9  |        0  |       10  |
           | wave_3  |        0  |        0  |       10  |       10  |
           ===========================================================
           | Total   |       11  |        9  |       10  |       30  |
           ===========================================================
        MISCLASSIFICATION RATE BY CLASS
      
           TRUE CLASS            (  ERROR /SIZE    )   FREQUENCY
           wave_1                (      0 /     10 )       0.00
           wave_2                (      1 /     10 )      10.00
           wave_3                (      0 /     10 )       0.00
      
           TOTAL                 (      1 /     30 )       3.33
      The value 1 represents the number of objects of a priori class wave_2 which are assigned to the assign class wave_2. The assign class wave_2 is build by the rules of the decision tree.
        MISCLASSIFICATION RATE BY CLASS
      
           TRUE CLASS            (  ERROR /SIZE    )   FREQUENCY
           wave_1                (      0 /     10 )       0.00
           wave_2                (      1 /     10 )      10.00
           wave_3                (      0 /     10 )       0.00
      
           TOTAL                 (      1 /     30 )       3.33
      The resubstitution global error rate is 3.33 %.
         
         ==================================
         | EDITION OF DECISION TREE       |
         ==================================
      
      
       
                 +----  [ 4 ]wave_1  (    10.00      1.00      0.00 )
                 !   
            !----2[ position  9 <= 3.800000] 
            !    ! 
            !    +----  [ 5 ]wave_2  (     0.00      9.00      0.00 )
            ! 
       !----1[ position 13 <= 3.810000] 
            ! 
            +----  [ 3 ]wave_3  (     0.00      0.00     10.00 )
      If the answers of the binary question[ position 13 <= 3.810000]? is false then the object is assigned to a priori class wave_3 else the result depends of the reponse of the answers position 9 <= 3.800000]?. If the reponse is false then the object is assigned to a priori class wave_2 else the object is assigned to a priori class wave_1 and the vector (10.00, 1.00, 0.00) gives the number of objects of each a priori class assigned to leaf 4 ( 10 is the number of objects of a priori class 1 "wave_1").
    6. Bibliography
[1] Ciampi A. et al. (1996 ): Recursive Partition with probabilistically imprecise data. Ordinal and Symbolic Data Analysis. Springer Verlag, E. Diday et al. (eds.), pp. 201-212.

[2] E. Périnel (1996) Segmentation et Analyse des Données Symboliques. Application à des données probabilistes imprécises. Thèse de doctorat de l'Université Paris IX Dauphine.

[3] E. Périnel, A. Ciampi. (1997). Growing a probabilistic decision tree, AMSDA 1997 Naples, Italy.

[4] E. Périnel (1998) Construire un arbre de discrimination binaire à partir de données imprécises.RSA.