SOFTWARE USER MANUAL
 
 
 
 

PROC Factorial Discriminant Analysis
D.M.S. University of Naples

 
 
 
 
 




1. Introduction
 
 

    1. Aim and Objective
See scientific report.
 
 

1.2 Printout

Dissimilarities / Distances table

Classification tables

Classification Summary

Classification Ratio

Coordinates of the vertices of hypercubes that represent the objects on each factorial axis.
 
 

1.3 Parameters

All the parameters will be specified.

Parameters put in bold do not have default values and it is mandatory for the computation to assign them values.
 
 
 
 

2. Commands

Default values are given in brackets after the names of the parameters.

When an error is found in the ".pad" file, but it can be solved by default, the parser dumps a WARNING in the error ".log" file; differently an ERROR will be dumped and the program is stopped.

Parameter LIST:

--------------------------------------------------------------------------------

PROC DMS_FDA

title of the procedure

CLASS_ID Specifies the variable that identifies the a priori classes of SOs.

(NO DEFAULT VALUE is available)

NVARS Specifies the number of predictors to be used

(NO DEFAULT VALUE is available)

SELECT Specifies the predictors to be used

(NO DEFAULT VALUE is available)

SET_ID (0) Specifies the binary variable in order to identify the SO?s of the training set and test set (or the set of new SO?s to be assigned).

NUMB (2) number of factorial variables (axes) with respect to compute the geometrical classification rule

AXES (1,2) Specifies of factorial variables (axes) with respect to compute the geometrical classification rule

CLSM (1) Type of classification method

CLASSRULE (0) classification rule specification

GAMMA (50) value for the "gamma" parameter in Ichino - De Carvalho?s distance (CLASSRULE =1)

RHO (2) value for the parameter r in Ichino - De Carvalho?s distance (CLASSRULE =1)
 
 

3. Parameter Details
 
CLASS_ID
Specifies which variable should be used as class identifier
* possible values : from 1 to the maximum number of variables.

* default value : NO DEFAULT
 
 

This value specifies which variable identifies different classes. NB: This variable should be a NOMINAL variable without associated probability. The variable cannot be a Multi-Nominal variables.
 
This variable cannot be chosen in the SELECT option in the parameter file ".pad".
 
 


 
 
 
 
 
 
 
 
 
NVARS
Specifies the number of predictive variables to be used

* possible values : from 2 to the maximum number of variables-1

* default value : NO DEFAULT

This parameter specifies the number of variables to be used in the analysis as predictors.


 
SELECT
Specifies which variables will be used 

This parameter specifies which variables to use in the determination of the factorial discriminant axes.

These variables can be: nominal, multi nominal, multi nominal with associated probability (modal), intervals, and real.

It is not possible to choose the same variables selected as Class_ID or Set_ID.

This software version does not allow the choice of variables containing NA and NULL values.
 
SET_ID
Specifies which variable should be used as set identifier

* possible values : from 1 to the maximum number of variables.

* default value : 0

This parameter specifies which variable identifies objects belonging to TRAINING and TEST sets.

NB: The SO?s identified as belonging to "TEST set" can be also new SO?s to be assigned to the a priori classes according to the classification rule.

The variable SET_ID should be a NOMINAL one without associated probability (modes) and with only two categories.

It

If 0 is set as value, the algorithm will use all the SOs in the file both as test and training set in order to validate the procedure.

This variable cannot be chosen in the SELECT option in the parameter file ".pad".


 
 
 
NUMB
Number of axes to be used in analysis and printed in output

* possible values : from 2 to the maximum number of meaningful axes.

* default value : 2

NUMB is the number of factorial variables (axes) used in order to compute the distances (or dissimilarities) between SO?s. The scores of the SO?s image are furnished in the output file. If more than the meaningful factorial variables (axes) are selected, the program reduces automatically the number of them and a WARNING message is dumped in the ".LOG" file.


 
AXES
The axes to be used

* possible values : a vector of integer (ex. 1,2,3)

* default value : 1,2
 
 

AXES allows to choose the factorial variables (axes) to be used in computation of the distances or dissimilarities between SO?s.

The factorial variables (axes) that can be extracted depend on the values of their relative eigenvalues. If the user chooses a factorial variable (axis) associated to a trivial eigenvalue (e.g. if just 5 eigenvalues are extracted in the analysis and AXES=1, 23), the programs is stopped and an ERROR dumps.


 
CLSM
Allows to choose the distance to use

* possible values : 1, 2 or 3

* default value : 1

In the computation of the proximity between classes and objects, this parameter selects the classification method : the single (1), the average (2), the complete linkage (3)
 
CLASSRULE
Specify the classification rule that should be used
* possible values : 0 or 1

* default value : 0

It specifies the classification rule:

with 0 the Potential Descriptor Increase dissimilarity is used.

with 1 Ichino - De Carvalho?s distances is used.

Any different value from 0 or 1 is not admitted.
 
 
 
GAMMA
Specifies the value of constant g

* possible values : 0 or 100

* default value : 50
 
 

It specifies (in percentage) the value of constant g (gamma) (see Scientific Report) in the Ichino - De Carvalho?s distance.

e.g. : for Gamma = 50 the value of parameter g is set to 0.5

Notice that whereas CLASSRULE = 0 this parameter is ignored.
 
 
 
RHO
Specifies Minkowski?s parameter in De Carvalho?s distance

* possible values : any integer more or equal to 1

* default value : 2
 
 

It specifies the Minkowski?s parameter in Ichino - De Carvalho?s distance (see Scientific Report)

Notice that whereas CLASSRULE = 0 this parameter is ignored.
 
 
 
 
 
 

4. Examples of Commands
 
 
 
 

----+----1----+----2----+----3----+----4----+----5----+----6----+---

PROC = DMS_FDA

======= TEST ANALYSIS ========

NUMB = 2

AXES = 1,2

CLASS_ID = 1

SET_ID = 5

NVARS = 6

SELECT = 3,5,12,13,16,18

CLSM = 2

CLASSRULE = 1

gamma = 30

rho = 2

----+----1----+----2----+----3----+----4----+----5----+----6----+---
 
 
 
 

5. Files Required for Program Operation
 
  * input file SODAS

* output file .LST .LOG and .coo files
 
 

6. Data requirements
 
 

Data special requirements are :


 
 

7. Error List

ERROR ... too few factors to continue (less of 3)

... High correlation between variables. Not able to continue.

(This error occurs when the number of meaningful eigenvalue of the coded matrix is less than 3.)

ERROR it is not possible to use NA and NULL value in FDA.

ERROR a continuous or interval variable is constant.

ERROR You must select a nominal variable as SET_ID

ERROR You must select a nominal variable as SET_ID with only two categories

ERROR Not all the categories are in the set variable

ERROR You must select a nominal variable as identification class (CLASS_ID)

ERROR in the sodas input file (.SDS)

Not all the categories are in the training set.

Program stops.

ERROR The variable <Variable Label> does not have all the categories declared in the domain definition !!
 
 

ERROR None of the axes required in the PAD file for distance computing can be used

ERROR : It is not possible to calculate distance on the axis <axis number>

The maximum is <max axis number>

Distance not calculable!
 
 

ERROR : Cannot calculate the inverse! Singular matrix

It is possible that not all the modalities of a variable are present in the
TRAINING set of SO.