Françoise and Jacques GOUPIL
LISE-CEREMADE
University Paris IX Dauphine 75775 Paris France
e-mail: goupil@ceremade.dauphine.fr
Abstract of the methods, input and output
Running STAT
Using the listing
Using the graph
Error messages
Referenced documents
STAT extends to symbolic objects, represented by their description,
several "elementary statistics" methods usually limited to conventional
data.
It is a component of the SODAS software package, and as such is designed to run under the SODAS workbench and process SODAS data bases.
The relevant methods depend on the types of variables found in the SODAS base selected, and are filtered accordingly by the workbench:
1. relative frequencies for multinominal variables
2. relative frequencies for interval variables
3. capacities and min/max/mean for probabilistic multinominal variables
4. biplot for interval variables
Also covered is the central object identification, which does not depend on specific variable types.
The output data from the selected methods can be looked at in two ways: listing and graph, called via the dedicated icons of the workbench.
The graph can interactively be changed and customized (figures, shapes, colors, texts, comments, ...) and can be copied and saved.
In the following a) b) c) methods the input is always an array where there is a line for each symbolic object(individual) and a column for each variable.
In this method we compute the relative frequency of each modality
of the multinominal variable, taking into account the given rules from
the base. The graphic associated to the variable distribution can be either
a bar chart or a pie chart (see examples below).
In the following example the individuals are species of mushrooms described by several variables like cap presence, cap shape, etc… and there are two kinds of graphics which can be associated.
INDIVIDUALS :
1 S1
2 S2
3 S3
4 S4
VARIABLES :
1 Cap presence
1 present
2 absent
2 Cap shape
1 square
2 round
3 triangular
4 not applicable
3 Cap color
1 red
2 green
3 white
4 black
5 yellow
6 not applicable
.....
MATRIX :
Var 1 Var 2 Var 3 .....
Ind 1 1 1,3 3,4
Ind 2 1 1,3 2,4
Ind 3 1 1,3 2,3
Ind 4 2 4 6
RULES :
1=2 --> 2=4
1=2 --> 3=6
3=4 --> 2=3
3=4 --> 5=1,3
3=3 --> 5=3
the first rule means that
if the cap is absent ( value of var 1 = 2), the value of var 2 ( cap shape)
is not applicable.
In the following example, individuals are wine tasters and variables are wine " chateaux ". In the matrix cells we get the interval of the ratings given by a taster to a wine.
INDIVIDUALS :
1 AC
2 BY
3 CG
4 CQ
......
VARIABLES :
1 Ausone
2 Cheval Blanc
3 Cos d'Estournel
4 Ducru-Beaucaillou
5 Haut-Brion
6 L'Evangile
7 Lafite-Rothschild
8 Lafleur
............
MATRIX :
Var 1 Var 2 ...... Var 7 ......
Ind 1 56:74 75:92 ...... 64:82 ......
Ind 2 83:85 89:94 ...... 81:92 ......
Ind 3 84:90 86:92 ...... 87:90 ......
Ind 4 80:91 85:93 ...... 85:91 ......
c) Capacities and min/max/mean for probabilistic multinominal Variables.
The values of a probabilistic multinominal variable V on symbolic objects
SO1, SO2,...Son are for example :
SO1 -à p11M1, p12M2, p13M3 avec p11+p12+p13=1
SO2 -à p21M1, p22M2, p23M3 avec p21+p22+p23=1
.
.
Son -à pn1M1, pn2M2, pn3M3 avec pn1+pn2+pn3=1
In a capacity histogram, the capacity of a modality is the union capacity. Then, the capacity of (SO1 and SO2)for M1 is p11 + p21 - p11 * p21 and the capacity of SO1, SO2,...Son for M1 is computed using associativity property.
A min/mean/max graphic associates to each modality a sort of boxplot that represents the range and the mean of the probabilities of that modality.
The min value associated to M1, is the minimum of p11 , p21 ,.. , pn1 .
The mean value associated to M1, is the average of p11 , p21 ,.. , pn1 .
The max value associated to M1, is the maximum of p11 , p21 ,.. , pn1 .
In the following example, individuals are England areas and the variables deal with house equipment. Variable number 9 is a multinomial probabilistic variable which modalities are 1, 2, 3, 4. The matrix gives the probabilistic law of the modalities on each area.
INDIVIDUALS :
1 Northern metropolitan
2 North non-metropolitan
3 Yorks and humberside metropoli
4 Yorks and humberside non-metro
5 East midlands non-metropolitan
6 North west metropolitan
....
VARIABLES :
....
2 Central heating in property
....
....
9 QWEtelephone
1 AJ01 v=0
2 AJ02 v1-5
3 AJ03 v6-10
4 AJ04 v>10
....
MATRIX :
..... Var 9 .....
Ind 1 1(0.140), 2(0.640), 3(0.166), 4(0.053)
Ind 2 1(0.126), 2(0.561), 3(0.234), 4(0.076)
Ind 3 1(0.116), 2(0.576), 3(0.220), 4(0.087)
Ind 4 1(0.095), 2(0.600), 3(0.235), 4(0.070)
Ind 5 1(0.114), 2(0.590), 3(0.215), 4(0.079)
Ind 6 1(0.141), 2(0.541), 3(0.227), 4(0.089)
....
This graphic presents a symbolique object of the array like a rectangle in the plane of two variables choosen by the user. Each side of the rectangle is the range of the axis variable on the symbolique object.
In the following example, individuals are species of dogs and the variables choosen for the biplot representation are the weight and the neck height.
INDIVIDUALS :
1 Caniche
2 Chihuahua
3 Pekinois
.....
9 Mastiff
.....
13 SaiBer
VARIABLES :
1 Hauteur du garrot (neck height in cm)
2 Poids (weight in kg)
........
MATRIX :
Var 1 Var 2 .....
Ind 1 20:35 15:25
Ind 2 16:20 0.9:3.5
Ind 3 20:25 3:5
.....
Ind 9 75:75 100:100
.....
Ind 13 70:70 55:80
STAT comes as part of the SODAS package and does not need any specific
installation task.
4.1. Insert the STAT method in the chaining
N.B.: The term "method" may be confusing: although STAT incorporates various methods statistically speaking, it is seen as a single "method" by the workbench, which manages it as a whole.
From there on, the term
method will apply to each method inside STAT.
4.2. Parametrize STAT
This is done in two steps, involving two dialog boxes:
The workbench prompts for
one or several methods, depending on the types of variables.
4.3. Execute STAT
This is done within the chainig execution managed by the workbench.
4.4. Look at the output
After execution, the listing icon and the graph icon will show up next to the STAT icon. There are two exceptions:
The listing contains all the outcome from the method execution,
whereas it generally takes several graphs to scan all the results (roughly,
as many graphs as variables).
A header indicates the SODAS base used, the method selected and the date and time of execution, as shown below:
--------------------------------------------------------------------
SODAS - STAT RELATIVE FREQUENCIES (MODAL) Nov 01 1999 18:36
File: MUSHROO3.SDS
Title: Mushrooms
--------------------------------------------------------------------
Cap presence
pres present 0.7500
.........
The header is followed by detailed results which depend on the method selected.
Necessary identification is ensured by copying titles, labels and short IDs from the SODAS base.
N.B.: The biplot is not a method in itself since there is no computation and the only goal is to display a graph of the objects (as boxes); the listing is then irrelevant, by lack of results. Nevertheless, a listing is created in order to keep track of what has been done, and to give a list of the objects involved, for information.
Conversely, there is no graph for the central object, so that the listing is the only way to get the results.
It is driven by a menu, which makes it self-explanatory for most functions. Moreover, some functions are usual ones such as Saving, Copying, Printing...
The menu items are dynamically managed so that, according to the context, non applicable items are disabled and labelling of some items may change.
For ease of use, some menu items likely to be used repeatedly are mirrored in toolbar buttons (e.g. graph refresh button), as well as those commonly found in toolbars (e.g. Save, Copy, ..).
This section only details the items that are specific to STAT.
6.1 Menu
Since the menu is divided into groups of logically near functions, this description follows the menu sequence.
1. File
A file selection dialog box pops up. The type of file is .GRF.
Recalling a graph implies leaving the current one; a message box prompts the user for saving it or not.
The recalled graph keeps the same look and same editing and handling facilities as the original one, and can in turn be saved after editing.
The file name is kept from the SODAS file name, with the extension .GRF.
To save under another name, you may use the Save As option below.
This file may later be recalled with the option Open (see above), and reused as if the graph had not been left meanwhile.
a. the internal STAT format .GRF, like Save above
b. the standard Windows bitmap format .BMP
A file selection dialog box pops up. Proper formats are
enforced by the program.
STAT supports both portrait and landscape layouts.
2. Edit
As usual.
3. View
As usual.
4. Process
Only the variables previously selected in the workbench are available.
This is a context sensitive item.
In addition, clicking on an object in that list displays its values for the applicable variables.
Allows to select which objects are to be displayed on the graph (by default, all objects are displayed).
Avoids having to go back to the workbench for looking at the listing when already in the graph.
5. Draw
This is a context sensitive item.
This is a context sensitive item.
This is a context sensitive item.
This is a context sensitive item.
As a result, if the leftmost horizontal limit is positive, the origin will not be in the graph.
This item then enforces inclusion of the origin.
This is a context sensitive item.
This item then enforces extension of the vertical range up to 1.
This is a context sensitive item.
Using the short ID improves graph readability in case of rather long labels.
By default, the short ID is selected.
This is a context sensitive item.
By default, the labels are displayed, and start from over the top left corner of the boxes.
This is a context sensitive item.
A dialog box shows up, with an editing zone and a pushbutton for getting access to the Windows font and text color selection dialogs.
When returning from the dialog, the text is displayed.
The text can be moved, changed and deleted, all through mouse actions (see relevant section later in this document).
It should be noted that almost any text in the graph, either inserted by the program or by the user, may be handled that way; only exception: the scale markers, that are static.
An alternate and quicker way is to use the associated toolbar button or, even quicker, the Space key.
6. Help
See details in the following section.
6.2 Mouse and keyboard
Help on the mouse and keyboard actions can be displayed on-line at any time by selecting the relevant item in the Help popup menu.
One of the following windows is displayed, depending on whether the current graph is a biplot or not.
The way STAT reports errors depends on which main phase it is currently
in:
1. In the method execution phase, it reports the errors in the LOG file, according to the workbench requirements.
This file can be accessed via the listing icon, which in case of error reporting is red-crossed and directs to the LOG file instead of the listing file.
2. In the graphics phase, it can no longer report errors in the LOG file, since that LOG is an alternate to the listing file in case of errors during method execution, not a complement, so it is
"too late".
It then reports errors by way of message boxes.
J.L.Blanchard, H.Augendre – "Introduction de méthodes symboliques dans un logiciel statistique classique" - EDF-DER(1993).
A.Chouakria, P.Cazes, E.Diday – "Codage de variables intervalles en vue d’une analyse factorielle des correspondances multiples" - Journées ASU, Carcassonne(1997).
E.Diday, R.Emilion - "Treillis et capacités en analyse d’objets probabilistes"(1996).