...

Nessun titolo diapositiva - Dipartimento di Informatica

by user

on
Category: Documents
11

views

Report

Comments

Transcript

Nessun titolo diapositiva - Dipartimento di Informatica
DISSIMILARITIES AND MATCHING
BETWEEN SYMBOLIC OBJECTS
Prof. Donato Malerba
Department of Informatics,
University of Bari, Italy
[email protected]
ASSO School
Porto, Portugal
May 13-15, 2002
COMPUTING DISSIMILARITIES:
Fare clic per modificare
lo
WHY?
stile del titolo dello schema
• Several data analysis techniques are based on
quantifying
a dissimilarity
measure
Fare
clic per
modificare(or
glisimilarity)
stili del testo
between
multivariate data.
dello schema
• Clustering
Secondo
livello
• Discriminant analysis
Terzo livello
• Visualization-based
approaches
• Quarto livello
• Symbolic –objects
are
a
kind
of
multivariate
data.
Quinto livello
•
Ex.: [colour={red, black}][weight  {60,70,80}][height 
[]1.50,1.60]
• The dissimilarity measures presented here are among
those investigated in the SODAS Project.
2
BOOLEAN SYMBOLIC OBJECTS
Fare clic per modificare
lo
(BSO’S)
stile del titolo dello schema
A BSO is a conjunction of boolean elementary
events: clic per modificare gli stili del testo
Fare
[Ydello
 [Y2=A2]  ...  [Yp=Ap]
1=A1] schema
where
each variable
Secondo
livello Yi takes values in Yi and Ai is
a subset of Yi
Terzo livello
Let a and• Quarto
b be livello
two BSO’s:
a = [Y1=A1]– Quinto
 [Y2=A
livello
2]  ...  [Yp=Ap]
b = [Y1=B1]  [Y2=B2]  ...  [Yp=Bp]
where each variable Yj takes values in Yj and Aj
and Bj are subsets of Yj. We are interested to
compute the dissimilarity d(a,b).
7
Fare clic
per
modificare
lo
CONSTRAINED BSO’S
stile del titolo dello schema
Two types of dependencies between variables:
• Hierarchical
mother-daughter
Fare
clic perdependence
modificare (gli
stili del testo): A
variable
i may be inapplicable if another variable Yj takes
dello Yschema
its values in a subset Sj  Yj. This dependence is
Secondo livello
expressed as a rule:
Terzo livello
if [Yj = Sj] then [Yi = NA]
• Quarto livello
• Logical dependence
: This case occurs, if a subset
– Quinto livello
Sj  Yj of a variable Yj is related to a subset Si  Yi of a
variable Yi by a rule such as:
if [Yj = Sj] then [Yi = Si]
8
DISSIMILARITY AND SIMILARITY
Fare clic perMEASURES
modificare lo
stile del titolo dello schema
Dissimilarity Measure
* = d(a,a)  d(a,b)
Fare
gli stili
del<testo
d: EERclic
suchper
that dmodificare
= d(b,a)
 a,bE
a
dello schemaSimilarity Measure
s: EE
 R such that
s*a = s(a,a)  s(a,b) = s(b,a)  0  a,bE
Secondo
livello
Generally:
Terzo livello
 a  E: d*•a Quarto
= d* and
s*a= s* and specifically, d* = 0 while s*= 1
livello
– Quinto
livello can be transformed into
Dissimilarity
measures
similarity measures (and viceversa):
d=(s)
( s=-1(d) )
where:
(s) strictly decreasing function, and (1) = 0, (0) = 
9
DISSIMILARITY AND SIMILARITY
Fare clic
per
modificare
lo
MEASURES: PROPERTIES
stile del titolo dello schema
Some properties that a dissimilarity measure d on E may
satisfy
Fareare:
clic per modificare gli stili del testo
1. dello
d(a, b) = schema
0   c  E: d(a, c) = d(b, c)
(eveness)
2. d(a,
b) = 0  a =livello
b
Secondo
(definiteness)
3. d(a, b)
 d(a, c) livello
+ d(c, b)
Terzo
(triangle inequality)
• Quartoc),
livello
4. d(a, b)  max(d(a,
d(c, b))
– Quinto livello
(ultrametric inequality )
5. d(a, b) + d(c, d)  max(d(a, c) + d(b, d), d(a, d) +d(b, c)) (Buneman's inequality)
6. Let (E, +) be a group, then d(a, b) = d(a+c, b+c) (translation invariance )
 A dissimilarity function that satisfies proprieties 2 and 3 is called metric.
 A dissimilarity function that satisfies only property 3 is called pseudo
metric or semi- distance.
10
DISSIMILARITY MEASURES
Fare clic BETWEEN
per modificare
lo
BSO’S
stile del titolo dello schema
Author(s) (Year)  Notation from the SODAS Package
Fare clic per modificare gli stili del testo
• Gowda & Diday (1991)  U_1
dello
schema
• Ichino & Yaguchi (1994)  U_2, U_3, U_4
Secondo
livello SO_1, SO_2
• De
Carvalho (1994)
Terzo livello
• De Carvalho
(1996, 1998)  SO_3, SO_4, SO_5, C_1
• Quarto livello
– Quinto livello
U: only for unconstrained BSO’s
C: only for constrained BSO’s
SO: for both constrained and unconstrained BSO’s
11
GOWDA & DIDAY’S DISSIMILARITY
Fare clic perMEASURE
modificare lo
stile del titolo dello schema
Gowda & Diday’s dissimilarity measures for two BSO’s a and b:
p
Fare
clic
per
modificare
gli
stili
del
testo
U_1
D
(
A
,
B
)
D(a, b) = 
j
j
j 1
dello schema
If Yj is a continuous variable:
Secondo
livello
D(Aj, Bj) = D(Aj, Bj) + Ds(Aj, Bj) + Dc(Aj, Bj)
Terzo livello
while if Yj is a nominal variable:
• Quarto livello
D(Aj, Bj) = Ds(A
Dc(Aj, Bj)
j, Bj) +
– Quinto
livello
where the components are defined so that their values are
normalized between 0 and 1:
Aj
Bj
• D(Aj, Bj) due to position,
• Ds(Aj, Bj) due to span,
D
Ds
Dc 12
• Dc(Aj, Bj) due to content
GOWDA & DIDAY’S DISSIMILARITY
Fare clic perMEASURE
modificare lo
stile del titolo dello schema
Properties:
Fare
clic per modificare gli stili del testo
D(a,
b) schema
= 0  a = b (definiteness property),
dello
No proof is reported for the triangle inequality property
Secondo livello
Terzo livello
• Quarto livello
– Quinto livello
13
ICHINO & YAGUCHI’S
FareDISSIMILARITY
clic per modificare
lo
MEASURES
stile del titolo dello schema
Ichino & Yaguchi’s dissimilarity measures are based on the
Cartesian operators join  and meet .
Fare
clic per modificare gli stili del testo
For continuous variables:
Aj  Bj
dello schema
Aj
Bj
Aj  Bj
Secondo
Aj  Bj livello
Terzo
livello
Aj  Bj
while for
nominal
variables:
• Quarto livello
Aj  Bj = Aj  Bj
– Quinto livello
Aj  Bj = Aj  Bj
Given a pair of subsets (Aj, Bj) of Yj the componentwise
dissimilarity(Aj,Bj) is:
(Aj, Bj) =Aj  Bj Aj  Bj+ (2Aj  BjAj Bj)
where 0    0.5 and Ajis defined depending on variable14types.
ICHINO & YAGUCHI’S
FareDISSIMILARITY
clic per modificare
lo
MEASURES
stile del titolo dello schema
(Aj,Bj) are aggregated by an aggregation function such as
the generalised Minkowski’s distance of order q:
Fare
clic per modificare gli stili del testo
U_2
dello schema
d q ( a, b)  q
  ( A j , B j )
p
q
j 1
Secondo
livello on the chosen units of measurements
Drawback
: dependence
Terzo
livello of the componentwise dissimilarity:
Solution
: normalization
U_3
• Quarto livello
p
livello
d q (–a,Quinto
b)  q 
 ( Aj , B j ) q ,

j 1

 ( Aj , B j ) 
 ( Aj , B j )
Yj
The weighted formulation guarantees that dq(a,b)[0,1].
p
U_4
d q (a, b)  q  c j ( A j , B j )q
j 1
The above measures are metrics
15
DE CARVALHO’S DISSIMILARITY
Fare clic per
modificare
lo
MEASURES
stile del titolo dello schema
A straightforward extension of similarity measures for
classical clic
data matrices
with nominal
Fare
per modificare
glivariables.
stili del testo
dello schema
Agreement
Agreement
=(AjBj)
Secondo livello
Disagreement
=(c(Aj)Bj)
Terzo livello
Total
(Bj)
• Quarto livello
– Quinto livello
Disagreement
Total
=(Ajc(Bj))
(Aj)
=(c(Aj)c(Bj)) (c(Aj))
(c(Bj))
(Yi)
where (Vj) is either the cardinality of the set Vj (if Yj is a
nominal variable) or the length of the interval Vj (if Yj is a
continuous variable).
16
DE CARVALHO’S DISSIMILARITY
Fare clic per
modificare
lo
MEASURES
stile
del
titolo
dello
schema
Five different similarity measures s , i = 1, ..., 5,
i
defined:
Fare
clic per modificare
gli stili
del testo
si Comparison
Function Range
Property
s1 /(++)
dello
schema
[0,1]
s2 2/(2++)
[0,1]
Secondo
livello
s3 /(+2+2)
[0,1]
s4 Terzo
½[/(+)+/(+)]
[0,1]
livello
½
s3 /[(+)(+)]
[0,1]
• Quarto livello
are
metric
semi metric
metric
semi metric
semi metric
– Quinto livello
The corresponding dissimilarities are di = 1  si.
The di are aggregated by an aggregation function AF such
as the generalised Minkowski metric, thus obtaining:
SO_1
p
d ai (a, b)  q
 w j di ( A j , B j )
q
j 1
1 i  5
17
DE CARVALHO’S EXTENSION OF
ICHINO
YAGUCHI’S
DISSIMILARITY
Fare
clic&per
modificare
lo
MEASURE
stile del titolo dello schema
A different componentwise dissimilarity measure:
Fare clic per modificaregli
A j , stili
B j  del testo
 A j , B j  
dello schema
 A j  B j 
Secondo
livello
where
 is defined
as in Ichino & Yaguchi’s dissimilarity
Terzo livello
measure.
• Quartofunction
livello
The aggregation
AF suggested by De Carvalho is:
– Quinto livello
SO_2
q
p 1

d q (a, b)  q   ( A j , B j )

j 1 p
This measure is a metric.
18
THE DESCRIPTION-POTENTIAL
Fare clic per
modificare
lo
APPROACH
stile
del
titolo
dello
schema
All dissimilarity measures considered so far are defined
by two
functions: a comparison function (componentwise measure) and
Fare
clic per
modificare
gli stili del testo
an aggregation
function
.
A dello
differentschema
approach is based on the concept of description
potential (a) of a symbolic object a.
Secondo livello
Terzo livello
p
( a )    ( A j )
• Quarto livello
j 1
– Quinto livello
where (Vj) is either the cardinality of the set Vj (if Yj is a nominal
variable) or the length of the interval Vj (if Yj is a continuous
variable).
19
THE DESCRIPTION-POTENTIAL
Fare clic perAPPROACH
modificare lo
stile del titolo dello schema
SO_3
d1 (a, b)  (a  b)  (a  b)  [2(a  b)  (a)  (b)]
Fare clic per modificare gli stili del testo
dello schema
(a  b)  (a  b)  [2(a  b)  (a)  (b)]
SO_4

d 2 ( a, b) 
Secondo livello
(a E )
Terzo livello
• Quarto 
livello
(a  b)  (a  b)  [2(a  b)  (a)  (b)]
SO_5 d 2 (a, –b)Quinto

livello
(a  b)
The triangular inequality does not hold for SO_3 and SO_4,
which are equivalent. SO_5 is a metric.
20
DESCRIPTION POTENTIAL FOR
Fare clic
per
modificare
lo
CONSTRAINED BSO’S
stile del titolo dello schema
Given a BSO a and a logical dependence expressed by the
rule:
Fare clic per modificare gli stili del testo
if [Yj = Sj] then [Yi = Si]
dello schema
the incoherent restriction a’ of a is defined as:
Secondo livello
a’= [Y1=A1]  ...  [Yj-1=Aj-1]  [Yj=Aj Sj]  ...  [Yi-1=Ai-1]
Terzo livello
 [Yi=Ai (Yi\Si)]  ...  [Yp=Ap]
• Quarto livello
Then the description
of a is:
– Quinto potential
livello
p
(a )   ( A j )  (a)
j 1
A similar extension exists for hierarchical dependencies.
21
DISSIMILARITY MEASURES FOR
Fare clic
per
modificare
lo
CONSTRAINED BSO’S
stile
del
titolo
dello
schema
•The extended definition of description potential can be
applied to the computation of the distances SO_3, SO_4
Fare
clic per modificare gli stili del testo
and SO_5.
•De
Carvalho
proposed an extension of ’, so that SO_2
dello
schema
can also be applied to constrained BSO.
livello
•HeSecondo
also proposed
an extension of , , , and  in order
to takeTerzo
into account
livello of constraints. Therefore, SO_1 can
also be applied
tolivello
constrained BSO.
• Quarto
p
q
Finally, C_1 is– defined
as
follows:
Quinto livello
 di A j , B j 
where:
j 1
0
if
Y

NA


d
(
a
,
b
)


q
j
q
p
( j )  
 ( j )
1 otherwise
j 1
If all BSO’s are coherent, then the dissimilarity measures
22
do not change.
Fare clic per
modificare
lo
MATCHING
stile del titolo dello schema
• Matching is the process of comparing two or more
Fare
per modificare
gli stili
del testo
structuresclic
to discover
their similarities
or differences.
dello schema
• Similarity
judgements in the matching process
are
directional: They have a
Secondo livello
• referent, a, a prototype or the description of a class of
objectsTerzo livello
Quarto
livello of the prototype or an instance of a
• subject,• b
, a variant
– Quinto livello
class of objects.
• Matching two structures is a common problem to many
domains, like symbolic classification, pattern recognition,
data mining and expert systems.
29
Fare clicMATCHING
per modificare
lo
BSO’S
stile del titolo dello schema
•Generally, a BSO represents a class description and plays
the
role ofclic
the per
referent
in the matching
process.
Fare
modificare
gli stili
del testo
a:
[color = {black, white}]  [height =[170, 200]]
dello schema
describes a set of individuals either black or white, whose
Secondo
livello [170,200]. Such a set of individuals
height
is in the interval
is called
extension
of the BSO. The extension is a subset of
Terzo
livello
the universe
 of individuals
.
• Quarto
livello
– Quinto
livello
Given two BSO’s
a and
b, the matching operators define
whether b is the description of an individual in the extension
of a.
• In the SODAS software two matching operators for BSO’s
have been defined.
30
Fare
clic
per
modificare
lo
CANONICAL MATCHING OPERATOR
stile del titolo dello schema
• The result of the canonical matching operator is either 0
(false) or 1 (true).
Fare
clic
per
modificare
gli
stili
del
testo
• If E denotes the space of BSO’s described by a set of p
dello schema
variables
Yi taking values in the corresponding domains Yi,
thenSecondo
the matching
operator is a function:
livello
Match: E × E  {0, 1}
Terzo livello
such that for any two BSO’s a, b  E:
• Quarto livello
a –=Quinto
[Y1=Alivello
1]  [Y2=A2] 
...  [Yp=Ap]
b = [Y1=B1]  [Y2=B2]  ...  [Yp=Bp]
it happens that:
• Match(a,b) = 1
• Match(a,b) = 0
if BiAi for each i=1, 2, , p,
otherwise.
31
Fare
clic
per
modificare
lo
CANONICAL MATCHING OPERATOR
stile del titolo dello schema
Examples:
Fare clic per modificare gli stili del testo
District1
[profession={farmer, driver}]  [age=[24,34]]
dello =schema
Indiv1 = [profession=farmer]  [age=28]
Secondo livello
Indiv2 = [profession=salesman]  [age=[27,28]]
Terzo livello
• Quarto livello
Match(District1,Indiv1)
– Quinto livello= 1
Match(District1,Indiv2) = 0
32
Fare
clic
per
modificare
lo
CANONICAL MATCHING OPERATOR
stile del titolo dello schema
• The canonical matching function satisfies two out of three
Fare
modificare
propertiesclic
of aper
similarity
measure: gli stili del testo
dello
•  a,schema
b  E: Match(a, b)  0
•  a, b  E:livello
Match(a, a)  Match(a, b)
Secondo
while itTerzo
does not
satisfy the commutativity or simmetry
livello
property: • Quarto livello
– Quinto
livello
•  a, b 
E: Match(a,
b) = Match(b, a)
because of the different role played by a and b.
33
Fare
clic
per
modificare
lo
FLEXIBLE MATCHING OPERATOR
stile del titolo dello schema
• The requirement BiAi for each i=1, 2, , p, might be too
strict for real-world problems, because of the presence of
Fare
clicdescription
per modificare
gli stiliof del
testo
noise in the
of the individuals
the universe.
dello schema
• Example:
District1
= [profession={farmer,
driver}]  [age=[24,34]]
Secondo
livello
Indiv3livello
= [profession=farmer]  [age=23]
Terzo
=0
• QuartoMatch(District1,Indiv3)
livello
• It is necessary
to rely
on a flexible definition of matching
– Quinto
livello
operator, which returns a number in [0,1] corresponding to
the degree of match between two BSO’s, that is
flexible-matching: E × E  [0,1]
34
Fare
clic
per
modificare
lo
FLEXIBLE MATCHING OPERATOR
stile del titolo dello schema
For any two BSO’s a and b,
i) flexible-matching(a,b)=1 if Match(a,b)=true,
Fare
clic per modificare
gli
stili del testo
ii) flexible-matching(a,b)
[0,1)
otherwise.
dello
The
resultschema
of the flexible matching can be interpreted as the
probability of a matching b provided that a change is made in b.
Secondo livello
Let Ea = {b' E | Match(a,b')=1} and P(b | b') be the
Terzo
livello of observing b given that the original
conditional
probability
Quarto
observation• was
b'.livello
Then
– Quinto livello
flexible - matching( a, b) =
max P(b | b' )
def b' Ea
that is flexible-matching(a,b) equals the maximum conditional
35
probability over the space of BSO’s canonically matched by a.
FLEXIBLE MATCHING: AN
Fare clic per
modificare
lo
APPLICATION
stile del titolo dello schema
• Credit card applications (Quinlan)
Fare
clic
per
modificare
gli
stili
del
testo
• Fifteen variables whose names and values have
dello
schema to meaningless symbols to
been changed
protect
the confidentiality
of the data.
Secondo
livello
+
Terzo livello
• Quarto livello
• class variable:
positive in case of approval of
– Quinto negative
livello
credit facilities,
otherwise.
• Training set: 490 cases
• 6 rules generated by Quinlan’s system C4.5
36
FLEXIBLE MATCHING: AN
Fare clic per
modificare
lo
APPLICATION
stile del titolo dello schema
Rule
Class
Conditions
41
[Y3 > 1.54]  [ Y9 = f ]  [ Y4  {u, y}] 
Fare
clic
per
modificare gli stili del testo
 [Y6{c,d, cc, i, j, k, m, r, q, w, e, aa, ff}]
43
[ Y4  {u, y}]  [ Y8 <= 1.71 ] [ Y9 = f ]
dello
schema
[ Y3 <= 0.835]  [ Y6  {c,d,i,k,m,q,w,e,aa }] 
Secondo livello
[ Y7  {v,bb}]  [Y14 > 102]  [Y15 <= 500]
30 Terzo
+ livello
[ Y9 = t ]
34
[Y3livello
<= 0.125 ]  [Y14 > 221 ]
•+Quarto
46
+ – Quinto
[Y4  {l}
]
livello
6
-
• Such rules can be easily represented by means
of Boolean symbolic objects.
• Both matching operators can be considered in
order to test the validity of the induced rules.37
Fare clic per
modificare
lo
REFERENCES
stile
del
titolo
dello
•
Esposito
F., Malerba
D., V. Tamma,
H.-H.schema
Bock. Classical
resemblance measures. Chapter 8.1
•Fare
Esposito
F., Malerba
D., V. Tamma. Dissimilarity
measures
for
clic
per
modificare
gli
stili
del
testo
symbolic objects. Chapter 8.3
schema
• dello
Esposito
F., Malerba D., F.A. Lisi. Matching symbolic objects.
Chapter 8.4
Secondo
livello
in H.-H.
Bock, E. Diday
(eds.): Analysis of Symbolic Data. Exploratory
methods
for extracting
Terzo
livello statistical information from complex data.
•
•
Springer Verlag, Heidelberg, 2000.
• Quarto livello
D. Malerba, L. Sanarico, & V. Tamma (2000). A comparison of
Quinto livello
dissimilarity–measures
for Boolean symbolic data. In P. Brito, J.
Costa, & D. Malerba (Eds.), Proc. of the ECML 2000 Workshop on
“Dealing with Structured Data in Machine Learning and Statistics”,
Barcelona.
D. Malerba, F. Esposito, V. Gioviale, & V. Tamma. Comparing
Dissimilarity Measures in Symbolic Data Analysis. Pre-Proceedings
of EKT-NTTS, vol. 1, pp. 473-481.
38
Fare clic per
modificare
lo
METHOD
DI
stile
del titolo
dello schema
• Both dissimilarity
measures
and matching operators
between
BSO’s
available in
Fare
clic
perare
modificare
the method DI (Distance
dello
schema
matrix) of the SODAS
software.
Secondo livello
• Input:
Sodaslivello
file of BSO’s
Terzo
• Output for
dissimilarities:
• Quarto
livello
Report + Sodas
filelivello
with
– Quinto
distance matrix
• Output for matching operators:
Report
• Developer: Dipartimento di
Informatica, University of Bari,
Italy.
gli stili del testo
DI
method
Report
39
file
TWO
DIAGRAMS
Fare
clicUSE
perCASE
modificare
lo
stile del titolo dello schema
Create a SODAS chaining
with the DI method
Fare clic Create
pera SODAS
modificare gli stili del testo
chaining with the DI
dello schema
Set up parameters of
method
the DI method
Secondo livello
up parameters
TerzoSet
livello
the DI method
of
• Quarto livello
– Quinto livello
User
User
Run the DI method and generate a
new SODAS file with a
dissimilarity matrix
Create a new chaining with the
new SODAS file
Run the DI method
and generate a report
file
View report file
40
Fare clic
per
modificare
lo
PARAMETER SETUP
stile del titolo dello schema
• The user can select a subset of variables Yi on which the
dissimilarity
measure
or the matching
operator
Fare
clic per
modificare
gli stili del
testohas to
computed .
dello schema
Secondo livello
Terzo livello
• Quarto livello
– Quinto livello
41
Fare clic
per
modificare
lo
PARAMETER SETUP
stile del titolo dello schema
• The user can select a number of parameters.
Fare clic per modificare gli stili del testo
dello schema Dissimilarity measure
Secondo livelloor matching operator
Terzo livello
Name •ofQuarto livello
the new – Quinto livello
SODAS
file
42
Dissimilarity measure Parameters
Constraints
Default
U_1 (Gowda & Diday) none
U_2 (Ichino & Yaguchi) Gamma
[0 .. 0.5]
0.5
Order of power
1 .. 10
2
U_3 (Normalized Ichino Gamma
[0 .. 0.5]
0.5
& Yaguchi)
Order of power
1 .. 10
2
U_4 (Weighted
Gamma
[0 .. 0.5]
0.5
Normalized Ichino &
Order of power
1 .. 10
2
Yaguchi)
List of weights, one per var. Sum(weights) = 1.0 Equal weights
C_1 (Normalized De
Comparison function
D 1, D 2, D 3, D 4, D 5 D 1
Carvalho)
Order of power
1 .. 10
2
SO_1 (De Carvalho)
Comparison function
D 1, D 2, D 3, D 4, D 5 D 1
Order of power
1 .. 10
2
List of weights, one per var. Sum(weights) = 1.0 Equal weights
SO_2 (De Carvalho)
Gamma
[0 .. 0.5]
0.5
Order of power
1 .. 10
2
SO_3 (De Carvalho)
Gamma
[0 .. 0.5]
0.5
Order of power
1 .. 10
2
SO_4 (Normalized De
Gamma
[0 .. 0.5]
0.5
Carvalho)
Order of power
1 .. 10
2
SO_5 (Normalized De
Gamma
[0 .. 0.5]
0.5
Carvalho)
Order of power
1 .. 10
2
OUTPUT
SODAS FILE
Fare clic
per modificare
lo
stile
delSODAS
titolo
delloboth
schema
• The output
file contains
the same input data
and an additional dissimilarity matrix. The dissimilarity
between
theper
i-th modificare
and the j-th BSO
written
the cell
Fare
clic
gliisstili
delintesto
(entry) (i, j) of the matrix.
dello schema
• Only the lower part of the dissimilarity matrix is reported
inSecondo
the file, since
dissimilarities are symmetric.
livello
Terzo livello
TRIANGE_MATRIX
= (
• Quarto livello
(0) ,
– Quinto livello
(0.20531, 0) ,
(12.8626, 15.0793, 0) ,
(14.0338, 15.0403, 8.64626, 0) ,
(14.1655, 15.1651, 11.4512, 6.90074, 0) ,
44
…)
OUTPUT REPORT FILE
The report file is organized as follows:
Page 1
SODAS
07/26/01
Sodas The Statistical Package for Symbolic Data Analysis
Version 1.0
-
05 January 2001
**************D I S T A N C E
Data Information:
M E A S U R E S***********
OUTPUT REPORT FILE
Input Sodas File: C:\ASSO_L~1\SPERIM~3\NUOVAS~1\ABALO.SDS
28
8
Boolean Symbolic Objects (BSOs) read.
Variables selected for each BSO: 1 -- 8
Selected Distance Function: U_1
Gowda & Diday
Distance Matrix
BSO
1
2
3
4
1
0
2
0.2053
0
3
12.86
15.08
0
4
14.03
15.04
8.646
0
5
14.17
15.17
11.45
6.901
-----------------------------------------------------------------------Page 2
SODAS
07/26/01
Fare clic
per
modificare
lo
PARAMETER SETUP
stile del titolo dello schema
• Only one parameter has to be specified for matching
BSO’s. clic per modificare gli stili del testo
Fare
• dello
The Class
BSO represents the BSO that will be used as
schema
referent (default: 2nd BSO).
Secondo
• In
the case oflivello
canonical matching the subject can be
any Terzo
BSO. livello
• In the case
of livello
flexible matching the subject must be a
• Quarto
BSO representing
individual.
– Quinto an
livello
Matching operator
Parameters
CM (Canonical Matching) Class BSO
FM (Flexible Matching)
Class BSO
Constraints
1 .. no BSO
1 .. no BSO
Default
2
2
47
OUTPUT REPORT FILE
The report file is organized as follows:
Page 1
SODAS
12/31/99
Sodas The Statistical Package for Symbolic Data Analysis
*********** C A N O N I C A L
M A T C H I N G **********
Data Information
29
Boolean Symbolic Objects (BSOs) read.
1 : Boolean Symbolic Object (BSO) selected as class.
Matching Vector
BSO
1
1
Match
2
No Match
...
29
No Match
This procedure was completed at 17:57:33
FURTHER DEVELOPMENTS IN THE ASSO
Fare clic perPROJECT
modificare lo
stile del titolo dello schema
Two modules, DISS and MATCH. Both modules will be
upgraded to work with both BSO’s and Problabilistic
Fare
clic
per
modificare
gli
stili
del
testo
Symbolic Objects (PSO’s).
dello schema
Formally, a PSO, which is made up of M probabilistic
Secondo livello
elementary events (PEE) ai , is defined as:
Terzo
livello M
M
a=
 
a

a

[
Y

c
• Quarto
livello
i =
i
ij
i 1
i 1
– Quinto livello
pija

j 1... ki
]
Each PEE ai is attached to an observed quantitative
variables Yi which takes ki values cija with probability pija
(which sum up to 1).
49
Fly UP