...

I 1

by user

on
Category:

engineering

127

views

Report

Comments

Description

Transcript

I 1
Francesca Bugiotti
Università Roma Tre
17/12/2009
1
Model management
 What is
– A systematic approach to metadata management, which
handles schemas by means of a set of predefined
operators.
 Its goals
– Enhance the productivity of software developers, by
offering them techniques that allow for high-level
specifications and abstraction over recurring tasks
involving the manipulation of schemas.
17/12/2009
Università Roma Tre
2
Model management
 Model management systems
– Handle schemas and mappings and support a wide
variety of operations on them.
 MIDST
– We propose MIDST[1,2,3], a platform originally conceived
for model-independent schema and data translation, as
the basis to build a model management system.
– The so built model management system aims at being
model-independent and model-aware.
17/12/2009
Università Roma Tre
3
What model management addresses
 Concrete needs: they are a formalization of concrete
and frequent database maintenance problems
–
–
–
–
data integration over heterogeneous databases
data exchange between independent databases
ETL
wrapper generation for the access to relational
databases from object-oriented applications
– web site generation from databases.
17/12/2009
Università Roma Tre
4
What model management addresses
 Model management solutions to formalized problems:
- schema integration
- schema evolution
-
forward engineering
round-trip engineering
-…
17/12/2009
Università Roma Tre
5
Schema integration
S3
S1
S2
map23
S1
17/12/2009
Università Roma Tre
map12
S2
6
Forward engineering
V2
V1
map1
S1
17/12/2009
map2
S2
S2
Università Roma Tre
7
Round-trip engineering
S2
S1
map1
I1
17/12/2009
map2
I2
I2
Università Roma Tre
8
Model management problems solution
 Solutions to model management problems are
given in terms of scripts.
 A script is a set of model management operators
which are executed according to a specific control
flow.
17/12/2009
Università Roma Tre
9
Operators
 The operators involved in the script specifications are:
17/12/2009
Match
Diff
Merge
Compose
Modelgen
Copy
…
Università Roma Tre
10
Match
 Given two schemas S1 and S2, we define
map12 = MATCH(S1,S2)
where MATCH is the operator identifying
correspondences between the two schemas and hence
yielding a possible mapping.
 There are several algorithms implementing MATCH
operators.
17/12/2009
Università Roma Tre
11
Match
S2
S1
A
B
A
B
C
D
E
B
A
Match(S1,S2) = ?
17/12/2009
Università Roma Tre
12
Match
S2
S1
A
A
B
C
D
E
B
A
B
Match(S1,S2) = map12
S2
S1
A
B
17/12/2009
A
B
C
D
E
Università Roma Tre
B
A
13
Diff
 Given two schemas S and S1 the difference
diff(S, S1)
is a schema S2 that contains all the schema elements of
S that do not appear in S1.
 It can be interpreted as a set-oriented difference.
17/12/2009
Università Roma Tre
14
Example
S1
S
A
B
A
B
C
D
E
B
A
Diff(S,S1) = ?
17/12/2009
Università Roma Tre
15
Example
S1
S
A
B
A
B
C
D
E
B
A
Diff(S,S1) = S2
S2
A
17/12/2009
A
Università Roma Tre
B
C
D
E
16
Merge
 Given S and S1, their merge
merge(S, S1)
is a schema S2 that contains the schema elements that
appear in at least one of S or S1, modulo equivalence.
 It can be interpreted as a set-oriented union.
17/12/2009
Università Roma Tre
17
Example
S1
S
A
B
A
B
C
D
E
F
A
Merge(S,S1) = ?
17/12/2009
Università Roma Tre
18
Example
S2
S1
A
B
A
B
C
D
E
F
A
Merge(S1,S2) = S3
S3
A
B
F
17/12/2009
A
Università Roma Tre
B
C
D
E
19
Compose
 Given three schemas: S1, S2, S3 and two mappings,
map12 between S1 and S2 and map23 between S2 and
S3, we define map13 as the composition of map12 and
map23 as the mapping between S1 and S3.
Compose(S1, S2,S3, map12, map23) = map13
17/12/2009
Università Roma Tre
20
Modelgen
 Given a schema S of a source model M and a target
model M 1 , the translation
modelgen(S, M 1 )
is a schema S1 of M1 that corresponds to S .
17/12/2009
Università Roma Tre
21
Modelgen
M = ER Model
S
M1 = Relational Model
Modelgen(S,M1) = ?
17/12/2009
Università Roma Tre
22
Example
S1
S
Modelgen(S,M1) = S1
17/12/2009
Università Roma Tre
23
Operators
 A major goal is to provide model-independent
operators, which guarantee some kind of model closure
property.
 Here we move from a simplified version of Bernstein’s
solving procedure for the round-trip engineering problem
[4], in order to introduce the needed operators and explain
how they are implemented in a model-independent
fashion.
17/12/2009
Università Roma Tre
24
Round-trip engineering
 One of the most meaningful model management problems.
 Let us take it as an example to illustrate our approach to model
management problems.
S1
I1
17/12/2009
S2
I2
S1: specification schema
I1: an implementation schema obtained
from S1
I2: a modified version of the
implementation I2
S2: a new specification which corresponds
to I2.
Università Roma Tre
25
Round-trip engineering
S1
PCode
Title
Project
(0,N)
(1,1)
Manager
SSN
Name
EID
S1is the specification schema which is
translated into its corresponding
implementation schema I1.
It is a common example where the
specification is expressed in ER and
the implementation is relational.
I1
Project (PCode, Title, MGRSSN*)
Manager (SSN, EID, Name)
17/12/2009
The translation might be performed
using MIDST itself, since it was
conceived as an implementation of
the MODELGEN operator.
Università Roma Tre
26
Round-trip engineering
I1
Project (PCode, Title, MGRSSN*)
Manager (SSN, EID, Name)
I2
Project (PCode, Title, MGRID*)
Manager (SSN, EID, Name, Degree)
17/12/2009
I2is the implementation
schema which is a modified
version of I1.
The transformation involves a
change in the key of a
referred relation. The key of
Manager, which is referred
by MGRSSN of Project in I1,
becomes EID in I2.
As a consequence, the column
MGRSSN of Project,
referencing SSN ofManager,
has to reference EID. MGRID
is the version of MGRSSN
modified accordingly.
Università Roma Tre
27
Round-trip engineering
I2
Project (PCode, Title, MGRID*)
Manager (SSN, EID, Name, Degree)
S2
17/12/2009
Our goal is to generate S2,
the appropriately revised version of
the specification schema, such that
its corresponding implementation is
I2.
Università Roma Tre
28
Operators in scripts
 The solution which has been provided for the round-trip engineering is
based on a set of model management operators: DIFF, MERGE and
MODELGEN.
 DIFF and MERGE have been used to compute the difference and the
union of schemas.
 MODELGEN has been used as a solution to translate the specification
schema into the implementation and to compute the reversed
differences.
17/12/2009
Università Roma Tre
29
The Round-trip solving script
17/12/2009
Università Roma Tre
30
Midst and Modelgen
 The platform MIDST was originally conceived as a framework to
perform model-independent schema and data translations.
 MIDST was designed as a model-generic implementation of
MODELGEN.
17/12/2009
Università Roma Tre
31
Translations
Entity
Relationship
WSM
XSD
Object
Oriented
Object
Relational
XSD
Object
Relational
Relational
17/12/2009
Università Roma Tre
32
Translations
Entity
Relationship
WSM
XSD
Object
Oriented
Object
Relational
XSD
Object
Relational
Relational
17/12/2009
Università Roma Tre
33
The metamodel approach
 The constructs in the various model are rather similar:
– Can be classified into a few categories
(“metaconstructs”)
IE: the entity of the ER, the Object of the OO can be
reconduct to the same abstract concept, the “Abstract”
of our supermodel.
17/12/2009
Università Roma Tre
34
The supermodel
 A model that includes all the meta-constructs (in their
most general forms)
– Each model is subsumed by the supermodel (modulo
construct renaming)
– Each schema for any model is also schema for the
supermodel (modulo construct renaming).
17/12/2009
Università Roma Tre
35
Translations specification
 Translations can be defined on metaconstructs


And there are standard accepted ways to deal with translation
of metaconstructs
They can be performed within the supermodel
 Each translation from the supermodel SM to a target model
M is also a translation from any other model to M.
17/12/2009
Università Roma Tre
36
Translation specification
 The Datalog is used to specify the translation
A translation script in our tool is a set of datalog rules.
17/12/2009
Università Roma Tre
37
Datalog
 Declarative language
 We specify the condition for the insertion
 For every set of construct that matchs the conditions
in B we create a new construct A
A <- B
17/12/2009
Università Roma Tre
38
Datalog rule example
 We generate a new Abstract for each Aggregation
Abstract(
OID: SK1(oid),
Name: name )

Aggregation(
OID: oid,
Name: name );
17/12/2009
Università Roma Tre
39
Another rule
We copy only Lexical of Aggregation
Lexical (
OID: SK1(oid),
aggregationOID: SK2(aggOID),
Name:name,
isIdentifier:isId,
isNullable:isN,
isOptional:isO,
type:t)
<Lexical (
OID: oid,
aggregationOID: aggOID,
Name:name,
isIdentifier:isId,
isNullable:isN,
isOptional:isO,
type:t),
Aggregation(
OID:aggOID);
17/12/2009
Università Roma Tre
40
Approach
 It is possible to apply the same approach to other
model management operators?
 How can we define other operators with respect to our
supermodel?
17/12/2009
Università Roma Tre
41
Construct characteristics
 Every costruct has:
–
–
–
–
17/12/2009
An identification OID
A name
A set of properties
A set of references
Università Roma Tre
42
Construct characteristics
 Every costruct has:
–
–
–
–
17/12/2009
An identification OID
A name
A set of properties
A set of references
Università Roma Tre
SM_Lexical (
OID: SK1 oid,
aggregationOID: aggOID,
Name:name,
isIdentifier:isId,
isNullable:isN,
isOptional:isO,
type:t
)
43
Construct equivalence
 Two constructs are equivalent if they have:
– The same name
– The same set of properties
– And refer to equivalent costructs
17/12/2009
Università Roma Tre
44
Comparison
 There is a recursive definition of equivalence.
 We can order the construct and start the matching
from the constructs without references.
17/12/2009
Università Roma Tre
45
Construct characteristics
 Those can be found also
in the rules
–
–
–
–
17/12/2009
An identification OID
A name
A set of properties
A set of references
SM_Lexical (
OID: SK1(oid),
aggregationOID: SK2(aggOID),
Name:name,
isIdentifier:isId,
isNullable:isN,
isOptional:isO,
type:t
)
<SM_Lexical (
OID: oid,
aggregationOID: aggOID,
Name:name,
isIdentifier:isId,
isNullable:isN,
isOptional:isO,
),
SM_Aggregation(
OID:aggOID
);
Università Roma Tre
46
Example
 An equivalence comparison may work as follows:
 1.comparison of the aggregations or abstracts without
any references;
 2. comparison of constructs which may refer to them
17/12/2009
Università Roma Tre
47
Model management operators by examples
An Example of a possible implementation of model
management operators follow.
The adopted language is Datalog.
The tool is MIDST.
17/12/2009
Università Roma Tre
48
Datalog implementation of equivalence
.
 Fundamental functional block to compare two constructs:
EQUIV_Aggregation [DEST] (
OID1: oid1,
OID2: oid2)
<SM_Aggregation [SOURCE_1] (
OID: oid1,
Name: name),
SM_Aggregation[SOURCE_2] (
OID: oid2,
Name: name );
17/12/2009
Università Roma Tre
49
Datalog implementation of difference merge
 Fundamental functional block used to implement a SELECTIVE COPY.
SM_Aggregation(
OID: SK(oid),
Name: name )
<SM_Aggregation (
OID: oid,
Name: name ),
!EQUIV_Aggregation (
OID1: oid );
 Used both in difference and in merge.
17/12/2009
Università Roma Tre
50
Automatic generation
 These operators can be automatically generated by the
MIDST application framework.
 The construct of the supermodel are used to generate
the rules used for the matching.
 The order of the application is important.
17/12/2009
Università Roma Tre
51
Example
S1
PCode
Title
Project
(0,N)
(1,1)
Manager
SSN
Name
EID
I1
Project (PCode, Title, MGRSSN*)
Manager (SSN, EID, Name)
17/12/2009
Università Roma Tre
52
Example
I1
Project (PCode, Title, MGRSSN*)
Manager (SSN, EID, Name)
I2
Project (PCode, Title, MGRID*)
Manager (SSN, EID, Name, Degree)
17/12/2009
Università Roma Tre
53
Example
Step 1 : difference between the implementation schemas.
I1
Project (PCode, Title, MGRSSN*)
Manager (SSN, EID, Name)
1: DIFF(I1,I2)
G2’Project (MGRSSN*)
Manager (SSN, EID)
I2
Project (PCode, Title, MGRID*)
Manager (SSN, EID, Name, Degree)
17/12/2009
Università Roma Tre
54
The Round-trip solving script
17/12/2009
Università Roma Tre
55
Example
Step 2 : difference between the implementation schemas.
I1
Project (PCode, Title, MGRSSN*)
Manager (SSN, EID, Name)
G 2’ +
2: DIFF(I2,I1)
Project (MGRID*)
Manager (SSN, EID, Degree)
I2
Project (PCode, Title, MGRID*)
Manager (SSN, EID, Name, Degree)
17/12/2009
Università Roma Tre
56
The Round-trip solving script
17/12/2009
Università Roma Tre
57
Example
Step 3-4 : inversion of the two semidifferences.
G2’+
G2’-
Projectstub (MGRID*)
Projectstub (MGRSSN*)
Managerstub (SSN, EID, Degree)
Managerstub (SSN, EID)
3: REVERSE
4: REVERSE
S3’+
S3’Projectstub
(1,1)
(1,1)
(0,N)
(0,N)
Managerstub
17/12/2009
Projectstub
SSN
EID
Degree
Università Roma Tre
Managerstub
SSN
EID
58
The Round-trip solving script
17/12/2009
Università Roma Tre
59
Example
Step 5 : merge of the initial specification schema with the inverted
positive semidifference.
H
S3’+
S1
Project
PCode
Title
Projectstub
Projectstub
(1,1)
(1,1)
(1,1)
(0,N)
(0,N)
(0,N)
Manager
SSN
EID
Name
Managerstub
SSN
EID
EID
SSN
Degree
Managerstub
PCode
Title
Name
SSN
EID
Degee
5: MERGE
17/12/2009
Università Roma Tre
60
The Round-trip solving script
17/12/2009
Università Roma Tre
61
Example
Step 7 : difference between H and the inverted negative semidifference.
H
Projectstub
EID
SSN
S2
S3’PCode
Title
Project
Projectstub
(1,1)
(1,1)
(1,1)
(0,N)
(0,N)
(0,N)
Managerstub
Name
SSN
EID
Degee
Managerstub
SSN
EID
Manager
PCode
Title
SSN
EID
Name
Degree
6: DIFF
17/12/2009
Università Roma Tre
62
The Round-trip solving script
17/12/2009
Università Roma Tre
63
Demo
17/12/2009
Università Roma Tre
64/49
Properties
 Model independence
– MIDST handles schemas as instances of subsets of the
available metaconstructs.
– The operators are defined as datalog rules declaring
transformations in terms of the supermodel
metaconstructs.
– The operators are defined in such a way that they are
valid for any model by specifying comparisons between
every available construct.
17/12/2009
Università Roma Tre
65
Properties
 Model closure
– A model management operator (except MODELGEN)
applied to a set of input schemas of a model M yields
output schemas of the same model M.
 Model awareness
– Operators can be defined in such a way that they do not
add metaconstructs which are not present in the source
schemas (model awareness).
17/12/2009
Università Roma Tre
66
References
 [1] P. Atzeni, P. Cappellari and P.A. Bernstein. Modelgen: Model-
independent schema translation. In ICDE Conference, pages 1111-1112,
2005.
 [2] P. Atzeni, P. Cappellari and G. Gianforme. MIDST: model-
independent schema and data translation. In SIGMOD, pages 1134-1136,
ACM, 2007.
 [3] P. Atzeni, P. Cappellari, R. Torlone, P.A. Bernstein and G.
Gianforme. Model-independent schema translation. In VLDB Journal.
 [4] P.A. Bernstein. Applying model management to classical meta data
problems. In CIDR, pages 209-220, 2003.
17/12/2009
Università Roma Tre
67
Summary
 Model management
 Operators
 Model generic operators
 Operators in MIDST
 Example
17/12/2009
Università Roma Tre
68
Fly UP