Comments
Description
Transcript
I 1
Francesca Bugiotti Università Roma Tre 17/12/2009 1 Model management What is – A systematic approach to metadata management, which handles schemas by means of a set of predefined operators. Its goals – Enhance the productivity of software developers, by offering them techniques that allow for high-level specifications and abstraction over recurring tasks involving the manipulation of schemas. 17/12/2009 Università Roma Tre 2 Model management Model management systems – Handle schemas and mappings and support a wide variety of operations on them. MIDST – We propose MIDST[1,2,3], a platform originally conceived for model-independent schema and data translation, as the basis to build a model management system. – The so built model management system aims at being model-independent and model-aware. 17/12/2009 Università Roma Tre 3 What model management addresses Concrete needs: they are a formalization of concrete and frequent database maintenance problems – – – – data integration over heterogeneous databases data exchange between independent databases ETL wrapper generation for the access to relational databases from object-oriented applications – web site generation from databases. 17/12/2009 Università Roma Tre 4 What model management addresses Model management solutions to formalized problems: - schema integration - schema evolution - forward engineering round-trip engineering -… 17/12/2009 Università Roma Tre 5 Schema integration S3 S1 S2 map23 S1 17/12/2009 Università Roma Tre map12 S2 6 Forward engineering V2 V1 map1 S1 17/12/2009 map2 S2 S2 Università Roma Tre 7 Round-trip engineering S2 S1 map1 I1 17/12/2009 map2 I2 I2 Università Roma Tre 8 Model management problems solution Solutions to model management problems are given in terms of scripts. A script is a set of model management operators which are executed according to a specific control flow. 17/12/2009 Università Roma Tre 9 Operators The operators involved in the script specifications are: 17/12/2009 Match Diff Merge Compose Modelgen Copy … Università Roma Tre 10 Match Given two schemas S1 and S2, we define map12 = MATCH(S1,S2) where MATCH is the operator identifying correspondences between the two schemas and hence yielding a possible mapping. There are several algorithms implementing MATCH operators. 17/12/2009 Università Roma Tre 11 Match S2 S1 A B A B C D E B A Match(S1,S2) = ? 17/12/2009 Università Roma Tre 12 Match S2 S1 A A B C D E B A B Match(S1,S2) = map12 S2 S1 A B 17/12/2009 A B C D E Università Roma Tre B A 13 Diff Given two schemas S and S1 the difference diff(S, S1) is a schema S2 that contains all the schema elements of S that do not appear in S1. It can be interpreted as a set-oriented difference. 17/12/2009 Università Roma Tre 14 Example S1 S A B A B C D E B A Diff(S,S1) = ? 17/12/2009 Università Roma Tre 15 Example S1 S A B A B C D E B A Diff(S,S1) = S2 S2 A 17/12/2009 A Università Roma Tre B C D E 16 Merge Given S and S1, their merge merge(S, S1) is a schema S2 that contains the schema elements that appear in at least one of S or S1, modulo equivalence. It can be interpreted as a set-oriented union. 17/12/2009 Università Roma Tre 17 Example S1 S A B A B C D E F A Merge(S,S1) = ? 17/12/2009 Università Roma Tre 18 Example S2 S1 A B A B C D E F A Merge(S1,S2) = S3 S3 A B F 17/12/2009 A Università Roma Tre B C D E 19 Compose Given three schemas: S1, S2, S3 and two mappings, map12 between S1 and S2 and map23 between S2 and S3, we define map13 as the composition of map12 and map23 as the mapping between S1 and S3. Compose(S1, S2,S3, map12, map23) = map13 17/12/2009 Università Roma Tre 20 Modelgen Given a schema S of a source model M and a target model M 1 , the translation modelgen(S, M 1 ) is a schema S1 of M1 that corresponds to S . 17/12/2009 Università Roma Tre 21 Modelgen M = ER Model S M1 = Relational Model Modelgen(S,M1) = ? 17/12/2009 Università Roma Tre 22 Example S1 S Modelgen(S,M1) = S1 17/12/2009 Università Roma Tre 23 Operators A major goal is to provide model-independent operators, which guarantee some kind of model closure property. Here we move from a simplified version of Bernstein’s solving procedure for the round-trip engineering problem [4], in order to introduce the needed operators and explain how they are implemented in a model-independent fashion. 17/12/2009 Università Roma Tre 24 Round-trip engineering One of the most meaningful model management problems. Let us take it as an example to illustrate our approach to model management problems. S1 I1 17/12/2009 S2 I2 S1: specification schema I1: an implementation schema obtained from S1 I2: a modified version of the implementation I2 S2: a new specification which corresponds to I2. Università Roma Tre 25 Round-trip engineering S1 PCode Title Project (0,N) (1,1) Manager SSN Name EID S1is the specification schema which is translated into its corresponding implementation schema I1. It is a common example where the specification is expressed in ER and the implementation is relational. I1 Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) 17/12/2009 The translation might be performed using MIDST itself, since it was conceived as an implementation of the MODELGEN operator. Università Roma Tre 26 Round-trip engineering I1 Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) I2 Project (PCode, Title, MGRID*) Manager (SSN, EID, Name, Degree) 17/12/2009 I2is the implementation schema which is a modified version of I1. The transformation involves a change in the key of a referred relation. The key of Manager, which is referred by MGRSSN of Project in I1, becomes EID in I2. As a consequence, the column MGRSSN of Project, referencing SSN ofManager, has to reference EID. MGRID is the version of MGRSSN modified accordingly. Università Roma Tre 27 Round-trip engineering I2 Project (PCode, Title, MGRID*) Manager (SSN, EID, Name, Degree) S2 17/12/2009 Our goal is to generate S2, the appropriately revised version of the specification schema, such that its corresponding implementation is I2. Università Roma Tre 28 Operators in scripts The solution which has been provided for the round-trip engineering is based on a set of model management operators: DIFF, MERGE and MODELGEN. DIFF and MERGE have been used to compute the difference and the union of schemas. MODELGEN has been used as a solution to translate the specification schema into the implementation and to compute the reversed differences. 17/12/2009 Università Roma Tre 29 The Round-trip solving script 17/12/2009 Università Roma Tre 30 Midst and Modelgen The platform MIDST was originally conceived as a framework to perform model-independent schema and data translations. MIDST was designed as a model-generic implementation of MODELGEN. 17/12/2009 Università Roma Tre 31 Translations Entity Relationship WSM XSD Object Oriented Object Relational XSD Object Relational Relational 17/12/2009 Università Roma Tre 32 Translations Entity Relationship WSM XSD Object Oriented Object Relational XSD Object Relational Relational 17/12/2009 Università Roma Tre 33 The metamodel approach The constructs in the various model are rather similar: – Can be classified into a few categories (“metaconstructs”) IE: the entity of the ER, the Object of the OO can be reconduct to the same abstract concept, the “Abstract” of our supermodel. 17/12/2009 Università Roma Tre 34 The supermodel A model that includes all the meta-constructs (in their most general forms) – Each model is subsumed by the supermodel (modulo construct renaming) – Each schema for any model is also schema for the supermodel (modulo construct renaming). 17/12/2009 Università Roma Tre 35 Translations specification Translations can be defined on metaconstructs And there are standard accepted ways to deal with translation of metaconstructs They can be performed within the supermodel Each translation from the supermodel SM to a target model M is also a translation from any other model to M. 17/12/2009 Università Roma Tre 36 Translation specification The Datalog is used to specify the translation A translation script in our tool is a set of datalog rules. 17/12/2009 Università Roma Tre 37 Datalog Declarative language We specify the condition for the insertion For every set of construct that matchs the conditions in B we create a new construct A A <- B 17/12/2009 Università Roma Tre 38 Datalog rule example We generate a new Abstract for each Aggregation Abstract( OID: SK1(oid), Name: name ) Aggregation( OID: oid, Name: name ); 17/12/2009 Università Roma Tre 39 Another rule We copy only Lexical of Aggregation Lexical ( OID: SK1(oid), aggregationOID: SK2(aggOID), Name:name, isIdentifier:isId, isNullable:isN, isOptional:isO, type:t) <Lexical ( OID: oid, aggregationOID: aggOID, Name:name, isIdentifier:isId, isNullable:isN, isOptional:isO, type:t), Aggregation( OID:aggOID); 17/12/2009 Università Roma Tre 40 Approach It is possible to apply the same approach to other model management operators? How can we define other operators with respect to our supermodel? 17/12/2009 Università Roma Tre 41 Construct characteristics Every costruct has: – – – – 17/12/2009 An identification OID A name A set of properties A set of references Università Roma Tre 42 Construct characteristics Every costruct has: – – – – 17/12/2009 An identification OID A name A set of properties A set of references Università Roma Tre SM_Lexical ( OID: SK1 oid, aggregationOID: aggOID, Name:name, isIdentifier:isId, isNullable:isN, isOptional:isO, type:t ) 43 Construct equivalence Two constructs are equivalent if they have: – The same name – The same set of properties – And refer to equivalent costructs 17/12/2009 Università Roma Tre 44 Comparison There is a recursive definition of equivalence. We can order the construct and start the matching from the constructs without references. 17/12/2009 Università Roma Tre 45 Construct characteristics Those can be found also in the rules – – – – 17/12/2009 An identification OID A name A set of properties A set of references SM_Lexical ( OID: SK1(oid), aggregationOID: SK2(aggOID), Name:name, isIdentifier:isId, isNullable:isN, isOptional:isO, type:t ) <SM_Lexical ( OID: oid, aggregationOID: aggOID, Name:name, isIdentifier:isId, isNullable:isN, isOptional:isO, ), SM_Aggregation( OID:aggOID ); Università Roma Tre 46 Example An equivalence comparison may work as follows: 1.comparison of the aggregations or abstracts without any references; 2. comparison of constructs which may refer to them 17/12/2009 Università Roma Tre 47 Model management operators by examples An Example of a possible implementation of model management operators follow. The adopted language is Datalog. The tool is MIDST. 17/12/2009 Università Roma Tre 48 Datalog implementation of equivalence . Fundamental functional block to compare two constructs: EQUIV_Aggregation [DEST] ( OID1: oid1, OID2: oid2) <SM_Aggregation [SOURCE_1] ( OID: oid1, Name: name), SM_Aggregation[SOURCE_2] ( OID: oid2, Name: name ); 17/12/2009 Università Roma Tre 49 Datalog implementation of difference merge Fundamental functional block used to implement a SELECTIVE COPY. SM_Aggregation( OID: SK(oid), Name: name ) <SM_Aggregation ( OID: oid, Name: name ), !EQUIV_Aggregation ( OID1: oid ); Used both in difference and in merge. 17/12/2009 Università Roma Tre 50 Automatic generation These operators can be automatically generated by the MIDST application framework. The construct of the supermodel are used to generate the rules used for the matching. The order of the application is important. 17/12/2009 Università Roma Tre 51 Example S1 PCode Title Project (0,N) (1,1) Manager SSN Name EID I1 Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) 17/12/2009 Università Roma Tre 52 Example I1 Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) I2 Project (PCode, Title, MGRID*) Manager (SSN, EID, Name, Degree) 17/12/2009 Università Roma Tre 53 Example Step 1 : difference between the implementation schemas. I1 Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) 1: DIFF(I1,I2) G2’Project (MGRSSN*) Manager (SSN, EID) I2 Project (PCode, Title, MGRID*) Manager (SSN, EID, Name, Degree) 17/12/2009 Università Roma Tre 54 The Round-trip solving script 17/12/2009 Università Roma Tre 55 Example Step 2 : difference between the implementation schemas. I1 Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) G 2’ + 2: DIFF(I2,I1) Project (MGRID*) Manager (SSN, EID, Degree) I2 Project (PCode, Title, MGRID*) Manager (SSN, EID, Name, Degree) 17/12/2009 Università Roma Tre 56 The Round-trip solving script 17/12/2009 Università Roma Tre 57 Example Step 3-4 : inversion of the two semidifferences. G2’+ G2’- Projectstub (MGRID*) Projectstub (MGRSSN*) Managerstub (SSN, EID, Degree) Managerstub (SSN, EID) 3: REVERSE 4: REVERSE S3’+ S3’Projectstub (1,1) (1,1) (0,N) (0,N) Managerstub 17/12/2009 Projectstub SSN EID Degree Università Roma Tre Managerstub SSN EID 58 The Round-trip solving script 17/12/2009 Università Roma Tre 59 Example Step 5 : merge of the initial specification schema with the inverted positive semidifference. H S3’+ S1 Project PCode Title Projectstub Projectstub (1,1) (1,1) (1,1) (0,N) (0,N) (0,N) Manager SSN EID Name Managerstub SSN EID EID SSN Degree Managerstub PCode Title Name SSN EID Degee 5: MERGE 17/12/2009 Università Roma Tre 60 The Round-trip solving script 17/12/2009 Università Roma Tre 61 Example Step 7 : difference between H and the inverted negative semidifference. H Projectstub EID SSN S2 S3’PCode Title Project Projectstub (1,1) (1,1) (1,1) (0,N) (0,N) (0,N) Managerstub Name SSN EID Degee Managerstub SSN EID Manager PCode Title SSN EID Name Degree 6: DIFF 17/12/2009 Università Roma Tre 62 The Round-trip solving script 17/12/2009 Università Roma Tre 63 Demo 17/12/2009 Università Roma Tre 64/49 Properties Model independence – MIDST handles schemas as instances of subsets of the available metaconstructs. – The operators are defined as datalog rules declaring transformations in terms of the supermodel metaconstructs. – The operators are defined in such a way that they are valid for any model by specifying comparisons between every available construct. 17/12/2009 Università Roma Tre 65 Properties Model closure – A model management operator (except MODELGEN) applied to a set of input schemas of a model M yields output schemas of the same model M. Model awareness – Operators can be defined in such a way that they do not add metaconstructs which are not present in the source schemas (model awareness). 17/12/2009 Università Roma Tre 66 References [1] P. Atzeni, P. Cappellari and P.A. Bernstein. Modelgen: Model- independent schema translation. In ICDE Conference, pages 1111-1112, 2005. [2] P. Atzeni, P. Cappellari and G. Gianforme. MIDST: model- independent schema and data translation. In SIGMOD, pages 1134-1136, ACM, 2007. [3] P. Atzeni, P. Cappellari, R. Torlone, P.A. Bernstein and G. Gianforme. Model-independent schema translation. In VLDB Journal. [4] P.A. Bernstein. Applying model management to classical meta data problems. In CIDR, pages 209-220, 2003. 17/12/2009 Università Roma Tre 67 Summary Model management Operators Model generic operators Operators in MIDST Example 17/12/2009 Università Roma Tre 68