DataMirrorCDC Operator Readme Table of Contents Presented by IBM Software Group -306.ibm.com/software/
by user
Comments
Transcript
DataMirrorCDC Operator Readme Table of Contents Presented by IBM Software Group -306.ibm.com/software/
DataMirrorCDC Operator Readme Presented by IBM Software Group http://www-306.ibm.com/software/ Table of Contents If you're viewing this document online, you can click any of the topics below to link directly to that section. 1. About DataMirrorCDC Operator ..................................... 2. DataMirror and Information Server Integration scenario where the operator can be used ................................................ 3. Stage Properties........................................................ 4. Operator Data Flow .................................................... 5. Data Type Mapping .................................................... 6. Using the DataMirrorCDC Operator................................. 7. Example ................................................................. 8. Limitations ............................................................... 9. Troubleshooting ........................................................ DataMirrorCDC Operator Readme 2 3 4 6 8 10 11 12 13 Page 1 of 13 http://www-306.ibm.com/software/ Presented by IBM Software Group Section 1. About DataMirrorCDC Operator The DataMirrorCDC operator allows you to integrate data between remote source database and a target database by collecting captured data changes from Transformation Server. The target database is then updated with the changes. The DataMirrorCDC operator connects to Transformation Server via a TCP/IP port. The operator collects the captured delta changes made to source database due to insert, update or delete operations from Transformation Server in NetChange mirroring mode. The columns are output so that they can be consumed by the downstream stages (via Switch stage, Transformer stage etc) in DataStage/QualityStage. The delta changes can then be applied to the target database. Page 2 of 13 DataMirrorCDC Operator Readme Presented by IBM Software Group http://www-306.ibm.com/software/ Section 2. DataMirror and Information Server Integration scenario where the operator can be used The scenario describes how DataMirrorCDC operator allows you to integrate data between remote source database and a target database using DataMirror's Change Data Capture technology by streaming data changes into Information Server DataStage. 1. 2. 3. 4. DataStage extracts data from source database using standard ETL functions DataMirrorCDC operator requests the changed data from TS DS over TCP/IP DataMirror captures changes made to remote database Transformation Server for WebSphere DataStage sends the captured changes to custom operator using TCP/IP sockets 5. DataMirrorCDC operator passes the data off to downstream stages 6. Updates are written to target database DataMirrorCDC Operator Readme Page 3 of 13 http://www-306.ibm.com/software/ Presented by IBM Software Group Section 3. Stage Properties The following properties need to be configured for DataMirrorCDC operator. debug: Enter True to get more information if you encounter a problem. The default value is False. Type - Boolean. dmbindir: Enter the full path to the DataMirror binaries directory on the DataStage server machine. This directory should contain the dmshowtableinfo.exe and dmstartnetchangemirror.exe executables for Windows or dmshowtableinfo and dmstartnetchangemirror executables for Linux. For example, if the DataMirror Transformation Server for WebSphere DataStage is installed in the default directories, the value will be "C:\Program Files\DataMirror\Transformation Server for DataStage\bin\" (without quotes). Type - String. dmsrcdatastore: Enter the name of the DataMirror source datastore for the subscription dmsub. This value is case sensitive. Type - String. dmsub: Enter the name of the DataMirror subscription that is providing the change data. This value is case sensitive. Type - String. dmtgtdatastore: Enter the name of the DataMirror target datastore for the subscription dmsub. This value is case sensitive. Type - String. dsport: Enter a port that this stage instance can use to communicate with the DataMirror Transformation Server. Verify that this port is not used by any other standard programs or DataStage jobs using another instance of this operator. This port should be same as the port that was specified when configuring the subscription dmsub for Direct Connect method. Type - Integer. recordtype: Enter the value as "Single" (without quotes) when the subscription dmsub is configured with Single Record format. The value is "Multiple" without quote when the subscription dmsub is configured with Multiple Record format. Note that Multiple Record format is not supported in this release. The values are case sensitive. Type String. sourcetable: Enter the fully qualified name of the table in the subscription that you want to retrieve the changed data from. Value is case sensitive. Type - String. tempdir: Enter the full path for a writable working directory that can be used by this stage for storing temporary data. Type - String. dsautostartTS: Enter False to start mirroring manually either from DataMirror Transformation Server Management Console or Transformation Server for WebSphere DataStage. For the value False, the job starts, turns blue, and waits for user to start mirroring manually. Default is True when this stage starts the mirroring automatically. Type - Boolean. Page 4 of 13 DataMirrorCDC Operator Readme Presented by IBM Software Group http://www-306.ibm.com/software/ In the screen capture below, the DataMirrorCDC operator properties are shown with some example values. DataMirrorCDC Operator Readme Page 5 of 13 http://www-306.ibm.com/software/ Presented by IBM Software Group Section 4. Operator Data Flow The DataMirrorCDC operator does not have input data sets. The operator receives multiple input messages via a TCP/IP connection from Transformation Server. The DataMirrorCDC operator processes the input messages and outputs database operations and data. The operator should be run in sequential mode. The number of output columns is dynamic since it depends on the source table schema. The DataMirrorCDC operator generates the columns in the order as shown below. The first five columns are constants. • DM_OPERATION_ID • DM_TIMESTAMP • DM_TXID • DM_OPERATION_TYPE • DM_USER The rest of the columns are a set of before-image source table columns and a set of after-image source table columns in order. For insert operation, before-image column values are null. For delete operation, after-image column values are null. DM_OPERATION_ID: The column values are 0, 1 or 2. 0 indicates an insert operation, 1 indicates an update operation, and 2 indicates a delete operation. The column type is numeric, so it can be used as the selector ID when a Switch stage is used in the job. DM_TIMESTAMP: Timestamp. DM_TXID: Transaction ID. DM_OPERATION_TYPE: Operation type. Values are I for insert, U for update and D for delete. The column type is varchar. DM_USER: Source table owner. BEFORE_COLUMN1: First column in the source table that contains the value before the change happened. BEFORE_COLUMN2: Second column in the source table that contains the value before the change happened. : : BEFORE_COLUMNn: nth column in the source table that contains the value before the change happened. AFTER_COLUMN1: First column in the source table that contains the value after the Page 6 of 13 DataMirrorCDC Operator Readme Presented by IBM Software Group http://www-306.ibm.com/software/ change happened. AFTER_COLUMN2: Second column in the source table that contains the value after the change happened. : : AFTER_COLUMNn: nth column in the source table that contains the value after the change happened. DataMirrorCDC Operator Readme Page 7 of 13 http://www-306.ibm.com/software/ Presented by IBM Software Group Section 5. Data Type Mapping The following table defines the mapping between source type and operator field type. Source Data Type Operator Field Data Type CHAR String CHARACTER String VARCHAR String LONG VARCHAR String GRPHIC String VARGRAPHIC String LONG GRAPHIC String CLOB String NCHAR Ustring NVARCHAR Ustring BIT 8-bit integer TINYINT 8-bit integer SMALLINT 16-bit integer INTEGER 32-bit integer BIGINT 64-bit integer DECIMAL String NUMERIC String REAL Single-precision float DOUBLE Double-precision float DATE Date TIME Time TIMESTAMP Timestamp BLOB Raw BINARY Raw VARBINARY Raw LONGVARBINARY Raw CHAR FOR BIT DATA Raw VARCHAR FOR BIT DATA Raw LONG VARCHAR FOR BIT DATA Raw Page 8 of 13 DataMirrorCDC Operator Readme Presented by IBM Software Group UNKNOWN DataMirrorCDC Operator Readme http://www-306.ibm.com/software/ String Page 9 of 13 http://www-306.ibm.com/software/ Presented by IBM Software Group Section 6. Using the DataMirrorCDC Operator DataMirrorCDC operator runs on Windows and Linux RedHat platforms that are currently supported by Information Server 8.0.1 FP1. Using this stage requires IBM DataMirror Transformation Server for WebSphere DataStage Version 6.2 to be installed on the same system where IBM Information Server is running. For installing and configuring IBM DataMirror Transformation Server for WebSphere DataStage, refer to the End-User Documentation for IBM DataMirror Transformation Server for WebSphere DataStage. For information on configuring a subscription for the DataMirrorCDC operator, see the documentation for Transformation Server Management Console. Configuring subscription in Transformation Server Management Console for the DataMirrorCDC operator requires the following methods to be used in Transformation Server Management Console: • Mapping Type for Websphere DataStage • Connection Method Direct Connect • Record Format Single Record IBM DataMirror provides a simple way to generate the DataStage jobs automatically with all of the required parameters for the DataMirrorCDC operator filled in. The DataStage job can be configured by using the Transformation Server Management Console drop-down Generate WebSphere DataStage Definition option. This action results in the generation of a DataStage specific import file. This file can then be imported into DataStage using the Repository import function. The resulting generated job can then be modified to process the captured data using any standard DataStage feature. Page 10 of 13 DataMirrorCDC Operator Readme Presented by IBM Software Group http://www-306.ibm.com/software/ Section 7. Example In the example below, we have created a job that uses the DataMirrorCDC operator to retrieve changes made to a DataMirror managed table. The changes that are returned from the operator are processed by the Switch operator. This SWITCH operator will determine whether and insert, update or delete was detected and then pass it on to the appropriate link so that the same action can be applied to a target table The Switch properties are shown below. DataMirrorCDC Operator Readme Page 11 of 13 http://www-306.ibm.com/software/ Presented by IBM Software Group Section 8. Limitations Maximum number of columns in source table: 1024 bytes Maximum column name size: 128 bytes Maximum message size: 65536 bytes Maximum row size: 32256 bytes COMMIT Scope: This operator does not support the unit of work indicators sent by DataMirror. You need to make sure that outbound stages deal appropriately with committing changes to the target on regular intervals. Using the example in the previous section, the DB2 Enterprise stage has to be updated to commit after processing 2000 records as shown below. NULL values are not supported. Page 12 of 13 DataMirrorCDC Operator Readme Presented by IBM Software Group http://www-306.ibm.com/software/ Section 9. Troubleshooting 1. Job is blue and seems to be hanging. Solution: The job may be hanging because dsautostartTS is set to False. Change dsautostartTS to True to start mirroring automatically. 2. Job fails or aborts. Set debug=True in the stage properties to get more information in the event log. To help resolve the issue, contact IBM with the information in the event log. 3. After importing the generated DSX file into DataStage, if you want to use a different stage other than Sequential File stage, drop the Sequential File stage without dropping the link so that column names are saved in the output link. 4. If you will be writing the captured changes from the DataMirrorCDC operator to a relational database using DataStage, you need to structure the DataStage job to sequence the changes properly. Since the DataMirrorCDC operator will output the Inserts, Updates and Deletes operations in the order that they were captured by DataMirror, you should develop your DataStage job to make sure that this order is maintained to ensure the operations also occur in the correct sequence when updating the database table. Colophon This tutorial was written entirely in XML, using the developerWorks Toot-O-Matic tutorial generator. The open source Toot-O-Matic tool is an XSLT style sheet and several XSLT extension functions that convert an XML file into a number of HTML pages, a zip file, JPEG heading graphics, and two PDF files. Our ability to generate multiple text and binary formats from a single source file illustrates the power and flexibility of XML. (It also saves our production team a great deal of time and effort.) DataMirrorCDC Operator Readme Page 13 of 13