...

DataMirrorCDC Operator Readme Table of Contents Presented by IBM Software Group -306.ibm.com/software/

by user

on
Category: Documents
30

views

Report

Comments

Transcript

DataMirrorCDC Operator Readme Table of Contents Presented by IBM Software Group -306.ibm.com/software/
DataMirrorCDC Operator Readme
Presented by IBM Software Group
http://www-306.ibm.com/software/
Table of Contents
If you're viewing this document online, you can click any of the topics below to link directly to that section.
1. About DataMirrorCDC Operator .....................................
2. DataMirror and Information Server Integration scenario where
the operator can be used ................................................
3. Stage Properties........................................................
4. Operator Data Flow ....................................................
5. Data Type Mapping ....................................................
6. Using the DataMirrorCDC Operator.................................
7. Example .................................................................
8. Limitations ...............................................................
9. Troubleshooting ........................................................
DataMirrorCDC Operator Readme
2
3
4
6
8
10
11
12
13
Page 1 of 13
http://www-306.ibm.com/software/
Presented by IBM Software Group
Section 1. About DataMirrorCDC Operator
The DataMirrorCDC operator allows you to integrate data between remote source
database and a target database by collecting captured data changes from
Transformation Server. The target database is then updated with the changes. The
DataMirrorCDC operator connects to Transformation Server via a TCP/IP port. The
operator collects the captured delta changes made to source database due to insert,
update or delete operations from Transformation Server in NetChange mirroring mode.
The columns are output so that they can be consumed by the downstream stages (via
Switch stage, Transformer stage etc) in DataStage/QualityStage. The delta changes
can then be applied to the target database.
Page 2 of 13
DataMirrorCDC Operator Readme
Presented by IBM Software Group
http://www-306.ibm.com/software/
Section 2. DataMirror and Information Server Integration
scenario where the operator can be used
The scenario describes how DataMirrorCDC operator allows you to integrate data
between remote source database and a target database using DataMirror's Change
Data Capture technology by streaming data changes into Information Server
DataStage.
1.
2.
3.
4.
DataStage extracts data from source database using standard ETL functions
DataMirrorCDC operator requests the changed data from TS DS over TCP/IP
DataMirror captures changes made to remote database
Transformation Server for WebSphere DataStage sends the captured changes to
custom operator using TCP/IP sockets
5. DataMirrorCDC operator passes the data off to downstream stages
6. Updates are written to target database
DataMirrorCDC Operator Readme
Page 3 of 13
http://www-306.ibm.com/software/
Presented by IBM Software Group
Section 3. Stage Properties
The following properties need to be configured for DataMirrorCDC operator.
debug: Enter True to get more information if you encounter a problem. The default
value is False. Type - Boolean.
dmbindir: Enter the full path to the DataMirror binaries directory on the DataStage
server machine. This directory should contain the dmshowtableinfo.exe and
dmstartnetchangemirror.exe executables for Windows or dmshowtableinfo and
dmstartnetchangemirror executables for Linux. For example, if the DataMirror
Transformation Server for WebSphere DataStage is installed in the default directories,
the value will be "C:\Program Files\DataMirror\Transformation Server for
DataStage\bin\" (without quotes). Type - String.
dmsrcdatastore: Enter the name of the DataMirror source datastore for the
subscription dmsub. This value is case sensitive. Type - String.
dmsub: Enter the name of the DataMirror subscription that is providing the change
data. This value is case sensitive. Type - String.
dmtgtdatastore: Enter the name of the DataMirror target datastore for the subscription
dmsub. This value is case sensitive. Type - String.
dsport: Enter a port that this stage instance can use to communicate with the
DataMirror Transformation Server. Verify that this port is not used by any other
standard programs or DataStage jobs using another instance of this operator. This port
should be same as the port that was specified when configuring the subscription
dmsub for Direct Connect method. Type - Integer.
recordtype: Enter the value as "Single" (without quotes) when the subscription dmsub
is configured with Single Record format. The value is "Multiple" without quote when the
subscription dmsub is configured with Multiple Record format. Note that Multiple
Record format is not supported in this release. The values are case sensitive. Type String.
sourcetable: Enter the fully qualified name of the table in the subscription that you
want to retrieve the changed data from. Value is case sensitive. Type - String.
tempdir: Enter the full path for a writable working directory that can be used by this
stage for storing temporary data. Type - String.
dsautostartTS: Enter False to start mirroring manually either from DataMirror
Transformation Server Management Console or Transformation Server for WebSphere
DataStage. For the value False, the job starts, turns blue, and waits for user to start
mirroring manually. Default is True when this stage starts the mirroring automatically.
Type - Boolean.
Page 4 of 13
DataMirrorCDC Operator Readme
Presented by IBM Software Group
http://www-306.ibm.com/software/
In the screen capture below, the DataMirrorCDC operator properties are shown with
some example values.
DataMirrorCDC Operator Readme
Page 5 of 13
http://www-306.ibm.com/software/
Presented by IBM Software Group
Section 4. Operator Data Flow
The DataMirrorCDC operator does not have input data sets. The operator receives
multiple input messages via a TCP/IP connection from Transformation Server. The
DataMirrorCDC operator processes the input messages and outputs database
operations and data. The operator should be run in sequential mode.
The number of output columns is dynamic since it depends on the source table
schema. The DataMirrorCDC operator generates the columns in the order as shown
below. The first five columns are constants.
• DM_OPERATION_ID
• DM_TIMESTAMP
• DM_TXID
• DM_OPERATION_TYPE
• DM_USER
The rest of the columns are a set of before-image source table columns and a set of
after-image source table columns in order. For insert operation, before-image column
values are null. For delete operation, after-image column values are null.
DM_OPERATION_ID: The column values are 0, 1 or 2. 0 indicates an insert operation,
1 indicates an update operation, and 2 indicates a delete operation. The column type is
numeric, so it can be used as the selector ID when a Switch stage is used in the job.
DM_TIMESTAMP: Timestamp.
DM_TXID: Transaction ID.
DM_OPERATION_TYPE: Operation type. Values are I for insert, U for update and D
for delete. The column type is varchar.
DM_USER: Source table owner.
BEFORE_COLUMN1: First column in the source table that contains the value before
the change happened.
BEFORE_COLUMN2: Second column in the source table that contains the value
before the change happened.
:
:
BEFORE_COLUMNn: nth column in the source table that contains the value before
the change happened.
AFTER_COLUMN1: First column in the source table that contains the value after the
Page 6 of 13
DataMirrorCDC Operator Readme
Presented by IBM Software Group
http://www-306.ibm.com/software/
change happened.
AFTER_COLUMN2: Second column in the source table that contains the value after
the change happened.
:
:
AFTER_COLUMNn: nth column in the source table that contains the value after the
change happened.
DataMirrorCDC Operator Readme
Page 7 of 13
http://www-306.ibm.com/software/
Presented by IBM Software Group
Section 5. Data Type Mapping
The following table defines the mapping between source type and operator field type.
Source Data Type
Operator Field Data Type
CHAR
String
CHARACTER
String
VARCHAR
String
LONG VARCHAR
String
GRPHIC
String
VARGRAPHIC
String
LONG GRAPHIC
String
CLOB
String
NCHAR
Ustring
NVARCHAR
Ustring
BIT
8-bit integer
TINYINT
8-bit integer
SMALLINT
16-bit integer
INTEGER
32-bit integer
BIGINT
64-bit integer
DECIMAL
String
NUMERIC
String
REAL
Single-precision float
DOUBLE
Double-precision float
DATE
Date
TIME
Time
TIMESTAMP
Timestamp
BLOB
Raw
BINARY
Raw
VARBINARY
Raw
LONGVARBINARY
Raw
CHAR FOR BIT DATA
Raw
VARCHAR FOR BIT DATA
Raw
LONG VARCHAR FOR BIT DATA
Raw
Page 8 of 13
DataMirrorCDC Operator Readme
Presented by IBM Software Group
UNKNOWN
DataMirrorCDC Operator Readme
http://www-306.ibm.com/software/
String
Page 9 of 13
http://www-306.ibm.com/software/
Presented by IBM Software Group
Section 6. Using the DataMirrorCDC Operator
DataMirrorCDC operator runs on Windows and Linux RedHat platforms that are
currently supported by Information Server 8.0.1 FP1.
Using this stage requires IBM DataMirror Transformation Server for WebSphere
DataStage Version 6.2 to be installed on the same system where IBM Information
Server is running. For installing and configuring IBM DataMirror Transformation Server
for WebSphere DataStage, refer to the End-User Documentation for IBM DataMirror
Transformation Server for WebSphere DataStage. For information on configuring a
subscription for the DataMirrorCDC operator, see the documentation for
Transformation Server Management Console.
Configuring subscription in Transformation Server Management Console for the
DataMirrorCDC operator requires the following methods to be used in Transformation
Server Management Console:
• Mapping Type for Websphere DataStage
• Connection Method Direct Connect
• Record Format Single Record
IBM DataMirror provides a simple way to generate the DataStage jobs automatically
with all of the required parameters for the DataMirrorCDC operator filled in. The
DataStage job can be configured by using the Transformation Server Management
Console drop-down Generate WebSphere DataStage Definition option. This action
results in the generation of a DataStage specific import file. This file can then be
imported into DataStage using the Repository import function. The resulting
generated job can then be modified to process the captured data using any standard
DataStage feature.
Page 10 of 13
DataMirrorCDC Operator Readme
Presented by IBM Software Group
http://www-306.ibm.com/software/
Section 7. Example
In the example below, we have created a job that uses the DataMirrorCDC operator to
retrieve changes made to a DataMirror managed table. The changes that are returned
from the operator are processed by the Switch operator. This SWITCH operator will
determine whether and insert, update or delete was detected and then pass it on to the
appropriate link so that the same action can be applied to a target table
The Switch properties are shown below.
DataMirrorCDC Operator Readme
Page 11 of 13
http://www-306.ibm.com/software/
Presented by IBM Software Group
Section 8. Limitations
Maximum number of columns in source table: 1024 bytes
Maximum column name size: 128 bytes
Maximum message size: 65536 bytes
Maximum row size: 32256 bytes
COMMIT Scope: This operator does not support the unit of work indicators sent by
DataMirror. You need to make sure that outbound stages deal appropriately with
committing changes to the target on regular intervals. Using the example in the
previous section, the DB2 Enterprise stage has to be updated to commit after
processing 2000 records as shown below.
NULL values are not supported.
Page 12 of 13
DataMirrorCDC Operator Readme
Presented by IBM Software Group
http://www-306.ibm.com/software/
Section 9. Troubleshooting
1. Job is blue and seems to be hanging.
Solution: The job may be hanging because dsautostartTS is set to False. Change
dsautostartTS to True to start mirroring automatically.
2. Job fails or aborts.
Set debug=True in the stage properties to get more information in the event log. To
help resolve the issue, contact IBM with the information in the event log.
3. After importing the generated DSX file into DataStage, if you want to use a different
stage other than Sequential File stage, drop the Sequential File stage without dropping
the link so that column names are saved in the output link.
4. If you will be writing the captured changes from the DataMirrorCDC operator to a
relational database using DataStage, you need to structure the DataStage job to
sequence the changes properly. Since the DataMirrorCDC operator will output the
Inserts, Updates and Deletes operations in the order that they were captured by
DataMirror, you should develop your DataStage job to make sure that this order is
maintained to ensure the operations also occur in the correct sequence when updating
the database table.
Colophon
This tutorial was written entirely in XML, using the developerWorks Toot-O-Matic tutorial
generator. The open source Toot-O-Matic tool is an XSLT style sheet and several XSLT
extension functions that convert an XML file into a number of HTML pages, a zip file, JPEG
heading graphics, and two PDF files. Our ability to generate multiple text and binary formats
from a single source file illustrates the power and flexibility of XML. (It also saves our
production team a great deal of time and effort.)
DataMirrorCDC Operator Readme
Page 13 of 13
Fly UP