White Paper Best Practices and Performance Guidelines for IBM InfoSphere® Information Server

by user

on 15 сентября 2016

Category: Documents

>> Downloads: 4

views

Report

Comments

Description

Download White Paper Best Practices and Performance Guidelines for IBM InfoSphere® Information Server

Transcript

White Paper Best Practices and Performance Guidelines for IBM InfoSphere® Information Server

White Paper
Best Practices and Performance Guidelines for
IBM InfoSphere® Information Server
running on Intel® Xeon® Servers providing connectivity to
IBM Netezza® Data Warehouse Servers
Date: March 21, 2012
Authors:
Garrett Drysdale, Intel® Corporation
Sriram Padmanabhan, IBM
Branislav Barnak, IBM
Brian Caufield, IBM
Tony Curcio, IBM
Jon Deng, IBM
David Li, IBM
Liang Lu, IBM
Mi Wan Shum, IBM
John Skier, IBM Netezza
Samuel Wong, IBM
1
Table of Contents
Best Practices and Performance Guidelines for IBM InfoSphere® Information Server running on Intel® Xeon®
Servers providing connectivity to IBM Netezza® Data Warehouse Servers ......................................................... 1
1.
Introduction...................................................................................................................................... 3
2.
Overview of IBM InfoSphere Information Server .................................................................................... 3
2.1
IBM InfoSphere Information Server Delivers New, World-class Connectivity to IBM Netezza Data
Warehouse Appliances ........................................................................................................................... 4
2.2
New IBM InfoSphere Information Server Connector for IBM Netezza: Helping Customers Achieve Higher
Performance and Increased Developer efficiency....................................................................................... 5
2.2.1
Increased Developer Efficiency................................................................................................. 5
2.2.2
Achieve Higher Performance.................................................................................................... 5
2.3
New IBM InfoSphere Information Server Balanced Optimization for IBM Netezza ................................ 5
3.
Overview of Intel® Xeon® Series E7 Processors and Intel 10GbE Ethernet Network Adapter ........................ 6
3.1
Performance Scaling When Upgrading IBM InfoSphere Information Server and Intel® Xeon® Servers ..... 8
4.
Overview of IBM Netezza 1000 Series Data Warehouse Appliance ........................................................... 9
5.
Benchmarking Performance Results of Load and Unload Operations between IBM InfoSphere Information
Server and IBM Netezza 1000 Series Data Warehouse Appliance.................................................................... 11
5.1
Test Configuration..................................................................................................................... 11
5.2
Netezza Connector Load Performance Results .............................................................................. 12
5.3
Netezza Connector Unload Performance Results ........................................................................... 13
6.
Information Server Jobs Used for Testing............................................................................................ 14
7.
Standard Practices and Guidelines for Effective Performance ................................................................ 15
7.1
How to Avoid Partial Loads ......................................................................................................... 15
7.2
How to Manage Temporary Work Tables (TWTs) ........................................................................... 16
7.3
How to Use the Temporary Work Table with User-defined SQL ....................................................... 19
7.4
How to Setup Parallel Reads ....................................................................................................... 20
7.5
How to Use the Action Column Mode .......................................................................................... 21
7.6
Transaction Handling ................................................................................................................. 25
7.7
How to Use Quoted IDs.............................................................................................................. 28
7.8
Additional Best Practices for Optimal Performance........................................................................ 31
8.
Conclusion...................................................................................................................................... 32
9.
Acknowledgements ......................................................................................................................... 33
10.
About the Authors..................................................................................................................... 34
11.
Intel Legal Disclaimer................................................................................................................. 36
12.
IBM Legal Disclaimer ................................................................................................................. 37
2
1. Introduction
IBM InfoSphere® Information Server, Intel® with the latest Intel® Xeon® 4 and 8 socket capable Xeon® E7 series
processors and IBM Netezza® Performance Server data warehouse appliance work together to deliver breakthrough analytics based on more trustworthy information. The IBM Netezza data warehousing appliance provides
high-performance analytics to businesses. Since it is an appliance, it is easy to deploy and manage. IBM InfoSphere
Information Server helps ensure that information loaded into the IBM Netezza data warehouse appliance is comprehensive, accurate, timely and in context. This allows clients to rapidly drive innovation and improve business
results. IBM InfoSphere Information Server is hosted on Intel® Xeon® E7 series processors which provide a highly
scalable and reliable computing platform aimed at mission critical environments.
The purpose of this white paper is to help you become familiar with the new and comprehensive integration that is
now available between IBM InfoSphere Information Server and IBM Netezza data warehouse appliance. We summarize exciting results of internal benchmarking tests focused on Load and Unload performance when using the
new InfoSphere Information Server connector for IBM Netezza.
This paper is one of a series of white papers intended to provide IBM InfoSphere Information Server customers
with helpful performance tuning guidelines for deployment on Intel® Xeon® processor based platforms. We tested
connectivity between IBM InfoSphere Information Server running on Intel® Xeon® E7-4870 processors in a 4 socket
configuration and an IBM Netezza 1000-12 data warehouse appliance. The goal was to optimize performance of
the data transfer and load capabilities between these two platforms, which are critical for business intelligence.
Part of this work involved the creation and optimization of the new Information Server connector for IBM Netezza.
We show how to best utilize the new Netezza Connector to optimize performance between IBM Netezza Appliances and Information Server. We also share many of the important new features provided by this connector.
We demonstrate over 2.3 TB/hour of load and unload performance between Information Server running on a four
socket server using Intel® Xeon® E7 processors and an IBM Netezza 1000-12 data warehouse appliance running
version 6.0.2 of the IBM Netezza Appliance software. To attain this performance, Intel® 10 Gbit/s Ethernet Adapters are utilized with over 6 Gbits/s of bandwidth utilization. The IBM Netezza 1000 series is rated to attain in excess of 2TB/hour of load and unload performance which we were able to achieve. While over 2TB/hour of load
and unload performance is impressive, it is only the beginning in our quest of tremendous load and unload performance. We are confident that Information Server to IBM Netezza performance will increase substantially when
anticipated enhancements are available. This is due to the substantial network bandwidth and CPU utilization
headroom available on the Intel® Xeon® E7 servers in the current test configurations. In addition, Information
Server running on Intel® Xeon® E7 processor based systems allows scaling out to multiple systems as demand on
processing capacity increases. This allows performance capability to be increased as customer requirements grow.
2. Overview of IBM InfoSphere Information Server
IBM InfoSphere Information Server offers data integration through extract, transform, and load (ETL) capabilities.
It provides a designer tool that allows developers to visually create integration jobs. A job is used within IBM
InfoSphere Information Server to describe ETL tasks. Jobs are composed from a rich palette of reusable functions
called stages. These stages include:
•
Source and target access for databases, data warehouse appliances, applications and files
•
General processing stages such as filter, sort, join, union, lookup and aggregations
•
Built-in and custom transformations
3
•
Copy, move, FTP and other data movement stages
•
Real-time, XML, SOA and message queue processing
•
Stages for accessing Big Data, running BRMS rules, and integrating with custom built Java components
Additionally, IBM InfoSphere Information Server allows pre-conditions and post-conditions to be applied to the
stages. Multiple jobs can be controlled and linked by a sequencer. The sequencer provides the control logic that
can be used to process the appropriate data integration jobs. IBM InfoSphere Information Server also supports a
rich administration capability for deploying, scheduling and monitoring jobs.
One of the great strengths of IBM InfoSphere Information Server is that very little consideration to the underlying
structure of the system is necessary to design integration jobs. In most cases, if the system changes, is upgraded
or improved, or if a job is developed on one platform and implemented on another, the job design will not require
changes. IBM InfoSphere Information Server has the capability to learn about the shape and size of the system
from the configuration file. Further, it has the capability to organize the resources needed for a job according to
what is defined in the configuration file. When a system changes, the configuration file is changed, not the jobs. A
configuration file defines the number of data partitions (or “processing nodes”) with which the job will run. The
processing nodes are logical rather than physical.
The following are factors that affect the optimal degree of parallelism:
• CPU-intensive applications, which typically perform multiple CPU-demanding operations on each record, benefit
from the greatest possible parallelism up to the capacity supported by a given system.
• Jobs with large memory requirements can benefit from parallelism if they act on data that has been partitioned
and if the required memory is also divided among partitions.
• Applications that make intensive use of disk or other I/O resources, such as those that extract data from and load
data into databases, benefit from configurations in which the number of logical nodes equals the number of I/O
paths being accessed.
Another great strength of IBM InfoSphere Information Server is that it does not rely on the functions and processes of a database to perform transformations to scale and optimize processing. While it can generate complex
SQL queries and leverage database processing capability, it has been designed from the ground up as a multipath
data integration engine equally at home with files, streams, databases, and internal caching in single-machine,
cluster, and grid implementations. As a result, customers in many circumstances find they do not also need to invest in staging databases to support scaling their data integration environment. For additional information on Information Server jobs and stages, please see IBM InfoSphere Information Server Version 8.7 product documentation or IBM InfoSphere DataStage and QualityStage 8.7 Information Center.
.
2.1
IBM InfoSphere Information Server Delivers New, World-class Connectivity to IBM Netezza Data Warehouse Appliances
The new IBM InfoSphere Information Server connector for IBM Netezza, which is delivered for the Information
Server 8.5 and 8.7 releases, helps you realize faster time to value with IBM Netezza products running releases 4.6.x
and 6.x. The new connector provides the latest native connectivity features for parallel reading and writing of data,
and delivers capabilities to push various levels of transformation work into the IBM Netezza appliance. These new
features allow users to achieve higher performance and increase developer efficiency without requiring complex
custom coding.
4
2.2
New IBM InfoSphere Information Server Connector for IBM Netezza:
Helping Customers Achieve Higher Performance and Increased Developer
efficiency
2.2.1 Increased Developer Efficiency
The new, next-generation IBM InfoSphere Information Server connector for IBM Netezza provides a rich set of outof-the-box features. The connector selects the best-practice connectivity options by default, which generally allows it to optimize performance. To this end, the connector is able to:
•
Auto create a Temporary Work Table (TWT)
•
Auto generate DDL and load SQL
•
Use a TWT table as a default instead of direct insert
•
Use a random distribution key when creating a table
•
Use Action columns for propagating updates
When a developer wants to customize their data integration choices, the Netezza Connector provides those customization controls through easy-to-use check-box and drop-down configuration options.
The IBM InfoSphere Information Server connector for IBM Netezza also offers many of the automation features
that are common across the other Information Server connector objects. These options allow developers to solve
complex problems with simple choice selections. These options include:
• Multi-action modes that automatically attempt a secondary operation, including “Update then insert”, “Insert then Update” and “Delete then insert”, if the first action results in an error.
• Generated SQL for both the table actions as well as for creating the database objects based on the data fields
being passed through the data integration job.
• Partitioned Reads through scaling out multiple read requests and automatically merging those results.
2.2.2 Achieve Higher Performance
IBM InfoSphere Information Server and IBM Netezza data warehouse appliances are both massively scalable technologies and scale at impressive rates. Internal benchmarks of these technologies working together demonstrate
loading speeds in excess of 2 TB/hour, which allows organizations to keep up with incredibly demanding data volumes. These new speeds for loading and unloading data to and from the IBM Netezza data warehouse appliance
are a 10 – 15% increase in performance over the previous version of InfoSphere Information Server.
In many data integration patterns, it is common to have data that is already in the database that needs to be transformed. A typical example is when a user needs to aggregate a detail table into a summarized set. In this case, it is
often most efficient to allow the database to perform these transformation tasks. For this reason it is important
that a data integration technology can support multiple processing styles, both ETL (running transformations in a
scalable engine) as well as ELT (running the transformations on the database itself).
2.3
New IBM InfoSphere Information Server Balanced Optimization for IBM
Netezza
Data integration employs various runtime paradigms for optimizing performance. Combining both ETL and ELT
styles of processing allows organizations to apply the best solution for any given problem. In order to achieve
these benefits with IBM Netezza data warehouse appliances, Information Server 8.7 has extended the balanced
optimization capabilities to be able to run data transformations on the appliance.
IBM InfoSphere Information Server Balanced Optimization facilitates the creation of scalable, flexible, and optimized data integration architectures that leverage investments in parallel databases while increasing the efficiency
of integration developers. It enables users to express the logic of their data integration processes in current natural
5
flow-oriented Information Server conventions and automatically optimize their designs to achieve the benefits of
extract, load, transform and SQL pushdown.
Balanced Optimization provides the flexibility to leverage the most effective processing capabilities available for
faster integration and delivery of information, regardless of whether the processing takes place in the IBM Netezza
database or the Information Server transformation engine.
The new connector for IBM Netezza works with the Balanced Optimization technology to provide capabilities that:
• Design the job logic in the common Information Server paradigm and automate which pieces of job logic
can be performed on the database
• Significantly reduce the amount of data movement across the network by allowing the database to perform data reduction tasks
• Maximize optimization where source and target databases are homogenous
• Balance the processing usage on IBM Netezza with transformations on the Information Server engine for
the unique transformations and quality elements it provides – all without ever landing data to disk and
while maintaining parallel processing
3. Overview of Intel® Xeon® Series E7 Processors and Intel 10GbE Ethernet
Network Adapter
Servers using the Intel® Xeon® E7 series processor can deliver dramatic increases in performance and scalability
when compared to previous generation servers. The chipset includes new embedded technologies that give professionals in business, information management, creative, and scientific fields the tools to solve problems faster,
process larger data sets, and address bigger challenges.
With intelligent performance, a new high-bandwidth interconnect architecture, and greater memory capacity, platforms based on the Intel® Xeon® series E7 processor are ideal for demanding workloads. A standard four-socket
server provides up to 40 processor cores, 80 execution threads and up to two terabytes of memory. Eight-socket
and larger systems are in development by leading system vendors. The Intel® Xeon® series E7 processor also includes extensive reliability, availability and serviceability (RAS) features that improve data integrity and uptime.
One of the most important is Intel® Machine Check Architecture Recovery, which allows the operating system to
take corrective action and continue running when uncorrected errors are detected. These highly scalable servers
can be used to support enormous user populations.
Server platforms based on the Intel® Xeon® series E7 processor deliver a number of additional features that help to
improve performance, scalability and energy-efficiency.
• Next-generation Intel® Virtualization Technology (Intel® VT) provides extensive hardware assists in processors, chipsets and I/O devices to enable fast application performance in virtual machines, including
near-native I/O performance. Intel® VT also supports live virtual machine migration among current and future Intel® Xeon® processor-based servers, so businesses maintain a common pool of virtualized resources as they add new servers.
• Intel® QuickPath Interconnect Technology provides point-to-point links to distributed shared memory.
The Intel® Xeon® E7 series processors with QPI feature two integrated memory controllers with and 3 QPI
links to deliver scalable interconnect bandwidth, outstanding memory performance and flexibility and
tightly integrated interconnect RAS features. Technical articles on QPI can be found at
http://www.intel.com/technology/quickpath/.
• Intel® Turbo Boost Technology boosts performance when it’s needed most by dynamically increasing core
frequencies beyond rated values for peak workloads.
• Intel® Intelligent Power Technology adjusts core frequencies to conserve power when demand is lower.
• Intel® Hyper-Threading Technology can improve throughput and reduce latency for multithreaded applications and for multiple workloads running concurrently in virtualized environments.
6
•
New instruction support for AES encryption and decryption providing significant increases in performance
of these critical algorithms.
For additional information on the Intel® Xeon® Series E7 Processor for mission critical applications, please see
http://www.intel.com/content/www/us/en/processors/xeon/xeon-processor-e7family.html?iid=subhdr%20itc_xeon.
Intel® 7500 Chipset
Xeon® E7
Xeon® E7
Xeon® E7
Xeon® E7
Memory
Intel® Scalable
Memory Buffer
PCI Express* 2.0
Figure 1 - Intel® Xeon® E7 four socket system block diagram.
Many integration solutions for IBM Netezza data warehouse appliances such as the one described in this paper
produce high network bandwidth requirements exceeding ~6 Gbits/s. For this reason, Intel 10 GbE Ethernet Server
Adapters are used for these system topologies. 10 Gigabit Ethernet has moved past the early adoption stage and
is rapidly becoming main stream for backbones within enterprise, data center and service provider networks. The
escalating deployments of servers with multi-core processors and demanding applications such as High Performance Computing (HPC), database clusters, business intelligence and business analytics are driving the need for 10
Gigabit connections. Intel® 10 Gigabit Ethernet controller Adapters are designed to meet the throughput and latency requirements of bandwidth-hungry applications, while offering a very low power envelope for energy efficiency. Intel® 10 Gigabit Server Adapters are available in single and dual port configurations and provide a simplified alternative to multiple 1 Gbps server adapters. Intel® 10 GbE adapters provide load balancing across multiple
CPUs, multiple receive and transmit queues.
Virtualization performance is enhanced through Intel® VMDq technology which provides virtual machine device
queues to significantly reduce virtualization overhead. Intel 10 Gigabit XF Server Adapters are optimized for virtualized environments, supporting multiple queues, alleviating I/O bottlenecks between virtual machines. Virtual Machine Device queue1 (VMDq) technology offloads data sorting and data copying from the virtual machine monitor
(VMM) software layer to the hardware, improving overall throughput and CPU utilization on virtualized servers.
7
Additionally, Intel 10 Gigabit XF Server Adapters enable Intel® I/O Acceleration Technology (Intel® I/OAT) with support for Intel® QuickData for faster I/O processing on the new Quad-Core and Dual-Core Intel® Xeon® processorbased servers.
The tests shown in this paper are performed with native operating system configurations. However, Intel E7 processors and platforms support a wide range of virtualization features to provide great performance for virtualized
operating environments.
3.1
Performance Scaling When Upgrading IBM InfoSphere Information
Server and Intel® Xeon® Servers
There are significant benefits of upgrading hardware and software concurrently. Figure 2 shows that upgrading
both hardware and software together resulted in a 1.50x performance increase in throughput.
The horizontal axis has IBM InfoSphere Information Server v8.1 on the left and v8.5 on the right. The vertical axis
has the older Intel® Xeon® X7560 (code name Nehalem-EX, 2.26 GHz, 8 cores per socket, 16 threads per socket, 24
MB Last level cache) on the bottom, and the Intel® Xeon® E7-4870 (code name Westmere EX, 2.4 GHz, 10 cores per
socket, 20 threads per socket, 30 MB last level cache) on the top. Upgrading from v8.1 to v8.5 of IBM InfoSphere
Information Server on either processor resulted in a 1.11x performance gain. Upgrading the hardware from the
Intel® Xeon® X7560 processor system to the Intel® Xeon® E7-4870 processor system resulted in a 1.35x performance gain for both versions of IBM InfoSphere Information Server.
Figure 2 - Upgrading to the latest Intel Processors and latest IBM Information Server concurrently
yielded a 1.50x gain in throughput.
8
4. Overview of IBM Netezza 1000 Series Data Warehouse Appliance
The IBM Netezza 1000 series (http://www.netezza.com/data-warehouse-appliance-products/twinfin.aspx) is
a purpose-built, standards-based data warehouse appliance that architecturally integrates database, server
and storage into a single, easy-to-manage system. It is a high-performance appliance that can serve as an enterprise data warehouse, data mart, edge appliance or a data integration hub. Scaling from 1 terabyte to 1.5
petabytes, IBM Netezza 1000 series data warehouse appliances are designed for rapid analysis of large data
volumes, delivering 10-100x performance improvements at a much lower cost compared to other options
available from traditional database vendors.
With its powerful multicore CPU configuration, the IBM Netezza 1000 series is designed to harness the power
of all the computational cores in the appliance, to offer significant performance and scalability. A versatile
appliance, it is expandable to up to 10 racks, hosting up to 120 S-Blades and 960 CPU and FPGA cores. The
IBM Netezza 1000 series delivers world-class performance, value and simplicity to organizations enabling
them to ask questions and execute complex analytics never before contemplated.
Every IBM Netezza data warehouse appliance is delivered with IBM Netezza Analytics, an embedded software
platform for geospatial and advanced analytics. It provides the technology infrastructure to support enterprise deployments of parallel, in-database analytics. Support for a variety of popular tools and languages as
well as a built-in library of parallelized analytic functions make it simple to move analytic modeling and scoring inside the data warehouse appliance. IBM Netezza Analytics is fully integrated into the IBM Netezza data
warehouse asymmetric massively parallel processing (AMPP) architecture enabling data exploration, modelbuilding, model-diagnostics and scoring with unprecedented speed.
Key product highlights:
•
•
•
•
The IBM Netezza 1000 series data warehouse appliance delivers high-performance out-of-the-box, with
no indexing or tuning required, leading to shorter deployment cycles and faster time to value
The appliance is delivered ready-to-go for immediate data loading and query execution and integrates
with all leading ETL, BI and analytic applications
Powerful platform — brings together high-performance data warehousing, business intelligence and
advanced analytics in one easy to use appliance
Appliance simplicity — easy to deploy and manage; dramatically simplifies organizational data warehouse and analytics infrastructures
9
Figure 3 - IBM Netezza 1000-12 Data Warehouse Appliance Specification
10
5. Benchmarking Performance Results of Load and Unload Operations between IBM InfoSphere Information Server and IBM Netezza 1000 Series
Data Warehouse Appliance
5.1
Test Configuration
The test environment was an IBM InfoSphere Information Server v8.7 2-Tier configuration. The client tier was used
to run just the Information Server client applications. The other Information Server tiers (including Services, Engine, Metadata Repository) were installed on a single Intel® Xeon® E7-4870 Server. IBM Netezza 1000-12 Performance Server was used as the data warehouse. Information Server and the IBM Netezza server were connected
by a 10 Gb Ethernet network. The operating system for the IBM InfoSphere Information Server system was Red
Hat* Enterprise Linux* 5.3, 64-bit version.
The IBM InfoSphere Information Server system was configured with 1TB of DRAM. Most Information Server jobs
do not need this amount of DRAM and our server configuration will perform well with much less DRAM. We used
the DRAM to configure a RAMDISK to source the data that was processed and sent to the IBM Netezza data warehouse appliance. A RAMDISK was used to replace slower hard drive based storage.
Figure 4 – IBM InfoSphere Information Server and IBM Netezza Performance Server system topology
11
Table 1 and Table 2 list the specifics of the platforms tested:
Information Server on Intel® Xeon® E7—4780
OEM
Intel®
CPU Model ID
E7-4870
Platform Name
Boxboro
Sockets
4
Cores per Socket
10
Threads per core
2
CPU Code Name
Westmere-EX
CPU Frequency (GHz)
2.40
QPI GT/s
6.4
Hyperthreading
Enabled
Prefetch Settings
Default
LLC Size (MB)
30
BIOS Version
R24
Memory Installed (GB)
64
DIMM Type
DDR3-1066
DIMM Size (GB)
16
Number of DIMMS
64
NUMA
Enabled
OS
RHEL 5.3 64 bit
Table 1 – Intel® Platform Tested
IBM Netezza 1000-12 Performance Server data warehouse appliance
OEM
IBM
Racks
1
S-Blades
12
CPU Cores
104
OS
RHEL 5.3 64 bit
Software level
NPS 6.0
Table 2 – IBM Netezza Data Warehouse Appliance Platform Tested
5.2
Netezza Connector Load Performance Results
A 202 GB data set was used for each load test. The data was evenly pre-partitioned to equal the number of parallel processing nodes in the Information Server job doing the load. Matching the data set partitions to the processing node counts reduces contention for the I/O subsystem and reduces CPU utilization.
Figure 5 shows test results for loading the dataset using the Information Server to the target table on the IBM
Netezza 1000-12 server over a 10 Gb/s Ethernet connection. As the number of Information Server data partitions is
increased from 1 to 23, the throughput of the job increased from 146 to 2377 GB/hour. CPU monitoring data
showed that the InfoSphere Information Server CPU utilization was only 30% even when the load job was running
at peak throughput. The maximum load rate was achieved with 23 processing nodes which is the rated capacity of
12
the IBM Netezza appliance tested. (Other IBM Netezza appliances are available that provide even higher load
rates.)
Netezza Connector Load Throughput
Throughput
Avg. CPU %
2500
30
2377
25.4
2000
22.2
1500
25
20
GB/h
1161
15
1000
10
597
9.3
500
146
5
5.6
1.6
0
1
CPU Utilization (%)
2212
0
4
8
16
23
Number of DataStage Data Partitions
Figure 5 – Loading data from Information Server engine server to IBM Netezza 1000-12 server
5.3
Netezza Connector Unload Performance Results
Figure 6 shows test results of unloading data from the source table on IBM Netezza 1000-12 server using Information Server over a 10 Gbits/s Ethernet connection. Throughput of the unload job increased from 133 to 2579
GB/hour as we increased number of Information Server data partitions (or “processing nodes”) from 1 to 23. CPU
monitoring data showed that CPU utilization on the InfoSphere Information Server system increased from 1.7% to
33.5% as the degree of parallelism increased.
13
Netezza Connector Unload Throughput
Throughput
Avg. CPU %
40
2579
2500
33.5
GB/h
2000
1838
1500
20
985
15
1000
500
133
10
9.5
5
5.6
1.7
0
1
30
25
22.4
513
35
CPU Utilization (%)
3000
0
4
8
16
23
Number of DataStage Data Partitions
Figure 6 – Unloading data from IBM Netezza 1000-12 server using InfoSphere Information Server
6. Information Server Jobs Used for Testing
To fully test the capabilities of the Netezza Connector stage, we must make sure that job performance is not bottlenecked by system resources, such as CPU, disk or network, on the Information Server Intel® Xeon® server. While
a number of different input methods are available to the Netezza Connector, investigation revealed that a prepartitioned dataset would generate the best load and unload throughput. Using a pre-partitioned dataset significantly reduces CPU overhead and I/O requirements resulting in sufficient remaining resources on the Information
Server Intel® Xeon® server to handle other workloads. Figure 7 shows screen shots of the load and unload jobs.
Figure 7 – Load and Unload Information Server Jobs
14
7. Standard Practices and Guidelines for Effective Performance
In this section, we present guidelines for the development of highly efficient and scalable implementation of integration jobs using the Netezza Connector for IBM InfoSphere Information Server and IBM Netezza data warehouse
appliance. We have included a comprehensive detailed description of the different options available, and descriptions on how to use them to obtain scalable and efficient solutions for your scenarios.
This section is organized as follows:
• How to avoid partial loads
• How to manage Temporary Work Tables (TWTs)
• How to use the Temporary Work Table with user-defined SQL
• How to set up parallel reads
• How to use the action column mode
• Transaction handling
• How to use quoted IDs
7.1
How to Avoid Partial Loads
When loading data into a IBM Netezza table in parallel using Information Server, there are multiple processing
nodes loading data to the same target table in parallel. If one of the processing nodes encounters a problem, it
aborts the job. However, since processing nodes run in parallel there is a possibility that one or more of them
completes their part of the load before the job is terminated. This leads to partial data loads. This is referred to as
the Direct insert mode in the Netezza Connector.
To avoid this problem with loads, the Netezza Connector uses a staging table, referred to as the Temporary Work
Table or TWT. The load of input rows to the target table is done in two phases. (1) First, each processing node
writes its input records to the TWT. (2) Once all processing nodes complete the load phase successfully, the conductor process (one such process per job) executes a single insert statement from the TWT into the target table.
This way the connector either inserts all input records to the target table or none at all.
The other write modes such as Update, Delete, Update then Insert, Delete then Insert, User-defined SQL, and Action column always use TWT.
15
Figure 8 – Option for Direct Insert Mode
7.2
How to Manage Temporary Work Tables (TWTs)
By default the connector automatically creates and drops the TWT as it runs in an InfoSphere Information Server
job. Hence, the user name specified in the connection properties needs to have create and drop table privileges,
which, depending on the security requirements in some deployments, may not be granted.
If the user privileges would violate security policy, the connector can be configured to use a previously created
TWT. This table can be created by an administrator with necessary privileges ahead of time. The TWT must match
the input schema of the connector stage in a job and therefore a separate TWT must be maintained for each table
you intend to load in this manner. Additionally, jobs using Runtime Column Propagation (RCP) might not be able to
use a pre-existing TWT at all, since the link schema may not be known until the job runs.
16
A better way to manage TWTs is to keep them separate from the production tables in a separate database. The
benefits of this approach are:
•
TWTs can be excluded from backup and restore.
•
It is easier to keep track of them, because they are kept in their own place.
•
Privileges can be controlled separately from those of the production database.
•
The connector can still create and drop them automatically.
To enable this feature, set the connector’s Connection property Separate connection for TWT and specify the database name (optionally you can set a different user name and password).
Figure 9 – Use separate connection for TWT
17
TIP: You can let the connector manage the TWT by setting the property Temporary work table mode to Automatic,
or you can specify an existing TWT. If you decide to specify an existing TWT name, ensure that you include the database name. You can use a 2-part or a 3-part table name (e.g. twtdb.schema.table or twtdb..table) as shown in
the picture below.
Figure 10 – Temporary work table mode
18
7.3
How to Use the Temporary Work Table with User-defined SQL
The Netezza Connector can either automatically generate write mode SQL statements or it can accept user-defined
SQL statements. These statements are executed from the conductor process once all processing nodes finish loading the TWT.
The question is how these statements should refer to the TWT. If the TWT was created beforehand or in the Before SQL property, the person designing the job knows its name and can refer to it in User-defined SQL. However, if
the TWT is automatically generated by the connector, its name is not known until the job is run.
The connector provides two place holders (or parameters) that can be used in the User-defined SQL to refer to the
TWT and the target table.
Place Holder
Description
[[twt]]
Temporary work table name.
[[table]]
Target table name as specified in the Table name property.
Table 3 –Place holder or parameters to use in SQL statement
The place holders will be replaced with actual names at run time. Here are some examples on how user-defined
SQL statements can be entered using the place holders:
INSERT INTO [[table]] SELECT * FROM [[twt]]
or
UPDATE [[table]] SET [[table]].col1 = [[twt]].col1
Note that if the place holders are within a quoted string they will not be replaced. Single quotes are used to specify
a text value, so '[[twt]]' would be treated as a value and will not be replaced. Double quotes are used for identifiers, so “[[twt]]” would be treated as a table or column name.
19
7.4
How to Setup Parallel Reads
When creating an Information Server read job, by default the connector will read sequentially, that is, it will run on
only one processing node. To set up the connector to read in parallel, set the connector property Enable partitioned reads to Yes.
Figure 11 – Enable partitioned reads
With this option set, the connector will generate statements that are slightly different on each processing node so
that they read a different subset of rows. The connector uses modulus partitioning against the datasliceid column
(a special IBM Netezza column).
Note: When using a user-defined SQL statement with Enable partitioned reads set to Yes, the connector will execute the same statement on each processing node, which will produce duplicate data as each processing node sees
all the rows.
The connector provides the following place holders or parameters that can be used in the SQL statement to
achieve partitioning:
20
Place holder
Description
Value when running sequentially
[[node-count]]
The total number of processing nodes. This
represents the level of parallelism for the
Netezza Connector stage and is equal to the
number of processing nodes
1
[[node-number]]
Current processing node zero-based index.
For example if there are 4 processing nodes,
the processing nodes will have indexes
0,1,2,3
0
Table 4 –Place holder or parameters to use in SQL statement
Users can use these place holders to create partitioning clauses in their SQL statements. For example:
SELECT * FROM table WHERE mod(datasliceid,[[node-count]])=[[node-number]]
[[node-count]] will receive the same value in all processing nodes, while [[node-number]] will be substituted with a
different value in each processing node.
7.5
How to Use the Action Column Mode
Netezza Connector has a special mode suitable for replication jobs. It is called the Action column mode. In this
mode, each incoming row specifies the type of SQL statement that is to be executed with that row's data. There
are five supported actions: Insert (I), Update (U), Delete (D), Replace (R), and Merge (M). As the name of the mode
suggests there is a special column in the input schema called Action column of type char(1) that can take one of
the 5 action mode identifiers: I, U, D, R, M.
The connector executes three statements in sequence: Delete, Update, and Insert. The first statement deletes the
rows in the target table whose keys match the input rows with action D or R. The second statement updates the
rows in the target table whose keys match the input rows with action U or M. The third statement inserts into the
target table the rows with action I or R. It also inserts those input records with action M whose keys do not match
any records already present in the target table.
The Action column should only be present in the input link schema of the Netezza Connector stage and should not
be in the target table. The name of the column is arbitrary but it should not match any target table columns.
To setup an Action column job:
1)
Set Write mode to Action column.
2)
From the list of your input link columns, select the keys that will be used to match the records.
3)
From the list of qualifying (text type) input link columns, select the action column (a single column).
21
Figure 12 – Action column Write mode
22
Figure 13 – Select keys to match records
23
Figure 14 – Select Action column
24
7.6
Transaction Handling
Before and After SQL statement properties allow for multiple SQL statements to be specified (semicolonseparated). There are two properties that control the transaction scope of these statements and affect the job
behavior if any of the statements fail. These properties are: Fail on error and Atomic mode.
When the nature of the SQL statements in Before and After SQL is such that they are allowed to fail, for example,
dropping a table that may not exist, set Fail on error to No. With this setting, the job will continue even if one or
more statements fail. Additionally you can control if and how the related error message is to be logged using Log
statement errors as property.
If the nature of the SQL statements is such that it would not make sense to continue the execution of the job, set
Fail on error to Yes and the job will be aborted on the first statement failure. SQL statements following the one
that failed will not be executed. Additionally you can set Atomic mode to Yes to make sure that the SQL statements
already executed before the failure are rolled back.
Figure 15 – Fail on error and Atomic mode
25
Note: In Atomic mode, you cannot use any non-transactional SQL statements in this property.
The write modes with more than one statement (Update then insert, Delete then insert, Action column and Userdefined SQL) also have the Atomic mode property. When Atomic mode is set to Yes, all statements are executed in
one transaction (e.g. update and insert statements in Update then insert mode). If any one statement fails everything is rolled back. If Atomic mode is set to No, each is executed in a separate transaction. This for example means
that if Insert part of Update then insert failed, the updates would still be committed.
Figure 16 – SQL Atomic mode property
Note:
• Before SQL, After SQL, and Write mode SQL all have separate Atomic mode properties. The effect is that if a SQL
statement fails in one of these properties, the SQL statements already executed in the other properties will not
26
be rolled back. In other words, the scope of Atomic mode is limited to its parent property (and associated SQL
statements).
• The connector processes all input rows in a single transaction that is committed at the end of the job execution.
This makes the connector most suitable for batch-type jobs that can be invoked periodically (once a day for example) to process and load a large amount of data. This connector is not suitable for 'always-running' jobs, because the target table would never get updated.
27
7.7
How to Use Quoted IDs
IBM Netezza supports two types of identifiers: regular and delimited.
Regular identifiers are case insensitive and are stored in the database in lower or upper-case, default upper-case.
They can contain only letters (in any alphabet, not just the Latin alphabet), syllables (as in the Japanese Hiragana
syllabary), ideographs, decimal digits, underscores, and the dollar sign ($). Regular identifiers must begin with a
letter; they cannot begin with a digit, underscore, or dollar sign. Regular identifiers also cannot be a SQL reserved
word. The encoding in the IBM Netezza catalog is in UTF-8; the encoding for any display will depend on the client.
Delimited identifiers are enclosed in double quotation marks and have special considerations. They are casesensitive. For example, a database named “Sales” is not the same database as one named SALES. The IBM Netezza
system does not convert delimited identifiers to the default system case, nor does it save the enclosing double
quotation marks in the database.
Within the double quotation marks, a delimited identifier can include the same letters, syllables, ideographs,
decimal digits, and underscores as a regular identifier, but it can also include spaces, special characters such as
hyphens and percent signs, and SQL reserved keywords. A delimited identifier can begin with any of these letters,
digits, or symbols. The connector supports delimited identifiers in the table name property and in any user-defined
SQL statements. The connector behavior is also affected by the Enable case-sensitive identifiers property.
28
Figure 17 – Delimited identifiers in table name property
29
The following table depicts the behavior of the connector, given the type of table name and case-sensitivity setting:
Table name property
Case sensitive IDs
Disabled
db.schema.table
•
•
“db.schema.table”
•
•
•
db.schema.”table”
•
•
“db”.”schema”.”table”
•
•
Enabled
db, schema, and table are uppercased when used in system catalog
queries.
db, schema, and table are used as-is
in generated SQL statements.
•
The whole string db.schema.table is
assumed to be a delimited table
name (dots are also considered part
of the table name)
This name is used as-is in system
catalog queries.
This name is used as-is in generated
SQL statements.
•
•
•
•
•
db, schema, and table are used as-is
in the system catalog queries.
db, schema, and table are quoted in
generated SQL statements.
The dot character is still used as a
delimiter and cannot be part of the
ID name
The whole string db.schema.table is
assumed to be a delimited table
name (dots are also considered part
of the table name)
This name is used as-is in system
catalog queries.
This name is used as-is in generated
SQL statements.
db and schema are uppercased when •
used in system catalog queries; table
is used as-is.
db and schema are used as-is in gen- •
erated SQL statements; table is
quoted.
db and schema are used as-is in the
system catalog queries; table is used
as-is.
db, schema, and table are quoted in
generated SQL statements.
db, schema, and table are used as-is •
in system catalog queries.
db, schema, and table quoted in gen- •
erated SQL statements.
db, schema, and table are used as-is
in system catalog queries.
db, schema, and table quoted in generated SQL statements.
Table 5 – Case sensitive IDs and connector behavior
30
7.8
Additional Best Practices for Optimal Performance
Summary of Best Practices for
Optimal Performance
The following is a summary of best practices to optimize performance for data integration with IBM Netezza 1000-12 Data Warehouse Appliance, IBM InfoSphere Information Server, and Intel® Xeon® Servers:
•
IBM Netezza 1000-12 data warehouse appliance – Use Software defaults, no tuning required!
•
IBM InfoSphere Information Server connector for IBM Netezza - Default setting in
product. Use Software defaults, no tuning required!
•
Use high-performance solid-state drives (SSD) or RAM disk if adequate memory is
available for temporary and scratch space allocations for Information Server. Allocate at least 100 GB for each on a server. This can improve performance of
jobs using temporary or scratch space
•
Use Intel 10 GbE NICs and drivers configured with jumbo frames.
®
10 GbE is required to reach the data rates shown here as network
bandwidth exceeds 6 Gbits/s.
Jumbo frames provide single digit percentage improvements in our
tests.
•
Use the standard practices for effective performance as described in Section 7:
avoiding partial loads
managing Temporary Work Tables (TWTs)
using the Temporary Work Table with user-defined SQL
setting up parallel reads
using action column mode
using transaction handling
using quoted IDs
31
8. Conclusion
IBM InfoSphere Information Server can be deployed on powerful mission critical servers like the Intel® Xeon® platforms based on the E7-4870 series to efficiently process data at extremely high data rates. As a result, I/O and
network bandwidth are extremely important for high performance. Network interconnects like 10 Gbit/s Ethernet
or 40 Gbit/s Fiber Channel are necessary to fully realize the computation potential of this powerful combination of
hardware and software.
The performance of IBM InfoSphere Information Server on Intel E7-4870 servers can easily exceed the bandwidth
capability of 1 Gbit/s Ethernet when loading data to the IBM Netezza data warehouse appliance (or processing
data to or from many other sources/sinks). Intel 10 Gbit/s Ethernet adapters provide the necessary bandwidth
and features to optimize performance of the network when moving to 10 GbE. The test configuration shown in
the paper yield load and unload data rates in excess of 2.3 TB/hour between the IBM InfoSphere Information
Server running Intel E7-4870 processors and the IBM Netezza 1000 series data warehouse appliance. This performance level resulted in an average of 6 Gbits/s over the network connection between the two servers. Such
high performance rates might also require the use of larger memory based (or SSD devices) to maintain scratch
and temporary data required by Information Server. A RAMDISK was used to replace slower hard drive based
storage.
Accelerate Time to Value by Using IBM InfoSphere Information Server and IBM
Netezza Data Warehouse Appliance Together
IBM InfoSphere Information Server provides more trustworthy information--information that is comprehensive,
accurate, timely, and in context to the data warehouse. The purpose-built IBM Netezza data warehouse appliance
is designed to deliver new standards of performance, simplicity, and scalability for break-through analytics. IBM
InfoSphere Information Server v8.7 introduces new, faster and feature-rich connectivity to IBM Netezza to accelerate your time to value when deploying a more trustworthy data warehouse solution.
The new Netezza Connector is embedded with many user friendly and useful features based on customer feedback
and industry trends. It is based on using IBM Netezza’s External Table method for bulk loading and extraction. It is
capable of doing high performance extraction through parallel read. It supports multiple load operations, such as
Insert, Update then insert, Delete then insert, Update, Delete, Action column based insert, and custom SQL based
insert. It’s smart enough to allow the use of existing temporary work tables or to automatically create new ones,
plus it allows temporary work tables to be located in a separate database, thus avoiding the possibility of corrupting the target table and misuse of user privileges. It supports multiple table operations such as truncate, drop,
replace, or create with the unique ability to utilize different distribution key options such as user defined, automatic or random. There are additional features that provide end users with even more capabilities, such as the
ability to execute before and after SQL statements from either a single node or multiple nodes, schema reconciliation, and run time column propagation.
The IBM InfoSphere Information Server Balanced Optimization is very unique when compared to other offerings in
the market. It is capable of pushing transformation processing to both source and target databases. This provides
users the ability to optimize the usage of the variety of resources in their IT environment. The data transformation
processing can be done entirely on dedicated servers, or the entire processing can be pushed into IBM Netezza, or
a hybrid model can be used where data transformation is done both on dedicated data transformation servers and
inside of IBM Netezza. The original jobs and the Balanced Optimized jobs are kept in the same repository. This
provides users the capability to design execution logic to choose which job to run depending on the current resource load in their environment.
Optimize Return on Investment with the Powerful Combination of IBM Mission
Critical Software and Intel® Xeon® E7 Mission Critical Processors
32
In summary, IBM InfoSphere Information Server provides enhanced native connectivity to the IBM Netezza data
warehouse appliance through the new IBM InfoSphere Information Server connector for Netezza. It supports IBM
InfoSphere Information Server Balanced Optimization, allowing the choice of where transformations occur, either
in the source database, in the target IBM Netezza database, in the high-performance Information Server engine, or
in any or all of those locations. Combining powerful mission critical servers like the Intel® Xeon® Platforms based
on the E7-4870 series processor running IBM InfoSphere Information Server parallel engine can efficiently process
data to and from the IBM Netezza server at impressive data rates.
9. Acknowledgements
Many thanks to the contributors who have provided help during our benchmark effort, their valuable input, and
reviews to this document:
• Jantz Tran (Sr. Staff Software and Hardware Engineer, Intel)
• Yeh-Heng Sheng (Software Architect, Balance Job Optimization, ISAS/Information Server, IBM)
• Gary Faircloth (User Technology Engineer: User Assistance Design, Writing, and Production Infrastructure,
IBM)
33
10.
About the Authors
Garrett Drysdale is a Software and Hardware Performance Engineer for Intel. Garrett has
analyzed and optimized software on Intel® platforms since 1995 spanning client, workstation, and enterprise server market segments. Garrett currently works with Cloud and
Data Center software developers to analyze and optimize Cloud apps on Intel servers, and
with internal design teams to assist in evaluating the impact of new technologies on software performance for future Intel® platforms. Garrett has a BSEE from University of Missouri-Rolla and a MSEE from The Georgia Institute of Technology. His email is
[email protected].
Dr. Sriram Padmanabhan is an IBM Distinguished Engineer, and Chief Architect for IBM
InfoSphere Servers. Most recently, he had led the Information Management Advanced
Technologies team investigating new technical areas such as the impact of Web 2.0 information access and delivery. He was a Research Staff Member and then a manager of the
Database Technology group at IBM T.J. Watson Research Center for several years. He was a
key technologist for DB2’s shared-nothing parallel database feature and one of the originators of DB2’s multi-dimensional clustering feature. He was also a chief architect for Data
Warehouse Edition which provides integrated warehousing and business intelligence capabilities enhancing DB2. Dr. Padmanabhan has authored more than 25 publications including a book chapter on DB2 in a popular database text book, several journal articles,
and many papers in leading database conferences. His email is [email protected].
Branislav Barnak is a Software Architect at IBM. He works on IBM InfoSphere DataStage
connectivity and is focusing on IBM technologies such as DB2, WebSphere MQ, mainframe
data storage systems, and others. Branislav has been involved with the development of
connectivity technologies for more than 10 years. He has worked on several connectivity
frameworks interfacing with database and messaging products such as Oracle, PeopleSoft,
Teradata, DB2, WebSphere MQ, and others. He holds a BS degree in Electrical Engineering
from the University of Belgrade, Serbia. His email address is [email protected].
Brian Caufield is a Software Architect for IBM InfoSphere Information Server responsible
for the definition and design of new IBM InfoSphere Information Server features, and also
works with the Information Server Performance Team. Brian represents IBM at the TPC,
working to define an industry standard benchmark for data integration. Previously, Brian
worked for 10 years as a developer on IBM InfoSphere Information Server specializing in
the parallel engine. His email is [email protected].
As Product Manager for IBM InfoSphere DataStage, Tony Curcio works with organizations
who are leveraging IBM’s scalable data integration platform to enable their Trusted Information infra-structure. In this role, he is responsible for aligning product development
direction to best fit the needs of IBM customers and industry trends. Tony also serves as a
focal point for product synergy on joint InfoSphere and Cognos technologies. His email is
[email protected]
34
Jon Deng is the Product Manager for IBM Information Server Connectivity where he works
with partners and vendors to integrate their products with Information Server. Jon has
over 15 years of experience in enterprise Software, Hardware and Industry solutions. Key
areas of expertise include Information Integration, BIG Data, Data Warehousing, Business
Intelligence, Real Time Analytics, Cloud computing, Web2.0/Mashups, and Production
Planning, Scheduling and Sequencing methodologies. His email is [email protected]
David Qiang Li is a Software Performance Engineer who specializes in software performance, stress and load testing for past 12 years. Before joining IBM InfoSphere product
group, David worked at IBM Rational Performance Engineering team, and played a key role
in improving quality and performance of Rational ClearCase, Rational ClearQuest and Rational Team Concert release over release. David was speaker at Rational Software Development Conference in 2007 and 2009, and IBM Information On Demand Conference in
2011. Based in Littleton MA, David frequently attends lectures and conferences in the IT
field at Harvard and MIT. His email is [email protected].
Lu Liang has many years of experience on System Test and Performance. Currently, Liang is
a software engineer in InfoSphere Information Server Performance Team. He has published many technical articles for Information Server in China's developWorks. Besides
being certified as an advanced DB2 administrator, he is also familiar with other relational
database, acquainted with J2EE and Web technology. He always keeps a keen eye on Information Management, new technology and open source projects. His email is
[email protected].
Mi Wan Shum is the manager of the IBM InfoSphere Information Server performance team
at the IBM Silicon Valley Lab. She graduated from University of Texas at Austin and she has
years of software development experience in IBM. Her email is [email protected]
John Skier is the Director of IBM's Netezza partner integration team where he worked with
partners to integrate their product offerings optimally with IBM Netezza's data warehouse
appliances using set base SQL processing techniques. He is a joint patent holder for Informix SQL engine technology and has been a member of the TPC and SQL standards
committees. He's spent his career developing, implementing and delivering high performance standards based solutions for data warehousing and on-line transaction processing
systems in both commercial and government sectors using SQL technology. His email is
[email protected].
Samuel Wong is a member of the IBM InfoSphere InfoSphere Information Server performance team at the IBM Silicon Valley Lab. He graduated from University of Toronto and he
has over 10 years of software development experience with IBM. His email is
[email protected]
35
11.Intel Legal Disclaimer
Performance tests and ratings are measured using specific computer systems and / or components and reflect the
approximate performance of Intel products as measured by those tests. Any difference in system hardware or
software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/.
THIS DOCUMENT AND RELATED MATERIALS AND INFORMATION ARE PROVIDED "AS IS" WITH NO WARRANTIES,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT OF INTELLECTUAL PROPERTY RIGHTS, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION, OR SAMPLE. INTEL ASSUMES NO RESPONSIBILITY FOR ANY ERRORS CONTAINED IN THIS DOCUMENT AND HAS NO LIABILITIES OR OBLIGATIONS FOR ANY
DAMAGES ARISING FROM OR IN CONNECTION WITH THE USE OF THIS DOCUMENT.
All products, product descriptions, plans, dates, and figures are preliminary based on current expectations and
subject to change without notice. Availability may vary in different channels.
*Other names and brands may be claimed as the property of others.
© 2012, Intel Corporation. All rights reserved. Intel, the Intel logo, Core, Itanium, NetBurst, Pentium, and VTune
are trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that
are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and
other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the
applicable product User and Reference Guides for more information regarding the specific instruction sets covered
by this notice.
Notice revision #20110804
36
12.IBM Legal Disclaimer
THE INFORMATION CONTAINED IN THIS DOCUMENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS DOCUMENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT
PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR
OTHERWISE RELATED TO, THIS OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS DOCUMENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT
OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS
OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR
LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.
The results reported in this document were achieved under controlled lab conditions that represent an optimal test case scenario. IBM does not guarantee these results and individual results will vary.
IBM, the IBM logo, ibm.com, InfoSphere, Information Server, and DataStage are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or
both. If these and other IBM trademarked terms are marked on their first occurrence in this information
with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks
owned by IBM at the time this information was published. Such trademarks may also be registered or
common law trademarks in other countries. A current list of IBM trademarks is available on the Web at
“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
37