White Paper Best Practices and Performance Guidelines for IBM InfoSphere® Information Server
by user
Comments
Transcript
White Paper Best Practices and Performance Guidelines for IBM InfoSphere® Information Server
White Paper Best Practices and Performance Guidelines for IBM InfoSphere® Information Server running on Intel® Xeon® Servers providing connectivity to IBM Netezza® Data Warehouse Servers Date: March 21, 2012 Authors: Garrett Drysdale, Intel® Corporation Sriram Padmanabhan, IBM Branislav Barnak, IBM Brian Caufield, IBM Tony Curcio, IBM Jon Deng, IBM David Li, IBM Liang Lu, IBM Mi Wan Shum, IBM John Skier, IBM Netezza Samuel Wong, IBM 1 Table of Contents Best Practices and Performance Guidelines for IBM InfoSphere® Information Server running on Intel® Xeon® Servers providing connectivity to IBM Netezza® Data Warehouse Servers ......................................................... 1 1. Introduction...................................................................................................................................... 3 2. Overview of IBM InfoSphere Information Server .................................................................................... 3 2.1 IBM InfoSphere Information Server Delivers New, World-class Connectivity to IBM Netezza Data Warehouse Appliances ........................................................................................................................... 4 2.2 New IBM InfoSphere Information Server Connector for IBM Netezza: Helping Customers Achieve Higher Performance and Increased Developer efficiency....................................................................................... 5 2.2.1 Increased Developer Efficiency................................................................................................. 5 2.2.2 Achieve Higher Performance.................................................................................................... 5 2.3 New IBM InfoSphere Information Server Balanced Optimization for IBM Netezza ................................ 5 3. Overview of Intel® Xeon® Series E7 Processors and Intel 10GbE Ethernet Network Adapter ........................ 6 3.1 Performance Scaling When Upgrading IBM InfoSphere Information Server and Intel® Xeon® Servers ..... 8 4. Overview of IBM Netezza 1000 Series Data Warehouse Appliance ........................................................... 9 5. Benchmarking Performance Results of Load and Unload Operations between IBM InfoSphere Information Server and IBM Netezza 1000 Series Data Warehouse Appliance.................................................................... 11 5.1 Test Configuration..................................................................................................................... 11 5.2 Netezza Connector Load Performance Results .............................................................................. 12 5.3 Netezza Connector Unload Performance Results ........................................................................... 13 6. Information Server Jobs Used for Testing............................................................................................ 14 7. Standard Practices and Guidelines for Effective Performance ................................................................ 15 7.1 How to Avoid Partial Loads ......................................................................................................... 15 7.2 How to Manage Temporary Work Tables (TWTs) ........................................................................... 16 7.3 How to Use the Temporary Work Table with User-defined SQL ....................................................... 19 7.4 How to Setup Parallel Reads ....................................................................................................... 20 7.5 How to Use the Action Column Mode .......................................................................................... 21 7.6 Transaction Handling ................................................................................................................. 25 7.7 How to Use Quoted IDs.............................................................................................................. 28 7.8 Additional Best Practices for Optimal Performance........................................................................ 31 8. Conclusion...................................................................................................................................... 32 9. Acknowledgements ......................................................................................................................... 33 10. About the Authors..................................................................................................................... 34 11. Intel Legal Disclaimer................................................................................................................. 36 12. IBM Legal Disclaimer ................................................................................................................. 37 2 1. Introduction IBM InfoSphere® Information Server, Intel® with the latest Intel® Xeon® 4 and 8 socket capable Xeon® E7 series processors and IBM Netezza® Performance Server data warehouse appliance work together to deliver breakthrough analytics based on more trustworthy information. The IBM Netezza data warehousing appliance provides high-performance analytics to businesses. Since it is an appliance, it is easy to deploy and manage. IBM InfoSphere Information Server helps ensure that information loaded into the IBM Netezza data warehouse appliance is comprehensive, accurate, timely and in context. This allows clients to rapidly drive innovation and improve business results. IBM InfoSphere Information Server is hosted on Intel® Xeon® E7 series processors which provide a highly scalable and reliable computing platform aimed at mission critical environments. The purpose of this white paper is to help you become familiar with the new and comprehensive integration that is now available between IBM InfoSphere Information Server and IBM Netezza data warehouse appliance. We summarize exciting results of internal benchmarking tests focused on Load and Unload performance when using the new InfoSphere Information Server connector for IBM Netezza. This paper is one of a series of white papers intended to provide IBM InfoSphere Information Server customers with helpful performance tuning guidelines for deployment on Intel® Xeon® processor based platforms. We tested connectivity between IBM InfoSphere Information Server running on Intel® Xeon® E7-4870 processors in a 4 socket configuration and an IBM Netezza 1000-12 data warehouse appliance. The goal was to optimize performance of the data transfer and load capabilities between these two platforms, which are critical for business intelligence. Part of this work involved the creation and optimization of the new Information Server connector for IBM Netezza. We show how to best utilize the new Netezza Connector to optimize performance between IBM Netezza Appliances and Information Server. We also share many of the important new features provided by this connector. We demonstrate over 2.3 TB/hour of load and unload performance between Information Server running on a four socket server using Intel® Xeon® E7 processors and an IBM Netezza 1000-12 data warehouse appliance running version 6.0.2 of the IBM Netezza Appliance software. To attain this performance, Intel® 10 Gbit/s Ethernet Adapters are utilized with over 6 Gbits/s of bandwidth utilization. The IBM Netezza 1000 series is rated to attain in excess of 2TB/hour of load and unload performance which we were able to achieve. While over 2TB/hour of load and unload performance is impressive, it is only the beginning in our quest of tremendous load and unload performance. We are confident that Information Server to IBM Netezza performance will increase substantially when anticipated enhancements are available. This is due to the substantial network bandwidth and CPU utilization headroom available on the Intel® Xeon® E7 servers in the current test configurations. In addition, Information Server running on Intel® Xeon® E7 processor based systems allows scaling out to multiple systems as demand on processing capacity increases. This allows performance capability to be increased as customer requirements grow. 2. Overview of IBM InfoSphere Information Server IBM InfoSphere Information Server offers data integration through extract, transform, and load (ETL) capabilities. It provides a designer tool that allows developers to visually create integration jobs. A job is used within IBM InfoSphere Information Server to describe ETL tasks. Jobs are composed from a rich palette of reusable functions called stages. These stages include: • Source and target access for databases, data warehouse appliances, applications and files • General processing stages such as filter, sort, join, union, lookup and aggregations • Built-in and custom transformations 3 • Copy, move, FTP and other data movement stages • Real-time, XML, SOA and message queue processing • Stages for accessing Big Data, running BRMS rules, and integrating with custom built Java components Additionally, IBM InfoSphere Information Server allows pre-conditions and post-conditions to be applied to the stages. Multiple jobs can be controlled and linked by a sequencer. The sequencer provides the control logic that can be used to process the appropriate data integration jobs. IBM InfoSphere Information Server also supports a rich administration capability for deploying, scheduling and monitoring jobs. One of the great strengths of IBM InfoSphere Information Server is that very little consideration to the underlying structure of the system is necessary to design integration jobs. In most cases, if the system changes, is upgraded or improved, or if a job is developed on one platform and implemented on another, the job design will not require changes. IBM InfoSphere Information Server has the capability to learn about the shape and size of the system from the configuration file. Further, it has the capability to organize the resources needed for a job according to what is defined in the configuration file. When a system changes, the configuration file is changed, not the jobs. A configuration file defines the number of data partitions (or “processing nodes”) with which the job will run. The processing nodes are logical rather than physical. The following are factors that affect the optimal degree of parallelism: • CPU-intensive applications, which typically perform multiple CPU-demanding operations on each record, benefit from the greatest possible parallelism up to the capacity supported by a given system. • Jobs with large memory requirements can benefit from parallelism if they act on data that has been partitioned and if the required memory is also divided among partitions. • Applications that make intensive use of disk or other I/O resources, such as those that extract data from and load data into databases, benefit from configurations in which the number of logical nodes equals the number of I/O paths being accessed. Another great strength of IBM InfoSphere Information Server is that it does not rely on the functions and processes of a database to perform transformations to scale and optimize processing. While it can generate complex SQL queries and leverage database processing capability, it has been designed from the ground up as a multipath data integration engine equally at home with files, streams, databases, and internal caching in single-machine, cluster, and grid implementations. As a result, customers in many circumstances find they do not also need to invest in staging databases to support scaling their data integration environment. For additional information on Information Server jobs and stages, please see IBM InfoSphere Information Server Version 8.7 product documentation or IBM InfoSphere DataStage and QualityStage 8.7 Information Center. . 2.1 IBM InfoSphere Information Server Delivers New, World-class Connectivity to IBM Netezza Data Warehouse Appliances The new IBM InfoSphere Information Server connector for IBM Netezza, which is delivered for the Information Server 8.5 and 8.7 releases, helps you realize faster time to value with IBM Netezza products running releases 4.6.x and 6.x. The new connector provides the latest native connectivity features for parallel reading and writing of data, and delivers capabilities to push various levels of transformation work into the IBM Netezza appliance. These new features allow users to achieve higher performance and increase developer efficiency without requiring complex custom coding. 4 2.2 New IBM InfoSphere Information Server Connector for IBM Netezza: Helping Customers Achieve Higher Performance and Increased Developer efficiency 2.2.1 Increased Developer Efficiency The new, next-generation IBM InfoSphere Information Server connector for IBM Netezza provides a rich set of outof-the-box features. The connector selects the best-practice connectivity options by default, which generally allows it to optimize performance. To this end, the connector is able to: • Auto create a Temporary Work Table (TWT) • Auto generate DDL and load SQL • Use a TWT table as a default instead of direct insert • Use a random distribution key when creating a table • Use Action columns for propagating updates When a developer wants to customize their data integration choices, the Netezza Connector provides those customization controls through easy-to-use check-box and drop-down configuration options. The IBM InfoSphere Information Server connector for IBM Netezza also offers many of the automation features that are common across the other Information Server connector objects. These options allow developers to solve complex problems with simple choice selections. These options include: • Multi-action modes that automatically attempt a secondary operation, including “Update then insert”, “Insert then Update” and “Delete then insert”, if the first action results in an error. • Generated SQL for both the table actions as well as for creating the database objects based on the data fields being passed through the data integration job. • Partitioned Reads through scaling out multiple read requests and automatically merging those results. 2.2.2 Achieve Higher Performance IBM InfoSphere Information Server and IBM Netezza data warehouse appliances are both massively scalable technologies and scale at impressive rates. Internal benchmarks of these technologies working together demonstrate loading speeds in excess of 2 TB/hour, which allows organizations to keep up with incredibly demanding data volumes. These new speeds for loading and unloading data to and from the IBM Netezza data warehouse appliance are a 10 – 15% increase in performance over the previous version of InfoSphere Information Server. In many data integration patterns, it is common to have data that is already in the database that needs to be transformed. A typical example is when a user needs to aggregate a detail table into a summarized set. In this case, it is often most efficient to allow the database to perform these transformation tasks. For this reason it is important that a data integration technology can support multiple processing styles, both ETL (running transformations in a scalable engine) as well as ELT (running the transformations on the database itself). 2.3 New IBM InfoSphere Information Server Balanced Optimization for IBM Netezza Data integration employs various runtime paradigms for optimizing performance. Combining both ETL and ELT styles of processing allows organizations to apply the best solution for any given problem. In order to achieve these benefits with IBM Netezza data warehouse appliances, Information Server 8.7 has extended the balanced optimization capabilities to be able to run data transformations on the appliance. IBM InfoSphere Information Server Balanced Optimization facilitates the creation of scalable, flexible, and optimized data integration architectures that leverage investments in parallel databases while increasing the efficiency of integration developers. It enables users to express the logic of their data integration processes in current natural 5 flow-oriented Information Server conventions and automatically optimize their designs to achieve the benefits of extract, load, transform and SQL pushdown. Balanced Optimization provides the flexibility to leverage the most effective processing capabilities available for faster integration and delivery of information, regardless of whether the processing takes place in the IBM Netezza database or the Information Server transformation engine. The new connector for IBM Netezza works with the Balanced Optimization technology to provide capabilities that: • Design the job logic in the common Information Server paradigm and automate which pieces of job logic can be performed on the database • Significantly reduce the amount of data movement across the network by allowing the database to perform data reduction tasks • Maximize optimization where source and target databases are homogenous • Balance the processing usage on IBM Netezza with transformations on the Information Server engine for the unique transformations and quality elements it provides – all without ever landing data to disk and while maintaining parallel processing 3. Overview of Intel® Xeon® Series E7 Processors and Intel 10GbE Ethernet Network Adapter Servers using the Intel® Xeon® E7 series processor can deliver dramatic increases in performance and scalability when compared to previous generation servers. The chipset includes new embedded technologies that give professionals in business, information management, creative, and scientific fields the tools to solve problems faster, process larger data sets, and address bigger challenges. With intelligent performance, a new high-bandwidth interconnect architecture, and greater memory capacity, platforms based on the Intel® Xeon® series E7 processor are ideal for demanding workloads. A standard four-socket server provides up to 40 processor cores, 80 execution threads and up to two terabytes of memory. Eight-socket and larger systems are in development by leading system vendors. The Intel® Xeon® series E7 processor also includes extensive reliability, availability and serviceability (RAS) features that improve data integrity and uptime. One of the most important is Intel® Machine Check Architecture Recovery, which allows the operating system to take corrective action and continue running when uncorrected errors are detected. These highly scalable servers can be used to support enormous user populations. Server platforms based on the Intel® Xeon® series E7 processor deliver a number of additional features that help to improve performance, scalability and energy-efficiency. • Next-generation Intel® Virtualization Technology (Intel® VT) provides extensive hardware assists in processors, chipsets and I/O devices to enable fast application performance in virtual machines, including near-native I/O performance. Intel® VT also supports live virtual machine migration among current and future Intel® Xeon® processor-based servers, so businesses maintain a common pool of virtualized resources as they add new servers. • Intel® QuickPath Interconnect Technology provides point-to-point links to distributed shared memory. The Intel® Xeon® E7 series processors with QPI feature two integrated memory controllers with and 3 QPI links to deliver scalable interconnect bandwidth, outstanding memory performance and flexibility and tightly integrated interconnect RAS features. Technical articles on QPI can be found at http://www.intel.com/technology/quickpath/. • Intel® Turbo Boost Technology boosts performance when it’s needed most by dynamically increasing core frequencies beyond rated values for peak workloads. • Intel® Intelligent Power Technology adjusts core frequencies to conserve power when demand is lower. • Intel® Hyper-Threading Technology can improve throughput and reduce latency for multithreaded applications and for multiple workloads running concurrently in virtualized environments. 6 • New instruction support for AES encryption and decryption providing significant increases in performance of these critical algorithms. For additional information on the Intel® Xeon® Series E7 Processor for mission critical applications, please see http://www.intel.com/content/www/us/en/processors/xeon/xeon-processor-e7family.html?iid=subhdr%20itc_xeon. Intel® 7500 Chipset Xeon® E7 Xeon® E7 Xeon® E7 Xeon® E7 Memory Intel® Scalable Memory Buffer PCI Express* 2.0 Figure 1 - Intel® Xeon® E7 four socket system block diagram. Many integration solutions for IBM Netezza data warehouse appliances such as the one described in this paper produce high network bandwidth requirements exceeding ~6 Gbits/s. For this reason, Intel 10 GbE Ethernet Server Adapters are used for these system topologies. 10 Gigabit Ethernet has moved past the early adoption stage and is rapidly becoming main stream for backbones within enterprise, data center and service provider networks. The escalating deployments of servers with multi-core processors and demanding applications such as High Performance Computing (HPC), database clusters, business intelligence and business analytics are driving the need for 10 Gigabit connections. Intel® 10 Gigabit Ethernet controller Adapters are designed to meet the throughput and latency requirements of bandwidth-hungry applications, while offering a very low power envelope for energy efficiency. Intel® 10 Gigabit Server Adapters are available in single and dual port configurations and provide a simplified alternative to multiple 1 Gbps server adapters. Intel® 10 GbE adapters provide load balancing across multiple CPUs, multiple receive and transmit queues. Virtualization performance is enhanced through Intel® VMDq technology which provides virtual machine device queues to significantly reduce virtualization overhead. Intel 10 Gigabit XF Server Adapters are optimized for virtualized environments, supporting multiple queues, alleviating I/O bottlenecks between virtual machines. Virtual Machine Device queue1 (VMDq) technology offloads data sorting and data copying from the virtual machine monitor (VMM) software layer to the hardware, improving overall throughput and CPU utilization on virtualized servers. 7 Additionally, Intel 10 Gigabit XF Server Adapters enable Intel® I/O Acceleration Technology (Intel® I/OAT) with support for Intel® QuickData for faster I/O processing on the new Quad-Core and Dual-Core Intel® Xeon® processorbased servers. The tests shown in this paper are performed with native operating system configurations. However, Intel E7 processors and platforms support a wide range of virtualization features to provide great performance for virtualized operating environments. 3.1 Performance Scaling When Upgrading IBM InfoSphere Information Server and Intel® Xeon® Servers There are significant benefits of upgrading hardware and software concurrently. Figure 2 shows that upgrading both hardware and software together resulted in a 1.50x performance increase in throughput. The horizontal axis has IBM InfoSphere Information Server v8.1 on the left and v8.5 on the right. The vertical axis has the older Intel® Xeon® X7560 (code name Nehalem-EX, 2.26 GHz, 8 cores per socket, 16 threads per socket, 24 MB Last level cache) on the bottom, and the Intel® Xeon® E7-4870 (code name Westmere EX, 2.4 GHz, 10 cores per socket, 20 threads per socket, 30 MB last level cache) on the top. Upgrading from v8.1 to v8.5 of IBM InfoSphere Information Server on either processor resulted in a 1.11x performance gain. Upgrading the hardware from the Intel® Xeon® X7560 processor system to the Intel® Xeon® E7-4870 processor system resulted in a 1.35x performance gain for both versions of IBM InfoSphere Information Server. Figure 2 - Upgrading to the latest Intel Processors and latest IBM Information Server concurrently yielded a 1.50x gain in throughput. 8 4. Overview of IBM Netezza 1000 Series Data Warehouse Appliance The IBM Netezza 1000 series (http://www.netezza.com/data-warehouse-appliance-products/twinfin.aspx) is a purpose-built, standards-based data warehouse appliance that architecturally integrates database, server and storage into a single, easy-to-manage system. It is a high-performance appliance that can serve as an enterprise data warehouse, data mart, edge appliance or a data integration hub. Scaling from 1 terabyte to 1.5 petabytes, IBM Netezza 1000 series data warehouse appliances are designed for rapid analysis of large data volumes, delivering 10-100x performance improvements at a much lower cost compared to other options available from traditional database vendors. With its powerful multicore CPU configuration, the IBM Netezza 1000 series is designed to harness the power of all the computational cores in the appliance, to offer significant performance and scalability. A versatile appliance, it is expandable to up to 10 racks, hosting up to 120 S-Blades and 960 CPU and FPGA cores. The IBM Netezza 1000 series delivers world-class performance, value and simplicity to organizations enabling them to ask questions and execute complex analytics never before contemplated. Every IBM Netezza data warehouse appliance is delivered with IBM Netezza Analytics, an embedded software platform for geospatial and advanced analytics. It provides the technology infrastructure to support enterprise deployments of parallel, in-database analytics. Support for a variety of popular tools and languages as well as a built-in library of parallelized analytic functions make it simple to move analytic modeling and scoring inside the data warehouse appliance. IBM Netezza Analytics is fully integrated into the IBM Netezza data warehouse asymmetric massively parallel processing (AMPP) architecture enabling data exploration, modelbuilding, model-diagnostics and scoring with unprecedented speed. Key product highlights: • • • • The IBM Netezza 1000 series data warehouse appliance delivers high-performance out-of-the-box, with no indexing or tuning required, leading to shorter deployment cycles and faster time to value The appliance is delivered ready-to-go for immediate data loading and query execution and integrates with all leading ETL, BI and analytic applications Powerful platform — brings together high-performance data warehousing, business intelligence and advanced analytics in one easy to use appliance Appliance simplicity — easy to deploy and manage; dramatically simplifies organizational data warehouse and analytics infrastructures 9 Figure 3 - IBM Netezza 1000-12 Data Warehouse Appliance Specification 10 5. Benchmarking Performance Results of Load and Unload Operations between IBM InfoSphere Information Server and IBM Netezza 1000 Series Data Warehouse Appliance 5.1 Test Configuration The test environment was an IBM InfoSphere Information Server v8.7 2-Tier configuration. The client tier was used to run just the Information Server client applications. The other Information Server tiers (including Services, Engine, Metadata Repository) were installed on a single Intel® Xeon® E7-4870 Server. IBM Netezza 1000-12 Performance Server was used as the data warehouse. Information Server and the IBM Netezza server were connected by a 10 Gb Ethernet network. The operating system for the IBM InfoSphere Information Server system was Red Hat* Enterprise Linux* 5.3, 64-bit version. The IBM InfoSphere Information Server system was configured with 1TB of DRAM. Most Information Server jobs do not need this amount of DRAM and our server configuration will perform well with much less DRAM. We used the DRAM to configure a RAMDISK to source the data that was processed and sent to the IBM Netezza data warehouse appliance. A RAMDISK was used to replace slower hard drive based storage. Figure 4 – IBM InfoSphere Information Server and IBM Netezza Performance Server system topology 11 Table 1 and Table 2 list the specifics of the platforms tested: Information Server on Intel® Xeon® E7—4780 OEM Intel® CPU Model ID E7-4870 Platform Name Boxboro Sockets 4 Cores per Socket 10 Threads per core 2 CPU Code Name Westmere-EX CPU Frequency (GHz) 2.40 QPI GT/s 6.4 Hyperthreading Enabled Prefetch Settings Default LLC Size (MB) 30 BIOS Version R24 Memory Installed (GB) 64 DIMM Type DDR3-1066 DIMM Size (GB) 16 Number of DIMMS 64 NUMA Enabled OS RHEL 5.3 64 bit Table 1 – Intel® Platform Tested IBM Netezza 1000-12 Performance Server data warehouse appliance OEM IBM Racks 1 S-Blades 12 CPU Cores 104 OS RHEL 5.3 64 bit Software level NPS 6.0 Table 2 – IBM Netezza Data Warehouse Appliance Platform Tested 5.2 Netezza Connector Load Performance Results A 202 GB data set was used for each load test. The data was evenly pre-partitioned to equal the number of parallel processing nodes in the Information Server job doing the load. Matching the data set partitions to the processing node counts reduces contention for the I/O subsystem and reduces CPU utilization. Figure 5 shows test results for loading the dataset using the Information Server to the target table on the IBM Netezza 1000-12 server over a 10 Gb/s Ethernet connection. As the number of Information Server data partitions is increased from 1 to 23, the throughput of the job increased from 146 to 2377 GB/hour. CPU monitoring data showed that the InfoSphere Information Server CPU utilization was only 30% even when the load job was running at peak throughput. The maximum load rate was achieved with 23 processing nodes which is the rated capacity of 12 the IBM Netezza appliance tested. (Other IBM Netezza appliances are available that provide even higher load rates.) Netezza Connector Load Throughput Throughput Avg. CPU % 2500 30 2377 25.4 2000 22.2 1500 25 20 GB/h 1161 15 1000 10 597 9.3 500 146 5 5.6 1.6 0 1 CPU Utilization (%) 2212 0 4 8 16 23 Number of DataStage Data Partitions Figure 5 – Loading data from Information Server engine server to IBM Netezza 1000-12 server 5.3 Netezza Connector Unload Performance Results Figure 6 shows test results of unloading data from the source table on IBM Netezza 1000-12 server using Information Server over a 10 Gbits/s Ethernet connection. Throughput of the unload job increased from 133 to 2579 GB/hour as we increased number of Information Server data partitions (or “processing nodes”) from 1 to 23. CPU monitoring data showed that CPU utilization on the InfoSphere Information Server system increased from 1.7% to 33.5% as the degree of parallelism increased. 13 Netezza Connector Unload Throughput Throughput Avg. CPU % 40 2579 2500 33.5 GB/h 2000 1838 1500 20 985 15 1000 500 133 10 9.5 5 5.6 1.7 0 1 30 25 22.4 513 35 CPU Utilization (%) 3000 0 4 8 16 23 Number of DataStage Data Partitions Figure 6 – Unloading data from IBM Netezza 1000-12 server using InfoSphere Information Server 6. Information Server Jobs Used for Testing To fully test the capabilities of the Netezza Connector stage, we must make sure that job performance is not bottlenecked by system resources, such as CPU, disk or network, on the Information Server Intel® Xeon® server. While a number of different input methods are available to the Netezza Connector, investigation revealed that a prepartitioned dataset would generate the best load and unload throughput. Using a pre-partitioned dataset significantly reduces CPU overhead and I/O requirements resulting in sufficient remaining resources on the Information Server Intel® Xeon® server to handle other workloads. Figure 7 shows screen shots of the load and unload jobs. Figure 7 – Load and Unload Information Server Jobs 14 7. Standard Practices and Guidelines for Effective Performance In this section, we present guidelines for the development of highly efficient and scalable implementation of integration jobs using the Netezza Connector for IBM InfoSphere Information Server and IBM Netezza data warehouse appliance. We have included a comprehensive detailed description of the different options available, and descriptions on how to use them to obtain scalable and efficient solutions for your scenarios. This section is organized as follows: • How to avoid partial loads • How to manage Temporary Work Tables (TWTs) • How to use the Temporary Work Table with user-defined SQL • How to set up parallel reads • How to use the action column mode • Transaction handling • How to use quoted IDs 7.1 How to Avoid Partial Loads When loading data into a IBM Netezza table in parallel using Information Server, there are multiple processing nodes loading data to the same target table in parallel. If one of the processing nodes encounters a problem, it aborts the job. However, since processing nodes run in parallel there is a possibility that one or more of them completes their part of the load before the job is terminated. This leads to partial data loads. This is referred to as the Direct insert mode in the Netezza Connector. To avoid this problem with loads, the Netezza Connector uses a staging table, referred to as the Temporary Work Table or TWT. The load of input rows to the target table is done in two phases. (1) First, each processing node writes its input records to the TWT. (2) Once all processing nodes complete the load phase successfully, the conductor process (one such process per job) executes a single insert statement from the TWT into the target table. This way the connector either inserts all input records to the target table or none at all. The other write modes such as Update, Delete, Update then Insert, Delete then Insert, User-defined SQL, and Action column always use TWT. 15 Figure 8 – Option for Direct Insert Mode 7.2 How to Manage Temporary Work Tables (TWTs) By default the connector automatically creates and drops the TWT as it runs in an InfoSphere Information Server job. Hence, the user name specified in the connection properties needs to have create and drop table privileges, which, depending on the security requirements in some deployments, may not be granted. If the user privileges would violate security policy, the connector can be configured to use a previously created TWT. This table can be created by an administrator with necessary privileges ahead of time. The TWT must match the input schema of the connector stage in a job and therefore a separate TWT must be maintained for each table you intend to load in this manner. Additionally, jobs using Runtime Column Propagation (RCP) might not be able to use a pre-existing TWT at all, since the link schema may not be known until the job runs. 16 A better way to manage TWTs is to keep them separate from the production tables in a separate database. The benefits of this approach are: • TWTs can be excluded from backup and restore. • It is easier to keep track of them, because they are kept in their own place. • Privileges can be controlled separately from those of the production database. • The connector can still create and drop them automatically. To enable this feature, set the connector’s Connection property Separate connection for TWT and specify the database name (optionally you can set a different user name and password). Figure 9 – Use separate connection for TWT 17 TIP: You can let the connector manage the TWT by setting the property Temporary work table mode to Automatic, or you can specify an existing TWT. If you decide to specify an existing TWT name, ensure that you include the database name. You can use a 2-part or a 3-part table name (e.g. twtdb.schema.table or twtdb..table) as shown in the picture below. Figure 10 – Temporary work table mode 18 7.3 How to Use the Temporary Work Table with User-defined SQL The Netezza Connector can either automatically generate write mode SQL statements or it can accept user-defined SQL statements. These statements are executed from the conductor process once all processing nodes finish loading the TWT. The question is how these statements should refer to the TWT. If the TWT was created beforehand or in the Before SQL property, the person designing the job knows its name and can refer to it in User-defined SQL. However, if the TWT is automatically generated by the connector, its name is not known until the job is run. The connector provides two place holders (or parameters) that can be used in the User-defined SQL to refer to the TWT and the target table. Place Holder Description [[twt]] Temporary work table name. [[table]] Target table name as specified in the Table name property. Table 3 –Place holder or parameters to use in SQL statement The place holders will be replaced with actual names at run time. Here are some examples on how user-defined SQL statements can be entered using the place holders: INSERT INTO [[table]] SELECT * FROM [[twt]] or UPDATE [[table]] SET [[table]].col1 = [[twt]].col1 Note that if the place holders are within a quoted string they will not be replaced. Single quotes are used to specify a text value, so '[[twt]]' would be treated as a value and will not be replaced. Double quotes are used for identifiers, so “[[twt]]” would be treated as a table or column name. 19 7.4 How to Setup Parallel Reads When creating an Information Server read job, by default the connector will read sequentially, that is, it will run on only one processing node. To set up the connector to read in parallel, set the connector property Enable partitioned reads to Yes. Figure 11 – Enable partitioned reads With this option set, the connector will generate statements that are slightly different on each processing node so that they read a different subset of rows. The connector uses modulus partitioning against the datasliceid column (a special IBM Netezza column). Note: When using a user-defined SQL statement with Enable partitioned reads set to Yes, the connector will execute the same statement on each processing node, which will produce duplicate data as each processing node sees all the rows. The connector provides the following place holders or parameters that can be used in the SQL statement to achieve partitioning: 20 Place holder Description Value when running sequentially [[node-count]] The total number of processing nodes. This represents the level of parallelism for the Netezza Connector stage and is equal to the number of processing nodes 1 [[node-number]] Current processing node zero-based index. For example if there are 4 processing nodes, the processing nodes will have indexes 0,1,2,3 0 Table 4 –Place holder or parameters to use in SQL statement Users can use these place holders to create partitioning clauses in their SQL statements. For example: SELECT * FROM table WHERE mod(datasliceid,[[node-count]])=[[node-number]] [[node-count]] will receive the same value in all processing nodes, while [[node-number]] will be substituted with a different value in each processing node. 7.5 How to Use the Action Column Mode Netezza Connector has a special mode suitable for replication jobs. It is called the Action column mode. In this mode, each incoming row specifies the type of SQL statement that is to be executed with that row's data. There are five supported actions: Insert (I), Update (U), Delete (D), Replace (R), and Merge (M). As the name of the mode suggests there is a special column in the input schema called Action column of type char(1) that can take one of the 5 action mode identifiers: I, U, D, R, M. The connector executes three statements in sequence: Delete, Update, and Insert. The first statement deletes the rows in the target table whose keys match the input rows with action D or R. The second statement updates the rows in the target table whose keys match the input rows with action U or M. The third statement inserts into the target table the rows with action I or R. It also inserts those input records with action M whose keys do not match any records already present in the target table. The Action column should only be present in the input link schema of the Netezza Connector stage and should not be in the target table. The name of the column is arbitrary but it should not match any target table columns. To setup an Action column job: 1) Set Write mode to Action column. 2) From the list of your input link columns, select the keys that will be used to match the records. 3) From the list of qualifying (text type) input link columns, select the action column (a single column). 21 Figure 12 – Action column Write mode 22 Figure 13 – Select keys to match records 23 Figure 14 – Select Action column 24 7.6 Transaction Handling Before and After SQL statement properties allow for multiple SQL statements to be specified (semicolonseparated). There are two properties that control the transaction scope of these statements and affect the job behavior if any of the statements fail. These properties are: Fail on error and Atomic mode. When the nature of the SQL statements in Before and After SQL is such that they are allowed to fail, for example, dropping a table that may not exist, set Fail on error to No. With this setting, the job will continue even if one or more statements fail. Additionally you can control if and how the related error message is to be logged using Log statement errors as property. If the nature of the SQL statements is such that it would not make sense to continue the execution of the job, set Fail on error to Yes and the job will be aborted on the first statement failure. SQL statements following the one that failed will not be executed. Additionally you can set Atomic mode to Yes to make sure that the SQL statements already executed before the failure are rolled back. Figure 15 – Fail on error and Atomic mode 25 Note: In Atomic mode, you cannot use any non-transactional SQL statements in this property. The write modes with more than one statement (Update then insert, Delete then insert, Action column and Userdefined SQL) also have the Atomic mode property. When Atomic mode is set to Yes, all statements are executed in one transaction (e.g. update and insert statements in Update then insert mode). If any one statement fails everything is rolled back. If Atomic mode is set to No, each is executed in a separate transaction. This for example means that if Insert part of Update then insert failed, the updates would still be committed. Figure 16 – SQL Atomic mode property Note: • Before SQL, After SQL, and Write mode SQL all have separate Atomic mode properties. The effect is that if a SQL statement fails in one of these properties, the SQL statements already executed in the other properties will not 26 be rolled back. In other words, the scope of Atomic mode is limited to its parent property (and associated SQL statements). • The connector processes all input rows in a single transaction that is committed at the end of the job execution. This makes the connector most suitable for batch-type jobs that can be invoked periodically (once a day for example) to process and load a large amount of data. This connector is not suitable for 'always-running' jobs, because the target table would never get updated. 27 7.7 How to Use Quoted IDs IBM Netezza supports two types of identifiers: regular and delimited. Regular identifiers are case insensitive and are stored in the database in lower or upper-case, default upper-case. They can contain only letters (in any alphabet, not just the Latin alphabet), syllables (as in the Japanese Hiragana syllabary), ideographs, decimal digits, underscores, and the dollar sign ($). Regular identifiers must begin with a letter; they cannot begin with a digit, underscore, or dollar sign. Regular identifiers also cannot be a SQL reserved word. The encoding in the IBM Netezza catalog is in UTF-8; the encoding for any display will depend on the client. Delimited identifiers are enclosed in double quotation marks and have special considerations. They are casesensitive. For example, a database named “Sales” is not the same database as one named SALES. The IBM Netezza system does not convert delimited identifiers to the default system case, nor does it save the enclosing double quotation marks in the database. Within the double quotation marks, a delimited identifier can include the same letters, syllables, ideographs, decimal digits, and underscores as a regular identifier, but it can also include spaces, special characters such as hyphens and percent signs, and SQL reserved keywords. A delimited identifier can begin with any of these letters, digits, or symbols. The connector supports delimited identifiers in the table name property and in any user-defined SQL statements. The connector behavior is also affected by the Enable case-sensitive identifiers property. 28 Figure 17 – Delimited identifiers in table name property 29 The following table depicts the behavior of the connector, given the type of table name and case-sensitivity setting: Table name property Case sensitive IDs Disabled db.schema.table • • “db.schema.table” • • • db.schema.”table” • • “db”.”schema”.”table” • • Enabled db, schema, and table are uppercased when used in system catalog queries. db, schema, and table are used as-is in generated SQL statements. • The whole string db.schema.table is assumed to be a delimited table name (dots are also considered part of the table name) This name is used as-is in system catalog queries. This name is used as-is in generated SQL statements. • • • • • db, schema, and table are used as-is in the system catalog queries. db, schema, and table are quoted in generated SQL statements. The dot character is still used as a delimiter and cannot be part of the ID name The whole string db.schema.table is assumed to be a delimited table name (dots are also considered part of the table name) This name is used as-is in system catalog queries. This name is used as-is in generated SQL statements. db and schema are uppercased when • used in system catalog queries; table is used as-is. db and schema are used as-is in gen- • erated SQL statements; table is quoted. db and schema are used as-is in the system catalog queries; table is used as-is. db, schema, and table are quoted in generated SQL statements. db, schema, and table are used as-is • in system catalog queries. db, schema, and table quoted in gen- • erated SQL statements. db, schema, and table are used as-is in system catalog queries. db, schema, and table quoted in generated SQL statements. Table 5 – Case sensitive IDs and connector behavior 30 7.8 Additional Best Practices for Optimal Performance Summary of Best Practices for Optimal Performance The following is a summary of best practices to optimize performance for data integration with IBM Netezza 1000-12 Data Warehouse Appliance, IBM InfoSphere Information Server, and Intel® Xeon® Servers: • IBM Netezza 1000-12 data warehouse appliance – Use Software defaults, no tuning required! • IBM InfoSphere Information Server connector for IBM Netezza - Default setting in product. Use Software defaults, no tuning required! • Use high-performance solid-state drives (SSD) or RAM disk if adequate memory is available for temporary and scratch space allocations for Information Server. Allocate at least 100 GB for each on a server. This can improve performance of jobs using temporary or scratch space • Use Intel 10 GbE NICs and drivers configured with jumbo frames. ® 10 GbE is required to reach the data rates shown here as network bandwidth exceeds 6 Gbits/s. Jumbo frames provide single digit percentage improvements in our tests. • Use the standard practices for effective performance as described in Section 7: avoiding partial loads managing Temporary Work Tables (TWTs) using the Temporary Work Table with user-defined SQL setting up parallel reads using action column mode using transaction handling using quoted IDs 31 8. Conclusion IBM InfoSphere Information Server can be deployed on powerful mission critical servers like the Intel® Xeon® platforms based on the E7-4870 series to efficiently process data at extremely high data rates. As a result, I/O and network bandwidth are extremely important for high performance. Network interconnects like 10 Gbit/s Ethernet or 40 Gbit/s Fiber Channel are necessary to fully realize the computation potential of this powerful combination of hardware and software. The performance of IBM InfoSphere Information Server on Intel E7-4870 servers can easily exceed the bandwidth capability of 1 Gbit/s Ethernet when loading data to the IBM Netezza data warehouse appliance (or processing data to or from many other sources/sinks). Intel 10 Gbit/s Ethernet adapters provide the necessary bandwidth and features to optimize performance of the network when moving to 10 GbE. The test configuration shown in the paper yield load and unload data rates in excess of 2.3 TB/hour between the IBM InfoSphere Information Server running Intel E7-4870 processors and the IBM Netezza 1000 series data warehouse appliance. This performance level resulted in an average of 6 Gbits/s over the network connection between the two servers. Such high performance rates might also require the use of larger memory based (or SSD devices) to maintain scratch and temporary data required by Information Server. A RAMDISK was used to replace slower hard drive based storage. Accelerate Time to Value by Using IBM InfoSphere Information Server and IBM Netezza Data Warehouse Appliance Together IBM InfoSphere Information Server provides more trustworthy information--information that is comprehensive, accurate, timely, and in context to the data warehouse. The purpose-built IBM Netezza data warehouse appliance is designed to deliver new standards of performance, simplicity, and scalability for break-through analytics. IBM InfoSphere Information Server v8.7 introduces new, faster and feature-rich connectivity to IBM Netezza to accelerate your time to value when deploying a more trustworthy data warehouse solution. The new Netezza Connector is embedded with many user friendly and useful features based on customer feedback and industry trends. It is based on using IBM Netezza’s External Table method for bulk loading and extraction. It is capable of doing high performance extraction through parallel read. It supports multiple load operations, such as Insert, Update then insert, Delete then insert, Update, Delete, Action column based insert, and custom SQL based insert. It’s smart enough to allow the use of existing temporary work tables or to automatically create new ones, plus it allows temporary work tables to be located in a separate database, thus avoiding the possibility of corrupting the target table and misuse of user privileges. It supports multiple table operations such as truncate, drop, replace, or create with the unique ability to utilize different distribution key options such as user defined, automatic or random. There are additional features that provide end users with even more capabilities, such as the ability to execute before and after SQL statements from either a single node or multiple nodes, schema reconciliation, and run time column propagation. The IBM InfoSphere Information Server Balanced Optimization is very unique when compared to other offerings in the market. It is capable of pushing transformation processing to both source and target databases. This provides users the ability to optimize the usage of the variety of resources in their IT environment. The data transformation processing can be done entirely on dedicated servers, or the entire processing can be pushed into IBM Netezza, or a hybrid model can be used where data transformation is done both on dedicated data transformation servers and inside of IBM Netezza. The original jobs and the Balanced Optimized jobs are kept in the same repository. This provides users the capability to design execution logic to choose which job to run depending on the current resource load in their environment. Optimize Return on Investment with the Powerful Combination of IBM Mission Critical Software and Intel® Xeon® E7 Mission Critical Processors 32 In summary, IBM InfoSphere Information Server provides enhanced native connectivity to the IBM Netezza data warehouse appliance through the new IBM InfoSphere Information Server connector for Netezza. It supports IBM InfoSphere Information Server Balanced Optimization, allowing the choice of where transformations occur, either in the source database, in the target IBM Netezza database, in the high-performance Information Server engine, or in any or all of those locations. Combining powerful mission critical servers like the Intel® Xeon® Platforms based on the E7-4870 series processor running IBM InfoSphere Information Server parallel engine can efficiently process data to and from the IBM Netezza server at impressive data rates. 9. Acknowledgements Many thanks to the contributors who have provided help during our benchmark effort, their valuable input, and reviews to this document: • Jantz Tran (Sr. Staff Software and Hardware Engineer, Intel) • Yeh-Heng Sheng (Software Architect, Balance Job Optimization, ISAS/Information Server, IBM) • Gary Faircloth (User Technology Engineer: User Assistance Design, Writing, and Production Infrastructure, IBM) 33 10. About the Authors Garrett Drysdale is a Software and Hardware Performance Engineer for Intel. Garrett has analyzed and optimized software on Intel® platforms since 1995 spanning client, workstation, and enterprise server market segments. Garrett currently works with Cloud and Data Center software developers to analyze and optimize Cloud apps on Intel servers, and with internal design teams to assist in evaluating the impact of new technologies on software performance for future Intel® platforms. Garrett has a BSEE from University of Missouri-Rolla and a MSEE from The Georgia Institute of Technology. His email is [email protected]. Dr. Sriram Padmanabhan is an IBM Distinguished Engineer, and Chief Architect for IBM InfoSphere Servers. Most recently, he had led the Information Management Advanced Technologies team investigating new technical areas such as the impact of Web 2.0 information access and delivery. He was a Research Staff Member and then a manager of the Database Technology group at IBM T.J. Watson Research Center for several years. He was a key technologist for DB2’s shared-nothing parallel database feature and one of the originators of DB2’s multi-dimensional clustering feature. He was also a chief architect for Data Warehouse Edition which provides integrated warehousing and business intelligence capabilities enhancing DB2. Dr. Padmanabhan has authored more than 25 publications including a book chapter on DB2 in a popular database text book, several journal articles, and many papers in leading database conferences. His email is [email protected]. Branislav Barnak is a Software Architect at IBM. He works on IBM InfoSphere DataStage connectivity and is focusing on IBM technologies such as DB2, WebSphere MQ, mainframe data storage systems, and others. Branislav has been involved with the development of connectivity technologies for more than 10 years. He has worked on several connectivity frameworks interfacing with database and messaging products such as Oracle, PeopleSoft, Teradata, DB2, WebSphere MQ, and others. He holds a BS degree in Electrical Engineering from the University of Belgrade, Serbia. His email address is [email protected]. Brian Caufield is a Software Architect for IBM InfoSphere Information Server responsible for the definition and design of new IBM InfoSphere Information Server features, and also works with the Information Server Performance Team. Brian represents IBM at the TPC, working to define an industry standard benchmark for data integration. Previously, Brian worked for 10 years as a developer on IBM InfoSphere Information Server specializing in the parallel engine. His email is [email protected]. As Product Manager for IBM InfoSphere DataStage, Tony Curcio works with organizations who are leveraging IBM’s scalable data integration platform to enable their Trusted Information infra-structure. In this role, he is responsible for aligning product development direction to best fit the needs of IBM customers and industry trends. Tony also serves as a focal point for product synergy on joint InfoSphere and Cognos technologies. His email is [email protected] 34 Jon Deng is the Product Manager for IBM Information Server Connectivity where he works with partners and vendors to integrate their products with Information Server. Jon has over 15 years of experience in enterprise Software, Hardware and Industry solutions. Key areas of expertise include Information Integration, BIG Data, Data Warehousing, Business Intelligence, Real Time Analytics, Cloud computing, Web2.0/Mashups, and Production Planning, Scheduling and Sequencing methodologies. His email is [email protected] David Qiang Li is a Software Performance Engineer who specializes in software performance, stress and load testing for past 12 years. Before joining IBM InfoSphere product group, David worked at IBM Rational Performance Engineering team, and played a key role in improving quality and performance of Rational ClearCase, Rational ClearQuest and Rational Team Concert release over release. David was speaker at Rational Software Development Conference in 2007 and 2009, and IBM Information On Demand Conference in 2011. Based in Littleton MA, David frequently attends lectures and conferences in the IT field at Harvard and MIT. His email is [email protected]. Lu Liang has many years of experience on System Test and Performance. Currently, Liang is a software engineer in InfoSphere Information Server Performance Team. He has published many technical articles for Information Server in China's developWorks. Besides being certified as an advanced DB2 administrator, he is also familiar with other relational database, acquainted with J2EE and Web technology. He always keeps a keen eye on Information Management, new technology and open source projects. His email is [email protected]. Mi Wan Shum is the manager of the IBM InfoSphere Information Server performance team at the IBM Silicon Valley Lab. She graduated from University of Texas at Austin and she has years of software development experience in IBM. Her email is [email protected] John Skier is the Director of IBM's Netezza partner integration team where he worked with partners to integrate their product offerings optimally with IBM Netezza's data warehouse appliances using set base SQL processing techniques. He is a joint patent holder for Informix SQL engine technology and has been a member of the TPC and SQL standards committees. He's spent his career developing, implementing and delivering high performance standards based solutions for data warehousing and on-line transaction processing systems in both commercial and government sectors using SQL technology. His email is [email protected]. Samuel Wong is a member of the IBM InfoSphere InfoSphere Information Server performance team at the IBM Silicon Valley Lab. He graduated from University of Toronto and he has over 10 years of software development experience with IBM. His email is [email protected] 35 11.Intel Legal Disclaimer Performance tests and ratings are measured using specific computer systems and / or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/. THIS DOCUMENT AND RELATED MATERIALS AND INFORMATION ARE PROVIDED "AS IS" WITH NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT OF INTELLECTUAL PROPERTY RIGHTS, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION, OR SAMPLE. INTEL ASSUMES NO RESPONSIBILITY FOR ANY ERRORS CONTAINED IN THIS DOCUMENT AND HAS NO LIABILITIES OR OBLIGATIONS FOR ANY DAMAGES ARISING FROM OR IN CONNECTION WITH THE USE OF THIS DOCUMENT. All products, product descriptions, plans, dates, and figures are preliminary based on current expectations and subject to change without notice. Availability may vary in different channels. *Other names and brands may be claimed as the property of others. © 2012, Intel Corporation. All rights reserved. Intel, the Intel logo, Core, Itanium, NetBurst, Pentium, and VTune are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 36 12.IBM Legal Disclaimer THE INFORMATION CONTAINED IN THIS DOCUMENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS DOCUMENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS DOCUMENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE. The results reported in this document were achieved under controlled lab conditions that represent an optimal test case scenario. IBM does not guarantee these results and individual results will vary. IBM, the IBM logo, ibm.com, InfoSphere, Information Server, and DataStage are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml 37