...

Using DB2 BLU to enable IBM Cúram Business Intelligence and Analytics

by user

on
Category: Documents
76

views

Report

Comments

Transcript

Using DB2 BLU to enable IBM Cúram Business Intelligence and Analytics
Using DB2 BLU to enable
IBM Cúram Business Intelligence and Analytics
Anthony Farrell is a senior software engineer in the IBM Cúram platform group. Anthony has
technical responsibility for Business Intelligence in the Cúram platform, has 10 years’ experience of
Cúram product development, is a certified Java programmer with strong Java, Java Platform
Enterprise Edition, and business intelligence experience. Anthony Farrell [email protected]
@ Copyright International Business Machines Corporation 2015.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP
Schedule Contract with IBM Corp.
CONTENTS
1
Introduction .......................................................................................................... 1
1.1
Purpose.................................................................................................... 1
1.2
Audience .................................................................................................. 1
1.3
Pre-requisites ........................................................................................... 1
2
What is DB2 BLU? ............................................................................................... 1
3
Enabling BIA in a BLU sandbox ........................................................................... 2
4
3.1
Enabling BLU in a new database.............................................................. 2
3.2
Enabling BLU in an existing database ...................................................... 2
3.3
Creating column-organized tables ............................................................ 3
Using DDL statements .............................................................................. 3
3.3.2
Using db2convert ...................................................................................... 5
3.3.3
Using the Workload Table Organization Advisor (WTOA) ....................... 5
Planning your BIA Rollout with BLU ..................................................................... 6
4.1
When BLU is best..................................................................................... 6
4.2
Columnar table adoption strategies for Cúram BIA databases ................. 7
4.3
5
3.3.1
4.2.1
BIA staging area ....................................................................................... 7
4.2.2
BIA warehouse.......................................................................................... 7
4.2.3
BIA data mart ............................................................................................ 7
Tuning and other considerations .............................................................. 8
4.3.1
Row and column-based tables co-existing ............................................... 8
4.3.2
Column organized tables and Indexes ..................................................... 8
4.3.3
Materialized Query Tables ........................................................................ 8
4.3.4
Reading from files ..................................................................................... 9
4.3.5
DB2 Warehouse data flow configuration settings ..................................... 9
Acknowledgements .............................................................................................. 9
Appendix A – Sample SQL ............................................................................................ 10
Appendix B – Further Reading ...................................................................................... 11
ii
Introduction
1
Introduction
Cúram Business Intelligence and Analytics is a decision support solution that helps social program
organizations analyze the effectiveness of their programs and gain insight into the efficiency of their
operations. It is configurable to meet each organization’s specific needs and scalable from the
program to enterprise level.
It consists of embedded analytics, domain-specific dashboards, a complete Extract, Transform,
Load (ETL) framework and tool-independent, pre-defined, domain-specific data marts. Cúram
Business Intelligence and Analytics delivers contextually relevant, embedded Analytics through
pre-packaged domain-specific reports.
1.1
Purpose
This article shows that IBM Cúram Business Intelligence and Analytics (BIA) can work with DB2
BLU version 10.5 FP5.
In this article, you are introduced to DB2 10.5 BLU and you are shown how to start by using BLU
technology. You are shown how to set up a BLU Cúram BIA sandbox and several scenarios for
adopting BLU technology with Cúram BIA are discussed.
1.2
Audience
This article is intended for technical users who are tasked with developing solutions that use Cúram
BIA, or for technical users who are aware of DB2 10.5 BLU without extensive expertise.
1.3
2
Pre-requisites

IBM Cúram development environment with Business Intelligence and Analytics
installed. The SQL tools in this article were tested against Cúram version 6.1.0.1.

Knowledge of the Cúram Business Intelligence and Analytics components.

Awareness of DB2 10.5 BLU Acceleration.

SQL skills and the ability to run the sample SQL.

The SQL in this article was tested against DB2 10.5 Fix Pack 5.
What is DB2 BLU?
IBM DB2 with BLU Acceleration is the next generation database technology for in-memory
computing. It delivers a combination of innovations from IBM Research & Development labs, BLU
Acceleration provides breakthrough performance.
BLU technology is seamlessly integrated in DB2 10.5. In addition, BLU enables faster time–tovalue through a choice of deployment models of on-premises or cloud.
DB2 BLU uses in-memory computing for faster insight into your data; it increases the speed of your
analytic workloads with columnar tables. It does not require SQL or schema changes to be
implemented to your application code.
1
If you are seeing storage savings with adaptive compression, further savings are possible with
column-organized tables in BLU. Another direct impact of reducing the on-disk footprint is faster
backup, restore, and smaller backup images.
3
Enabling BIA in a BLU sandbox
For further guidance on creating BIA data warehouse database’s please review the Reporting
developer guide, creating DB2 target schema’s.
3.1
Enabling BLU in a new database
Creating a DB2 database is the typical approach when setting up a new BIA sandbox, this
approach provides the fastest setup to experience BLU technology. Configure your DB2 instance
with the following steps: 1. Set the analytics registry variable. This variable tells the database that it is intended to
be used for analytic workloads. Run:
db2set db2_workload=analytics.
2. Restart the instance by using the following command.
db2stop force followed by db2start.
3. Apply the db2_workload setting to the database by using the following command.
Run:
autoconfigure apply db and dbm.
4. Create the database by using the following command.
create database BIA_BLU using codeset utf-8 territory us collate
using identity
5. Create the table space by using the following command.
create tablespace BIA_BLU_tbsp
Because the database was created after setting DB2_WORKLOAD=ANALYTICS, the default page
size for the table spaces is 32K and the extent size is 4 pages. You are now ready to create
column organized tables. In fact, all tables that are created in this database default to columnar, to
create row organized tables in this database you must use the “organise by row” clause on the
create table definition.
3.2
Enabling BLU in an existing database
You can enable BLU in an existing database, follow the steps to enable BLU in an existing
database.

Set the analytics registry variable. This variable tells the database that it is intended to
be used for analytic workloads. Run:
2
Enabling BIA in a BLU sandbox
db2set db2_workload=analytics

Restart the instance by using.
db2stop force followed by db2start.

The following command applies the db2_workload setting to the database.
autoconfigure apply db and dbm

Now enable automatic storage. Run the following command:
create stogroup BIA_BLU_mystg on '/dbfs1' set as default

To convert existing DBMS table spaces to use automatic storage, run the two ALTER
TABLESPACE commands:
alter tablespace BIA_BLU_tbsp managed by automatic storage
alter tablespace BIA_BLU_tbsp rebalance

To create new automatic storage table spaces, run the CREATE TABLESPACE
command. Because this table space is used to store column-organized table data, use
the recommended page size of 32K and the extent size of 4 pages.
create tablespace BIA_BLU_tbsp pagesize 32k extentsize 4
The cost of database administration can be a large part of the cost of the managing database
software. BLU supports autonomics to lower the cost of database administration. Autonomics is
the concept of database self-management, allowing the database to self-tune, self-manage
containers and self-tune memory all of which drives better default operation that lowers the total
cost of ownership. If you are not already using automatic storage table spaces, you can get started
by running the CREATE STOGROUP command.
3.3
3.3.1
Creating column-organized tables
Using DDL statements
If the database was configured with DB2_WORKLOAD=ANALYTICS, newly created tables are
column-organized by default. For example, running the following DDL statements create a table
that is column organized.
create table DM_DIMCASESTATUSCOL (statuskey bigint not null, status
varchar(60));
Otherwise, you must use the optional “organise by clause”, the following example creates a row
organized table.
create table DM_DIMCASESTATUSROW (statuskey bigint not null, status
varchar(60)) organize by row;
3
4
Enabling BIA in a BLU sandbox
You can also load data from a row-organized table into a column-organized table by using the
following commands:
load from DIMCASESTATUSROW of del replace into DM_DIMCASESTATUSCOL
The load command automatically builds the columnar compression dictionaries, synopsis table and
collect statistics. With regards to compression, bear in mind you need a large enough data set
(several gigabytes) that contains a good representative subset of data values to achieve optimal
compression. In cases with strongly unsorted input data, the effectiveness of data skipping can
sometimes be improved by pre-sorting data on numeric or date/time columns that are frequently
used in query predicates.
3.3.2
Using db2convert
Use the db2convert command to convert one or all row-organized tables into column-organized tables,
row-organized tables remain online during this conversion. The db2convert command starts the
ADMIN_MOVE_TABLE stored procedure. Ensure you back up the database or table spaces as the conversion
is one way.
db2convert -d BIA_BLU -z db2admin -t DM_DIMCASESTATUSROW
This command converts the DM_DIMCASESTATUSROW table in the BIA_BLU database from roworganized to column-organized format. The -z option specifies the schema for the table and the -ts
option species the table space in which the column-organized table should be created.
To convert all tables in a database use the command:
db2convert –d BIA_BLU
When the command completes you get a message:
SQL2446I The db2convert command completed successfully. All row-organized tables that satisfy
the specified matching criteria have been converted to column-organized tables.
3.3.3
Using the Workload Table Organization Advisor (WTOA)
Before you use this tool, make sure to back up the database, and as always refer to the DB2
documentation for full details before implementing any changes.
This advisor tool can be used to identify tables that might be good candidates for columnorganization. The DB2 advisor presents its analysis and rationale, the advisor lists the tables that
are might benefit from conversion and those tables that might not. You can see the cardinalities of
the tables, how many statements in the query workload referenced them, the cumulative cost of
running the statements that reference a table, the statements themselves, and more. With the
advisor, you can view and save DDL for implementing the changes that are necessary to convert
the tables to column organized.
Now, you should be able to measure the compression savings and start running queries without
any SQL or DDL changes. Question two in the Appendix shows the SQL to measure disk space
consumption.
5
4
Planning your BIA Rollout with BLU
4.1
When BLU is best
The BLU feature in DB2 with version 10.5 is intended for analytic (warehouse / data mart type)
workloads. These workloads can often involve many thousands of users who run regular report
queries. If you have databases that have a mixture of tables that are used for transaction
processing workloads (OLTP) and analytic workloads (OLAP). The following table shows some of
the markers that suggest tables can be suited to column-organized.
If the number of read requests is much greater than the number of write requests, then columnorganized tables can help. Column-organized tables have several advantages when compared to
the row-organized tables:

Read efficiency. In a column-organized table, only those columns that are needed are
read, while each row is read in a row-organized table.

Compression efficiency. Typically, a column-organized table features a compression ratio
that is higher than a row-organized table compression ratio.

Sorting and indexing efficiency. Sorted columns compress more efficiently and range
queries run faster.
Use column-organized tables.
Use row-organized tables
Analytical Workloads servicing Warehouse or data
marts (Star schema or dimensionally designed
tables)
Online transaction processing workloads (highly
normalized databases that are designed for high
levels of transaction through put)
Queries that use grouping, sorting, aggregation.
Applications where the majority of queries return
one or a few rows.
Queries where a subset of table columns is typically
used. Table scans over a star schema are typically
suited to column-organized tables.
Applications that insert or update one or few
records in a single transaction.
Performance is optimal when inserting large
numbers of records, 100's or 1000's of records for
large insert or update transactions. Typically 1000's
to amortize the transaction costs.
Materialized Query Tables (MQT) are created to
improve performance. MQT is no longer required
with column-organized tables as performance is
already optimal.
Applications that store LOB within tables or XML
within tables.
6
Planning your BIA Rollout with BLU
4.2
Columnar table adoption strategies for Cúram BIA databases
The default database configuration for BIA is a two tiered model. That is, a single database is
created to host the BIA Staging schema and the BIA Warehouse schema. A second database is
created to host the BIA data mart schema.
4.2.1
BIA staging area
Row-based organization is best for these tables. The number of read requests is not much greater
than the number of write requests. Also, typically, all columns are read by warehouse ETL
processes when updating warehouse tables. Using the Workload Table Advisor Tool, you can
identify any staging tables that might benefit from column organization.
4.2.2
BIA warehouse
Row-based organization is best for these tables. The number of read requests is not much greater
than the number of write requests. Also, typically, all columns are read by data mart ETL
processes when updating data mart tables. Using the Workload Table Advisor Tool, you can
identify any staging tables that might benefit from column organization.
4.2.3
BIA data mart
Analytics are built from data mart tables, as a result, the number of read requests is much greater
than the number of write requests. With green field projects, deciding how to use columnar tables
is simpler. The data mart database is created as a columnar database, this configuration results in
all Fact and Dimensions tables being column-organised.
If you have production data, you might want a more fine grained strategy to achieve “the greatest
return for lowest effort”. Using the Workload Table Advisor Tool, identify the Fact table’s most in
use, for each Fact table identify the most used dimensions and then convert this cohort of Fact and
dimension tables to column-organised tables. Prioritize the remaining output from the advisor tool,
you can then convert the remaining Fact and dimensions tables.
7
4.3
4.3.1
Tuning and other considerations
Row and column-based tables co-existing
To, gain full advantage of the optimizations that are provided by the columnar execution engine,
SQL statements must join columnar tables to columnar tables. Introducing row-based tables into
SQL queries can negate many of the optimizations that are otherwise possible. If you are running a
mixture of row based and columnar tables in a single database, ensure that your database is tuned
such that OLAP workloads are not negatively affected by your columnar tables. Columnar
databases typically require a very high level of database sort memory, this can negatively affect
row-based queries.
With FP4 DB2 introduced configuration that allows row and column-based tables to co-exist in the
same database, without OLTP queries on row tables being adversely affected. Also, DB2 FP5
contains many updates to the DB2 Warehouse components; Warehouse Design Client, SQW, and
so on. To ensure that you are getting the best from your DB2 stack you are advised to stay current
with Fix Packs.
4.3.2
Column organized tables and Indexes
By default BIA DDL includes indexes to improve performance, these indexes are no longer
required with a columnar database. Only indexes that are required to enforce primary keys or
unique constraints are required, other indexes that support performance are no longer required.
4.3.3
Materialized Query Tables
Your current architecture might include MQT to support analytics. For example, to support charts
that are written with BIRT or reports that are written in other technologies like Cognos.
If you created MQT for performance reasons then you should consider changing your analytics to
point directly at columnar tables, as DB2 BLU provides performance that is good when compared
to MQT. DB2 advises that there is now little performance gain from MQT when compared to
columnar tables.
Materialized Query Tables used as part of a “resilience or failover” 1 process.
MQT are sometimes used as part of a “resilience or failover” process. For example, MQT can
remain online for analytics users when the underlying warehouse is being updated. If the
warehouse load completes successfully, MQT are then rebuilt with the latest data. If the warehouse
load fails the existing MQT remain in place until the warehouse load succeeds. Because you are
using MQT for resilience or failover, you can continue to do so.
1
MQT is used by Cúram clients as a resilience technique during ETL processing.
8
Acknowledgements
4.3.4
Reading from files
Some BIA ETL processes read data from comma-separated-files (CSV). As you upgrade to newer
versions of DB2 you might need to rebind ETL data flows against any CSV files. This might be
required as newer versions of DB2 might provide updated usage notes for the LOAD command.
4.3.5
DB2 Warehouse data flow configuration settings
Further design time and runtime configuration options are possible with BIA ETL data flows
including: row compression for internal staging tables, enable optimization for code generation,
jdbc batch size and so on. Depending on your local topology and data patterns, you are advised to
change these configuration settings locally.
5
Acknowledgements
I would like to thank Patrick Fagan, Chief Architect, Cúram Platform Group for his encouragement.
I also want to thank the following members of the Cúram product development team for their input
and support: 
Joseph Glackin, Senior Software Engineer, who tested the SQL tools in this article and
performed a technical review this document.

Eoin Fitzpatrick, Software Engineer, Cúram Product Development, Cúram and Smarter
Care, IBM Watson Health, who performed a technical review this article.

Caitriona nic Lughadha, Information Development Manager, Functional Verification Test
Manager Cúram SPM, who performed an editorial review of this article.

Andrew Foley, System Verification Test Engineer, who installed DB2 version 10.5 FP5 on
Linux.

Conor Cullen, Partner Enablement Manager, Cúram and Smarter Care, IBM Watson
Health, who provided a business review of the document.

I would also like to thank, Albert Z. Yao, Warehouse Tooling, InfoSphere Warehouse, for
confirming that DB2 Warehouse supports DB2 BLU. It was also noted by Albert that
clients should stay current with DB2 Fix packs to avail of the latest supported features and
fixes.
.
9
APPENDIX A – SAMPLE SQL
Question: How do I verify which tables are row-organized and, which are column-organized?
Answer: Read the TABLEORG column from the syscat.tables to verify if a table is row-organized
(TABLEORG=R) or column-organized (TABLEORG=C). See the example SQL which lists the table
organization for the tables created in this article.
select tabname, tableorg from syscat.tables
Question: How do I evaluate any storage saving with BLU compression?
Answer: The following example query can be used to get the total on-disk footprint for the columnorganized tables versus the same data loaded into row-organized tables
select
systemTables.tabname,
rowcompmode,
pctpagessaved,
data_object_p_size,
index_object_p_size,
col_object_p_size,
data_object_p_size + index_object_p_size + col_object_p_size as
totalSizeInKB
from
syscat.tables systemTables,
sysibmadm.admintabinfo adminInfo
where
systemTables.tabname = adminInfo.tabname and
systemTables.tabschema = adminInfo.tabschema
10
APPENDIX B – FURTHER READING

DB2 10.5. What’s new in the SQL Warehousing Tool
What's new in SQW for DB2 Warehouse V10.5

DB2 10.5 Knowledge Center:
www01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.kc.doc/welcome.html

Big Data & Analytics Hub
http://www.ibmbigdatahub.com/blog/super-analytics-super-easy

DB2 Best Practices Community
https://www.ibm.com/developerworks/community/groups/

Tuning and Configuring DB2 BLU:
https://www.ibm.com/developerworks/community

Optimizing ETL performance with DB2 BLU:
http://www.ibm.com/developerworks/data/library/techarticle/dm-1504etl-blu-accel/dm-1504etl-bluaccel-pdf.pdf

Options for converting tables to column organized tables:
http://www.ibm.com/developerworks/library/dm-1406convert-table-db2105/

Cúram Social Program Management
http://www-01.ibm.com/support/knowledgecenter/SS8S5A/welcome?lang=en

Cúram SPMP Analytics Accelerator
http://www01.ibm.com/support/knowledgecenter/SS8S5A_6.1.0/com.ibm.curam.content.doc/SPMAnalytics/SP
MOverview/c_SPM_SPMAnalyticsOverview.html

Installing a Cúram Development Environment
http://www01.ibm.com/support/knowledgecenter/SS8S5A_6.1.0/com.ibm.curam.content.doc/install_Developm
entEnvironment/ctr_install_installingthedevelopmentenvironment.html?lang=en

Cúram Business Intelligence and Analytics
http://www-03.ibm.com/software/products/en/business-intelligence-analytics

Cúram Business Intelligence and Analytics Knowledge Center
http://www01.ibm.com/support/knowledgecenter/SS8S5A_6.1.1/com.ibm.curam.nav.doc/common/t_ctr_bia.ht
ml

Cúram Business Intelligence and Analytics Knowledge Center – Creating BIA database’s
http://www01.ibm.com/support/knowledgecenter/SS8S5A_6.1.1/com.ibm.curam.content.doc/BIReportingDeve
loper/c_BEREP_Installandconfig1CreateDbTargetSchemas1.html
11
®
© Copyright IBM Corporation 2015
IBM United States of America
Produced in the United States of America
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP
Schedule Contract with IBM Corp.
IBM can not offer the products, services, or features discussed in this document in other countries.
Consult your local IBM representative for information on the products and services currently
available in your area. Any reference to an IBM product, program, or service is not intended to
state or imply that only that IBM product, program, or service may be used. Any functionally
equivalent product, program, or service that does not infringe any IBM intellectual property right
may be used instead. However, it is the user's responsibility to evaluate and verify the operation of
any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not grant you any license to these patents. You
can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PAPER “AS IS”
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or
implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes may be
made periodically to the information herein; these changes may be incorporated in subsequent
versions of the paper. IBM may make improvements and/or changes in the product(s) and/or the
program(s) described in this paper at any time without notice.
Any references in this document to non-IBM Web sites are provided for convenience only and do
not in any manner serve as an endorsement of those Web sites. The materials at those Web sites
are not part of the materials for this IBM product and use of those Web sites is at your own risk.
12
IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not give you any license to these patents. You
can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
4205 South Miami Boulevard
Research Triangle Park, NC 27709 U.S.A.
All statements regarding IBM's future direction or intent are subject to change or withdrawal without
notice, and represent goals and objectives only.
This information is for planning purposes only. The information herein is subject to change before
the products described become available.
If you are viewing this information softcopy, the photographs and color illustrations may not appear.
13
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corporation in the United States, other countries, or both. If these and other IBM
trademarked terms are marked on their first occurrence in this information with a trademark symbol
(® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the
time this information was published. Such trademarks may also be registered or common law
trademarks in other countries. A current list of IBM trademarks is available on the web at
"Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in
the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.
14
Fly UP