© 2012 IBM Corporation 1 © 2015 IBM Corporation

by user

on 15 сентября 2016

Category: Documents

>> Downloads: 8

views

Report

Comments

Description

Transcript

z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Building a Big Data Solution Using IBM DB2 for z/OS
Jane Man
Senior Software Engineer
[email protected]
©
1 2015 IBM Corporation
© 2012 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Please note
IBM’s statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment,
promise, or legal obligation to deliver any material, code or functionality. Information
about potential future products may not be incorporated into any contract.
The development, release, and timing of any future features or functionality described
for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM
benchmarks in a controlled environment. The actual throughput or performance that any
user will experience will vary depending upon many factors, including considerations
such as the amount of multiprogramming in the user’s job stream, the I/O configuration,
the storage configuration, and the workload processed. Therefore, no assurance can be
given that an individual user will achieve results similar to those stated here.
2
2
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Agenda
Big Data Overview
What is Hadoop?
What is BigInsight?
DB2 11 for z/OS connectors to BigData
What is it?
How does it work?
Use Cases
Other
BigInsights BigSql 3.0
DoveTail Co:Z
Sqoop
Veristorm Enterprise
Summary
Questions and Answers
IBM Silicon Valley Laboratory - 2014 WW Tech Sales Boot Camp
3
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Big Data is All Data And All Paradigms
Transactional &
Application Data
4
4
Machine Data
Social Data
Enterprise
Content
• Volume
• Velocity
• Variety
• Variety
• Structured
• Semi-structured
• Highly unstructured
• Highly unstructured
• Throughput
• Ingestion
• Veracity
• Volume
© 2015 IBM Corporation
© 2013 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Demand for differently structured data to be seamlessly
integrated, to augment analytics / decisions
•
•
Analytics and decision engines
reside where the DWH /
transaction data is
“Noise” (veracity) surrounds
the core business data
•
•
Social Media, emails, docs,
telemetry, voice, video, content
Expanding our insights –
getting closer to the “truth”
•
•
Data
Warehouse
Integration
Business
Analytics
DB2 for z/OS
IMS
Information
Governance
Lower risk and cost
Increased profitability
“Circle of trust”
”
widens
55
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Big Data Use Cases
Big Data Exploration
Find, visualize, understand
all big data to improve
decision making
6
Enhanced 360o View
of the Customer
Security/Intelligence
Extension
Extend existing customer
views (MDM, CRM, etc) by
incorporating additional
internal and external
information sources
Lower risk, detect fraud
and monitor cyber
security in real-time
Operations Analysis
Data Warehouse Augmentation
Analyze a variety of machine
data for improved business
results
Integrate big data and data warehouse
capabilities to increase operational
efficiency
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
What is
An open source software framework that supports dataintensive distributed applications
High throughput, batch processing
runs on large clusters of commodity
hardware
• Yahoo runs a 4000 nodes Hadoop
cluster in 2008
• Two main components
Hadoop distributed file system
• self-healing, high-bandwidth clustered
storage
MapReduce engine
77
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Hadoop: The underlying principle
Lots of redundant disks – really inexpensive disks
Lots of cores – inexpensive cores working all the time
Disks crash – that’s ok – just replace them
Processors fail – that’s ok – just replace them
Network errors happen – that’s ok - just retry
Disks, processors networked
8
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Hadoop Distributed File System (HDFS)
Files are broken in to large blocks
(default=64MB).
Blocks are replicated
(default=3 times)
and distributed
across the cluster.
Durability
Availability
Throughput
Optimized for
Streaming reads of large files
write-once-read-many access
model, append-only
99
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
MapReduce
A simple, yet powerful framework for parallel computation
– Applicable to many problems, flexible data format
Basic steps:
– Do parallel computation (Map) on each block (split) of data in an HDFS file and
output a stream of (key, value) pairs to the local file system
– Redistribute (shuffle) the map output by key
– Do another parallel computation on the redistributed map output and write results
into HDFS (Reduce)
Mapper
Reducer
M1
M2
M3
shuffle
R1
R2
M4
10
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Power of Parallelism
1 Hard Disk = 100MB/sec (~1Gbps)
Server = 12 Hard Disks = 1.2GB/sec (~12Gbps)
Rack = 20 Servers = 24GB/sec (~240Gbps)
Avg. Cluster = 6 Racks = 144GB/sec (~1.4Tbps)
Large Cluster = 200 Racks = 4.8TB/sec (~48Tbps)
Scanning 4.8TB at 100MB/sec takes 13 hours.
11
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
From Getting Starting to Enterprise Deployment:
Enterprise class
Different BigInsights Editions For Varying Needs
Enterprise Edition
Standard Edition
Apache
Hadoop
- Spreadsheet-style tool
-- Web console
-- Dashboards
- Pre-built applications
-- Eclipse tooling
-- RDBMS connectivity
-- Big SQL
-- Monitoring and alerts
-- Platform enhancements
-- . . .
- Accelerators
-- GPFS – FPO
-- Adaptive MapReduce
- Text analytics
- Enterprise Integration
-- Big R
-- InfoSphere Streams*
-- Watson Explorer*
-- Cognos BI*
-- Data Click*
-- . . .
Quick Start
Free. Non-production
-
* Limited use license
Same features as
Standard Edition plus text
analytics and Big R
Breadth of capabilities
12
© 2013 IBM Corporation
12
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
InfoSphere BigInsights for Hadoop includes the latest Open Source
components, enhanced by enterprise components
IBM InfoSphere BigInsights for Hadoop
GPFS FPO
Open Source
13
IBM
Governance
HDFS
Data Privacy for Hadoop
File System
Data Matching
HBase
Flexible
Scheduler
Data Store
Adaptive MapReduce
Audit & History
MapReduce
Data Masking
Big SQL
Security
Pig
Data Security for Hadoop
Sqoop
LDAP
Hive
Kerberos
ETL
YARN*
HCatalog
Monitoring
Flume
Search
Jaql
Enterprise
Search
Runtime
Streams
Solr/
Lucene
Data
Access
Text
Analytics
Resource Management &
Administration
Big R
Console
Stream Computing
Oozie
Dashboard
Advanced Analytics
R
BigSheets
Reader and
Macro
Text Analytics
Extractors
Eclipse Tooling:
MapReduce, Hive, Jaql,
Pig, Big SQL, AQL
BigSheets
Charting
Applications & Development
ZooKeeper
Visualization & Ad
Hoc Analytics
* In Beta
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
IBM InfoSphere BigInsights on System z
Deployment flexibility: Compatible clusters on or off the Mainframe
Secure Perimeter
14
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
What makes sense when?
Data originates mostly on the mainframe (Log
files, database extracts)
Data security a primary concern
Clients will not send data across external net
Relatively small data – 100 GB to 10s of TBs
Hadoop is valued mainly for richness of tools
Case 1: Hadoop on the Mainframe
DB2
VSAM
QSAM
IMS
Linux
SMF
Linux
Linux
Linux
z/VM
RMF
Logs
Linux
IFL
…
IFL
IFL
z/OS
CP(s)
Case 2: Hadoop off the Mainframe
DB2
VSAM
QSAM
IMS
SMF
RMF
Linux
Logs
z/OS
Linux
Linux
Linux
Linux
Most data originates off the mainframe
Security less of a concern since data is not
trusted anyway
Very large data sets – 100s of TB to PBs
Hadoop is valued for ability to manage large
datasets economically
Desire to leverage cheap processing and
potentially cloud elasticity
CP(s)
15
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
BigInsights “Connector”
for DB2 for z/OS
16
© 2015 IBM Corporation
V11
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Enhancing DB2 Analytics on “z” with Big Data
DB2 is providing the connectors and the DB capability to
allow DB2 applications to easily and efficiently access data
in Hadoop
•New user-defined
functions
•New generic table
UDF capability
IBM Big Insights
JAQL
{
}
17
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
DB2 11 Support for BigData
Goal: Integrate DB2 for z/OS with IBM's Hadoop based
BigInsights Bigdata platform. Enabling traditional
applications on DB2 for z/OS to access Big Data analytics.
1. Analytics jobs can be specified using JSON Query Language
(Jaql)
1. Submitted to BigInsights
2. Results stored in Hadoop Distributed File System (HDFS).
2. A table UDF (HDFS_READ) reads the Bigdata analytic result
from HDFS, for subsequent use in an SQL query.
–
HDFS_READ output table can have variable shapes
– DB2 11 supports generic table UDF, enabling this function
18
18
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
BigInsights “Connector” for DB2 for z/OS
Two DB2 “sample” functions:
–JAQL_SUBMIT – Submit a JAQL script for execution on
BigInsights from DB2
–HDFS_READ – Read HDFS files into DB2 as a table for use in
SQL
Notes:
–Functions are developed by DB2 for z/OS
• Shipped with DB2 11 in prefix.SDSNLOD2
• Functions are not installed by default
–Functions and examples are documented by BigInsights
• http://www.ibm.com/support/docview.wss?uid=swg27040438
–Intended to be DB2 samples
19
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
BigInsights Connector Usage
Key Roles:
–DB2 for z/OS DBA, Sysprog
• Issue CREATE FUNCTION DDL on DB2
– Refer to SDSNLOD2 dataset member
– Determine an appropriate WLM environment
–BigInsights Analytics Team
• Develop Jaql (or other) analytics on BigInsights
–DB2 for z/OS Application or Analytics Team
• Use BigInsights Connector functions in DB2 to execute
BigInsights analytics and retrieve results
20
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
JAQL_SUBMIT
Submit a JAQL script for execution on BigInsights from DB2
SET RESULTFILE =
JAQL_SUBMIT
('syslog = lines("hdfs:///idz1470/syslog3sec.txt");
JAQL script
containing the
analysis
[localRead(syslog)->filter(strPos($,"$HASP373")>=0)->count()]->
write(del(location="hdfs:///idz1470/iod00s/lab3e2.csv"));’,
'http://bootcamp55.democentral.ibm.com:14000/webhdfs/v1/idz1470/
iod00s/lab3e2.csv?op=OPEN',
'http://bootcamp55.democentral.ibm.com:8080',
''
);
21
options
Intended
HDFS file to
hold the
result
URL of the BigInsights cluster
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
JAQL – JSON Query Language
Java MapReduce is the “assembly language” of Hadoop.
JAQL is a high-level query language included in BigInsights with
three objectives:
–Semi-structured analytics: analyze and manipulate large-scale
semi-structured data, like JSON data
–Parallelism: uses the Hadoop MapReduce framework to process
JSON data in parallel
–Extensibility: JAQL UDFs, Java functions, JAQL modules
JAQL provides a hook into any BigInsights analysis
22
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
HDFS_READ Example
HDFS_READ
Read a file from HDFS, present the file as a DB2 table for use in SQL
SET RESULT_FILE = JAQL_SUBMIT(. . . . . );
URL of the CSV file to be read
SELECT BIRESULT.CNT FROM
TABLE(HDFS_READ(RESULT_FILE, '')
AS BIRESULT(CNT INTEGER);
options
Definition of the “table”, how to
present the results to SQL
23
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Example of HDFS_Read
The following is an example of a CSV file stored in HDFS
1997,Ford, E350,"ac, abs, moon",3000.00
1999,Chevy, "Venture ""Extended Edition""", ,4900.00
1996,Jeep,Grand Cherokee,"MUST SELL! AC, moon roof, loaded",4799.00
Example SQL statement:
SELECT * FROM TABLE (HDFS_Read('http://BI.foo.com/data/controller/dfs/file.csv',
‘'))
AS X (YEAR INTEGER, MAKE VARCHAR(10), MODEL VARCHAR(30),
DESCRIPTION VARCHAR(40), PRICE DECIMAL(8,2));
• Result
YEAR
-------1997
1999
1996
24
MAKE
-------Ford
Chevy
Jeep
MODEL
--------------------E350
Venture "Extended Edition"
Grand Cherokee
DESCRIPTION
PRICE
------------------------------ ----------ac, abs, moon
3000.00
(null)
4900.00
MUST SELL! AC, moon 4799.00
roof, loaded
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Integrated Query Example
INSERT INTO BI_TABLE (CNT)
(SELECT CNT FROM TABLE
(HDFS_READ
(JAQL_SUBMIT
('syslog = lines("hdfs:///idz1470/syslog3sec.txt");
[localRead(syslog)->filter(strPos($,"$HASP373")>=0)->count()]->
write(del(location="hdfs:///idz1470/iod00s/lab3e2.csv"));' ,
'http://bootcamp55.democentral.ibm.com:14000/webhdfs/v1/idz1470/
iod00s/lab3e2.csv?op=OPEN',
'http://bootcamp55.democentral.ibm.com:8080',
''
),
''
)
)
AS BIGINSIGHTS(CNT INTEGER));
25
JAQL_SUBMIT can be
embedded in HDFS_READ for a
synchronous execute/read
workflow
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
DB2 – BigInsights Integration Use
Case
26
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
DB2-BigInsights Integration Use Case 1
1. BigInsights ingests data that usually is not ingested by established
structured data analysis systems like DB2
• e.g. email from all clients sent to an insurance company.
2. DB2 kicks off a Hadoop job on BigInsights that analyzes the emails and
identifies customers who have expressed dissatisfaction with the
service and the word ‘cancel’, ‘terminate’, ‘switch’ or synonyms thereof,
or names of the competitors.
3. BigInsights job runs successfully, creates a file of results (names and
email addresses of customers at risk) and terminates.
4. DB2 reads the BigInsights result file using user defined table function
(HDFS_READ).
5. DB2 joins the result with the Agent table and alerts the agents of the atrisk customers. The agents act upon the at-risk customer and offer a
promotion to stave off defection.
27
27
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Use Case 2 – syslog analysis
1. DB2 Syslog is sent to BigInsight for analysis
2. Issue SQL statements to
1.
2.
3.
4.
5.
6.
7.
Count the number of lines in our syslog
Count the number of jobs started in our syslog
Show the lines containing $HASP373 messages
Show job names together with start time
Count the occurrences of each job
Show which jobs hand more than one occurrence
Produce a report of jobs with occurrence, departments, and owner
names
internet
DB2 11 for z/OS
28
28
Infosphere BigInsights 2.1
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Syslog sample
29
29
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Count the number of lines in our syslog
SELECT BIGINSIGHTS.CNT FROM
TABLE
(HDFS_READ
(JAQL_SUBMIT
('syslog = lines("hdfs:///idz1470/syslog3sec.txt");
[localRead(syslog)->count()]->
write(del(location="hdfs:///idz1470/iod00s/lab3e1.csv"));',
'http://biginsights.demos.ibm.com:14000/webhdfs/v1/idz1470/'
||'iod00s/lab3e1.csv?op=OPEN',
'http://biginsights.demos.ibm.com:8080',
'timeout=1000'
),
''
)
)
AS BIGINSIGHTS(CNT INTEGER);
30
30
Result:
CNT
6055
1 record(s) selected
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Show the lines containing $HASP373 messages
SELECT BIGINSIGHTS.LINE FROM
TABLE
(HDFS_READ
(JAQL_SUBMIT
('syslog = lines("hdfs:///idz1470/syslog3sec.txt");
localRead(syslog)->filter(strPos($,"$HASP373")>=0)->
write(del(location="hdfs:///idz1470/iod00s/lab3e3.csv"));',
'http://biginsights.demos.ibm.com:14000/webhdfs/v1/idz1470/'
||'iod00s/lab3e3.csv?op=OPEN',
'http://biginsights.demos.ibm.com:8080',
'timeout=1000'
),
''
)
)
31
31
AS BIGINSIGHTS(LINE VARCHAR(116));
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Result:
LINE
N 4000000 STLAB28 13189 20:30:00.20 JOB07351 00000010 $HASP373 PJ907163 STARTED - INIT PJ - CLASS V - SYS 28
N 4000000 STLAB28 13189 20:30:00.20 JOB07348 00000010 $HASP373 C5LEG003 STARTED - INIT K1 - CLASS L - SYS 28
N 4000000 STLAB28 13189 20:30:00.21 JOB07288 00000010 $HASP373 T1XRP010 STARTED - INIT D6 - CLASS 8 - SYS 28
N 4000000 STLAB28 13189 20:30:00.21 JOB07290 00000010 $HASP373 T1XRP018 STARTED - INIT D9 - CLASS 8 - SYS 28
N 4000000 STLAB28 13189 20:30:00.21 JOB07287 00000010 $HASP373 T1XRP022 STARTED - INIT D7 - CLASS 8 - SYS 28
N 4000000 STLAB28 13189 20:30:00.21 JOB07289 00000010 $HASP373 T1XRP002 STARTED - INIT D5 - CLASS 8 - SYS 28
N 4000000 STLAB28 13189 20:30:00.26 JOB07413 00000010 $HASP373 PJ907165 STARTED - INIT PJ - CLASS V - SYS 28
N 4000000 STLAB28 13189 20:30:00.26 JOB07419 00000010 $HASP373 PJ907164 STARTED - INIT PJ - CLASS V - SYS 28
N 4000000 STLAB28 13189 20:30:08.71 JOB07349 00000010 $HASP373 T1XRP006 STARTED - INIT D0 - CLASS 8 - SYS 28
N 4000000 STLAB28 13189 20:30:08.71 JOB07350 00000010 $HASP373 T1XRP014 STARTED - INIT D8 - CLASS 8 - SYS 28
N 4000000 STLAB28 13189 20:30:17.25 JOB07490 00000010 $HASP373 T1XRP022 STARTED - INIT D7 - CLASS 8 - SYS 28
N 4000000 STLAB28 13189 20:30:17.25 JOB07492 00000010 $HASP373 T1XRP002 STARTED - INIT D5 - CLASS 8 - SYS 28
N 4000000 STLAB28 13189 20:30:17.28 JOB07494 00000010 $HASP373 PJ907289 STARTED - INIT PJ - CLASS V - SYS 28
N 4000000 STLAB28 13189 20:30:17.29 JOB07495 00000010 $HASP373 PJ907287 STARTED - INIT PJ - CLASS V - SYS 28
....
161 record(s) selected
32
32
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Count the number of jobs started in our syslog
SELECT BIGINSIGHTS.CNT FROM
TABLE
(HDFS_READ
(JAQL_SUBMIT
('syslog = lines("hdfs:///idz1470/syslog3sec.txt");
[localRead(syslog)->filter(strPos($,"$HASP373“)>=0)->count()]->
write(del(location="hdfs:///idz1470/iod00s/lab3e2.csv"));',
'http://biginsights.demos.ibm.com:14000/webhdfs/v1/idz1470/'
||'iod00s/lab3e2.csv?op=OPEN',
'http://biginsights.demos.ibm.com:8080',
'timeout=1000'
),
''
)
)
Result:
CNT
161
1 record(s) selected
AS BIGINSIGHTS(CNT INTEGER);
33
33
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Integrating HDFS_READ with SQL
-- read file from exercise lab3e4, group to
occurrences in db2
SELECT JOBNAME, COUNT(*) AS OCCURS
FROM TABLE
(HDFS_READ
(
'http://biginsights.demos.ibm.com:14000/webhdfs/v1/'
||'idz1470/iod00s/lab3e4.csv?op=OPEN',
Result:
JOBNAME OCCURS
''
C5LEG003 8
)
C5SQ2003 5
)
T1CMD28 2
T1LEG002 5
AS BIGINSIGHTS(JOBNAME CHAR(8), START TIME)
T1NES002 3
GROUP BY JOBNAME
T1QDM28 3
HAVING COUNT(*) > 1;
T1XRP002 8
34
34
T1XRP006 7
T1XRP010 8
T1XRP014 7
T1XRP018 8
T1XRP022 8
12 record(s)
selected
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Report of Jobs with Job description and Occurrence
-- join to JOBCAT table
SELECT T1.JOBNAME, J.COMMENT, T1.OCCURS
FROM
TABLE
(HDFS_READ
('http://biginsights.demos.ibm.com:14000/webhdfs/v1/'
||'idz1470/iod00s/lab3e5.csv?op=OPEN',
'')
)
AS T1(JOBNAME CHAR(8), OCCURS SMALLINT),
IOD02S.JOBCAT J
WHERE J.JOBNAME = T1.JOBNAME;
IOD02S.JOBCAT table
OWNERDEPT JOBNAME COMMENT
21020
C5CPX002 RUN XMLTABLE JOB
21020
C5EAX003 RUN XMLQUERY JOB
21020
C5EAX007 RUN XMLTABLE JOB
21020
C5LEG003 RUN XML PERFORMANCE JOB
21020
C5NES001 RUN XML INDEX JOB
21020
C5NES002 RUN XMLQUERY JOB
21020
C5NES004 RUN XMLTABLE JOB
21020
C5SQ2003 RUN XMLEXISTS JOB
…….
35
35
Result
JOBNAME COMMENT
OCCURS
C5CPX002 RUN XMLTABLE JOB
1
C5EAX003 RUN XMLQUERY JOB
1
C5EAX007 RUN XMLTABLE JOB
1
C5LEG003 RUN XML PERFORMANCE JOB 8
C5NES001 RUN XML INDEX JOB
1
C5NES002 RUN XMLQUERY JOB
1
C5NES004 RUN XMLTABLE JOB
1
C5SQ2003 RUN XMLEXISTS JOB
5
PJ906521 TEST UDFS JOB 100
1
© 2015 IBM Corporation
……
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
More…
Create a View on HDFS_READ for transparent access
CREATE VIEW IOD00S.VIEW1
AS SELECT T1.JOBNAME, T1.OCCURS
FROM TABLE
(HDFS_READ
('http://biginsights.demos.ibm.com:14000/webhdfs/v1/'
||'idz1470/iod00s/lab3e5.csv?op=OPEN',
'')
)
AS T1(JOBNAME CHAR(8), OCCURS SMALLINT);
36
36
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
More…
Write HDFS_READ result to another table
-- CREATE table to hold job names & start times
CREATE TABLE IOD00S.JOB_STATS
(JOBNAME CHAR(8),
STARTTIME TIME);
-- INSERT into the newly created table
INSERT INTO IOD00S.JOB_STATS
SELECT JOBNAME, START TIME
SELECT * FROM
FROM
iod02s.JOB_STATS
TABLE
ORDER BY
(HDFS_READ
STARTTIME DESC
(‘http://biginsights.demos.ibm.com:14000/webhdfs/v1/'JOBNAME
STARTTIME
||'idz1470/iod00s/lab3e4.csv?op=OPEN',
PJ930233 20:32:59
''
PJ908567 20:32:59
PJ908611 20:32:59
)
PJ908568 20:32:59
)
C5LEG003 20:32:58
AS T1(JOBNAME CHAR(8), START TIME);
T1QDM28 20:32:58
T1NES002 20:32:58
…
37
37
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
More…
Using with Common Table Expressions
WITH JOBS(JOBNAME, START) AS
(SELECT JOBNAME, START TIME
FROM TABLE
(HDFS_READ
(
'http://biginsights.demos.ibm.com:14000/webhdfs/v1/idz1470/'
||'iod02s/lab3e4.csv?op=OPEN',
'') )
Result:
JOBNAME START
AS T1(JOBNAME CHAR(8), START TIME)
C5LEG003 20:32:33
)
T1XRP002 20:32:33
SELECT J1.JOBNAME, J1.START
T1XRP022 20:32:33
T1XRP018 20:32:33
FROM JOBS J1
WHERE J1.START >= (SELECT J2.START FROM JOBS J2 PJ908331 20:32:33
PJ908332 20:32:33
WHERE J2.JOBNAME='SMFDMP28') SMFDMP28 20:32:33
38
38
PJ908330 20:32:33
PJ908390 20:32:33
T1XRP006 20:32:41
T1XRP014 20:32:41
……
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Other
39
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Components of an Integration Solution
Move data out of DB2 (out of Z)
Move data into DB2 (on to Z)
Execute analysis on the hadoop cluster
40
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Big SQL and BigInsights 3.0
Big SQL is a standard component in all IBM BigInsights versions
Enterprise class
– Seamlessly integrated with other BigInsights components
– Expands IBM's continued focus on large scale, high performance analytics
Enterprise
Edition
Standard Edition
Apache
Hadoop
- Spreadsheet-style tool
- Web console
- Dashboards
- Pre-built applications
- Eclipse tooling
- RDBMS connectivity
- Big SQL
- Monitoring and alerts
-- Platform enhancements
-- . . .
-
- Accelerators
- GPFS – FPO
- Adaptive MapReduce
- Text analytics
- Enterprise Integration
-- Big R
-- InfoSphere Streams*
-- Watson Explorer*
-- Cognos BI*
-- Data Click*
-- . . .
-
* Limited use license
Quick Start
Free. Non-production
Same features as
Standard Edition plus
text analytics and Big R
Breadth of capabilities
41
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Loading BigSQL from JDBC data source
A JDBC URL may be used to load directly from external
data source
It supports many options to partition the extraction of data
– Providing a table and partitioning column
– Providing a query and a WHERE clause to use for partitioning
Example usage:
LOAD USING JDBC
CONNECTION URL 'jdbc:db2://myhost:50000/SAMPLE'
WITH PARAMETERS (
user = 'myuser',
password='mypassword'
)
FROM TABLE STAFF WHERE "dept=66 and job='Sales'"
INTO TABLE staff_sales
PARTITION ( dept=66 , job='Sales')
APPEND WITH LOAD PROPERTIES (bigsql.load.num.map.tasks = 1)
;
42
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Co:Z Launcher Example
OpenSSH user credentials
//COZJOB JOB ...,
//*=====================================
//*
//* run analysisScript on linux
//*
//*=====================================
//STEP1
EXEC PROC=COZPROC,
//
ARGS=‘[email protected]'
//STDIN
DD *
Shell script to be run on target
system
# This is input to the remote shell
/home/jason/bin/analysisScript.sh | \
/home/hadoop/hadoop-2.3.0/bin/hdfs dfs -put /hadoop/db2table.toz
43
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Co:Z Dataset Pipes Streaming Example
//COZJOB
//*
//STEP1
//
//UTPRINT
//SYSUT1
//SORTOUT
//SYSIN
JOB ...,
EXEC DSNUPROC,UID='USSPIPE. LOAD',
UTPROC='', SYSTEM=‘DB2A'
DD SYSOUT=*
...
...
DD *
Use a TEMPLATE to point to a
USS pipe
TEMPLATE DATAFIL PATH='/u/admf001/test1'
RECFM(VB)
LRECL(33)
FILEDATA(TEXT)
LOAD DATA FORMAT DELIMITED
COLDEL X'6B' CHARDEL X'7F'
INTO TABLE JASON.TABLE2
(C1 POSITION(*) INTEGER,
C2 POSITION(*) VARCHAR)
INDDN DATAFIL
//COZJOB2
//STEP1
//
//STDIN
DECPT
JOB ...
EXEC PROC=COZPROC,
ARGS='[email protected]‘
DD *
X'4B'
LOAD using the TEMPLATE,
listens to the pipe
A separate job to write to the
pipe from the remote system
/home/hadoop/hadoop-2.3.0/bin/hdfs dfs -cat /hadoop/db2table.toz | \
tofile '/u/admf001/test1'
44
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Apache Sqoop
Open Source top-level Apache product
– http://sqoop.apache.org/
– Transfers data between HDFS and RDBMS (like DB2!)
– Uses DB2 SQL INSERT and SELECT statements to support export and import
– No custom DB2 for z/OS connector
means default performance options
sqoop export
DB2’s JDBC driver
--driver com.ibm.db2.jcc.DB2Driver
–-connect jdbc:db2://9.30.137.207:59000/SYSTMF1
--username JASON --password xxxxxxxx
--table SYSADM.NYSETB02
My server,
--export-dir "/user/jason/myFile.txt" credentials, and
table
-m 9
45
Specify parallelism on
the remote cluster
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Veristorm Enterprise
http://www.veristorm.com/
– (Not an IBM product)
– zDoop : Veristorm hadoop distribution on z/Linux
– User-friendly GUI allows point and click data movement
– Copies z/OS VSAM, DB2, IMS, datasets and log files into zDoop
46
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
IBM InfoSphere System z Connector for Hadoop
Hadoop on your platform of
choice
System z Mainframe
IBM System z for security
Linux for System z
z/OS
Power Systems
InfoSphere BigInsights
DB2
Intel Servers
MapReduce, Hbase, Hive
VSAM
S
M
F
System z
Connector
For Hadoop
Point and click or batch selfservice data access
HDFS
Lower cost processing &
storage
IMS
Logs
z/VM
System z
CP(s)
Connector
For Hadoop
47
IFL
IFL
…
IFL
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Summary
DB2 11 for z/OS. enables traditional applications on
DB2 for z/OS to access IBM's Hadoop based
BigInsights Bigdata platform for Big Data analytics.
–
–
48
JAQL_SUBMIT – Submit a JAQL script for execution on BigInsights
from DB2. Results are stored in Hadoop Distributed File System
(HDFS).
HDFS_READ – A table UDF read HDFS files (containing BigIsights
analytic result) into DB2 as a table for use in SQL.
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
Resources
BigInsights (including JAQL documentation)
http://www-01.ibm.com/software/data/infosphere/biginsights/
JAQL_SUBMIT and HDFS_READ
http://www.ibm.com/support/docview.wss?uid=swg27040438
Integrate DB2 for z/OS with InfoSphere BigInsights, Part 1: Set up the
InfoSphere BigInsights connector for DB2 for z/OS
http://www.ibm.com/developerworks/library/ba-integrate-db2biginsights/index.html
Integrate DB2 for z/OS with InfoSphere BigInsights, Part 2: Use the
InfoSphere BigInsights connector to perform analysis using Jaql and SQL
http://www.ibm.com/developerworks/library/ba-integrate-db2biginsights2/index.html
49
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
InfoSphere BigInsights Quick Start is the newest member of the
BigInsights family
What is BigInsights Quick Start?
• No charge, downloadable edition that allows you to
experiment with enterprise-grade Hadoop features
• Simplifies the complexity of Hadoop with easy-to-follow
tutorials and videos
• No time or data limitations to allow you to experiment
for a wide range of use cases
50
50
Download
Now!
ibm.co\QuickStart
Watch the
videos!
ibmurl.hursley.ibm.com/3
PLJ
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
IBM Silicon Valley Laboratory - 2014 WW Tech Sales Boot Camp
51
© 2015 IBM Corporation
z Analytics WW Sales and Technical Sales
Boot Camp 2015 – January 26-30, 2015
52
© 2015 IBM Corporation