Comments
Transcript
© 2012 IBM Corporation 1 © 2015 IBM Corporation
z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Building a Big Data Solution Using IBM DB2 for z/OS Jane Man Senior Software Engineer [email protected] © 1 2015 IBM Corporation © 2012 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2 2 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Agenda Big Data Overview What is Hadoop? What is BigInsight? DB2 11 for z/OS connectors to BigData What is it? How does it work? Use Cases Other BigInsights BigSql 3.0 DoveTail Co:Z Sqoop Veristorm Enterprise Summary Questions and Answers IBM Silicon Valley Laboratory - 2014 WW Tech Sales Boot Camp 3 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Big Data is All Data And All Paradigms Transactional & Application Data 4 4 Machine Data Social Data Enterprise Content • Volume • Velocity • Variety • Variety • Structured • Semi-structured • Highly unstructured • Highly unstructured • Throughput • Ingestion • Veracity • Volume © 2015 IBM Corporation © 2013 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Demand for differently structured data to be seamlessly integrated, to augment analytics / decisions • • Analytics and decision engines reside where the DWH / transaction data is “Noise” (veracity) surrounds the core business data • • Social Media, emails, docs, telemetry, voice, video, content Expanding our insights – getting closer to the “truth” • • Data Warehouse Integration Business Analytics DB2 for z/OS IMS Information Governance Lower risk and cost Increased profitability “Circle of trust” ” widens 55 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Big Data Use Cases Big Data Exploration Find, visualize, understand all big data to improve decision making 6 Enhanced 360o View of the Customer Security/Intelligence Extension Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Lower risk, detect fraud and monitor cyber security in real-time Operations Analysis Data Warehouse Augmentation Analyze a variety of machine data for improved business results Integrate big data and data warehouse capabilities to increase operational efficiency © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 What is An open source software framework that supports dataintensive distributed applications High throughput, batch processing runs on large clusters of commodity hardware • Yahoo runs a 4000 nodes Hadoop cluster in 2008 • Two main components Hadoop distributed file system • self-healing, high-bandwidth clustered storage MapReduce engine 77 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Hadoop: The underlying principle Lots of redundant disks – really inexpensive disks Lots of cores – inexpensive cores working all the time Disks crash – that’s ok – just replace them Processors fail – that’s ok – just replace them Network errors happen – that’s ok - just retry Disks, processors networked 8 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Hadoop Distributed File System (HDFS) Files are broken in to large blocks (default=64MB). Blocks are replicated (default=3 times) and distributed across the cluster. Durability Availability Throughput Optimized for Streaming reads of large files write-once-read-many access model, append-only 99 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 MapReduce A simple, yet powerful framework for parallel computation – Applicable to many problems, flexible data format Basic steps: – Do parallel computation (Map) on each block (split) of data in an HDFS file and output a stream of (key, value) pairs to the local file system – Redistribute (shuffle) the map output by key – Do another parallel computation on the redistributed map output and write results into HDFS (Reduce) Mapper Reducer M1 M2 M3 shuffle R1 R2 M4 10 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Power of Parallelism 1 Hard Disk = 100MB/sec (~1Gbps) Server = 12 Hard Disks = 1.2GB/sec (~12Gbps) Rack = 20 Servers = 24GB/sec (~240Gbps) Avg. Cluster = 6 Racks = 144GB/sec (~1.4Tbps) Large Cluster = 200 Racks = 4.8TB/sec (~48Tbps) Scanning 4.8TB at 100MB/sec takes 13 hours. 11 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 From Getting Starting to Enterprise Deployment: Enterprise class Different BigInsights Editions For Varying Needs Enterprise Edition Standard Edition Apache Hadoop - Spreadsheet-style tool -- Web console -- Dashboards - Pre-built applications -- Eclipse tooling -- RDBMS connectivity -- Big SQL -- Monitoring and alerts -- Platform enhancements -- . . . - Accelerators -- GPFS – FPO -- Adaptive MapReduce - Text analytics - Enterprise Integration -- Big R -- InfoSphere Streams* -- Watson Explorer* -- Cognos BI* -- Data Click* -- . . . Quick Start Free. Non-production - * Limited use license Same features as Standard Edition plus text analytics and Big R Breadth of capabilities 12 © 2013 IBM Corporation 12 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 InfoSphere BigInsights for Hadoop includes the latest Open Source components, enhanced by enterprise components IBM InfoSphere BigInsights for Hadoop GPFS FPO Open Source 13 IBM Governance HDFS Data Privacy for Hadoop File System Data Matching HBase Flexible Scheduler Data Store Adaptive MapReduce Audit & History MapReduce Data Masking Big SQL Security Pig Data Security for Hadoop Sqoop LDAP Hive Kerberos ETL YARN* HCatalog Monitoring Flume Search Jaql Enterprise Search Runtime Streams Solr/ Lucene Data Access Text Analytics Resource Management & Administration Big R Console Stream Computing Oozie Dashboard Advanced Analytics R BigSheets Reader and Macro Text Analytics Extractors Eclipse Tooling: MapReduce, Hive, Jaql, Pig, Big SQL, AQL BigSheets Charting Applications & Development ZooKeeper Visualization & Ad Hoc Analytics * In Beta © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 IBM InfoSphere BigInsights on System z Deployment flexibility: Compatible clusters on or off the Mainframe Secure Perimeter 14 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 What makes sense when? Data originates mostly on the mainframe (Log files, database extracts) Data security a primary concern Clients will not send data across external net Relatively small data – 100 GB to 10s of TBs Hadoop is valued mainly for richness of tools Case 1: Hadoop on the Mainframe DB2 VSAM QSAM IMS Linux SMF Linux Linux Linux z/VM RMF Logs Linux IFL … IFL IFL z/OS CP(s) Case 2: Hadoop off the Mainframe DB2 VSAM QSAM IMS SMF RMF Linux Logs z/OS Linux Linux Linux Linux Most data originates off the mainframe Security less of a concern since data is not trusted anyway Very large data sets – 100s of TB to PBs Hadoop is valued for ability to manage large datasets economically Desire to leverage cheap processing and potentially cloud elasticity CP(s) 15 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 BigInsights “Connector” for DB2 for z/OS 16 © 2015 IBM Corporation V11 z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Enhancing DB2 Analytics on “z” with Big Data DB2 is providing the connectors and the DB capability to allow DB2 applications to easily and efficiently access data in Hadoop •New user-defined functions •New generic table UDF capability IBM Big Insights JAQL { } 17 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 DB2 11 Support for BigData Goal: Integrate DB2 for z/OS with IBM's Hadoop based BigInsights Bigdata platform. Enabling traditional applications on DB2 for z/OS to access Big Data analytics. 1. Analytics jobs can be specified using JSON Query Language (Jaql) 1. Submitted to BigInsights 2. Results stored in Hadoop Distributed File System (HDFS). 2. A table UDF (HDFS_READ) reads the Bigdata analytic result from HDFS, for subsequent use in an SQL query. – HDFS_READ output table can have variable shapes – DB2 11 supports generic table UDF, enabling this function 18 18 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 BigInsights “Connector” for DB2 for z/OS Two DB2 “sample” functions: –JAQL_SUBMIT – Submit a JAQL script for execution on BigInsights from DB2 –HDFS_READ – Read HDFS files into DB2 as a table for use in SQL Notes: –Functions are developed by DB2 for z/OS • Shipped with DB2 11 in prefix.SDSNLOD2 • Functions are not installed by default –Functions and examples are documented by BigInsights • http://www.ibm.com/support/docview.wss?uid=swg27040438 –Intended to be DB2 samples 19 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 BigInsights Connector Usage Key Roles: –DB2 for z/OS DBA, Sysprog • Issue CREATE FUNCTION DDL on DB2 – Refer to SDSNLOD2 dataset member – Determine an appropriate WLM environment –BigInsights Analytics Team • Develop Jaql (or other) analytics on BigInsights –DB2 for z/OS Application or Analytics Team • Use BigInsights Connector functions in DB2 to execute BigInsights analytics and retrieve results 20 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 JAQL_SUBMIT Submit a JAQL script for execution on BigInsights from DB2 SET RESULTFILE = JAQL_SUBMIT ('syslog = lines("hdfs:///idz1470/syslog3sec.txt"); JAQL script containing the analysis [localRead(syslog)->filter(strPos($,"$HASP373")>=0)->count()]-> write(del(location="hdfs:///idz1470/iod00s/lab3e2.csv"));’, 'http://bootcamp55.democentral.ibm.com:14000/webhdfs/v1/idz1470/ iod00s/lab3e2.csv?op=OPEN', 'http://bootcamp55.democentral.ibm.com:8080', '' ); 21 options Intended HDFS file to hold the result URL of the BigInsights cluster © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 JAQL – JSON Query Language Java MapReduce is the “assembly language” of Hadoop. JAQL is a high-level query language included in BigInsights with three objectives: –Semi-structured analytics: analyze and manipulate large-scale semi-structured data, like JSON data –Parallelism: uses the Hadoop MapReduce framework to process JSON data in parallel –Extensibility: JAQL UDFs, Java functions, JAQL modules JAQL provides a hook into any BigInsights analysis 22 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 HDFS_READ Example HDFS_READ Read a file from HDFS, present the file as a DB2 table for use in SQL SET RESULT_FILE = JAQL_SUBMIT(. . . . . ); URL of the CSV file to be read SELECT BIRESULT.CNT FROM TABLE(HDFS_READ(RESULT_FILE, '') AS BIRESULT(CNT INTEGER); options Definition of the “table”, how to present the results to SQL 23 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Example of HDFS_Read The following is an example of a CSV file stored in HDFS 1997,Ford, E350,"ac, abs, moon",3000.00 1999,Chevy, "Venture ""Extended Edition""", ,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! AC, moon roof, loaded",4799.00 Example SQL statement: SELECT * FROM TABLE (HDFS_Read('http://BI.foo.com/data/controller/dfs/file.csv', ‘')) AS X (YEAR INTEGER, MAKE VARCHAR(10), MODEL VARCHAR(30), DESCRIPTION VARCHAR(40), PRICE DECIMAL(8,2)); • Result YEAR -------1997 1999 1996 24 MAKE -------Ford Chevy Jeep MODEL --------------------E350 Venture "Extended Edition" Grand Cherokee DESCRIPTION PRICE ------------------------------ ----------ac, abs, moon 3000.00 (null) 4900.00 MUST SELL! AC, moon 4799.00 roof, loaded © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Integrated Query Example INSERT INTO BI_TABLE (CNT) (SELECT CNT FROM TABLE (HDFS_READ (JAQL_SUBMIT ('syslog = lines("hdfs:///idz1470/syslog3sec.txt"); [localRead(syslog)->filter(strPos($,"$HASP373")>=0)->count()]-> write(del(location="hdfs:///idz1470/iod00s/lab3e2.csv"));' , 'http://bootcamp55.democentral.ibm.com:14000/webhdfs/v1/idz1470/ iod00s/lab3e2.csv?op=OPEN', 'http://bootcamp55.democentral.ibm.com:8080', '' ), '' ) ) AS BIGINSIGHTS(CNT INTEGER)); 25 JAQL_SUBMIT can be embedded in HDFS_READ for a synchronous execute/read workflow © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 DB2 – BigInsights Integration Use Case 26 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 DB2-BigInsights Integration Use Case 1 1. BigInsights ingests data that usually is not ingested by established structured data analysis systems like DB2 • e.g. email from all clients sent to an insurance company. 2. DB2 kicks off a Hadoop job on BigInsights that analyzes the emails and identifies customers who have expressed dissatisfaction with the service and the word ‘cancel’, ‘terminate’, ‘switch’ or synonyms thereof, or names of the competitors. 3. BigInsights job runs successfully, creates a file of results (names and email addresses of customers at risk) and terminates. 4. DB2 reads the BigInsights result file using user defined table function (HDFS_READ). 5. DB2 joins the result with the Agent table and alerts the agents of the atrisk customers. The agents act upon the at-risk customer and offer a promotion to stave off defection. 27 27 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Use Case 2 – syslog analysis 1. DB2 Syslog is sent to BigInsight for analysis 2. Issue SQL statements to 1. 2. 3. 4. 5. 6. 7. Count the number of lines in our syslog Count the number of jobs started in our syslog Show the lines containing $HASP373 messages Show job names together with start time Count the occurrences of each job Show which jobs hand more than one occurrence Produce a report of jobs with occurrence, departments, and owner names internet DB2 11 for z/OS 28 28 Infosphere BigInsights 2.1 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Syslog sample 29 29 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Count the number of lines in our syslog SELECT BIGINSIGHTS.CNT FROM TABLE (HDFS_READ (JAQL_SUBMIT ('syslog = lines("hdfs:///idz1470/syslog3sec.txt"); [localRead(syslog)->count()]-> write(del(location="hdfs:///idz1470/iod00s/lab3e1.csv"));', 'http://biginsights.demos.ibm.com:14000/webhdfs/v1/idz1470/' ||'iod00s/lab3e1.csv?op=OPEN', 'http://biginsights.demos.ibm.com:8080', 'timeout=1000' ), '' ) ) AS BIGINSIGHTS(CNT INTEGER); 30 30 Result: CNT 6055 1 record(s) selected © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Show the lines containing $HASP373 messages SELECT BIGINSIGHTS.LINE FROM TABLE (HDFS_READ (JAQL_SUBMIT ('syslog = lines("hdfs:///idz1470/syslog3sec.txt"); localRead(syslog)->filter(strPos($,"$HASP373")>=0)-> write(del(location="hdfs:///idz1470/iod00s/lab3e3.csv"));', 'http://biginsights.demos.ibm.com:14000/webhdfs/v1/idz1470/' ||'iod00s/lab3e3.csv?op=OPEN', 'http://biginsights.demos.ibm.com:8080', 'timeout=1000' ), '' ) ) 31 31 AS BIGINSIGHTS(LINE VARCHAR(116)); © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Result: LINE N 4000000 STLAB28 13189 20:30:00.20 JOB07351 00000010 $HASP373 PJ907163 STARTED - INIT PJ - CLASS V - SYS 28 N 4000000 STLAB28 13189 20:30:00.20 JOB07348 00000010 $HASP373 C5LEG003 STARTED - INIT K1 - CLASS L - SYS 28 N 4000000 STLAB28 13189 20:30:00.21 JOB07288 00000010 $HASP373 T1XRP010 STARTED - INIT D6 - CLASS 8 - SYS 28 N 4000000 STLAB28 13189 20:30:00.21 JOB07290 00000010 $HASP373 T1XRP018 STARTED - INIT D9 - CLASS 8 - SYS 28 N 4000000 STLAB28 13189 20:30:00.21 JOB07287 00000010 $HASP373 T1XRP022 STARTED - INIT D7 - CLASS 8 - SYS 28 N 4000000 STLAB28 13189 20:30:00.21 JOB07289 00000010 $HASP373 T1XRP002 STARTED - INIT D5 - CLASS 8 - SYS 28 N 4000000 STLAB28 13189 20:30:00.26 JOB07413 00000010 $HASP373 PJ907165 STARTED - INIT PJ - CLASS V - SYS 28 N 4000000 STLAB28 13189 20:30:00.26 JOB07419 00000010 $HASP373 PJ907164 STARTED - INIT PJ - CLASS V - SYS 28 N 4000000 STLAB28 13189 20:30:08.71 JOB07349 00000010 $HASP373 T1XRP006 STARTED - INIT D0 - CLASS 8 - SYS 28 N 4000000 STLAB28 13189 20:30:08.71 JOB07350 00000010 $HASP373 T1XRP014 STARTED - INIT D8 - CLASS 8 - SYS 28 N 4000000 STLAB28 13189 20:30:17.25 JOB07490 00000010 $HASP373 T1XRP022 STARTED - INIT D7 - CLASS 8 - SYS 28 N 4000000 STLAB28 13189 20:30:17.25 JOB07492 00000010 $HASP373 T1XRP002 STARTED - INIT D5 - CLASS 8 - SYS 28 N 4000000 STLAB28 13189 20:30:17.28 JOB07494 00000010 $HASP373 PJ907289 STARTED - INIT PJ - CLASS V - SYS 28 N 4000000 STLAB28 13189 20:30:17.29 JOB07495 00000010 $HASP373 PJ907287 STARTED - INIT PJ - CLASS V - SYS 28 .... 161 record(s) selected 32 32 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Count the number of jobs started in our syslog SELECT BIGINSIGHTS.CNT FROM TABLE (HDFS_READ (JAQL_SUBMIT ('syslog = lines("hdfs:///idz1470/syslog3sec.txt"); [localRead(syslog)->filter(strPos($,"$HASP373“)>=0)->count()]-> write(del(location="hdfs:///idz1470/iod00s/lab3e2.csv"));', 'http://biginsights.demos.ibm.com:14000/webhdfs/v1/idz1470/' ||'iod00s/lab3e2.csv?op=OPEN', 'http://biginsights.demos.ibm.com:8080', 'timeout=1000' ), '' ) ) Result: CNT 161 1 record(s) selected AS BIGINSIGHTS(CNT INTEGER); 33 33 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Integrating HDFS_READ with SQL -- read file from exercise lab3e4, group to occurrences in db2 SELECT JOBNAME, COUNT(*) AS OCCURS FROM TABLE (HDFS_READ ( 'http://biginsights.demos.ibm.com:14000/webhdfs/v1/' ||'idz1470/iod00s/lab3e4.csv?op=OPEN', Result: JOBNAME OCCURS '' C5LEG003 8 ) C5SQ2003 5 ) T1CMD28 2 T1LEG002 5 AS BIGINSIGHTS(JOBNAME CHAR(8), START TIME) T1NES002 3 GROUP BY JOBNAME T1QDM28 3 HAVING COUNT(*) > 1; T1XRP002 8 34 34 T1XRP006 7 T1XRP010 8 T1XRP014 7 T1XRP018 8 T1XRP022 8 12 record(s) selected © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Report of Jobs with Job description and Occurrence -- join to JOBCAT table SELECT T1.JOBNAME, J.COMMENT, T1.OCCURS FROM TABLE (HDFS_READ ('http://biginsights.demos.ibm.com:14000/webhdfs/v1/' ||'idz1470/iod00s/lab3e5.csv?op=OPEN', '') ) AS T1(JOBNAME CHAR(8), OCCURS SMALLINT), IOD02S.JOBCAT J WHERE J.JOBNAME = T1.JOBNAME; IOD02S.JOBCAT table OWNERDEPT JOBNAME COMMENT 21020 C5CPX002 RUN XMLTABLE JOB 21020 C5EAX003 RUN XMLQUERY JOB 21020 C5EAX007 RUN XMLTABLE JOB 21020 C5LEG003 RUN XML PERFORMANCE JOB 21020 C5NES001 RUN XML INDEX JOB 21020 C5NES002 RUN XMLQUERY JOB 21020 C5NES004 RUN XMLTABLE JOB 21020 C5SQ2003 RUN XMLEXISTS JOB ……. 35 35 Result JOBNAME COMMENT OCCURS C5CPX002 RUN XMLTABLE JOB 1 C5EAX003 RUN XMLQUERY JOB 1 C5EAX007 RUN XMLTABLE JOB 1 C5LEG003 RUN XML PERFORMANCE JOB 8 C5NES001 RUN XML INDEX JOB 1 C5NES002 RUN XMLQUERY JOB 1 C5NES004 RUN XMLTABLE JOB 1 C5SQ2003 RUN XMLEXISTS JOB 5 PJ906521 TEST UDFS JOB 100 1 © 2015 IBM Corporation …… z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 More… Create a View on HDFS_READ for transparent access CREATE VIEW IOD00S.VIEW1 AS SELECT T1.JOBNAME, T1.OCCURS FROM TABLE (HDFS_READ ('http://biginsights.demos.ibm.com:14000/webhdfs/v1/' ||'idz1470/iod00s/lab3e5.csv?op=OPEN', '') ) AS T1(JOBNAME CHAR(8), OCCURS SMALLINT); 36 36 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 More… Write HDFS_READ result to another table -- CREATE table to hold job names & start times CREATE TABLE IOD00S.JOB_STATS (JOBNAME CHAR(8), STARTTIME TIME); -- INSERT into the newly created table INSERT INTO IOD00S.JOB_STATS SELECT JOBNAME, START TIME SELECT * FROM FROM iod02s.JOB_STATS TABLE ORDER BY (HDFS_READ STARTTIME DESC (‘http://biginsights.demos.ibm.com:14000/webhdfs/v1/'JOBNAME STARTTIME ||'idz1470/iod00s/lab3e4.csv?op=OPEN', PJ930233 20:32:59 '' PJ908567 20:32:59 PJ908611 20:32:59 ) PJ908568 20:32:59 ) C5LEG003 20:32:58 AS T1(JOBNAME CHAR(8), START TIME); T1QDM28 20:32:58 T1NES002 20:32:58 … 37 37 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 More… Using with Common Table Expressions WITH JOBS(JOBNAME, START) AS (SELECT JOBNAME, START TIME FROM TABLE (HDFS_READ ( 'http://biginsights.demos.ibm.com:14000/webhdfs/v1/idz1470/' ||'iod02s/lab3e4.csv?op=OPEN', '') ) Result: JOBNAME START AS T1(JOBNAME CHAR(8), START TIME) C5LEG003 20:32:33 ) T1XRP002 20:32:33 SELECT J1.JOBNAME, J1.START T1XRP022 20:32:33 T1XRP018 20:32:33 FROM JOBS J1 WHERE J1.START >= (SELECT J2.START FROM JOBS J2 PJ908331 20:32:33 PJ908332 20:32:33 WHERE J2.JOBNAME='SMFDMP28') SMFDMP28 20:32:33 38 38 PJ908330 20:32:33 PJ908390 20:32:33 T1XRP006 20:32:41 T1XRP014 20:32:41 …… © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Other 39 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Components of an Integration Solution Move data out of DB2 (out of Z) Move data into DB2 (on to Z) Execute analysis on the hadoop cluster 40 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Big SQL and BigInsights 3.0 Big SQL is a standard component in all IBM BigInsights versions Enterprise class – Seamlessly integrated with other BigInsights components – Expands IBM's continued focus on large scale, high performance analytics Enterprise Edition Standard Edition Apache Hadoop - Spreadsheet-style tool - Web console - Dashboards - Pre-built applications - Eclipse tooling - RDBMS connectivity - Big SQL - Monitoring and alerts -- Platform enhancements -- . . . - - Accelerators - GPFS – FPO - Adaptive MapReduce - Text analytics - Enterprise Integration -- Big R -- InfoSphere Streams* -- Watson Explorer* -- Cognos BI* -- Data Click* -- . . . - * Limited use license Quick Start Free. Non-production Same features as Standard Edition plus text analytics and Big R Breadth of capabilities 41 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Loading BigSQL from JDBC data source A JDBC URL may be used to load directly from external data source It supports many options to partition the extraction of data – Providing a table and partitioning column – Providing a query and a WHERE clause to use for partitioning Example usage: LOAD USING JDBC CONNECTION URL 'jdbc:db2://myhost:50000/SAMPLE' WITH PARAMETERS ( user = 'myuser', password='mypassword' ) FROM TABLE STAFF WHERE "dept=66 and job='Sales'" INTO TABLE staff_sales PARTITION ( dept=66 , job='Sales') APPEND WITH LOAD PROPERTIES (bigsql.load.num.map.tasks = 1) ; 42 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Co:Z Launcher Example OpenSSH user credentials //COZJOB JOB ..., //*===================================== //* //* run analysisScript on linux //* //*===================================== //STEP1 EXEC PROC=COZPROC, // ARGS=‘[email protected]' //STDIN DD * Shell script to be run on target system # This is input to the remote shell /home/jason/bin/analysisScript.sh | \ /home/hadoop/hadoop-2.3.0/bin/hdfs dfs -put /hadoop/db2table.toz 43 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Co:Z Dataset Pipes Streaming Example //COZJOB //* //STEP1 // //UTPRINT //SYSUT1 //SORTOUT //SYSIN JOB ..., EXEC DSNUPROC,UID='USSPIPE. LOAD', UTPROC='', SYSTEM=‘DB2A' DD SYSOUT=* ... ... DD * Use a TEMPLATE to point to a USS pipe TEMPLATE DATAFIL PATH='/u/admf001/test1' RECFM(VB) LRECL(33) FILEDATA(TEXT) LOAD DATA FORMAT DELIMITED COLDEL X'6B' CHARDEL X'7F' INTO TABLE JASON.TABLE2 (C1 POSITION(*) INTEGER, C2 POSITION(*) VARCHAR) INDDN DATAFIL //COZJOB2 //STEP1 // //STDIN DECPT JOB ... EXEC PROC=COZPROC, ARGS='[email protected]‘ DD * X'4B' LOAD using the TEMPLATE, listens to the pipe A separate job to write to the pipe from the remote system /home/hadoop/hadoop-2.3.0/bin/hdfs dfs -cat /hadoop/db2table.toz | \ tofile '/u/admf001/test1' 44 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Apache Sqoop Open Source top-level Apache product – http://sqoop.apache.org/ – Transfers data between HDFS and RDBMS (like DB2!) – Uses DB2 SQL INSERT and SELECT statements to support export and import – No custom DB2 for z/OS connector means default performance options sqoop export DB2’s JDBC driver --driver com.ibm.db2.jcc.DB2Driver –-connect jdbc:db2://9.30.137.207:59000/SYSTMF1 --username JASON --password xxxxxxxx --table SYSADM.NYSETB02 My server, --export-dir "/user/jason/myFile.txt" credentials, and table -m 9 45 Specify parallelism on the remote cluster © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Veristorm Enterprise http://www.veristorm.com/ – (Not an IBM product) – zDoop : Veristorm hadoop distribution on z/Linux – User-friendly GUI allows point and click data movement – Copies z/OS VSAM, DB2, IMS, datasets and log files into zDoop 46 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 IBM InfoSphere System z Connector for Hadoop Hadoop on your platform of choice System z Mainframe IBM System z for security Linux for System z z/OS Power Systems InfoSphere BigInsights DB2 Intel Servers MapReduce, Hbase, Hive VSAM S M F System z Connector For Hadoop Point and click or batch selfservice data access HDFS Lower cost processing & storage IMS Logs z/VM System z CP(s) Connector For Hadoop 47 IFL IFL … IFL © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Summary DB2 11 for z/OS. enables traditional applications on DB2 for z/OS to access IBM's Hadoop based BigInsights Bigdata platform for Big Data analytics. – – 48 JAQL_SUBMIT – Submit a JAQL script for execution on BigInsights from DB2. Results are stored in Hadoop Distributed File System (HDFS). HDFS_READ – A table UDF read HDFS files (containing BigIsights analytic result) into DB2 as a table for use in SQL. © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 Resources BigInsights (including JAQL documentation) http://www-01.ibm.com/software/data/infosphere/biginsights/ JAQL_SUBMIT and HDFS_READ http://www.ibm.com/support/docview.wss?uid=swg27040438 Integrate DB2 for z/OS with InfoSphere BigInsights, Part 1: Set up the InfoSphere BigInsights connector for DB2 for z/OS http://www.ibm.com/developerworks/library/ba-integrate-db2biginsights/index.html Integrate DB2 for z/OS with InfoSphere BigInsights, Part 2: Use the InfoSphere BigInsights connector to perform analysis using Jaql and SQL http://www.ibm.com/developerworks/library/ba-integrate-db2biginsights2/index.html 49 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 InfoSphere BigInsights Quick Start is the newest member of the BigInsights family What is BigInsights Quick Start? • No charge, downloadable edition that allows you to experiment with enterprise-grade Hadoop features • Simplifies the complexity of Hadoop with easy-to-follow tutorials and videos • No time or data limitations to allow you to experiment for a wide range of use cases 50 50 Download Now! ibm.co\QuickStart Watch the videos! ibmurl.hursley.ibm.com/3 PLJ © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 IBM Silicon Valley Laboratory - 2014 WW Tech Sales Boot Camp 51 © 2015 IBM Corporation z Analytics WW Sales and Technical Sales Boot Camp 2015 – January 26-30, 2015 52 © 2015 IBM Corporation