© Copyright 2015 EMC Corporation. All rights reserved. 1
by user
Comments
Transcript
© Copyright 2015 EMC Corporation. All rights reserved. 1
© Copyright 2015 EMC Corporation. All rights reserved. 1 EMC REDEFINING BIG DATA ALEXANDER ERMAKOV - PIVOTAL © Copyright 2015 EMC Corporation. All rights reserved. 2 Traditional Data Architecture 100ms Front End Front End Front End Front End DBMS DBMS DBMS DBMS 3 sec … … ETL 1 day DWH BI 3-4 days © Copyright 2015 EMC Corporation. All rights reserved. Data Minin g … OLAP Front End DBMS The path from end users to business decisions takes 1 day minimum and 3-4 days typically Advanced Data Architecture – ELT DBMS DBMS … DBMS DBMS ETL DBMS DBMS ELT ODS DDS ODS … ODS DDS Aggregates Data Marts … Aggregates Reports OLAP © Copyright 2015 EMC Corporation. All rights reserved. Data Marts Reports OLAP ELT arisen 10 years ago Driven by Storage cost reduction Introduction of MPP Pushdown optimization in ETL tools Modern Data Architecture – Data Lake Concept was introduced 4 years ago by James Dixon Data Lake Idea: integrate Hadoop solution into typical enterprise architecture to improve customer analytics capabilities Usually Data Lake consists of the following approaches – Using Hadoop for storing and processing of unstructured data – Using Hadoop as a staging platform for all the input data and using it for storing all the source data loaded into the customer platform – Historical data offload to Hadoop and using it as a cold data storage © Copyright 2015 EMC Corporation. All rights reserved. Modern Data Architecture – Data Lake DBMS DBMS … CDC DBMS ELT Hadoop DWH ODS ODS ODS … ODS UDS DDS Aggregates OLAP Data Marts OLAP Data Minin g © Copyright 2015 EMC Corporation. All rights reserved. Data Mining At Scale Analytical Archives Reports SQL-on-Hadoop BI Modern Data Architecture – Lambda Lambda Architecture introduced by Nathan Marz 2 years ago Goal is to build a robust scalable fault-tolerant data processing architecture, that is easily extensible and requires minimal maintenance Combines both near real time data processing and batch processing into a single data processing approach Based on the functional approach: query = function(all data) © Copyright 2015 EMC Corporation. All rights reserved. Modern Data Architecture – Lambda Source data is loaded to both Speed and Batch layers Source Data Master Dataset is maintained in Batch Layer and contains all the raw input data and is a basis for any recalculation needed in the system Batch Layer Speed Layer Real-time View Master Dataset Real-time View Serving Layer Real-time View Batch View Query Query © Copyright 2015 EMC Corporation. All rights reserved. Batch View Batch View Speed layer handles only small part of the latest data, discarding all the older data entries Query merges the results from both Batch and Speed layer Modern Data Architecture – Streaming The design is based on event stream processing Uses message queue as the main data hub Was born by reactive programming introduction. Emerged with introduction of Spark Streaming, Storm and Samza Don’t mix with “real-time processing” – – – – Not just a webservice and RPC – no “response” exists in this design Not necessarily real-time: save the stream and reprocess it on demand Event stream processing instead of batch extraction of the data Using the same event stream for both OLTP and OLAP systems © Copyright 2015 EMC Corporation. All rights reserved. Modern Data Architecture – Streaming OLTP Srv SOAP Srv … Srv © Copyright 2015 EMC Corporation. All rights reserved. CDC Log JDBC Batc h SP Tabl e copy Parse ETL cp DWH Batc h ETL load Data Mart … Ap p BE DDS HTTP Ap p Ap p ODS FE JDBC BI Modern Data Architecture – Streaming OLTP Srv SOAP Ap p Srv … CDC Log JDBC Batc h SP Tabl e Srv ETL cp Parse copy DWH Batc h ETL Data Mart … BE DDS HTTP Ap p Ap p ODS FE JDBC BI load RTI Introducing Queue Queue App Hadoop SOAP … Srv © Copyright 2015 EMC Corporation. All rights reserved. JDBC SP Tabl e Ap p JDBC Batc h ETL Data Mart Ap p ETL Srv Srv BI DWH DDS … OLTP ES HTTP Ap p Ap p BE Batc h ODS FE Ap p SQL On Hadoop HDF S STG Pivotal and Modern Data Architecture Pivotal GemFire App Pivotal Cloud Foundry HTTP App … App Spring XD App Streaming App Pivotal HD … App Streaming OLTP Pivotal Greenplum Tabl e ETL ETL © Copyright 2015 EMC Corporation. All rights reserved. Pivota l HAWQ Data Mart ES SP Data BI Data Mart App BE DDS Queue ODS FE Pivotal and Modern Data Architecture Pivotal GemFire Pivotal Labs – agile software development for next-generation applications App Pivotal Cloud Foundry App … App App Streaming Pivotal Cloud Foundry – PaaS for customer applications App Pivotal HD … App Streaming OLTP SP Pivota l HAWQ Data Mart RabbitMQ – distributed message queue service on top of PCF Data Pivotal Greenplum Tabl e ETL Spring IO – foundation platform for modern applications ETL © Copyright 2015 EMC Corporation. All rights reserved. BI Data Mart HTTP Spring XD DDS App BE ES Queue ODS FE Pivotal and Modern Data Architecture Pivotal GemFire App Pivotal Cloud Foundry FE Queue App App BE Spring XD App Streaming App Pivotal GemFire – in-memory data grid enabling real-time data processing and real-time decision making for enterprises App Streaming OLTP Pivotal Greenplum Tabl e ETL ETL © Copyright 2015 EMC Corporation. All rights reserved. Pivota l HAWQ Data Mart ES SP Data BI Data Mart App Pivotal HD … DDS … ODS HTTP Pivotal and Modern Data Architecture Pivotal GemFire App Pivotal Cloud Foundry FE Queue App HTTP App … BE Spring XD App Streaming App Pivotal HD … Spring XD – unified, distributed and extensible framework for data pipelining: ingesting, batching, processing and exporting App Streaming OLTP Pivotal Greenplum Data Mart ETL DDS Tabl e ETL © Copyright 2015 EMC Corporation. All rights reserved. Pivota l HAWQ Data Mart ES SP Data BI ODS App Pivotal and Modern Data Architecture Pivotal GemFire Pivotal HD – leading Hadoop distribution based on ODP App Pivotal HAWQ – bringing the power of MPP to the Hadoop cluster, best in class SQL-on-Hadoop solution Pivotal Cloud Foundry FE Queue App HTTP App … App BE Spring XD App Streaming App Pivotal HD … App Streaming OLTP SP Data BI Pivota l HAWQ Data Mart Pivotal Greenplum Apache Spark – component of the Pivotal HD distribution, modern framework for distributed data processing © Copyright 2015 EMC Corporation. All rights reserved. Data Mart ETL DDS ES ETL ODS Tabl e Pivotal and Modern Data Architecture Pivotal GemFire App Pivotal Cloud Foundry FE Queue BE Spring XD App App App App Pivotal Greenplum – leading analytical MPP database, HTTP … for the enterprise … foundation data warehousing systems and advanced analytics Streaming Pivotal HD App Streaming OLTP Pivotal Greenplum Data Mart ETL DDS Tabl e ETL © Copyright 2015 EMC Corporation. All rights reserved. Pivota l HAWQ Data Mart ES SP Data BI ODS App Pivotal and Modern Data Architecture Pivotal GemFire App Pivotal Cloud Foundry HTTP App … App Spring XD App Streaming App Pivotal HD … App Streaming OLTP SP Pivotal Greenplum Tabl e ETL ETL © Copyright 2015 EMC Corporation. All rights reserved. Pivota l HAWQ Data Mart ES Data Lake Data BI Data Mart App BE DDS Queue ODS FE Pivotal and Modern Data Architecture Pivotal GemFire App Pivotal Cloud Foundry HTTP App … App Spring XD App Streaming App Pivotal HD … App Streaming OLTP SP Pivotal Greenplum Tabl e ETL ETL © Copyright 2015 EMC Corporation. All rights reserved. Pivota l HAWQ Data Mart ES Lambda Architecture Data BI Data Mart App BE DDS Queue ODS FE Pivotal and Modern Data Architecture Pivotal GemFire App Pivotal Cloud Foundry HTTP App … App Spring XD App Streaming App Pivotal HD … App Streaming OLTP SP Pivotal Greenplum Tabl e ETL ETL © Copyright 2015 EMC Corporation. All rights reserved. Pivota l HAWQ Data Mart ES Streaming Data BI Data Mart App BE DDS Queue ODS FE Pivotal and Modern Data Architecture Pivotal GemFire App Pivotal Cloud Foundry HTTP App … App Spring XD App Streaming App Pivotal HD … App Streaming OLTP Pivotal Greenplum Tabl e ETL ETL © Copyright 2015 EMC Corporation. All rights reserved. Pivota l HAWQ Data Mart ES SP Data BI Data Mart App BE DDS Queue ODS FE