...

© Copyright 2015 EMC Corporation. All rights reserved. 1

by user

on
Category: Documents
9

views

Report

Comments

Transcript

© Copyright 2015 EMC Corporation. All rights reserved. 1
© Copyright 2015 EMC Corporation. All rights reserved.
1
EMC REDEFINING BIG DATA
ALEXANDER ERMAKOV - PIVOTAL
© Copyright 2015 EMC Corporation. All rights reserved.
2
Traditional Data Architecture
100ms
Front
End
Front
End
Front
End
Front
End
DBMS
DBMS
DBMS
DBMS
3 sec
…
…
ETL
1 day
DWH
BI
3-4
days
© Copyright 2015 EMC Corporation. All rights reserved.
Data
Minin
g
…
OLAP
Front
End
DBMS
The path from end users
to business decisions
takes 1 day minimum
and 3-4 days typically
Advanced Data Architecture – ELT
DBMS
DBMS
…
DBMS
DBMS
ETL
DBMS
DBMS
ELT
ODS
DDS
ODS
…
ODS
DDS
Aggregates
Data
Marts
…
Aggregates
Reports
OLAP
© Copyright 2015 EMC Corporation. All rights reserved.
Data
Marts
Reports
OLAP
ELT arisen 10 years ago
Driven by
 Storage cost reduction
 Introduction of MPP
 Pushdown
optimization in ETL
tools
Modern Data Architecture – Data Lake
 Concept was introduced 4 years ago by James Dixon
 Data Lake Idea: integrate Hadoop solution into typical
enterprise architecture to improve customer analytics
capabilities
 Usually Data Lake consists of the following approaches
– Using Hadoop for storing and processing of unstructured data
– Using Hadoop as a staging platform for all the input data and using it
for storing all the source data loaded into the customer platform
– Historical data offload to Hadoop and using it as a cold data storage
© Copyright 2015 EMC Corporation. All rights reserved.
Modern Data Architecture – Data Lake
DBMS
DBMS
…
CDC
DBMS
ELT
Hadoop
DWH
ODS
ODS
ODS
…
ODS
UDS
DDS
Aggregates
OLAP
Data
Marts
OLAP
Data
Minin
g
© Copyright 2015 EMC Corporation. All rights reserved.
Data Mining
At Scale
Analytical
Archives
Reports
SQL-on-Hadoop
BI
Modern Data Architecture – Lambda
 Lambda Architecture introduced by Nathan Marz 2 years ago
 Goal is to build a robust scalable fault-tolerant data
processing architecture, that is easily extensible and requires
minimal maintenance
 Combines both near real time data processing and batch
processing into a single data processing approach
 Based on the functional approach:
query = function(all data)
© Copyright 2015 EMC Corporation. All rights reserved.
Modern Data Architecture – Lambda
 Source data is loaded to both
Speed and Batch layers
Source
Data
 Master Dataset is maintained in
Batch Layer and contains all the
raw input data and is a basis for
any recalculation needed in the
system
Batch Layer
Speed
Layer
Real-time
View
Master Dataset
Real-time
View
Serving Layer
Real-time
View
Batch
View
Query
Query
© Copyright 2015 EMC Corporation. All rights reserved.
Batch
View
Batch
View
 Speed layer handles only small
part of the latest data, discarding
all the older data entries
 Query merges the results from
both Batch and Speed layer
Modern Data Architecture – Streaming
 The design is based on event stream processing
 Uses message queue as the main data hub
 Was born by reactive programming introduction. Emerged
with introduction of Spark Streaming, Storm and Samza
 Don’t mix with “real-time processing”
–
–
–
–
Not just a webservice and RPC – no “response” exists in this design
Not necessarily real-time: save the stream and reprocess it on demand
Event stream processing instead of batch extraction of the data
Using the same event stream for both OLTP and OLAP systems
© Copyright 2015 EMC Corporation. All rights reserved.
Modern Data Architecture – Streaming
OLTP
Srv
SOAP
Srv
…
Srv
© Copyright 2015 EMC Corporation. All rights reserved.
CDC
Log
JDBC
Batc
h
SP
Tabl
e
copy
Parse
ETL
cp
DWH
Batc
h
ETL
load
Data
Mart
…
Ap
p
BE
DDS
HTTP
Ap
p
Ap
p
ODS
FE
JDBC
BI
Modern Data Architecture – Streaming
OLTP
Srv
SOAP
Ap
p
Srv
…
CDC
Log
JDBC
Batc
h
SP
Tabl
e
Srv
ETL
cp
Parse
copy
DWH
Batc
h
ETL
Data
Mart
…
BE
DDS
HTTP
Ap
p
Ap
p
ODS
FE
JDBC
BI
load
RTI
Introducing Queue
Queue
App
Hadoop
SOAP
…
Srv
© Copyright 2015 EMC Corporation. All rights reserved.
JDBC
SP
Tabl
e
Ap
p
JDBC
Batc
h
ETL
Data
Mart
Ap
p
ETL
Srv
Srv
BI
DWH
DDS
…
OLTP
ES
HTTP
Ap
p
Ap
p
BE
Batc
h
ODS
FE
Ap
p
SQL
On
Hadoop
HDF
S
STG
Pivotal and Modern Data Architecture
Pivotal
GemFire
App
Pivotal Cloud Foundry
HTTP
App
…
App
Spring XD
App
Streaming
App
Pivotal HD
…
App
Streaming
OLTP
Pivotal
Greenplum
Tabl
e
ETL
ETL
© Copyright 2015 EMC Corporation. All rights reserved.
Pivota
l
HAWQ
Data
Mart
ES
SP
Data
BI
Data
Mart
App
BE
DDS
Queue
ODS
FE
Pivotal and Modern Data Architecture
Pivotal
GemFire
 Pivotal Labs – agile software
development for next-generation
applications
App
Pivotal Cloud Foundry
App
…
App
App
Streaming
 Pivotal Cloud Foundry – PaaS
for customer applications
App
Pivotal HD
…
App
Streaming
OLTP
SP
Pivota
l
HAWQ
Data
Mart
 RabbitMQ – distributed message
queue service on top of PCF
Data
Pivotal
Greenplum
Tabl
e
ETL
 Spring IO – foundation platform
for modern applications
ETL
© Copyright 2015 EMC Corporation. All rights reserved.
BI
Data
Mart
HTTP
Spring XD
DDS
App
BE
ES
Queue
ODS
FE
Pivotal and Modern Data Architecture
Pivotal
GemFire
App
Pivotal Cloud Foundry
FE
Queue
App
App
BE
Spring XD
App
Streaming
App
Pivotal GemFire – in-memory data grid enabling real-time
data processing and real-time decision making for enterprises
App
Streaming
OLTP
Pivotal
Greenplum
Tabl
e
ETL
ETL
© Copyright 2015 EMC Corporation. All rights reserved.
Pivota
l
HAWQ
Data
Mart
ES
SP
Data
BI
Data
Mart
App
Pivotal HD
…
DDS
…
ODS
HTTP
Pivotal and Modern Data Architecture
Pivotal
GemFire
App
Pivotal Cloud Foundry
FE
Queue
App
HTTP
App
…
BE
Spring XD
App
Streaming
App
Pivotal HD
…
Spring XD – unified, distributed and extensible framework for
data pipelining: ingesting, batching, processing and exporting
App
Streaming
OLTP
Pivotal
Greenplum
Data
Mart
ETL
DDS
Tabl
e
ETL
© Copyright 2015 EMC Corporation. All rights reserved.
Pivota
l
HAWQ
Data
Mart
ES
SP
Data
BI
ODS
App
Pivotal and Modern Data Architecture
Pivotal
GemFire
 Pivotal HD – leading Hadoop distribution based on ODP
App
 Pivotal HAWQ – bringing the power of MPP to the Hadoop
cluster, best in class SQL-on-Hadoop solution
Pivotal Cloud Foundry
FE
Queue
App
HTTP
App
…
App
BE
Spring XD
App
Streaming
App
Pivotal HD
…
App
Streaming
OLTP
SP
Data
BI
Pivota
l
HAWQ
Data
Mart
Pivotal
Greenplum
 Apache Spark – component of the Pivotal HD distribution,
modern framework for distributed data processing
© Copyright 2015 EMC Corporation. All rights reserved.
Data
Mart
ETL
DDS
ES
ETL
ODS
Tabl
e
Pivotal and Modern Data Architecture
Pivotal
GemFire
App
Pivotal Cloud Foundry
FE
Queue
BE
Spring XD
App
App
App
App
Pivotal Greenplum – leading analytical MPP database,
HTTP
… for the enterprise
…
foundation
data warehousing systems and
advanced analytics
Streaming
Pivotal HD
App
Streaming
OLTP
Pivotal
Greenplum
Data
Mart
ETL
DDS
Tabl
e
ETL
© Copyright 2015 EMC Corporation. All rights reserved.
Pivota
l
HAWQ
Data
Mart
ES
SP
Data
BI
ODS
App
Pivotal and Modern Data Architecture
Pivotal
GemFire
App
Pivotal Cloud Foundry
HTTP
App
…
App
Spring XD
App
Streaming
App
Pivotal HD
…
App
Streaming
OLTP
SP
Pivotal
Greenplum
Tabl
e
ETL
ETL
© Copyright 2015 EMC Corporation. All rights reserved.
Pivota
l
HAWQ
Data
Mart
ES
Data Lake
Data
BI
Data
Mart
App
BE
DDS
Queue
ODS
FE
Pivotal and Modern Data Architecture
Pivotal
GemFire
App
Pivotal Cloud Foundry
HTTP
App
…
App
Spring XD
App
Streaming
App
Pivotal HD
…
App
Streaming
OLTP
SP
Pivotal
Greenplum
Tabl
e
ETL
ETL
© Copyright 2015 EMC Corporation. All rights reserved.
Pivota
l
HAWQ
Data
Mart
ES
Lambda Architecture
Data
BI
Data
Mart
App
BE
DDS
Queue
ODS
FE
Pivotal and Modern Data Architecture
Pivotal
GemFire
App
Pivotal Cloud Foundry
HTTP
App
…
App
Spring XD
App
Streaming
App
Pivotal HD
…
App
Streaming
OLTP
SP
Pivotal
Greenplum
Tabl
e
ETL
ETL
© Copyright 2015 EMC Corporation. All rights reserved.
Pivota
l
HAWQ
Data
Mart
ES
Streaming
Data
BI
Data
Mart
App
BE
DDS
Queue
ODS
FE
Pivotal and Modern Data Architecture
Pivotal
GemFire
App
Pivotal Cloud Foundry
HTTP
App
…
App
Spring XD
App
Streaming
App
Pivotal HD
…
App
Streaming
OLTP
Pivotal
Greenplum
Tabl
e
ETL
ETL
© Copyright 2015 EMC Corporation. All rights reserved.
Pivota
l
HAWQ
Data
Mart
ES
SP
Data
BI
Data
Mart
App
BE
DDS
Queue
ODS
FE
Fly UP