...

ORACLE DATA WAREHOUSE ON EMC SYMMETRIX VMAX 40K

by user

on
Category: Documents
13

views

Report

Comments

Transcript

ORACLE DATA WAREHOUSE ON EMC SYMMETRIX VMAX 40K
White Paper
ORACLE DATA WAREHOUSE
ON EMC SYMMETRIX VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
• Scalable query and ETL performance for very large database (VLDB)
• Reduced backup time and impact enabled by TimeFinder VP Snap
• Unisphere for VMAX enables efficient array management and
provisioning
EMC Solutions Group
Abstract
This white paper describes how EMC® Symmetrix® VMAX® 40K series with
Enginuity™, used with EMC PowerPath®, EMC TimeFinder® Snap, and EMC
Solutions Enabler, can support an Oracle data warehouse environment for very
large database (VLDB) and enterprise data warehouse (EDW) scenarios. The
solution provides the scalability, performance, security, and ease of use
required for mission-critical business demands. This solution can be scaled
many magnitudes bigger in terms of users, performance, and storage volumes.
May 2012
Copyright © 2012 EMC Corporation. All Rights Reserved.
EMC believes the information in this publication is accurate as of its
publication date. The information is subject to change without notice.
The information in this publication is provided “as is.” EMC Corporation makes
no representations or warranties of any kind with respect to the information in
this publication, and specifically disclaims implied warranties of
merchantability or fitness for a particular purpose.
Use, copying, and distribution of any EMC software described in this
publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation
Trademarks on EMC.com.
All trademarks used herein are the property of their respective owners.
Part Number H10571
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
2
Table of contents
Executive summary ............................................................................................................................. 5
Business case .................................................................................................................................. 5
Solution overview ............................................................................................................................ 5
Key results ....................................................................................................................................... 6
Introduction ....................................................................................................................................... 7
Purpose ........................................................................................................................................... 7
Scope .............................................................................................................................................. 7
Audience.......................................................................................................................................... 7
Terminology ..................................................................................................................................... 7
Key technology components ............................................................................................................... 9
Overview .......................................................................................................................................... 9
EMC Symmetrix VMAX 40K ............................................................................................................... 9
EMC Unisphere for VMAX ............................................................................................................... 10
EMC Virtual Provisioning ................................................................................................................ 10
EMC TimeFinder Virtual Provisioning Snap ..................................................................................... 10
EMC PowerPath .............................................................................................................................. 11
Oracle Database 11g R2 Enterprise Edition .................................................................................... 12
Solution architecture and design ...................................................................................................... 13
Configuration overview .................................................................................................................. 13
Solution architecture...................................................................................................................... 14
Hardware environment ................................................................................................................... 14
Software environment .................................................................................................................... 15
Storage connectivity overview ........................................................................................................ 16
Brocade DCX 8510 Backbone ......................................................................................................... 17
Storage Virtual Provisioning design ................................................................................................ 17
Drive type....................................................................................................................................... 17
Oracle ASM disk group configuration ............................................................................................. 18
Database and workload profile ...................................................................................................... 18
Preliminary throughput testing using Orion on a single node ............................................................ 19
Overview ........................................................................................................................................ 19
Test objective ................................................................................................................................. 19
Test scenario and methodology ..................................................................................................... 19
Test results .................................................................................................................................... 20
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
3
Storage provisioning on a VMAX 40K array ....................................................................................... 21
Overview ........................................................................................................................................ 21
Step 1: Create devices and storage groups..................................................................................... 21
Step 2: Label the devices to identify their purposes ....................................................................... 22
Step 3: Provision the storage to the hosts ...................................................................................... 23
Query workload test ......................................................................................................................... 24
Overview ........................................................................................................................................ 24
Test objective ................................................................................................................................. 24
Test scenarios and methodology .................................................................................................... 24
Test scenario 1 .......................................................................................................................... 24
Test scenario 2 .......................................................................................................................... 24
Test results .................................................................................................................................... 25
ETL workload test ............................................................................................................................. 31
Overview ........................................................................................................................................ 31
Test objective ................................................................................................................................. 31
Test scenarios and methodology .................................................................................................... 31
Test results .................................................................................................................................... 32
Rapid deployment of test/dev database environments ...................................................................... 35
Overview ........................................................................................................................................ 35
Test objective ................................................................................................................................. 35
Test scenario.................................................................................................................................. 35
Test procedure ............................................................................................................................... 35
Test results .................................................................................................................................... 38
Offload RMAN backup ....................................................................................................................... 40
Overview ........................................................................................................................................ 40
Test scenario and methodology ..................................................................................................... 40
Test results .................................................................................................................................... 41
Conclusion ....................................................................................................................................... 42
Summary ....................................................................................................................................... 42
Findings ......................................................................................................................................... 42
References ....................................................................................................................................... 43
White papers ................................................................................................................................. 43
Product documentation.................................................................................................................. 43
Other documentation ..................................................................................................................... 43
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
4
Executive summary
Business case
Today’s enterprise data warehouses (EDWs) are mission-critical resources for
supporting business operations systems. Data warehouses must address an
increasingly broad range of demands:
•
Enterprise data is growing exponentially because businesses incorporate more
detailed data from very large databases (VLDBs).
•
Business intelligence (BI) tools provide deeper insight into the data, which can
lead to new revenue streams. The EDW capacity must be able to scale and
provide predictable performance to keep pace with these new workloads.
•
Online analytical processing (OLAP) requires realtime or near-realtime analytics
that provide timely insight into large or complex data sets. High volume and
frequent data loads are essential to provide the OLAP analytics required for
decision making in the fast-moving businesses.
•
Enterprise-class protection of the data in the data warehouse has become an
implicit business requirement. IT departments must employ techniques to
back up larger data sets during limited backup windows, with minimum impact
on system performance.
The EMC® Symmetrix® VMAX® 40K series with Enginuity™ and EMC Unisphere® for
VMAX can meet the mission-critical requirements of VLDBs and EDWs with
exceptional scalability, enterprise-class data integrity and security, and performance
optimized for today’s environments.
Solution overview
This solution uses the EMC Symmetrix VMAX 40K series to support a realistic data
warehouse environment that can meet enterprise demands for scalability,
performance, and security:
•
This solution implements a 30 TB data warehouse environment on Symmetrix
VMAX 40K and demonstrates how customers can implement key processes,
such as high-speed data loading and efficient backup of the database, with
little or no disruption to the performance, scalability, or availability of the
database system and associated applications.
•
The solution describes a realworld-like lifecycle of an enterprise data
warehouse by simulating the query and ETL (extract, transform, and load)
workloads separately, and validating the performance scalability.
•
The solution demonstrates how to offload backup of the database by
leveraging EMC TimeFinder® Virtual Provisioning™ (VP) Snap. During the
backup against the Virtual Provisioning Snap copy, the query workload is
running on the production, and we validate the impact on the performance
caused by the database backup.
Overall, this solution provides a recommended practice that can scale bandwidth and
users on the data warehouse database deployed on VMAX 40K. In this paper, the
configurations can be extended nearly linearly to support larger and more scalable
configurations.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
5
Key results
The solution tests demonstrated the following key results:
•
Performance and scalability
The VMAX 40K environment demonstrated consistent, optimal performance
scalability in query and ETL workloads when more users and more database
servers were added into the 30 TB data warehouse. VMAX 40K is a flexible
platform that grows with a business’s need to provide higher performance by
adding additional database nodes, HBA ports, front-end ports, and disk drives
into the workload.
•
Minimal impact was achieved by offloading backup from the production server
to the backup host by using TimeFinder VP Snap.
When the TimeFinder VP Snap copy was created and mounted on a backup
host, the database was backed up on the backup host. Comparing to when no
backup was running, the average query throughput was minimally affected.
•
Easy to configure and monitor
Unisphere for VMAX provides easier storage management, including logical
unit number (LUN) provisioning, performance monitoring, and other
management tasks.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
6
Introduction
Purpose
The solution shows an overall approach to data warehousing on EMC VMAX 40K, and
focuses on the baseline performance and functionality needed to manage a modern
data warehouse, such as query, data load, and other routing data warehouse
management tasks.
Scope
Throughout this white paper, we assume that you have some familiarity with the
concepts and operations related to storage and backup technologies and their use in
data warehouse infrastructures.
This white paper discusses multiple EMC products as well as products from other
vendors. Some general configuration and operational procedures are outlined.
However, for detailed product installation information, refer to the user
documentation for those products.
The scope of this paper is to describe:
•
The scalability of decision support system (DSS) query and data loading
performance on VMAX 40K
•
New environment provisioning based on TimeFinder VP Snap
•
Offloading data warehouse backup to a TimeFinder VP Snap backup host while
under query workload
Audience
This solution is intended for chief information officers (CIOs), data center directors,
Oracle database administrators (DBAs), technical managers, and any others involved
in evaluating, acquiring, managing, operating, or designing data warehouse
environments in large and medium-sized enterprises.
Terminology
Table 1 lists the terminologies used in this solution paper.
Table 1.
Terminology
Term
Definition
Decision support system (DSS)
Decision support systems are applications that use the
summarized data in data warehouses and turn this into
information on which users and managers can make
key business decisions.
Data warehouse
A data warehouse is a database that is designed for
query and analysis rather than for transaction
processing. It usually contains historical data derived
from transaction data, and it can include and
consolidate data from other sources. An enterprise can
use a data warehouse to monitor and report on
operations and to deploy analytical tools for forecasting
and modeling.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
7
Term
Definition
ETL
The data warehouse is required to load data regularly to
facilitate business analysis. To do this, data from one or
more systems needs to be extracted and copied into the
data warehouse. The process of extracting data from
source systems and bringing it into the data warehouse
is commonly called ETL, which stands for extract,
transform, and load.
Oracle ASM
Oracle Automatic Storage Management (ASM) is a
volume manager and a file system for Oracle database
files that supports single-instance Oracle Database and
Oracle Real Application Clusters (RAC) configurations.
Oracle ACFS
The ASM Cluster File System (ACFS) extends ASM by
providing a robust, general-purpose, extent-based, and
journaling file system. ACFS provides support for files
such as Oracle binaries, report files, trace files, alert
logs, and other application data files.
Oracle 11g RAC database
RAC is software that enables customers to use clustered
hardware by running multiple instances against the
same database.
SYMCLI
Symmetrix Solutions Enabler command line interface.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
8
Key technology components
Overview
EMC Symmetrix
VMAX 40K
This solution used the following key hardware and software components:
•
EMC Symmetrix VMAX 40K
•
EMC Unisphere for VMAX
•
EMC Virtual Provisioning
•
EMC TimeFinder Virtual Provisioning Snap
•
EMC PowerPath®
•
Oracle Database 11g R2 Enterprise Edition
The EMC Symmetrix VMAX 40K builds on EMC’s industry-leading Symmetrix storage
array platform for powerful, trusted, smart storage that provides higher levels of
performance, availability, and intelligence in the virtual data center.
Built on the strategy of simple, intelligent, and modular storage, VMAX 40K
incorporates a highly scalable virtual matrix architecture that enables Symmetrix
VMAX arrays to grow seamlessly and cost-effectively from an entry-level configuration
into the world’s largest storage system. Symmetrix VMAX supports Flash drives, Fibre
Channel (FC) drives, and Serial Attached SCSI (SAS) drives within a single array, as
well as an extensive range of RAID types. EMC Symmetrix VMAX 40K provides the
performance and mission-critical availability to address the challenges of enterprise
data warehouses:
•
Powerful: Symmetrix VMAX 40K delivers the highest levels of performance to
demanding EDW and analytic applications. Line of business users and BI
analysts can experience the fastest response and gain insights more quickly.
The scalable data loading capability of Symmetrix VMAX 40K enables
optimization of business processes by using more frequent data refreshes.
•
Trusted: Enterprise-class protection of the data in the Symmetrix VMAX 40K
data warehouse can be achieved without significant impact on performance.
•
Smart: Enhanced management tools enable efficient deployment of mixed
workloads on the same array.
The EMC Symmetrix VMAX 40K with Enginuity 5876 software simplifies and
automates the management and protection of information:
•
Federated Tiered Storage
•
Fully Automated Storage Tiering for Virtual Pools (FAST VP) enhancements
•
EMC RecoverPoint integration
•
Unisphere for VMAX
•
Dynamic back-end configuration
•
EMC TimeFinder enhancements
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
9
EMC Unisphere for
VMAX
Unisphere for VMAX is an advanced GUI for managing Symmetrix VMAX arrays.
Unisphere for VMAX enables you to provision, manage, and monitor any Symmetrix
VMAX array from one screen to significantly reduce storage administration time.
Unisphere for VMAX uses the same GUI framework as the unified EMC VNX® family
platform. For customers who use Symmetrix VMAX and VNX in the same data center,
Unisphere provides a consistent look and feel that simplifies management
operations.
Unisphere provides a web interface that includes the following Symmetrix Solutions
Enabler CLI operations:
•
Access management
•
Configuration management
•
Replication management
•
Monitoring and alerts
In addition, Unisphere includes a performance option that collects and stores
historical performance data to analyze and report on workload and resource usage
trends.
EMC Virtual
Provisioning
EMC Virtual Provisioning simplifies storage management, improves capacity
utilization, and enhances performance. Virtual Provisioning provides the separation
of physical storage devices from the storage devices as perceived by the host system.
Virtual Provisioning introduces the following concepts and components:
EMC TimeFinder
Virtual
Provisioning Snap
•
Thin devices (TDEVs) are devices that do not have storage allocated to them
when they are created. Thin devices can be created with an inflated capacity,
because data devices provide the actual storage space for data written to them.
To a host operating system, thin devices look like regular devices with their
configured capacity, and the host interacts with them in the same way as with
regular devices.
•
Data devices are special devices (not mapped to the host) that provide physical
storage for thin devices. Data devices must be enabled in a virtual pool before
they can be used.
•
A virtual pool is a collection of data devices that provides storage capacity for
the thin devices that are bound to the pool. All data devices in a given virtual
pool share the same RAID protection type and are of the same drive technology.
EMC TimeFinder VP Snap is a replication technology that creates a point-in-time copy
of the source device when a target device is virtually provisioned. The copied data
resides on allocations in a virtual pool, but not a save pool. If more VP Snap sessions
are added to the same source device, data is copied to the targets based on whether
the source data is new with respect to the point in time of each copy. When data is
copied to more than one target, only a single, shared copy resides in the VP pool.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
10
For example, as shown in Figure 1, multiple VP Snap copies (SNAP1, 2…n) are created
from the same source devices and are bound to the same virtual pool. When a second
VP Snap (SNAP2) session is created and activated, allocation sharing begins. Write
I/Os (target to source devices) that are new to multiple VP Snap copies are saved in a
single set of allocations that is shared by multiple target devices.
Figure 1.
VP Snap extent sharing within a virtual pool
Multiple copies of the data warehouse production environment can be created by VP
Snap, and then mounted on other servers for special purposes; for example, test or
development.
EMC PowerPath
EMC PowerPath is host-based software that provides automated data path
management and load-balancing capabilities for heterogeneous servers, networks,
and storage deployed in physical and virtual environments. PowerPath provides I/O
multipath functionality. With PowerPath, a server node can access the same SAN
volume via multiple paths (HBA ports), which enables both load balancing across the
multiple paths and transparent failover between the paths. PowerPath offers
significant benefits for the high-throughput requirements of an Oracle data
warehouse:
•
Standardize data path management across physical and virtual environments,
and grow without purchasing more infrastructure components.
•
Automate multipathing policies and load balancing to provide predictable and
consistent database availability.
•
Improve service-level agreements by eliminating application impact from I/O
failures.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
11
Oracle Database
11g R2 Enterprise
Edition
This paper presents a storage solution for Oracle Database 11g R2 data warehouse
environments. This solution implements many Oracle Database11g R2 features
including RAC, ASM, and ACFS.
In Oracle 11g R2, Oracle ASM and Oracle Clusterware have been integrated into the
Oracle Grid Infrastructure. This provides the cluster and storage services required to
run Oracle RAC databases. Oracle ASM is also extended to include support for Oracle
Cluster Registry (OCR) and voting files to be placed within ASM disk groups.
Oracle ACFS extends ASM functionality to act as a general-purpose cluster file
system. Oracle database binaries can be placed on ACFS together with other files,
such as trace and alert logs. In this solution, we use Oracle ACFS to store the comma
separated value (CSV) files for data loading.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
12
Solution architecture and design
Configuration
overview
The data warehouse environment in the solution consists of a four-node Oracle 11g
R2 RAC database. We use 10 gigabit Ethernet (GbE) for the cluster interconnect
network and we allocate storage to Oracle ASM from virtual pools. The underlying
storage is provided by a Symmetrix VMAX 40K array running Enginuity 5876.
In this solution, we populate a 30 TB data warehouse database by using a DSS-like
toolkit. We run two types of workload to validate the performance scalability on the
VMAX 40K array when the number of concurrent users is increased:
•
Query workload for reporting is generated by multiple concurrent users against
the database. We gradually increase the number of multiple concurrent users,
on different numbers of database nodes, to run queries against the database.
•
ETL data loading workload is simulated by multiple concurrent sessions to load
data by external tables. In the ETL workload, multiple sessions are generated
on different database nodes to load data simultaneously.
The configuration used in this solution provides a good performance for this 30 TB
data warehouse environment, and it is the foundation for significant performance
increase in a near-linear manner by adding additional database nodes, HBA and
front-end ports, and disk drives to the workload.
On the VMAX 40K, EMC TimeFinder VP Snap provides the benefits of space saving by
sharing allocation between different VP Snap sessions. In addition, EMC TimeFinder
VP Snap enables offloading of the backup for the data warehouse onto another
dedicated server. It provides minimal impact on the query workload and system
throughput.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
13
Solution
architecture
Figure 2 depicts the architecture of the solution environment.
Figure 2.
Hardware
environment
Solution architecture
Table 2 details the hardware environment for the solution.
Table 2.
Hardware environment
Equipment
Quantity
Configuration
Production
server
4
4-node Oracle RAC configuration; each node
contains:
• 4 x 8-core central processing units (CPUs)
• 256 GB random access memory (RAM)
• Dual-port 1 GbE network interface cards (NICs)
• Dual-port 10 GbE Converged Network Adapters
(CNAs)
• 2 x dual-port 8 Gb host bus adapters (HBAs)
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
14
Equipment
Quantity
Configuration
Backup host
1
One backup host configuration:
• 4 x 4-core CPUs
• 32 GB RAM
• Dual-port 1 GbE NICs
• Dual-port 8 Gb HBAs
Test/dev server
1
One test/dev server configuration:
• 4 x 4-core CPUs
• 32 GB RAM
• Dual-port 1 GbE NICs
• Dual-port 8 Gb HBAs
Storage
1
EMC Symmetrix VMAX 40K with:
• Enginuity 5876
• 4 engines, 128 GB cache each
• 368 x 600 GB 10K 3.5” FC drives
• 39 x 2 TB 7.2K 3.5” SAS drives
Software
environment
Storage area
network (SAN)
switch
2
8 Gb/s FC switches
Ethernet switch
2
10 GbE switches
Table 3 details the software environment for the solution.
Table 3.
Software environment
Software
Version
Description
VMAX Enginuity code
5876
VMAX micro code
EMC Solutions Enabler
7.4
Host CLI storage management
software
EMC PowerPath
5.6
Multipathing and load balancing
software
Oracle Grid Infrastructure
Enterprise Edition
11.2.0.3
Clusterware and ASM software
Oracle Database 11g R2
Enterprise Edition
11.2.0.3
Oracle database software
Red Hat Enterprise Linux
5.5
Production and backup host
operating system
Microsoft Windows 2008
R2
Operating system of the storage
management server
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
15
Storage
connectivity
overview
The VMAX 40K array and the servers of the four-node RAC cluster are connected as
shown in Figure 3. The SAN switches are Brocade DCX 8510 Backbones that use 16
FC ports for the connection between the servers and the VMAX 40K array. The Brocade
DCX 8510 Backbones provide redundant SAN infrastructure, including fabric
extension. The colored scheme indicates the SAN zoning configuration on each
switch.
The VMAX 40K array has four engines with sixteen 8 Gb front-end FC ports that are
cabled and split across two SAN switches. Each server contains two dual HBA ports.
Each server HBA port is zoned to each VMAX 40K engine. This configuration offers the
highest level of protection with high performance and uses the full features of
PowerPath.
In this configuration, we connect multiple HBAs to the host. Therefore, there are
redundant paths to each VMAX engine. There is no single point of failure. Data
availability is ensured in the event of a HBA, cable, switch, or engine failure. Because
there are multiple paths per engine, this configuration benefits from PowerPath’s
load-balancing feature and thus provides additional performance. With this
configuration, the bandwidth can be scaled by adding additional HBA ports, front-end
ports, and SAN switch ports.
Figure 3.
Storage connectivity
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
16
Brocade DCX 8510
Backbone
Networks are essential to support the demands for data growth. Brocade DCX 8510
Backbones provide a powerful FC switching infrastructure, and offer a reliable,
scalable, high-performance foundation for IT infrastructure. Brocade DCX 8510
Backbones are designed to increase business agility while providing non-stop access
to information and reducing infrastructure and administrative costs.
The Brocade DCX 8510 offers high throughput to meet the requirements of an Oracle
data warehouse:
Storage Virtual
Provisioning
design
•
Enable simpler, flatter, low-latency chassis connectivity to reduce network
complexity, management, and costs.
•
Optimize data center connectivity over distance with integrated highperformance metro and global connectivity.
•
Simplify and centralize end-to-end SAN management with comprehensive
diagnostics, monitoring, and automation.
•
Maximize performance for I/O- and bandwidth-intensive applications with more
than seven times the performance of competitive offerings.
•
Protect investments in existing SAN fabrics and automation tools while
reducing operational costs and minimizing business disruption.
EMC Virtual Provisioning automatically stripes data across all data devices in a virtual
pool, and balances the workload across storage devices. To ensure even striping of
data, it is recommended that all data devices in a virtual pool are the same size.
Table 4 shows the RAID selections and number of spindles for each virtual pool. In
this solution, Oracle data files, redo log files, temporary files, and CSV files are
located on thin devices using RAID 1 protection and 360 physical spindles for the
best performance and capacity. Use 7.2K SAS drives with RAID 6 protection for Flash
Recovery Area (FRA) files, which are less sensitive to performance.
Table 4.
Virtual pool design on VMAX 40K
Virtual Pool
RAID
protection
Drive type
Physical
spindles
size
Number of
active
spindles
10K_FC_pool
RAID 1
FC 10K
600 GB
360
DATA, REDO,
TEMP, CSV, ETL
SAS_pool
RAID 6
(6+2)
SAS 7.2K
2 TB
32
FRA
Item
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
17
Oracle ASM disk
group
configuration
Table 5 details the ASM disk group design based on EMC storage best practices. We
used six ASM disk groups to store the relevant database files, including data files,
control files, online redo log files, temporary files, and CSV files (used for ETL).
Table 5.
ASM disk group design
Item
LUN size
(GB)
Meta
members
Number of
LUNs
ASM disk
group name
Virtual pool
DATA
512
8
128
+DATA
10K_FC_pool
TEMP
136
1
4
+TEMP
10K_FC_pool
REDO
16
1
16
+REDO
10K_FC_pool
FRA
512
8
16
+FRA
SAS_pool
CSV
512
8
16
+CSV
10K_FC_pool
ETL
512
8
12
+ETL
10k_FC_pool
In this solution, we use:
Database and
workload profile
•
10K_FC_pool to store the database relevant files in the DATA, REDO, TEMP, and
ETL disk groups, because there are 360 FC drives in the pool to provide high
performance and capacity for data read and writes. The ACFS file system is
created on the CSV disk group and is used to store the flat files for data
loading.
•
SAS_pool to store the archive log files in the FRA disk group.
Table 6 details the database and workload profile for the solution.
Table 6.
Database and workload profile
Profile characteristic
Details
Database type
Data warehouse
Database size
30 TB
Oracle database
11g R2 4-node RAC on ASM
Oracle RAC (physical environment)
4 nodes
Workload profile
DSS workload
Data load source
External flat files on Oracle ACFS used for
external tables
Network connectivity
8 Gb FC for SAN; 10 GbE for IP
We used a DSS-like toolkit to generate a data warehouse database and deliver the
DSS workloads, including the query and ETL workloads required for the solution.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
18
Preliminary throughput testing using Orion on a single node
Overview
We conducted a preliminary test to measure the pure I/O capabilities of the
architecture and validate the solution configuration. To do this, we simulated I/O
using an Oracle toolset called Orion. After the I/O profile of the architecture was
established by using these simulation tools, real-world queries were generated
through the DSS-like toolkit and the real database I/O operations, including queries
and data loads, were generated by this toolkit.
Orion is an Oracle I/O Numbers Calibration Tool designed to simulate Oracle I/O
workloads without creating and running an Oracle database. It can also simulate the
effect of the striping performed by ASM. Orion uses the Oracle database I/O libraries
and can simulate online transaction processing (OLTP) workloads (small I/Os), data
warehouses (large I/Os), and mixed workloads.
Orion is useful to measure the performance capabilities of a storage system: either to
uncover performance issues or to size a new database installation.
Note
Test objective
Test scenario and
methodology
Orion overwrites the existing data so you should only run it against raw
devices before installing any database or application.
The objective of our testing was as follows:
•
Use Orion to generate random large I/O workloads to determine the throughput
and to evaluate storage for the Oracle data warehouse.
•
Use Orion to validate the throughput scalability of HBA ports.
•
Use the test results as the baseline to validate the throughput generated by the
DSS-like toolkit.
The test scenario included the following steps:
1.
Enable one HBA port and disable the other ports by using PowerPath software
on a single node.
2.
Run Orion to generate the workload for one HBA port.
3.
Measure the throughput.
4.
Enable the second, third, and fourth HBA ports in turn and repeat the previous
three steps after enabling each port.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
19
Test results
Figure 4 demonstrates the observed throughput scaling on a single node.
Figure 4.
8Gb HBA ports scalability on a single node using Orion
The results clearly show that the throughput scaled in a linear manner when adding
HBAs and front-end ports to scale an environment. For example, the average
throughput was 725 MB/s when the workload was running on one HBA port and one
front-end port. It increased to 1,444 MB/s after enabling the second port. Linear
scaling was observed when we enabled the third and the fourth ports. The average
throughput was 2,892 MB/s (2.82 GB/s) when the workload was running on four 8 Gb
HBA ports, which is double the average throughput achieved when the workload was
running on two ports.
We can scale the Orion throughput on one node based on this fundamental
configuration by introducing additional HBA ports and front-end ports.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
20
Storage provisioning on a VMAX 40K array
Overview
Unisphere for VMAX, the GUI tool, provides a simple method for storage
administrators to configure VMAX 40K arrays, including simplified navigation,
display, and other storage management tasks.
This section demonstrates how to provision storage through a central console and
outlines the main steps for Virtual Provisioning on a Symmetrix VMAX 40K array:
Step 1: Create
devices and
storage groups
1.
Create devices and storage groups.
2.
Label the devices to identify their purpose (optional but recommended).
3.
Provision the storage to the hosts.
Figure 5 shows the interface for creating devices through the Unisphere GUI.
Unisphere provides an easy way to create devices based on the Oracle database
requirements.
The example shown here uses the Create a storage group wizard from the Common
Tasks view of Unisphere, which is visible at all times on the upper right of the
Unisphere GUI. The operation in the example creates Oracle redo log devices and
puts those devices in the storage group called ORACLE_DW.
Figure 5.
Creating devices in the storage group ORACLE_DW
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
21
Step 2: Label the
devices to identify
their purposes
After the devices and groups are created, we label the thin devices to identify their
purpose. Figure 6 shows how to set the volume identifier name through Unisphere. To
access this menu, right-click the volume and select Set Volume Identifier.
Figure 6.
Setting the volume identifier name
These identifiers are visible using the EMC inq tool when the LUNs are presented to
the host. The identifiers can also make the devices easier to distinguish by having a
meaningful name as well as the hexadecimal system id of the volume as shown in
Figure 7.
Figure 7.
Note
EMC inq command output
This step is optional but it makes the configuration process easier for storage
administrators.
When labeling multiple volumes, you can generate and run a script using the CLI to
speed up the process if you have many devices. For example:
#symconfigure -sid xxx -cmd "set dev 0BEF:0BFE
device_name='REDO_ASM';" commit
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
22
Step 3: Provision
the storage to the
hosts
VMAX arrays use the concept of Masking Views for presenting storage to a host. The
Masking View groups three components to form the structure:
•
A storage group is a grouping of devices that are presented to the host.
•
A port group is a grouping of front-end ports that make devices accessible.
•
An initiator group is a container of one or more host initiators that accesses the
storage.
Figure 8 shows the Provision Storage wizard from Unisphere.
Figure 8.
Unisphere Provision Storage wizard
The Provision Storage wizard is accessed through the Common Tasks view. In this
operation, we select the hosts that are created by the Create a new host wizard on the
Common Tasks view, and we use the existing storage group ORACLE_DW created in
Step 1: Create devices and storage groups. The host and storage group are
associated in the Unisphere Provision Storage wizard. The devices in the storage
group become visible to the host.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
23
Query workload test
Overview
The DSS-like toolkit provides an Oracle data warehouse test workload to test and
validate the performance of typical Oracle data warehouse workloads on the EMC
VMAX 40K storage platform.
The schema in the kit has 12 tables including two fact tables: sales and returns. The
remaining 10 tables act as dimension tables. The two main fact tables are rangepartitioned by date and sub-partitioned by hash on their join key. Multiple concurrent
users run a series of typical queries against the data. The throughput is measured
during the workload.
Notes • Benchmark results are highly dependent upon workload, specific
application requirements, and system design and implementation.
Relative system performance will vary as a result of these and other
factors. Therefore, the solution test workloads should not be used as a
substitute for a specific customer application benchmark when critical
capacity planning and/or product evaluation decisions are contemplated.
• All performance data contained in this report was obtained in a rigorously
controlled environment. Results obtained in other operating environments
may vary significantly.
• EMC Corporation does not warrant or represent that a user can or will
achieve similar performance expressed in transactions per minute.
Test objective
The objective of the test is to measure the performance scalability when the
concurrent users are scaling. The concurrent users are generated by the DSS-like
workload toolkit with each user running similar queries.
Test scenarios and
methodology
All the test results were captured from Automatic Workload Repository (AWR) reports
and Unisphere for VMAX.
Test scenario 1
Description: gradually increase the number of concurrent users on a single data
warehouse node.
Each user runs a series of queries where the next query cannot start until the previous
one is completed. Multiple users are running simultaneously. However, the query
order changes from one user to another.
The test process was running the workload with 2, 4, 8, 16, 32, and 64 concurrent
users separately.
Test scenario 2
Description: measure the performance scalability by adding nodes into the test.
When a database node is added (doubling the CPU power and HBA ports), additional
front-end connectivity is added. With the addition of each new node, we tested the
system again by gradually increasing the number of concurrent query users. For this
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
24
test, queries are running simultaneously from all the nodes added against the
database.
The test process included the following steps:
Test results
1.
Run the workload with two concurrent users on each node of the two RAC
nodes – that is, with four concurrent users running simultaneously.
2.
Run the workload with 4, 8, 16, 32, and 64 concurrent users on each of the
two RAC nodes separately – that is, with a total of 8, 16, 32, 64, and 128
concurrent users running simultaneously.
3.
Repeat the previous two steps after adding the third and fourth nodes
separately.
Performance statistics were captured using AWR reports and Unisphere for VMAX:
•
AWR reports
Read the throughput (GB/s) data from the “physical read total bytes” row in the
AWR report. For example, the highlighted statistics in Figure 9 show the
throughput of 64 concurrent users running on one node.
Figure 9.
•
Query workload test (AWR report): throughput on one node when 64 concurrent
users running
VMAX management console: Unisphere
Enable realtime storage monitoring using Unisphere, and open the monitoring
window for the metric (Host MBs/sec) to get the throughput number from the
storage perspective. Figure 10 shows an example.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
25
Figure 10.
Query workload test: Unisphere for VMAX monitoring example
Figure 11 shows the result of Test Scenario 1.
Figure 11.
Query workload test: concurrent user scaling and throughput of one Oracle RAC
node
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
26
In Figure 11:
•
The x-axis shows the number of concurrent users.
•
The y-axis shows the average throughput during the workload.
Figure 11 shows that the average throughput (GB/s) increased to a certain extent with
the scale of the users, because the higher workload increased storage utilization and
cache sharing when additional users were running. For example, Figure 12 shows the
throughput monitored from Unisphere while two, four, and eight users were running.
2 users with average
throughput: 2.25 GB/s
Figure 12.
4 users with average
throughput: 2.37 GB/s
8 users with average throughput: 2.53
GB/s
Query workload test: realtime throughput when Unisphere for 2, 4, and 8 users
running
Although the throughputs at the peak time were similar for two, four, and eight users,
at around 2,830 MB/s (which was the limit of one server with four 8 Gb HBA ports, as
validated by the Orion test), the throughput cycling decreased when additional users
were added. Scaling the number of users caused higher workload, which increased
storage utilization and cache sharing to some degree. Therefore, the average
throughput calculated in the AWR reports increased to some extent when additional
users were running simultaneously.
The average throughput reached 2.77 GB/s when 64 concurrent users were running
simultaneously, when the throughput of the HBA ports on the server reached its limit.
From the performance of the HBA port scalability testing, the HBA throughput limit
was verified in the Preliminary throughput testing using Orion on a single node
section: the average throughput of four 8 Gb HBA ports was 2.82 GB/s.
Higher throughput can be achieved if additional HBA ports are added into the
database server because the throughput was limited by the number of HBA ports in
the configuration used in this solution.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
27
Figure 13 shows the Test Scenario 2 results, including the performance statistics
concluded from the AWR reports.
Figure 13.
Query workload test: average throughput comparison between one, two, three,
and four RAC nodes
In Figure 13:
•
•
The x-axis shows the number of concurrent users on each node. For example,
for “2 users on each node” in the figure:

The red bar describes two users running on a single node.

The blue bar describes two users running on each of two RAC nodes; in
total four users are running.

The purple bar describes two users running on each of three RAC nodes; in
total six users are running.

The green bar describes two users running on each of four RAC nodes; in
total eight users are running.
The y-axis shows the average throughput during the workload.
The results confirmed that the average throughput scaled nearly linearly along with
the scale of nodes and concurrent users. For example, when 64 concurrent users were
running on a single node, the average throughput was 2.77 GB/s. The throughput
almost doubled to 5.48 GB/s when the second node was added into the environment;
it tripled to 8.23 GB/s when the third node was added into the workload; it almost
quadrupled to 10.68 GB/s when four RAC nodes were running for the workload.
When the query workload was running, the CPU and memory resources were not
bottlenecked, according to the AWR reports.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
28
Figure 14 shows the operating system statistics when the throughput reached 10.68
GB/s, which was generated by 256 concurrent users running on four RAC nodes.
Figure 14.
Query workload test: foreground wait classes percentage
Figure 14 shows the host CPU usage for the duration of the tests. CPU utilization
across all nodes was under 30%. 95.37% of all the CPU busy time was spent on user
I/O operations, and the remaining 3.76% was spent on database running.
Figure 15 describes the throughput with 64, 128, 192, and 256 concurrent users
running on one, two, three, and four RAC nodes respectively. Figure 16 shows the
corresponding realtime throughput received from Unisphere.
Figure 15.
Query workload test: Oracle RAC nodes scaling and throughput
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
29
One node (64
concurrent users)
Figure 16.
Two nodes (128
concurrent users)
Three nodes (192
concurrent users)
Four nodes (256
concurrent users)
Query workload test: realtime throughput from Unisphere
The throughput increased nearly linearly with each additional node added into the
test environment. The throughput of four RAC nodes reached 10.68 GB/s (37.55
TB/hour), which could be higher if additional nodes, front-end ports, and disk drives
were added into the environment.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
30
ETL workload test
Overview
Modern data warehouses require frequent and large data loads periodically
throughout the day. The 24x7 nature of the EDW no longer allows a long window of
data loading for DBAs. Therefore, it is important to simulate the impact of ETL data
load on the performance of the database.
This test scenario demonstrates the ETL data loading on the production database and
records the performance data, especially the throughput (physical write total
megabytes per second), during the ETL load.
We used the function of Oracle external table for the data loading. This test scenario
shows the throughput scalability when data is loading from external tables from
Oracle ACFS file system into the Oracle 11g R2 database.
The testing procedure started with loading the data on one of the four RAC nodes, and
then scaled out to two, three, and four RAC nodes.
Test objective
The objective of the test is to show the throughput scalability of the VMAX 40K, with
the disk configuration used in this solution, while the session running the data
loading are scaling out on four RAC nodes. Each session runs a similar ETL workload
by loading CSV files into the database.
Test scenarios and
methodology
This solution test demonstrates performance scalability on the VMAX 40K by loading
data from external tables. Testing includes the following test scenarios:
•
The first scenario uses one RAC node and runs one session to load data from
one external table. The session loads one CSV file with a size of 120 GB for ETL
loading. The CSV file is located on the Oracle ACFS file system. The external
table is created as follows:
create table sales_ext (
id integer,
…)
organization external(
type oracle_loader
default directory EXT_DIR
access parameters (fields terminated by "|")
location ('sales.csv'))
parallel reject limit unlimited;
The data is loaded from the external table as follows:
alter session enable parallel dml;
alter table sales parallel;
alter table sales_ext parallel;
insert into /*+ append */ sales select * from sales_ext;
Note
The table “sales” has the same structure as the table “sales_ext”. The
data is loaded directly with the “append” hint, and multiple parallel
slaves are used for data loading.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
31
•
Test results
The second test scenario scales data loading sessions with external tables on
additional RAC nodes. Based on the performance data from the first test
scenario, the second, third, and fourth nodes were added into the environment.
With one load session on each node, there are four sessions on four RAC nodes
to load data concurrently from four 120 GB CSV files. In this configuration, the
data loading process is the same as in the first scenario, but the load data is
scaled linearly by running one to four RAC nodes. This resulted in improved
bulk-load functionality and enhanced overall VMAX 40K performance.
We read the throughput (MB/s) from the physical write total bytes row in the AWR
report. For example, the highlighted statistics in Figure 17 show the average
throughput of one session running on one node to load data from an external table.
Figure 17.
ETL workload test (AWR report): average throughput on one node when one
session running to load data from an external table
As shown in Figure 17, the average throughput for data loading from the external
table was 893 MB/s (936064378.09/1024/1024). Figure 18 describes the realtime
throughput of the first test scenario from Oracle Enterprise Manager.
Figure 18.
ETL workload test: the realtime throughput on one node from Oracle Enterprise
Manager
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
32
Note
The throughput data shown in Figure 17 and Figure 18 differ because Figure
17 shows the average statistics in the AWR report while Figure 18 shows the
realtime number in Oracle Enterprise Manager.
Figure 19 shows the average throughput of the second scenario from the AWR report
and Figure 20 shows the realtime throughput from Unisphere.
Figure 19.
ETL test: average throughput in multiple nodes scenario
One node
(one session for
data loading)
Figure 20.
Two nodes
(two sessions for
data loading)
Three nodes
(three sessions for
data loading)
Four nodes
(four sessions for
data loading)
ETL test: realtime throughput from Unisphere
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
33
From the results, the average throughput increased nearly linearly when the second
node was added into the workload and the number of data loading sessions doubled.
For example, running one session to load data from an external table against one
node measured throughput of 893 MB/s, and this increased to 1,790 MB/s when two
sessions were running in two nodes with one session on each node. The throughput
increased similarly when the third and fourth node was added into the environment.
Note
The throughput data shown in Figure 19 and Figure 20 differ because the
average statistics were used in the AWR report and the realtime number was
captured from Unisphere.
Overall, the throughput of data loading on the VMAX 40K array was nearly linear
scaling when more nodes and sessions were added. Higher throughput can be
achieved if additional HBA ports, front-end ports, and RAC database nodes are added
into the environment.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
34
Rapid deployment of test/dev database environments
Overview
We tested the ability of the Symmetrix VMAX 40K solution infrastructure to support
the rapid deployment of a test or development database environment. With EMC
TimeFinder VP Snap technology, customers can easily create multiple copies of the
production database and repurpose those copies for testing, development,
offloading RMAN backup, and other purposes.
VP Snap enables storage administrators to create a point-in-time copy of a source
thin device where the virtually provisioned target devices bind to a virtual pool. Up to
16 VP Snap sessions can be taken from a single source. When all target devices are
bound to the same virtual pool allocations, VP Snap sessions can be shared between
copies.
Source updates that are new to all point-in-time copies are saved in a single set of
allocations that is shared by all target devices, which brings snap space efficiencies
to thin devices. (See EMC TimeFinder Virtual Provisioning Snap in the Key technology
components section).
Test objective
The objective of the test was to demonstrate the use of multiple EMC TimeFinder VP
Snap copies to maximize space savings on the array by sharing virtual pool
allocations, and to demonstrate the deployment of test or development environments
by EMC TimeFinder VP Snap technology.
Test scenario
This solution test demonstrates the process of deploying test and development
environments and validating the benefits of using VP Snap. It included the following
test scenarios:
Test procedure
•
Measure VMAX 40K virtual pool space utilization and compare the space usage
of four VP Snap sessions to one VP Snap session.
•
Mount the VP Snap replica on the test/dev server for test or development
purposes.
In the first scenario, we created one VP Snap session and examined the virtual pool
usage, as follows:
1.
Create TimeFinder VP Snap session target devices for all of the ASM disk
groups, including DATA, ETL, REDO and others, and then bind them to a new
virtual pool (see Step 1: Create devices and storage groups in the Storage
provisioning on a VMAX 40K array section).
2.
Define a device group to associate the source devices to the target devices
created in step 1, as shown in Figure 21.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
35
Figure 21.
3.
Define device group through Unisphere for VMAX
Create one VP Snap session with the Use VSE option, as shown in Figure 22.
Figure 22.
Create TimeFinder clone session with the Use VSE option
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
36
4.
Activate one VP Snap session with the Use Consistent option, as shown in
Figure 23.
Figure 23.
Note
Activate one VP Snap session with the Use Consistent option
Use the VSE option to invoke VP Snap functionality. After activating VP
Snap sessions, the copied data for each track resided in a unique
allocation within the virtual pool if I/O tracks were written to the
source devices. The pair state was CopyOnWrite, as shown in Figure
24.
Figure 24.
Detail of the pairs of VP Snap session
5.
Run the ETL data loading workload on the production environment.
6.
Terminate the VP Snap session to reset the pool allocations.
Similarly, we created and activated another four TimeFinder VP Snap sessions based
on step 1, 2, 3, and 4 and ran the same ETL data loading as in step 5. The test results
compared space usage by one VP Snap session and four VP Snap sessions.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
37
After that, we tested the second scenario, which was to mount VP Snap copies on the
test/dev server for rapid deployment of test or development database environments,
per the following steps:
1.
Present the TimeFinder VP Snap copies to the test/dev server (see Step 3:
Provision the storage to the hosts in the Storage provisioning on a VMAX 40K
array section).
2.
Scan and list the ASM disks:
# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
# oracleasm listdisks
3.
Mount all of the ASM disk groups, for example:
# $GRID_HOME/bin/crsctl start resource ora.REDO.dg
CRS-2672: Attempting to start 'ora.REDO.dg' on ' oradwvps'
CRS-2676: Start of 'ora.REDO.dg' on ' oradwvps' succeeded
4.
Start up the database. It can then be used for test or development purposes:
$ sqlplus / as sysdba
SQL> startup
Test results
After activating one VP Snap session, the tracks that were written to the source thin
device resided on allocations in the virtual pool. The allocation that was used by one
VP Snap session in the virtual pool was 435.22 GB, as shown in Figure 25.
Figure 25.
Virtual pool allocations used by one VP Snap session
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
38
After four VP Snap sessions were created and activated from the source device, the
allocations were shared. Source updates that were new to multiple VP Snap copies
were saved in a single set of allocations that was shared by multiple target devices.
As shown in Figure 26, the allocation that was used by four VP Snap sessions in the
virtual pool was 435.25 GB.
Figure 26.
Virtual pool allocations used by four VP Snap sessions
As shown in Figure 27, the allocations used by four sessions were almost the same
as those used by one session. The result showed that only one single shared copy
resided in the virtual pool when data was copied to four targets. Multiple test or
development environments can be deployed by using VP Snap copies with efficient
space saving benefit.
Figure 27.
Comparison of space usage by one VP Snap and four VP Snaps
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
39
Offload RMAN backup
Overview
An Oracle database is a critical component in many organizations. IT managers regard
data corruption as a huge threat to their Oracle database environment. Therefore,
backup is very important to protect the database from data loss. On the other hand,
backup is a resource intensive activity that can cause contention between the
resources and the workload on the production environment, especially for the CPU
and I/O resources. The challenge that backup administrators have is how to offload
the backup process from a production server to a backup host, to free up the
resources of the production server.
In this solution, VMAX 40K VP Snap technology addresses this challenge by enabling
offloaded, non-disruptive RMAN backup. VP Snap not only offers offloaded backup,
but also dramatically improves the mean time to recovery (MTTR) by reducing the time
required for the restore operation for certain types of corruptions (for example, the
logical errors resulting from dropping important tables by mistake). Furthermore, as
the backup operation has minimal impact on the database server performance, the
backup can be run more frequently. The recovery operation is also optimized since
fewer archived logs need to be applied.
Test scenario and
methodology
This solution test consisted of offloading backup of the database using the
TimeFinder VP Snap while the database was under intensive query load and
monitoring the impact on performance:
1.
Create and activate a TimeFinder VP Snap session (with the User Consistent
option) against the database while the database is online, to make sure the
Oracle database is consistent. (See the detailed steps in the Rapid
deployment of test/dev database environments section for how to enable a
VP Snap session with the User Consistent option).
Note
2.
Oracle support the use of third-party snapshot technologies for
backup, restore, and recovery operations without putting the
database in backup mode. (Refer to My Oracle Support Note: ID
604683.1)
Mount the TimeFinder VP Snap to a dedicated backup host.
After creating the TimeFinder VP Snap replicas against the Oracle database
devices, the replicas were mounted on a dedicated backup server (see the
detailed steps in the Rapid deployment of test/dev database environments
section).
3.
Start up the database on the backup host:
$ sqlplus / as sysdba
SQL> startup mount
4.
Run RMAN backup on the backup host.
The RMAN backup started on the VP Snap backup host while the DSS query
workload was running. The DSS query workload was generated with 256
concurrent users running simultaneously against the four-node RAC
production database. (See the Query workload test section)
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
40
Test results
5.
Measure the throughput (GB/s) and compare the data to that of the query
workload result.
6.
Restore the data files from the VP Snap copy.
When performing the offloaded RMAN backup with EMC TimeFinder VP Snap
technology, the average throughput for the workload generated by 256 concurrent
users was 10.67 GB/s, compared to the baseline of 10.68 GB/s. There was minimal
impact (0.1%) on the workload.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
41
Conclusion
Summary
IT departments are seeking ways to cope with the exponential growth of data and the
new business demands that are being placed on data warehouses. The growing data
is being refreshed more frequently, allowing for timely alerts, detailed reporting,
predictive modeling, and business optimization. As a result, more enterprises are
using business intelligence analysis tools to gain competitive advantage and to
create new revenue streams.
Using the methodology and test processes in this solution as a guide for the server
and storage configuration, customers can satisfy new query and loading performance
requirements while maintaining the availability and security of their data.
TimeFinder VP Snap technology enables fast deployment of test or development
environments. The space used by multiple VP Snap replicas is reduced dramatically.
By leveraging TimeFinder VP Snap technology for data warehouse backup offloading,
the system throughput on the production environment is minimally affected.
Findings
The key findings of the solution include:
•
In the 30 TB database, the average query throughput increased when
additional RAC nodes were added. For the query workload, it increased from
2.77 GB/s to 5.48 GB/s when the database servers were scaled out from one to
two, and increased to 10.68 GB/s when four RAC nodes were used.
•
The average throughput of the ETL workload increased linearly along with the
addition of the nodes. The throughput was 893 MB/s for one node, and it
increased to 1,790 MB/s when the second node was added. The throughput
increased to 3,540 MB/s by running four RAC nodes.
•
This solution can be taken as a baseline or foundation that can be scaled in a
flexible, predicable, and near-linear way, by adding additional HBA ports, frontend ports, and RAC nodes, to provide higher throughput based on the
configuration in this solution.
•
When offloading nondisruptive RMAN backup by using EMC TimeFinder VP
Snap technology, the performance for the query workload on four nodes was
minimally affected.
•
With EMC TimeFinder VP Snap technology, multiple copies of the production
database were created easily for test/dev purposes, with the added benefit of
reduced space use on the array.
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler
42
References
White papers
For additional information, see the white papers listed below.
•
Oracle Data Warehouse Sizing with EMC Symmetrix DMX-4 and Dell R900—Best
Practices Planning
•
Introduction to EMC Oracle Optimized Warehouse Reference Configurations
•
Deploying EMC VNX Unified Storage Systems for Data Warehouse
Applications—Best Practices for Adoption and Deployment
•
Maximize operational efficiency for Oracle RAC Environments with EMC
Symmetrix FAST VP (Automated Tiering)—An Architectural Overview
•
EMC Symmetrix VMAX Best Practices—Technical Notes
Product
documentation
For additional information, see the product documents listed below.
Other
documentation
For additional information, see the documents listed below.
•
EMC Symmetrix VMAX Virtual Provisioning Space Reclamation and Application
Considerations—Applied Technology
•
Oracle Real Application Clusters Installation Guide 11g Release 2 (11.2) for
Linux and UNIX
•
Oracle Real Application Clusters Administration and Deployment Guide 11g
Release 2 (11.2)
•
Oracle Grid Infrastructure Installation Guide 11g Release 2 (11.2) for Linux
•
Oracle Clusterware Administration and Deployment Guide 11g Release 2 (11.2)
•
My Oracle Support Note: ID 604683.1
Oracle Data Warehouse on EMC Symmetrix VMAX 40K
EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler
43
Fly UP