Sizing DB2 BLU Solutions May 28, 2015 Presented by: Michael Kwok
by user
Comments
Transcript
Sizing DB2 BLU Solutions May 28, 2015 Presented by: Michael Kwok
Sizing DB2 BLU Solutions May 28, 2015 Presented by: Michael Kwok 1 © 2015 IBM Corporation DB2 Tech Talk series host and today’s presenter: Jessica Rockwood, DB2 Tech Talk Host Senior Manager, DB2 Systems of Record - OLTP Today’s Technical Presenter . . . Michael Kwok, Ph.D. Program Director and Architect, Analytic Warehouse Performance (dashDB, BLU, DB2 Warehouse) A few details …. 1. The presentation for this Tech Talk: Click here to download 2. Next steps (see “resources” widget) 2 © 2015 IBM Corporation Disclaimer The information contained in this presentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided “as is”, without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other documentation. Nothing contained in this presentation is intended to, or shall have the effect of: • Creating any warranty or representation from IBM (or its affiliates or its or their suppliers and/or licensors); or • Altering the terms and conditions of the applicable license agreement governing the use of IBM software. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 3 © 2015 IBM Corporation Safe Harbour Statement IBM’s statements regarding its plans, directions, and intent, including the statements made in and during this presentation, are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding future products or features is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding future products or features is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about future products or features may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance and compression data is based on measurements and projections using IBM benchmarks in a controlled environment. The actual throughput, performance or compression that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 4 © 2015 IBM Corporation Agenda Performance Foundation BLU Acceleration – Hardware & Software Recommendations Sizing Guidelines Example Summary 5 © 2015 IBM Corporation PERFORMANCE FOUNDATION 6 © 2015 IBM Corporation Performance: Orchestration matters • Hardware: CPU, Memory, Storage and Network • • • • CPU cores and RAM available continue to grow Up to 12 cores/chip for IBM POWER8, up to 18 cores/chip for Intel Haswell 16GB RAM/core is a good “Rule of Thumb” Think of IOPS and MB/sec read/write for sizing storage and networks • Include SSDs in your storage strategy – they can make an astonishing difference! • • Random I/O, especially reads, benefit significantly Internal SSDs have best and most cost-effective performance but they often don't have a write cache nor HA • 10GbE Networks are the way to go • Software: OS, DB2 and Application • • Use best practice recommendations Stay current on maintenance level 7 © 2015 IBM Corporation New Technology Cornucopia* • Hardware • • • POWER8 S and E Class Servers Intel Haswell-EX Servers TMS 900 Flash Storage • Software • • • • • AIX 7.1 TL3 SPx RHEL 7.1 Ubuntu 14.04 LTS Windows Server 2012 R2 DB2 10.5 FP5 *This highlights the newest announced technology and does not imply any required minimum 8 © 2015 IBM Corporation POWER8 – Continued Leadership & Innovation! • POWER8 scale-out (1-2 socket) systems • • • Virtualization: PowerKVM, PowerVM Expanded Linux focus: Little Endian Ubuntu 14.04, RedHat 7.1 and SUSE 12 Mixed Endian VM support of a single PowerKVM host S812L • 1-socket, 2U • Linux only S822L • • • • • • 2-socket, 2U Up to 24 cores 1 TB memory 9 PCI Gen3 slot Linux only PowerVM & PowerKVM S822 • • • • • • 2-socket, 2U Up to 20 cores 1 TB memory 9 PCIe Gen 3 AIX & Linux PowerVM S814 • • • • • • 1-socket, 4U Up to 8 cores 512 GB memory 7 PCIe Gen 3 AIX, IBM i, Linux PowerVM S824L • • • • S824 2-socket, 4U Up to 24 cores 1-2 NVidia GPU Linux only • • • • • • 2-socket, 4U Up to 24 cores 1-2 TB memory 11 PCIe Gen 3 AIX, IBM i, Linux PowerVM 9 © 2015 IBM Corporation Enterprise POWER8 Systems Designed to take on the most complex data challenges Tackle your largest workloads with increased system scalability Deliver insights in real time with increased performance per-core Maximize your customers experience with Enterprise RAS Reduce costs with increased energy efficiency Manage the peaks and valleys of workloads Power Enterprise Pools Manage a wider range of workloads with up to 20 VMs percore Power E870 Power E880 • • • • • • • • Up to 80 cores 32 or 40 core nodes (5U) Up to 4TB Memory 1 or 2 Nodes per system Up to 192 cores 32 or 48 core nodes (5U) Up to 16TB Memory 1 to 4* Nodes per system 10 © 2015 IBM Corporation Intel Xeon E7-4800/8800 v3 (Haswell) Processor Haswell Processors Offer Increased Performance Over Ivy Bridge 20% increase in Cores X3850 X6 & x3950 X6 with E7-4800/8800 v3 processor family (Haswell EX) 20% increase in Cache DDR3 and DDR4 Memory* Flex x480 X6 Node E7-4800/8800 v3 processor family (Haswell EX) 11 © LENOVO 2015. All rights reserved. 17% Max Memory Speed increase 20% increase in QPI Speed *DDR4 on Rack Systems Only; Availability as of July 31, 2015 © 2015 IBM Corporation Linux Evolution • Linux continues to evolve and improve • • 3 main distros are both the same and different Beware of subtle kernel/library differences Ubuntu 14.04 SLES 12 RHEL 7.1 Glibc 2.19 2.19 2.17 Kernel 3.13 3.12 3.10 • Filesystems • • • • ext4 still a good choice (most popular and recommended) SLES12: btrfs for root, xfs is default for all else RHEL7: xfs is default, btrfs and ext4 supported We are still evaluating xfs 12 © 2015 IBM Corporation Summary of POWER and Intel Processors Xeon E5 v2 Xeon E5 v3 Xeon E7 v2 Xeon E7 v3 POWER 7+ POWER8 1.7-3.7GHz 1.7-3.7GHz 1.9-3.4 GHz 2.0-3.2 GHz 3.1-4.4 GHz 3.0-4.15 GHz 1,2* 1, 2* 1, 2* 1, 2* 1, 2* 1, 2, 4 1, 2, 4, 8 Cores per socket 8 12 18 15 18 8 12 Max Threads / socket 16 24 36 30 36 32 96 Max L1 Cache 32KB 32KB 32KB 32KB 32KB 32KB 64KB Max L2 Cache 256 KB 256 KB 256 KB 256 KB 256 KB 256 KB 512 KB Max L3 Cache 20 MB 30 MB 45 MB 37.5 MB 45 MB 80 MB 96 MB Max L4 Cache 0 0 0 0 0 0 128 MB 31.4-51.2 GB/s 42.6-59.7 GB/s 51.2-68.3 GB/s 68-85 GB/s 102 GB/s 100-180 GB/s 230- 410 GB/s Xeon E5 Clock rates SMT options Memory Bandwidth 1.8–3.6GHz 13 © 2015 IBM Corporation Z13 Enterprise Linux Server – The enterprise grade Linux solution z13 Up to 10 TB >3X more available memory Up to 141 Configurable cores Up to 85 Configurable LPARs IBM zAware Maximize service levels Larger Cache More workloads per server Crypto Performance and Express5S function SMT2, SIMD Enhanced performance Upgradeable from z196 and zEC12 * All statements regarding IBM's future direction and intent are sub 14 © 2015 IBM Corporation IBM FlashSystem 900 Gain faster insights with extreme performance, enterprise reliability and operational efficiency Overview Easy to deploy and manage, FlashSystem 900 is designed to accelerate the applications that drive business. Powered by FlashCore technology, FlashSystem 900 delivers the high performance, MicroLatency, enterprise reliability and operational efficiencies required for gaining competitive advantage in today’s dynamic marketplace. Minimum latency •Write 90 µs •Read 155 µs Highlights Accelerate critical applications, support more concurrent users, speed batch processes and lower virtual desktop costs with the extreme performance of IBM® FlashCore technology Harness the power of data with the ultra-fast response times of IBM MicroLatency Leverage macro efficiency for high storage density, low power consumption and improved resource utilization Maximum IOPS 4 KB •Read (100%, random) 1,100,000 •Read/write (70%/30%, random) 800,000 •Write (100%, random) 600,000 Maximum bandwidth 256 KB •Read (100%, sequential) 10 GB/s •Write (100%, sequential) 4.5 GB/s 15 © 2015 IBM Corporation BLU ACCELERATION 16 © 2015 IBM Corporation DB2 with BLU Acceleration • Innovative new technology for analytic queries • • • Columnar storage New run-time engine with vector processing, deep multi-core optimizations and cache-aware memory management “Active compression” - unique encoding for further storage reduction beyond DB2 10 levels, and runtime processing without decompression • Built directly into the DB2 kernel • BLU tables can coexist with traditional row tables, in same schema, tablespaces, bufferpools • Memory Query any combination of BLU or row data • Memory and CPU cache optimized • Value : Order-of-magnitude • • • CPU Storage Performance Storage savings Time to value 17 © 2015 IBM Corporation BLU Acceleration Introducing BLU Acceleration IBM Research & Development Lab Innovations 18 © 2015 IBM Corporation Hardware and Software Configuration Operating Minimum Version System Requirements supporting BLU Recommended OS Versions Processor Recommendations AIX AIX 6.1 TL7 SP6 or AIX 7.1 TL1 SP6 AIX 7.1 TL3 or higher POWER8 Linux x86 (64-bit only) RHEL6, SLES10 SP4 or SLES 11 SP2, Ubuntu 14.04 RHEL 7.1 or higher SLES11 SP3 or higher Ubuntu 14.04 Intel Xeon E5 v3 or E7 v3 Linux on Power (LE) RHEL 7.1, Ubuntu 14.04.02 RHEL 7.1, Ubuntu 14.04.02 POWER8 Linux on z RHEL6, SLES10 SP4 or SLES 11 SP2 RHEL 6.5 or higher SLES11 SP3 or higher z13 Windows (64-bit only) Windows 7, Windows Server 2008 Windows Server 2012 R2 Intel Xeon E5 v3 or E7 v3 (up to 4 sockets) 19 © 2015 IBM Corporation What About Storage? • General storage recommendations are unchanged with BLU Acceleration • When BLU Acceleration does I/O it is typically non-sequential (skip sequential or random I/O) • High performing storage (e.g., SSD) can benefit workloads with undersized bufferpools or significant spilling • Define active data tablespaces and temporary tablespaces on high performance SSD storage 20 © 2015 IBM Corporation Performance vs. Memory Per Core and Storage Type Streams per hour Query performance vs. Memory per core & Storage type (5-stream BI) 8GB / HDD 8GB / Flash 16GB / HDD 8GB / HDD 8GB / Flash 16GB / HDD Memory per core & Storage type 48 core P7+ with 8GB/core memory Bufferpool/sortheap split: 40%/40% Flash Configuration FlashSystem 810 Four 8Gb FC ports Five 480 GB modules (1 used as Active Spare) LUN’s: Eight 239 GB LUN’s HDD Configuration DS4800 Four 4 Gb FC ports 12 x EXP810 w/ 300 GB 15K RPM HDD LUN’s: Eight 12+P RAID5 (Used 240 GB of each) 8GB / HDD 8GB / Flash 16GB / HDD 21 © 2015 IBM Corporation SIZING GUIDELINES 22 © 2015 IBM Corporation Sizing Guideline Considerations • This is a rule of thumb guideline which does not differentiate among: • • • Family of processor (Power, Intel, or Z) IFL on Linux for z is treated as equivalent to a core on Power/Intel Generations of processor (i.e. P7 vs P8, IvyBridge vs Haswell, zEC12 vs z13) • This is a general guideline provided to size a system for performance in a production scenario irrespective of platform of choice and without detailed workload knowledge • Recommended minimum production configuration for BLU Acceleration documented at • • http://www01.ibm.com/support/docview.wss?uid=swg27038033 (System requirements for IBM DB2 for Linux, Unix, and Windows) For successful operation the memory minimum is more critical than the core count minimum 23 © 2015 IBM Corporation Sizing Guideline Summary • The initial sizing is based on these assumptions for all decision criteria: • • • 20-30 active, concurrent users 80% simple, 20% complex queries 30% active rows, 50% active columns, 7.5x compression • Minimum recommended: 8 cores or IFLs and 128GB of RAM for each 3TB of uncompressed (raw) user data • For higher concurrency/complexity, increase the minimum configuration and maintain 16GB RAM:core ratio • e.g., multiply CPU and RAM by a factor of 2 • Increased core/ IFL counts and memory lead to greater performance improvements • If a customer will settle for smaller performance improvements, minimum configurations (or less) can be used, but under-sizing memory can lead to queries failing due to insufficient memory 24 © 2015 IBM Corporation To Improve Sizing Accuracy • Determine the size of active uncompressed table data • • • Size of uncompressed data Active rows Active columns • Determine core and memory requirements • • • Expected BLU compression rates Degree of concurrency and complexity of the workload Target memory usage for bufferpools 25 © 2015 IBM Corporation Determine the Size of Active Uncompressed Table Data Step (a) : Determine the total size of uncompressed table data • For a new database, use the filesystem reported size of delimited ASCII input files • For an existing, uncompressed database, calculate the total size of table data only • For an existing, compressed database, estimate the uncompressed table data size • • Ensure all table statistics are up-to-date Use a SQL query on the system catalog tables to estimate the total size. Note that this query depends on table statistics being up to date select sum(a.fpages * (1.0/(1.0 - (cast(a.pctpagessaved as decimal(5,2))/100))) * c.pagesize/1024/1024/1024) as uncompressed_table_data_GB from syscat.tables a,syscat.datapartitions b, syscat.tablespaces c where a.tabschema not like 'SYS%' and a.tabschema = b.tabschema and a.tabname = b.tabname and b.datapartitionid = 0 and b.tbspaceid = c.tbspaceid 26 © 2015 IBM Corporation Determine the Size of Active Uncompressed Table Data Step (b) : Determine the percentage of active rows • Limit to active rows • • • Review how hot/warm/cold are the data rows Focus on what percentage of the range of data in the database a query typically touches Example • • Database stores 7 years of data If typical query accesses • All 7 years 100% of rows are active • 3/7 years 43% of rows are active • 1/7 years 14% of rows are active • Default assumption • Active Rows Inactive Rows Common rule of thumb estimate in warehouses is that 25-30% of all rows are active at any given time 27 © 2015 IBM Corporation Determine the Size of Active Uncompressed Table Data Step (c) : Determine the percentage of active columns • Limit to active columns • • • • Review how hot/warm/cold are the columns Focus on what percentage of the columns in the tables a query typically touches EXPLAIN plans will show column access per query Use EXPLAIN_OBJECT.COLUMN_COUNT and EXPLAIN_STREAM.COLUMN_COUNT to determine ratio per query • Default assumption • Common rule of thumb estimate in warehouses is that less than 50% of all columns are active at any given time Column data being accessed by a query at any given time Active Rows Inactive Rows 28 © 2015 IBM Corporation Determine the Size of Active Uncompressed Table Data • The size of active uncompressed table data = • Size of uncompressed data * percentage of active rows * percentage of active columns • Use size of active uncompressed table data to determine core and memory requirements based on • • • Expected BLU compression rates Degree of concurrency and complexity of the workload Target memory usage for bufferpools 29 © 2015 IBM Corporation BLU Acceleration Compression • To determine amount of memory required for BLU Acceleration, need to estimate the BLU compression rate • Conservative rule of thumb: 7x-8x compression of table data • Over-estimating the compression rate will result in an under-sized system • Calculate the size of compressed active table data by dividing the size of uncompressed active table data (previously obtained) by this ratio 30 © 2015 IBM Corporation How Much Memory Does BLU Need? • Memory is allocated within BLU in three areas • • • Bufferpool Sort memory (for join and group-by) Other memory (for locking, utility heap, etc.) • In general you want to keep the compressed active data in memory • In general you want to balance bufferpool and working memory, with some bias towards working memory • Calculate the total amount of memory required for the desired bufferpool configuration (using 32K pages) 31 © 2015 IBM Corporation How Much Memory Does BLU Need? (cont.) • BLU Acceleration best practice recommendations have two different distribution recommendations (bufferpool/sortheapthreshold/other) • • Typical workloads: High concurrency: 40/40/20 25/55/20 • For a given bufferpool size (to contain all compressed active data), determine the total amount of memory required for the desired distribution (see above) • Calculate the required cores to support the memory distribution using a 16GB/core ratio (recommended) • Round the number of cores up to the nearest appropriate socket • • Determine the amount of memory keeping in mind the number of DIMM slots on the machine • • e.g., 2 sockets, 12 cores e.g., 24 DIMMs slots x 16GB/DIMM = 384GB Increase the memory/core ratio if required to address concurrency/complexity requirements 32 © 2015 IBM Corporation Effect of Reducing Memory/Core with BLU 64-core P7+ LPAR with FlashSystem 810 storage 16 12 8 4 33 © 2015 IBM Corporation Degree of Concurrency and Complexity • More complex, higher concurrency workload will benefit from more cores and memory • For degree of concurrency, consider factors including • • How many active concurrent users? What is the breakdown of simple, intermediate, and complex users • For degree of complexity, examine the query workload • What is the breakdown between simple, moderate, and complex queries • Simple: Single table scans with simple predicates and aggregation • Moderate: Small number of tables joined, moderate sort and group-by requirements and modest result sets • Complex: multi-page SQL with huge joins, huge sorts, huge group-bys and larger result sets • How much parallelism should be dedicated to each query? 34 © 2015 IBM Corporation EXAMPLE 35 © 2015 IBM Corporation Example 100 GB 250 GB Compressed Active Data (CAD) 1 TB 10 TB Raw Uncompressed Data Apply BLU’s Compression Factor Out Inactive Rows Factor Out Inactive Columns Conservative estimate = 7x-8x, but 10x or higher is not uncommon. Refine estimate for your data via : • loading data into BLU tables and measuring actual compression, or, • using a compression estimation tool Most warehouse queries access only recent data. Most warehouse queries access only a subset of columns. Estimate via: • size of delimited ascii load files, and/or • catalog query for table size (uncompressed) on existing database A common Rule-of-Thumb (RoT) is that 25-30% of the data in a warehouse is active. For example, in a warehouse which stores 8 years of data, and in which queries access the 2 most recent years’ data, the proportion of active data rows would be 25%. A conservative RoT is that 50% of column data is frequently accessed. For this example, let’s assume that only 40% of column data is frequently accessed. 36 © 2015 IBM Corporation Example (cont.) 100 GB 100 GB Compressed Active Data (CAD) Bufferpool Factor Out Inactive Columns Most warehouse queries access only a subset of columns. A conservative RoT is that 50% of column data is frequently accessed. For this example, let’s say that only 40% of column data is frequently accessed. 256 GB Instance Memory Decide % of CAD to Buffer in Bufferpool Apply RoT to Determine Overall Memory Sizing We recommend sizing bufferpools at 100% of CAD. This is a conservative approach and provides headroom for possible errors in previous estimates. We recommend that bufferpools comprise between 25% and 40% of overall instance memory. BLU does not require all active data to be buffered. BLU’s dynamic list prefetching is designed to keep the CPUs running at full speed, by prefetching the next set of columnar data while processing the current set of data. In some cases, sizing bufferpools as low as 50% of CAD can be sufficient, if you have a capable I/O subsystem. Use 40% for typical workloads. Use a lower % (eg 25%) for higher concurrency and/or higher complexity workloads. (Higher concurrency or complexity workloads benefit from more working memory for sorting, hash joins and grouping operations). Here we apply the typical 40%. 37 © 2015 IBM Corporation SUMMARY 38 © 2015 IBM Corporation Summary • DB2 with BLU Acceleration is now supported on an even broader set of platforms • Linux on Power, Linux on z, Windows • New generations of hardware match well with the preferred memory and compute rich environments of BLU • Additional field experience has helped us validate and refine our sizing “rules of thumb” and overall methodology • General recommendation is for more memory per core • Workload specifics and specific performance SLAs will continue to require more detailed analysis 39 © 2015 IBM Corporation DB2 Tech Talk: Sizing BLU Acceleration Next Steps Roadmap Step One Step Two Learn more about BLU Acceleration • www.ibmbluhub.com •Read BLU Acceleration compression blog •Get technical information on sizing BLU •Read installation requirements from the DB2 Information Center • Read IBM Redbooks: Architecting and Deploying BLU Acceleration For Reference Call IBM to schedule a demo or learn more • 1 800 966-9875 (U.S) • 1-888-746-7426 (Canada) • 1800-425-3333 (India) • Or visit http://www.ibm.com/planetwide/ for contact information worldwide R Step Three Step Four •Download the trial •Download the 90-day free trial of DB2 with BLU Acceleration if you have not done so. (Be sure to select the warehouse option.) •Join the community • Twitter: @trydb2 and @IBM_DB2 • LinkedIn: BLU Acceleration Group 40 40 © 2015 IBM Corporation Upcoming Tech Talks Don’t miss these in-depth DB2 feature talks! Next DB2 Tech Talk: dashDB - scale your BLU warehouse in the cloud •June 25th 2015 at 12:30 PM EST •Speaker: Kelly Schlamb, Executive IT specialist, WW Cloud Data Services Technical Sales, DB2 and pureScale specialist How to register : DB2 Tech Talks web site •Registration: http://www.idug.org/p/cm/ld/fid=209 IDUG DB2 Educational Event in AUS •September 15-18, 2015 •Melbourne, Australia www.idug.org/p/cm/ld/fid=586 Insight 2015 •October 25-29, 2015 •Las Vegas, Nevada http://www01.ibm.com/software/events/insight/ IDUG Europe Tech Conference •November 15-20, 2015 •Dublin, Ireland www.idug.org/emea Dates and topics subject to change and modification. 41 41 © 2015 IBM Corporation Find a local user group chapter! http://bit.ly/1z2RYcl 42 42 © 2015 IBM Corporation Questions Listening live? Submit questions at the top of this webcast Listening in replay and have questions? Email our DB2 Tech Talk Coordinator: Ammar Naji ([email protected]) 43 43 © 2015 IBM Corporation Thanks for attending! Download this presentation on the right hand side of your window under the “resources” widget. 44 44 © 2015 IBM Corporation