A next-generation many-core processor with reliability, fault tolerance and adaptive power
by user
Comments
Transcript
A next-generation many-core processor with reliability, fault tolerance and adaptive power
A next-generation many-core processor with reliability, fault tolerance and adaptive power management features optimized for embedded and high performance computing applications Simon McIntosh-Smith, VP of Applications, [email protected] HPEC, September 2008 Copyright © 2008 ClearSpeed Technology Inc. All rights reserved. 1 www.clearspeed.com The CSX700 Processor • Includes dual MTAP cores: – – – – – – – 96 GFLOPS peak (32 & 64-bit) 48 GMACS peak (16x16 → 32+64) 10W max power consumption 250MHz clock speed 192 Processing Elements (2x96) 8 spare PEs for resiliency ECC on all internal memories • On-die temperature sensors • Active power management • Dual integrated 64-bit DDR2 memory controllers with ECC • Integrated PCI Express x16 • CCBR chip-to-chip bridge port • IBM 90nm process • 266 million transistors • Shipping to customers since June 08 Copyright © 2008 ClearSpeed Technology Inc. All rights reserved. 2 www.clearspeed.com The ClearSpeed AdvanceTM e710, e720 and CATS-700 • 96 GFLOPS e710 & e720 fit standard 1U & HP blade servers – Low power consumption of 25W max, small, light, passively cooled – Designed for high reliability (MTBF) – All memory is error protected; no moving parts (e.g. fans) are required • CATS-700 1U system – 1.152 TFLOPS 32- and 64-bit floating point – 96 GBytes/s memory bandwidth to 24 GB of ECC protected DDR2 – 300W typical power consumption • Easy to use Software Development Kit – ANSI C compiler, gdb-based debugger, advanced profiler Copyright © 2008 ClearSpeed Technology Inc. All rights reserved. 3 www.clearspeed.com CSX700 FFT performance and e710 power consumption 7.4 7.5 7.0 6.6 6.5 6.5 ) W (r 6.0 e w o p e r 5.5 o c 0 0 7 5.0 SX C 5.7 5.9 4.9 128 5.2 256 512 4.5 1024 4.5 2048 4.0 3.8 3.5 3.6 50MHz 100M Hz 150MHz 200M Hz 250MHz Core Clock Speed 1D FFT performance up to 20 GFLOPS, 2D FFT performance up to 16 GFLOPS 1D convolution performance up to 22 GFLOPS, ~3 GFLOPS/watt on FFTs 10,000,000 128 256 512 1024 2,267,810 1D convolutions per second 2048 1,816,039 1,362,637 1,000,000 995,686 909,053 797,284 597,996 454,215 435,204 398,708 348,092 261,323 199,524 190,808 174,218 100,000 152,549 114,468 87,118 82,093 76,330 65,734 49,282 38,132 32,872 16,455 10,000 Copyright © 2008 ClearSpeed Technology Inc. All rights reserved. 50MHz 100MHz 150MHz www.clearspeed.com Core Clock Speed 200MHz 4 250MHz CSX700 and beyond • The CSX700 is much more power efficient than cell and GPUs for embedded processing. – – – – – E.g. for single precision complex 1024x1024 2D FFT: Cell (8 SPE): 38 GFLOPS 40W 0.95 GFLOP/watt S870 (Tesla) GPU: 50 GFLOPS 170W 0.07 GFLOP/watt x86 core: 3 GFLOPS 25W 0.12 GFLOP/watt CSX700: 20 GFLOPS 7W 2.86 GFLOP/watt • Next generation processor “Carnac” in design now – Focusing on 1- and 2D FFT performance – Design goal is 100 GFLOPS/watt sustained on 2D FFTs • ClearSpeed Federal Systems launched to support defense programs Copyright © 2008 ClearSpeed Technology Inc. All rights reserved. 5 www.clearspeed.com