AFRL Spaced-Based Radar Workshop Topic Area: "PROCESSING"
by user
Comments
Transcript
AFRL Spaced-Based Radar Workshop Topic Area: "PROCESSING"
AFRL Spaced-Based Radar Workshop Topic Area: "PROCESSING" A Power Efficient Embedded High Performance Computer for Spaced Based Radar Signal Processing Carl Puschak, Lockheed Martin Advanced Technology Laboratories 1 Federal Street, Camden NJ 08104 609-338-4233 [email protected] Virginia W. Ross, Air Force Research Laboratory/IFTC 26 Electronic Parkway, Rome, NY 13441 305-330-4384, DSN 587-4384 [email protected] ABSTRACT - This paper presents the development of a 450 processor, 6 board embedded signal processing system that provides 400 MFLOPS per watt processing capability. A performance analysis for on-board radar processing applications will be presented. An overview of the CompactPCI and Myrinet based multi-processor system architecture will be given with details into the mechanical packaging, power distribution, and thermal management design issues that were encountered during the design process A Power-Efficient, Embedded, High-Performance Computer for Space-Based Radar Signal Processing Carl Puschak Lockheed Martin/ATL [email protected] 609-338-4233 Virginia Watson Ross Air Force Research Laboratory/IFTC [email protected] 315-330-4384 Wafer-Scale Signal Processor 6U Compact PCI Card 64bit PCI Backplane MCM2 MCM2 MCM2 MCM2 3D MCM PCI-to-PCI Bridge PCI Bus 1D MCM2 1D MCM2 PCI-to-PCI Bridge PMC I/O PMC I/O ¥ 150 CPUs, 200 Mflop/ CPU ¥ 30 Gflops/Card ¥ 512K Byte Memory/CPU ¥ 75 Mbytes/Card ¥ 50 MHz Programmable Clock MCM2 MCM2 MCM2 MCM2 MCM2 MCM2 MCM2 MCM2 3D MCM 3D MCM 1D MCM2 PCI Bus ¥ 180-200 Mflops/Watt ¥ 168 Watts/Card, 3.3 Volt Supply ¥ Liquid Cooled Cold Plate ¥ 3-PCI Mezzanine Card Slots ¥ ~$50K/Card CNP 10/15/98-2 Multi-Chip Module Processing Block ¥ 2Ó x 2Ó Ceramic Substrate ¥ 5 Dual CPU Processing Elements ¥ 2 Gflops per Substrate ¥ 20 Synchronous SRAMS ¥ 5 Mbyte Memory A FPASP5 IOBUS SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 SRAM 64K x 36 A FPASP5 CLK_NOC SRAM 64K x 36 B PCIF FPASP5 ¥ 33 MHz 64 Bit PCI Bus ¥ 50 MHz Local CPU IO Bus A PCI BUS ¥ 64 Bit PCI I/F Chip B TO FPASP5 IOCK_NOC SRAM 64K x 36 B ¥ 10.7 Watts A IOB_CLOCK FPASP5 ¥ 324 Pin Ceramic Package B A JTAG DONEA, DONEB TO FPASP5 FPASP5 B CNP 10/15/98-3 Wafer-Scale Dual Processor Chip ¥ ¥ ¥ ¥ ¥ GNU C Compiler Assembler ISA Simulator RTL VHDL Model RTEMS Kernel ¥ Very Long Instruction Word (VLIW) Microprogrammed Architecture ¥ 0.5 Micron, 50 MHz Clock Goal ¥ Dual 72 Bit External Member Banks ¥ Complete I/O and Internal JTAG Test Interface ¥ Shared 64 Bit I/O Bus ¥ On-CHip Microstore RAM for Custom Instruction ¥ Lockstep Mode ¥ Two Single Precision Multiplies and ALU Operations Per Clock ¥ One Double Precision Multiply and ALU Operation Per Clock CNP 10/15/98-4 3D Multi-Chip Module Processor Stack ¥ 40-Processor Four Stack WELDED LID STRUCTURE SEPARATION AREAS BUS BARS COPPER PLANES MINCO LEADS ¥ 8 Gflops SOLDER TO HDI I/O LEADS SOLDER OR Ag EPOXY 3D MCM PACKAGE 3D MCM GLUE PARTIAL VIEW CUTAWAY AREA ¥ 43 Watts at 66% Utilization ¥ Full J-TAG Built-In Test ¥ Size 2.4ÓL x 2.4ÓW x 0.4ÓH ¥ 324 Pin Leaded Package CNP 10/15/98-5 SPace Electronically Agile Radar (SPEAR) GMTI STAP Parameters ¥ Frequency 10 GHz ¥ PRF 2015 Hz ¥ Tx Pulse Width 124 µs ¥ A/D Rate 10 MHz ¥ Rx Channels 18 ¥ CPI Duration 31.7618 ms ¥ CPIs/sec 31.4844 ¥ Peak Power 8000 W ¥ Avg Power 2000 W ¥ Min Detect Vel 1.7 m/s ¥ PFA 1.00e-06 ¥ Search 750 km2/sec CNP 10/15/98-6 SPEAR GMTI STAP Algorithm Pipeline Pipe Stage Operation 1A 1B 1C 1D 1E 1F receive int->cmplx copy stagger real wind doppler FFT send to QR 0 9 0 13 386 0 2 13 17 13 166 10 18 36 104 104 104 124 2A 2B 2C QR Factor Back Sub Send Weights 1044 9 0 310 9 0 130 120 120 3A 3B 3C Apply Wts Partial Sum Send Range 77 14 0 26 22 2 104 12 24 4A 4B 4C Final Sum Pulse Cmprs CFAR 3 77 9 4 22 10 2 4 4 1641 626 130 Total MFLOPs/CPI MClks/CPI Mem (MB) CNP 10/15/98-7 SPEAR GMTI STAP Algorithm Flow 18 Chan Receive Data Int to Cmplx Copy 3 Stagger Real Window Ò0Ó FLOPs/CPI 9 MFLOPs Ò0Ó MFLOPs 26 MFLOPs 18 MBytes Total 36 MBytes 104 MBytes 104 MBytes 66 X 3722 X 18 66 X 3722 X 18 64 X 3722 X 3 X 18 64 X 3722 X 3 X 18 Send Weights Backsub QR Factorize 9 MFLOPs 1.044 GFLOPs 120 MBytes 130 MBytes 54 X 64 X 6 54 X 54 X 6 X 64 Send QR Apply Weights Partial Sum 78 MFLOPs 14 MFLOPs 120 MBytes 12 MBytes 64 X 3722 X 3 X 18 Decimate Move 125 MBytes 135 X 54 X 6 X 64 64 X 3722 X 5 sums X 6 Doppler FFT 386 MFLOPs 104 MBytes 64 X 3722 X 3 X 18 Transpose Send to Range Proc Ò0Ó MFLOPs 12 MBytes 3722 X 64 X 6 CNP 10/15/98-8 SPEAR GMTI STAP Algorithm Flow (cont.) Reserve 8192 Range From sum Complete Sum Zero fill Range FFT Multiply 3 MFLOPs Ò0Ó MFLOPS 37 MFLOPs 3 MFLOPs 4 MBytes Total 4 MBytes 4 MBytes 4 MBytes 8192 X 64 64 X 8192 64 X 8192 3722 X 64 X 5 sums CFAR Inverse FFT 9 MFLOPS 37 MFLOPs 4 MBytes 4 MBytes 64 X 8192 CNP 10/15/98-9 SPEAR WSSPT Architecture 6-U Ruggedized Card Cage Power Supply From RADAR A-to-Ds 3 Radar Channels 6 Myrinet Board 1 Fron t Sid Bac kS e ide Board 2 Fro nt Sid e Myrinet Switch Board 3 Board 4 CNP 10/15/98-10 64 Pu ls e Input Processing Partitioning 3722 Range Front side Myrinet 1 3 Chan 18 Chan Total an 3 Ch Back side Myrinet 3 Chan 3 Chan 3C han 3 Ch an CNP 10/15/98-11 Pipe Stage 1 and 3 Processor Allocation Driven by Memory Requirements 3MB Cube Section 3MBytes Double Buffer 3MBytes Use 5 MCM layers on each side 3722 Range Intervals 3 Channels 6 MByte Double Buffer 4.6 MB Layer 0 4.6 MB Layer 1 4.6 MB Layer 2 4.6 MB Layer 3 4.6 MB Layer 4 ta o T 2 3 19 X 64 l CNP 10/15/98-12 Subdivision of One Layer ( MCM 0, Layer 0 ) 120KB receive double buffer (each proc) 3 Channels Subdivided in contiguous range segments 4.6 MB CNP 10/15/98-13 Card Algorithm Mapping Pipeline: Stage 1 - Doppler For the 3 Cards with 3 Myrinet Interfaces Each Stage 2 - QR Stage 3 - Weight & Sum [4th card is dedicated to stage 2 (QR processing)] Stage 4 - Range & CFAR Layer 0 Layers 1-3 Stage 1 Stage 2 4-Stack 1-D Stage 3 Stage 1 Stage 1 Stage 4 Stage 3 Stage 3 4-Stack Stage 4 Stage 4 1-D Myrinet 4-Stack Stage 2 Myrinet Stage 1 Stage 3 Stage 4 Myrinet Front Back CNP 10/15/98-14 SPEAR Processing Summary ¥ 1641 MFLOPs/CPI and 626 MCLKs/CPI ¥ FPASPs running at 65% of peak ¥ Myrinet running at 56% of peak ¥ Total of 20 GCLK/sec req. for 31.5 CPI/sec Ñ 52.5 GFLOPs/sec ¥ 400 Procs required (minimum) Ñ 200 FPASP5s, 40 MCM Layers, 3 Boards ¥ Use 4 Boards with 10 Myrinet cards Ñ Gives us 570 procs, 285 MBytes total (plenty) CNP 10/15/98-15 References and Acknowledgment ¥ References [1] ÒSPace Electronically Agile Radar (SPEAR),Ó John W. Garhham; PL/VTMS (RDL), SPEAR Radar Parameters [2] ÒHDI Design of a 100 MFlops/Watt Floating Point DSP,Ó Dr. R. Linderman, Dr. M. Linderman, R. Kohler, Maj. J. Comtois, PhD; Air Force Research Laboratory; J. Sabatini, GE Corporte Reserch and Development Laboratory ¥ Acknowledgment Ñ The authors thank Jon Russo, Lockheed Martin Advanced Technology Laboratories, for mapping the SPEAR GMTI Algorithm onto the WaferScale Processor CNP 10/15/98-16