...

AFRL Spaced-Based Radar Workshop Topic Area: "PROCESSING"

by user

on
Category: Documents
11

views

Report

Comments

Transcript

AFRL Spaced-Based Radar Workshop Topic Area: "PROCESSING"
AFRL Spaced-Based Radar Workshop
Topic Area: "PROCESSING"
A Power Efficient Embedded High Performance Computer for Spaced Based Radar
Signal Processing
Carl Puschak, Lockheed Martin Advanced Technology Laboratories
1 Federal Street, Camden NJ 08104
609-338-4233
[email protected]
Virginia W. Ross, Air Force Research Laboratory/IFTC
26 Electronic Parkway, Rome, NY 13441
305-330-4384, DSN 587-4384
[email protected]
ABSTRACT - This paper presents the development of a 450 processor, 6 board
embedded signal processing system that provides 400 MFLOPS per watt processing
capability. A performance analysis for on-board radar processing applications will be
presented. An overview of the CompactPCI and Myrinet based multi-processor system
architecture will be given with details into the mechanical packaging, power distribution,
and thermal management design issues that were encountered during the design process
A Power-Efficient, Embedded,
High-Performance Computer for
Space-Based Radar Signal Processing
Carl Puschak
Lockheed Martin/ATL
[email protected]
609-338-4233
Virginia Watson Ross
Air Force Research Laboratory/IFTC
[email protected]
315-330-4384
Wafer-Scale Signal Processor
6U Compact PCI Card
64bit PCI Backplane
MCM2
MCM2
MCM2
MCM2
3D MCM
PCI-to-PCI
Bridge
PCI Bus
1D MCM2
1D MCM2
PCI-to-PCI
Bridge
PMC I/O
PMC I/O
¥ 150 CPUs, 200 Mflop/
CPU
¥ 30 Gflops/Card
¥ 512K Byte Memory/CPU
¥ 75 Mbytes/Card
¥ 50 MHz Programmable
Clock
MCM2
MCM2
MCM2
MCM2
MCM2
MCM2
MCM2
MCM2
3D MCM
3D MCM
1D MCM2
PCI Bus
¥ 180-200 Mflops/Watt
¥ 168 Watts/Card, 3.3
Volt Supply
¥ Liquid Cooled Cold
Plate
¥ 3-PCI Mezzanine Card
Slots
¥ ~$50K/Card
CNP 10/15/98-2
Multi-Chip Module Processing Block
¥ 2Ó x 2Ó Ceramic Substrate
¥ 5 Dual CPU Processing Elements
¥ 2 Gflops per Substrate
¥ 20 Synchronous SRAMS
¥ 5 Mbyte Memory
A
FPASP5
IOBUS
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
SRAM
64K x 36
A
FPASP5
CLK_NOC
SRAM
64K x 36
B
PCIF
FPASP5
¥ 33 MHz 64 Bit PCI Bus
¥ 50 MHz Local CPU IO Bus
A
PCI BUS
¥ 64 Bit PCI I/F Chip
B
TO
FPASP5
IOCK_NOC
SRAM
64K x 36
B
¥ 10.7 Watts
A
IOB_CLOCK
FPASP5
¥ 324 Pin Ceramic Package
B
A
JTAG
DONEA,
DONEB
TO
FPASP5
FPASP5
B
CNP 10/15/98-3
Wafer-Scale Dual Processor Chip
¥
¥
¥
¥
¥
GNU C Compiler
Assembler
ISA Simulator
RTL VHDL Model
RTEMS Kernel
¥ Very Long Instruction Word
(VLIW) Microprogrammed
Architecture
¥ 0.5 Micron, 50 MHz Clock Goal
¥ Dual 72 Bit External Member
Banks
¥ Complete I/O and Internal JTAG
Test Interface
¥ Shared 64 Bit I/O Bus
¥ On-CHip Microstore RAM for
Custom Instruction
¥ Lockstep Mode
¥ Two Single Precision Multiplies
and ALU Operations Per Clock
¥ One Double Precision Multiply
and ALU Operation Per Clock
CNP 10/15/98-4
3D Multi-Chip Module Processor Stack
¥ 40-Processor Four
Stack
WELDED LID STRUCTURE
SEPARATION AREAS
BUS BARS
COPPER PLANES
MINCO
LEADS
¥ 8 Gflops
SOLDER TO HDI
I/O
LEADS
SOLDER OR Ag EPOXY
3D MCM PACKAGE
3D MCM GLUE
PARTIAL VIEW CUTAWAY AREA
¥ 43 Watts at 66%
Utilization
¥ Full J-TAG Built-In Test
¥ Size 2.4ÓL x 2.4ÓW x
0.4ÓH
¥ 324 Pin Leaded
Package
CNP 10/15/98-5
SPace Electronically Agile Radar
(SPEAR) GMTI STAP Parameters
¥ Frequency
10 GHz
¥ PRF
2015 Hz
¥ Tx Pulse Width
124 µs
¥ A/D Rate
10 MHz
¥ Rx Channels
18
¥ CPI Duration
31.7618 ms
¥ CPIs/sec
31.4844
¥ Peak Power
8000 W
¥ Avg Power
2000 W
¥ Min Detect Vel
1.7 m/s
¥ PFA
1.00e-06
¥ Search
750 km2/sec
CNP 10/15/98-6
SPEAR GMTI STAP Algorithm Pipeline
Pipe Stage
Operation
1A
1B
1C
1D
1E
1F
receive
int->cmplx
copy stagger
real wind
doppler FFT
send to QR
0
9
0
13
386
0
2
13
17
13
166
10
18
36
104
104
104
124
2A
2B
2C
QR Factor
Back Sub
Send Weights
1044
9
0
310
9
0
130
120
120
3A
3B
3C
Apply Wts
Partial Sum
Send Range
77
14
0
26
22
2
104
12
24
4A
4B
4C
Final Sum
Pulse Cmprs
CFAR
3
77
9
4
22
10
2
4
4
1641
626
130
Total
MFLOPs/CPI MClks/CPI
Mem (MB)
CNP 10/15/98-7
SPEAR GMTI STAP Algorithm Flow
18 Chan
Receive Data
Int to Cmplx
Copy 3 Stagger
Real Window
Ò0Ó FLOPs/CPI
9 MFLOPs
Ò0Ó MFLOPs
26 MFLOPs
18 MBytes Total
36 MBytes
104 MBytes
104 MBytes
66 X 3722 X 18
66 X 3722 X 18
64 X 3722 X 3 X 18
64 X 3722 X 3 X 18
Send
Weights
Backsub
QR Factorize
9 MFLOPs
1.044 GFLOPs
120 MBytes
130 MBytes
54 X 64 X 6
54 X 54 X 6 X 64
Send
QR
Apply Weights
Partial Sum
78 MFLOPs
14 MFLOPs
120 MBytes
12 MBytes
64 X 3722 X 3 X 18
Decimate
Move
125 MBytes
135 X 54 X 6 X 64
64 X 3722 X 5 sums X 6
Doppler FFT
386 MFLOPs
104 MBytes
64 X 3722 X 3 X 18
Transpose
Send to
Range Proc
Ò0Ó MFLOPs
12 MBytes
3722 X 64 X 6
CNP 10/15/98-8
SPEAR GMTI STAP Algorithm
Flow (cont.)
Reserve 8192 Range
From
sum
Complete Sum
Zero fill
Range FFT
Multiply
3 MFLOPs
Ò0Ó MFLOPS
37 MFLOPs
3 MFLOPs
4 MBytes Total
4 MBytes
4 MBytes
4 MBytes
8192 X 64
64 X 8192
64 X 8192
3722 X 64 X 5
sums
CFAR
Inverse FFT
9 MFLOPS
37 MFLOPs
4 MBytes
4 MBytes
64 X 8192
CNP 10/15/98-9
SPEAR WSSPT Architecture
6-U Ruggedized
Card Cage
Power Supply
From
RADAR
A-to-Ds
3 Radar
Channels
6 Myrinet
Board 1
Fron
t Sid
Bac
kS
e
ide
Board 2
Fro
nt
Sid
e
Myrinet
Switch
Board 3
Board 4
CNP 10/15/98-10
64
Pu
ls
e
Input Processing Partitioning
3722 Range
Front side Myrinet 1
3 Chan
18 Chan Total
an
3 Ch
Back side Myrinet
3 Chan
3 Chan
3C
han
3 Ch
an
CNP 10/15/98-11
Pipe Stage 1 and 3
Processor Allocation Driven by
Memory Requirements
3MB Cube Section
3MBytes
Double Buffer
3MBytes
Use 5 MCM layers on each side
3722 Range Intervals
3 Channels
6 MByte Double Buffer
4.6 MB
Layer 0
4.6 MB
Layer 1
4.6 MB
Layer 2
4.6 MB
Layer 3
4.6 MB
Layer 4
ta
o
T
2
3
19 X
64
l
CNP 10/15/98-12
Subdivision of One Layer
( MCM 0, Layer 0 )
120KB receive
double buffer
(each proc)
3 Channels
Subdivided in
contiguous
range segments
4.6 MB
CNP 10/15/98-13
Card Algorithm Mapping
Pipeline:
Stage 1 - Doppler
For the 3 Cards with 3 Myrinet Interfaces Each
Stage 2 - QR
Stage 3 - Weight & Sum [4th card is dedicated to stage 2 (QR processing)]
Stage 4 - Range & CFAR
Layer 0 Layers 1-3
Stage 1 Stage 2
4-Stack
1-D
Stage 3
Stage 1
Stage 1
Stage 4
Stage 3
Stage 3
4-Stack
Stage 4
Stage 4
1-D
Myrinet
4-Stack
Stage 2
Myrinet
Stage 1
Stage 3
Stage 4
Myrinet
Front
Back
CNP 10/15/98-14
SPEAR Processing Summary
¥ 1641 MFLOPs/CPI and 626 MCLKs/CPI
¥ FPASPs running at 65% of peak
¥ Myrinet running at 56% of peak
¥ Total of 20 GCLK/sec req. for 31.5 CPI/sec
Ñ 52.5 GFLOPs/sec
¥ 400 Procs required (minimum)
Ñ 200 FPASP5s, 40 MCM Layers, 3 Boards
¥ Use 4 Boards with 10 Myrinet cards
Ñ Gives us 570 procs, 285 MBytes total (plenty)
CNP 10/15/98-15
References and Acknowledgment
¥ References
[1] ÒSPace Electronically Agile Radar (SPEAR),Ó John W. Garhham;
PL/VTMS (RDL), SPEAR Radar Parameters
[2] ÒHDI Design of a 100 MFlops/Watt Floating Point DSP,Ó Dr. R. Linderman,
Dr. M. Linderman, R. Kohler, Maj. J. Comtois, PhD; Air Force Research
Laboratory; J. Sabatini, GE Corporte Reserch and Development
Laboratory
¥ Acknowledgment
Ñ The authors thank Jon Russo, Lockheed Martin Advanced Technology
Laboratories, for mapping the SPEAR GMTI Algorithm onto the WaferScale Processor
CNP 10/15/98-16
Fly UP