...

Deployment of SAR and GMTI Signal Processing Bladed Linux Cluster

by user

on
Category: Documents
42

views

Report

Comments

Transcript

Deployment of SAR and GMTI Signal Processing Bladed Linux Cluster
Deployment of SAR and GMTI Signal Processing
on a Boeing 707 Aircraft using pMatlab and a
Bladed Linux Cluster
Jeremy Kepner, Tim Currie, Hahn Kim, Bipin Mathew,
Andrew McCabe, Michael Moore, Dan Rabinkin, Albert
Reuther, Andrew Rhoades, Lou Tella and Nadya Travinin
September 28, 2004
This work is sponsored by the Department of the Air Force under Air Force contract F19628-00-C-002.
Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily
endorsed by the United States Government.
MIT Lincoln Laboratory
Slide-1
Quicklook
Outline
•
Introduction
•
System
•
Software
•
Results
•
Summary
Slide-2
Quicklook
•
•
•
•
LiMIT
Technical Challenge
pMatlab
“QuickLook” Concept
MIT Lincoln Laboratory
LiMIT
•
Lincoln Multifunction Intelligence, Surveillance and
Reconnaissance Testbed
– Boeing 707 aircraft
– Fully equipped with sensors and networking
– Airborne research laboratory for development, testing, and
evaluation of sensors and processing algorithms
•
Employs Standard Processing Model for Research Platform
– Collect in the air/process on the ground
Slide-3
Quicklook
MIT Lincoln Laboratory
Processing Challenge
•
Can we process radar data (SAR & GMTI) in flight and
provide feedback on sensor performance in flight?
•
Requirements and Enablers
– Record and playback data
High speed RAID disk system
– High speed network
SGI
RAID Disk
Recorder
– High density parallel computing
Ruggedized bladed Linux cluster
14x2 CPU
IBM Blade Cluster
– Rapid algorithm development
pMatlab
Slide-4
Quicklook
MIT Lincoln Laboratory
pMatlab: Parallel Matlab Toolbox
Goals
•• Matlab
Matlab speedup
speedup through
through
transparent
transparent parallelism
parallelism
•• Near-real-time
Near-real-time rapid
rapid
prototyping
prototyping
High
High Performance
Performance Matlab
Matlab Applications
Applications
DoD Sensor
Processing
Ballistic
Ballistic Missile
Missile Defense
Defense
Laser
Laser Propagation
Propagation Simulation
Simulation
Hyperspectral
Hyperspectral Imaging
Imaging
Passive
Passive Sonar
Sonar
Airborne
Airborne Ground
Ground Moving
Moving
Target
Target Indicator
Indicator (GMTI)
(GMTI)
•• Airborne
Airborne Synthetic
Synthetic Aperture
Aperture
Radar
Radar (SAR)
(SAR)
Slide-5
Quicklook
Scientific
Simulation
Matlab*P
PVL
Lab-Wide
Lab-Wide Usage
Usage
••
••
••
••
••
DoD Decision
Support
MatlabMPI
Commercial
Applications
User
Interface
Parallel Matlab
Toolbox
Hardware
Interface
Parallel Computing Hardware
MIT Lincoln Laboratory
“QuickLook” Concept
28 CPU Bladed Cluster
Running pMatlab
RAID Disk
Recorder
Data Files
Analyst Workstation
Running Matlab
SAR
GMTI
…
(new)
Streaming
Sensor Data
MIT Lincoln Laboratory
Slide-6
Quicklook
Outline
•
Introduction
•
System
•
Software
•
Results
•
Summary
Slide-7
Quicklook
• ConOps
• Ruggedization
• Integration
MIT Lincoln Laboratory
Concept of Operations
~1 seconds = 1 dwell
Timeline
Record Streaming Data
~30 Seconds
Copy to Bladed Cluster
Process on Bladed Cluster
2 Dwell ~2 minutes
1st CPI ~ 2 minutes
Process on SGI
2 Dwells ~1 hour
To Other
Systems
RAID Disk
Recorder
Gbit Ethernet
(1/4x RT rate)
Split files,
Copy w/rcp
600 MB/s
(1x RT)
Streaming
Sensor Data
Slide-8
Quicklook
1st CPI ~ 1 minutes
Bladed Cluster
Running pMatlab
(1 TB local storage ~ 20 min data)
Xwindows
over Lan
Analyst Workstation
Running Matlab
SAR
GMTI
…
(new)
• Net benefit: 2 Dwells in 2 minutes vs. 1 hour
MIT Lincoln Laboratory
Vibration Tests
Tested only at operational (i.e.
in-flight) levels:
–
–
–
•
•
50
40
No Vibration
30
~0.7G (-6dB)
~1.0G (-3dB)
20
1.4G (0dB)
10
0
32
,4
54
,5
33
4
30
4,
19
4,
8
28
4,
52
36
,5
65
2
19
8,
4
02
1,
8
12
Tested in all 3 dimensions
Ran MatlabMPI file based
communication test up 14
CPUs/14 Hard drives
Throughput decreases seen at
1.4 G
60
16
•
0dB = 1.4G (above normal)
-3dB = ~1.0G (normal)
-6dB = ~0.7G (below normal)
X-axis, 13 CPU/13 HD
Throughput (MBps)
•
Message Sizes (Bytes)
Slide-9
Quicklook
MIT Lincoln Laboratory
Thermal Tests
•
Temperature ranges
–
–
•
Cooling tests
–
–
–
•
Test range: -20°C to 40°C
Bladecenter spec: 10°C to 35°C
Successfully cooled to -10°C
Failed at -20°C
Cargo bay typically ≥ 0°C
Heating tests
–
–
–
Slide-10
Quicklook
Used duct to draw outside air to
cool cluster inside oven
Successfully heated to 40°C
Outside air cooled cluster to 36°C
MIT Lincoln Laboratory
Mitigation Strategies
•
IBM Bladecenter is not designed
for 707’s operational
environment
•
Strategies to minimize risk of
damage:
1. Power down during takeoff/
landing
•
•
Avoids damage to hard drives
Radar is also powered down
2. Construct duct to draw cabin air
into cluster
•
•
Slide-11
Quicklook
Stabilizes cluster temperature
Prevents condensation of cabin air moisture within cluster
MIT Lincoln Laboratory
Integration
•
P2
VP1
P1
P2
rcp
VP2
VP1
VP2
…
VP1
VP2
VP1
…
P1
…
NODE 1
NODE 14
IBM Bladed Cluster
Nodes process CPIs in parallel, write
results onto node 1’s disk. Node 1
processor performs final
processing
Results displayed locally
Bladed Cluster
Gigabit Connection
SGI RAID System
Scan catalog files, select dwells and
CPIs to process (C/C shell)
Assign dwells/CPIs to nodes, package
up signature / aux data, one CPI per
file. Transfer data from SGI to each
processor’s disk (Matlab)
SGI
RAID
VP2
pMatlab allows integration to occur while algorithm is being finalized
Slide-12
Quicklook
MIT Lincoln Laboratory
Outline
•
Introduction
•
Hardware
•
Software
•
Results
•
Summary
Slide-13
Quicklook
• pMatlab architecture
• GMTI
• SAR
MIT Lincoln Laboratory
MatlabMPI & pMatlab Software Layers
Application
Vector/Matrix
Vector/Matrix
Parallel
Library
Output
Analysis
Input
Comp
Comp
Conduit
Task
Library
Library Layer
Layer (pMatlab)
(pMatlab)
Kernel
Kernel Layer
Layer
Messaging (MatlabMPI)
Math (Matlab)
User
Interface
Hardware
Interface
Parallel
Hardware
•• Can
Can build
build aa parallel
parallel library
library with
with aa
few
few messaging
messaging primitives
primitives
•• MatlabMPI
MatlabMPI provides
provides this
this
messaging
messaging capability:
capability:
MMPI_Send(dest
PI_Send(dest,co
,comm m,tag,X)
m,tag,X);;
XX == MPI_Recv(source,com
MPI_Recv(source,com m,tag)
m,tag);;
Slide-14
Quicklook
•• Can
Can build
build applications
applications with
with aa few
few
parallel
parallel structures
structures and
and functions
functions
•• pMatlab
pMatlab provides
provides parallel
parallel arrays
arrays
and
and functions
functions
XX == ones(n
ones(n,mapX);
,mapX);
YY == zeros(n,mapY);
zeros(n,mapY);
Y(:
Y(:,,::))==fffftt(X)
(X);;
MIT Lincoln Laboratory
LiMIT GMTI
Parallel Implementation
GMTI Block Diagram
DWELL
Input SIG
Data
CPI (N/dwell, parallel)
INS
Process
INS
SIG
Input Aux
Data
Input LOD
Data
EQC
EQ
SIG
Range Walk
Correction
Recon
Range Walk
Correction
Approach
Deal out CPIs to different CPUs
SIG
AUX
LOD
EQC
Subband
EQ
SIG LOD
Subband
LOD
SUBBAND (1,12, or 48)
PC
Deal to
Nodes
Crab
Correct
Doppler
Process
Beam Resteer
Correction
Adaptive
Beamform
STAP
CPI Detection
Processing
LOD
Performance
TIME/NODE/CPI
TIME FOR ALL 28 CPIS
Speedup
~100 sec
~200 sec
~14x
INS
Dwell Detect
Processing
•
Display
Demonstrates pMatlab in a large multi-stage application
–
•
Angle/Param Est
Geolocate
~13,000 lines of Matlab code
Driving new pMatlab features
–
Parallel sparse matrices for targets (dynamic data sizes)
Potential enabler for a whole new class of parallel algorithms
Applying to DARPA HPCS GraphTheory and NSA benchmarks
–
–
Slide-15
Quicklook
Mapping functions for system integration
Needs expert components!
MIT Lincoln Laboratory
GMTI pMatlab Implementation
•
GMTI pMatlab code fragment
% Create distribution spec: b = block, c = cyclic.
dist_spec(1).dist = 'b';
dist_spec(2).dist = 'c';
% Create Parallel Map.
p Map = m ap([1 M A PPIN G.Ncpus],dist_spec,0:MAPPIN G.Ncpus-1);
% Get local indices.
[lind.dim _1_ind lind.dim_2_ind] = global_ind(zeros(1,C*D,pMap));
% loop over local part
for index = 1:length(lind.dim_2_ind)
...
end
•
pMatlab primarily used for determining which CPIs to work on
–
Slide-16
Quicklook
CPIs dealt out using a cyclic distribution
MIT Lincoln Laboratory
LiMIT SAR
SAR Block Diagram
Collect
Pulse
Collect
Cube
8
Real Samples
@480MS/Sec
•
FFT
8
Select
FFT Bins
8
8
∑
(downsample)
Complex Samples
@180MS/Sec
Equalization
Coefficients
Polar
Histogram
Remap
&
SAR
Register
w.
Image
Autofocus
IMU
Output
Display
Most complex pMatlab application built (at that time)
–
–
•
8
Buffer
A/D
Buffer
Chirp
Coefficients
~4000 lines of Matlab code
CornerTurns of ~1 GByte data cubes
Drove new pMatlab features
–
Improving Corner turn performance
Working with Mathworks to improve
–
Selection of submatrices
Will be a key enabler for parallel linear algebra (LU, QR, …)
–
Large memory footprint applications
Can the file system be used more effectively
Slide-17
Quicklook
MIT Lincoln Laboratory
SAR pMatlab Implementation
•
SAR pMatlab code fragment
% Create Parallel Maps.
m apA = map([1 Ncpus],0:Ncpus-1);
m apB = map([Ncpus 1],0:Ncpus-1);
% Prepare distributed Matrices.
fd_ midc=zeros(m w,Totalnu mPulses,mapA);
fd_ midr=zeros(m w,Totalnu mPulses,mapB);
% Corner Turn (columns to rows).
fd_ midr(:,:) = fd_ midc;
•
Cornerturn Communication performed by overloaded ‘=‘ operator
–
–
Slide-18
Quicklook
Determines which pieces of matrix belongs where
Executes appropriate MatlabMPI send commands
MIT Lincoln Laboratory
Outline
•
Introduction
•
Implementation
•
Results
•
Summary
Slide-19
Quicklook
• Scaling Results
• Mission Results
• Future Work
MIT Lincoln Laboratory
Parallel Speedup
Parallel Performance
30
GMTI (1 per node)
GMTI (2 per node)
SAR (1 per node)
Linear
20
10
0
0
Slide-20
Quicklook
10
20
Number of Processors
MIT Lincoln Laboratory
30
SAR Parallel Performance
Corner Turn bandwidth
•
•
Slide-21
Quicklook
Application memory requirements too large for 1 CPU
• pMatlab a requirement for this application
Corner Turn performance is limiting factor
• Optimization efforts have improved time by 30%
• Believe additional improvement is possible
MIT Lincoln Laboratory
July Mission Plan
•
Final Integration
– Debug pMatlab on plane
– Working ~1 week before mission (~1 week after first flight)
– Development occurred during mission
•
Flight Plan
– Two data collection flights
– Flew a 50 km diameter box
– Six GPS-instrumented vehicles
Two 2.5T trucks
Two CUCV's
Two M577's
Slide-22
Quicklook
MIT Lincoln Laboratory
July Mission Environment
•
Stressing desert environment
Slide-23
Quicklook
MIT Lincoln Laboratory
July Mission GMTI results
•
GMTI successfully run on 707 in flight
–
–
•
Slide-24
Quicklook
Target reports
Range Doppler images
Plans to use QuickLook for streaming
processing in October mission
MIT Lincoln Laboratory
Embedded Computing Alternatives
•
Embedded Computer Systems
–
–
–
•
Designed for embedded signal processing
Advantages
1. Rugged - Certified Mil Spec
2. Lab has in-house experience
Disadvantage
1. Proprietary OS ⇒ No Matlab
Octave
–
–
–
Slide-25
Quicklook
Matlab “clone”
Advantage
1. MatlabMPI demonstrated using Octave
on SKY computer hardware
Disadvantages
1. Less functionality
2. Slower?
3. No object-oriented support ⇒ No
pMatlab support ⇒ Greater coding effort
MIT Lincoln Laboratory
Petascale pMatlab
pMapper: automatically finds best parallel mapping
A
FFT
C
pOoc: allows disk to be used as memory
Matlab
~1 GByte
RAM
•
FFT
pMatlab (N x GByte)
~1 GByte
RAM
~1 GByte
RAM
D
MULT
E
Optimal Mapping
Parallel Computer
•
B
Petascale pMatlab (N x TByte)
~1 GByte
RAM
~1 GByte
RAM
~1 TByte
RAID disk
~1 TByte
RAID disk
Performance (MFlops)
•
100
in core FFT
out of core FFT
75
50
25
0
10
100
1000
10000
Matrix Size (MBytes)
pMex: allows use of optimized parallel libraries (e.g. PVL)
pMatlab User Interface
Matlab*P Client/Server
Parallel Libraries:
PVL, ||VSIPL++, ScaLapack
Slide-26
Quicklook
pMex
dmat/ddens
translator
pMatlab Toolbox
Matlab math libraries
MIT Lincoln Laboratory
Summary
•
Airborne research platforms typically collect and process
data later
•
pMatlab, bladed clusters and high speed disks enable
parallel processing in the air
– Reduces execution time from hours to minutes
– Uses rapid prototyping environment required for research
•
Successfully demonstrated in LiMIT Boeing 707
– First ever in flight use of bladed clusters or parallel Matlab
•
Planned for continued use
– Real Time streaming of GMTI to other assets
•
Drives new requirements for pMatlab
– Expert mapping
– Parallel Out-of-Core
– pmex
Slide-27
Quicklook
MIT Lincoln Laboratory
Fly UP