Deployment of SAR and GMTI Signal Processing Bladed Linux Cluster
by user
Comments
Transcript
Deployment of SAR and GMTI Signal Processing Bladed Linux Cluster
Deployment of SAR and GMTI Signal Processing on a Boeing 707 Aircraft using pMatlab and a Bladed Linux Cluster Jeremy Kepner, Tim Currie, Hahn Kim, Bipin Mathew, Andrew McCabe, Michael Moore, Dan Rabinkin, Albert Reuther, Andrew Rhoades, Lou Tella and Nadya Travinin September 28, 2004 This work is sponsored by the Department of the Air Force under Air Force contract F19628-00-C-002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government. MIT Lincoln Laboratory Slide-1 Quicklook Outline • Introduction • System • Software • Results • Summary Slide-2 Quicklook • • • • LiMIT Technical Challenge pMatlab “QuickLook” Concept MIT Lincoln Laboratory LiMIT • Lincoln Multifunction Intelligence, Surveillance and Reconnaissance Testbed – Boeing 707 aircraft – Fully equipped with sensors and networking – Airborne research laboratory for development, testing, and evaluation of sensors and processing algorithms • Employs Standard Processing Model for Research Platform – Collect in the air/process on the ground Slide-3 Quicklook MIT Lincoln Laboratory Processing Challenge • Can we process radar data (SAR & GMTI) in flight and provide feedback on sensor performance in flight? • Requirements and Enablers – Record and playback data High speed RAID disk system – High speed network SGI RAID Disk Recorder – High density parallel computing Ruggedized bladed Linux cluster 14x2 CPU IBM Blade Cluster – Rapid algorithm development pMatlab Slide-4 Quicklook MIT Lincoln Laboratory pMatlab: Parallel Matlab Toolbox Goals •• Matlab Matlab speedup speedup through through transparent transparent parallelism parallelism •• Near-real-time Near-real-time rapid rapid prototyping prototyping High High Performance Performance Matlab Matlab Applications Applications DoD Sensor Processing Ballistic Ballistic Missile Missile Defense Defense Laser Laser Propagation Propagation Simulation Simulation Hyperspectral Hyperspectral Imaging Imaging Passive Passive Sonar Sonar Airborne Airborne Ground Ground Moving Moving Target Target Indicator Indicator (GMTI) (GMTI) •• Airborne Airborne Synthetic Synthetic Aperture Aperture Radar Radar (SAR) (SAR) Slide-5 Quicklook Scientific Simulation Matlab*P PVL Lab-Wide Lab-Wide Usage Usage •• •• •• •• •• DoD Decision Support MatlabMPI Commercial Applications User Interface Parallel Matlab Toolbox Hardware Interface Parallel Computing Hardware MIT Lincoln Laboratory “QuickLook” Concept 28 CPU Bladed Cluster Running pMatlab RAID Disk Recorder Data Files Analyst Workstation Running Matlab SAR GMTI … (new) Streaming Sensor Data MIT Lincoln Laboratory Slide-6 Quicklook Outline • Introduction • System • Software • Results • Summary Slide-7 Quicklook • ConOps • Ruggedization • Integration MIT Lincoln Laboratory Concept of Operations ~1 seconds = 1 dwell Timeline Record Streaming Data ~30 Seconds Copy to Bladed Cluster Process on Bladed Cluster 2 Dwell ~2 minutes 1st CPI ~ 2 minutes Process on SGI 2 Dwells ~1 hour To Other Systems RAID Disk Recorder Gbit Ethernet (1/4x RT rate) Split files, Copy w/rcp 600 MB/s (1x RT) Streaming Sensor Data Slide-8 Quicklook 1st CPI ~ 1 minutes Bladed Cluster Running pMatlab (1 TB local storage ~ 20 min data) Xwindows over Lan Analyst Workstation Running Matlab SAR GMTI … (new) • Net benefit: 2 Dwells in 2 minutes vs. 1 hour MIT Lincoln Laboratory Vibration Tests Tested only at operational (i.e. in-flight) levels: – – – • • 50 40 No Vibration 30 ~0.7G (-6dB) ~1.0G (-3dB) 20 1.4G (0dB) 10 0 32 ,4 54 ,5 33 4 30 4, 19 4, 8 28 4, 52 36 ,5 65 2 19 8, 4 02 1, 8 12 Tested in all 3 dimensions Ran MatlabMPI file based communication test up 14 CPUs/14 Hard drives Throughput decreases seen at 1.4 G 60 16 • 0dB = 1.4G (above normal) -3dB = ~1.0G (normal) -6dB = ~0.7G (below normal) X-axis, 13 CPU/13 HD Throughput (MBps) • Message Sizes (Bytes) Slide-9 Quicklook MIT Lincoln Laboratory Thermal Tests • Temperature ranges – – • Cooling tests – – – • Test range: -20°C to 40°C Bladecenter spec: 10°C to 35°C Successfully cooled to -10°C Failed at -20°C Cargo bay typically ≥ 0°C Heating tests – – – Slide-10 Quicklook Used duct to draw outside air to cool cluster inside oven Successfully heated to 40°C Outside air cooled cluster to 36°C MIT Lincoln Laboratory Mitigation Strategies • IBM Bladecenter is not designed for 707’s operational environment • Strategies to minimize risk of damage: 1. Power down during takeoff/ landing • • Avoids damage to hard drives Radar is also powered down 2. Construct duct to draw cabin air into cluster • • Slide-11 Quicklook Stabilizes cluster temperature Prevents condensation of cabin air moisture within cluster MIT Lincoln Laboratory Integration • P2 VP1 P1 P2 rcp VP2 VP1 VP2 … VP1 VP2 VP1 … P1 … NODE 1 NODE 14 IBM Bladed Cluster Nodes process CPIs in parallel, write results onto node 1’s disk. Node 1 processor performs final processing Results displayed locally Bladed Cluster Gigabit Connection SGI RAID System Scan catalog files, select dwells and CPIs to process (C/C shell) Assign dwells/CPIs to nodes, package up signature / aux data, one CPI per file. Transfer data from SGI to each processor’s disk (Matlab) SGI RAID VP2 pMatlab allows integration to occur while algorithm is being finalized Slide-12 Quicklook MIT Lincoln Laboratory Outline • Introduction • Hardware • Software • Results • Summary Slide-13 Quicklook • pMatlab architecture • GMTI • SAR MIT Lincoln Laboratory MatlabMPI & pMatlab Software Layers Application Vector/Matrix Vector/Matrix Parallel Library Output Analysis Input Comp Comp Conduit Task Library Library Layer Layer (pMatlab) (pMatlab) Kernel Kernel Layer Layer Messaging (MatlabMPI) Math (Matlab) User Interface Hardware Interface Parallel Hardware •• Can Can build build aa parallel parallel library library with with aa few few messaging messaging primitives primitives •• MatlabMPI MatlabMPI provides provides this this messaging messaging capability: capability: MMPI_Send(dest PI_Send(dest,co ,comm m,tag,X) m,tag,X);; XX == MPI_Recv(source,com MPI_Recv(source,com m,tag) m,tag);; Slide-14 Quicklook •• Can Can build build applications applications with with aa few few parallel parallel structures structures and and functions functions •• pMatlab pMatlab provides provides parallel parallel arrays arrays and and functions functions XX == ones(n ones(n,mapX); ,mapX); YY == zeros(n,mapY); zeros(n,mapY); Y(: Y(:,,::))==fffftt(X) (X);; MIT Lincoln Laboratory LiMIT GMTI Parallel Implementation GMTI Block Diagram DWELL Input SIG Data CPI (N/dwell, parallel) INS Process INS SIG Input Aux Data Input LOD Data EQC EQ SIG Range Walk Correction Recon Range Walk Correction Approach Deal out CPIs to different CPUs SIG AUX LOD EQC Subband EQ SIG LOD Subband LOD SUBBAND (1,12, or 48) PC Deal to Nodes Crab Correct Doppler Process Beam Resteer Correction Adaptive Beamform STAP CPI Detection Processing LOD Performance TIME/NODE/CPI TIME FOR ALL 28 CPIS Speedup ~100 sec ~200 sec ~14x INS Dwell Detect Processing • Display Demonstrates pMatlab in a large multi-stage application – • Angle/Param Est Geolocate ~13,000 lines of Matlab code Driving new pMatlab features – Parallel sparse matrices for targets (dynamic data sizes) Potential enabler for a whole new class of parallel algorithms Applying to DARPA HPCS GraphTheory and NSA benchmarks – – Slide-15 Quicklook Mapping functions for system integration Needs expert components! MIT Lincoln Laboratory GMTI pMatlab Implementation • GMTI pMatlab code fragment % Create distribution spec: b = block, c = cyclic. dist_spec(1).dist = 'b'; dist_spec(2).dist = 'c'; % Create Parallel Map. p Map = m ap([1 M A PPIN G.Ncpus],dist_spec,0:MAPPIN G.Ncpus-1); % Get local indices. [lind.dim _1_ind lind.dim_2_ind] = global_ind(zeros(1,C*D,pMap)); % loop over local part for index = 1:length(lind.dim_2_ind) ... end • pMatlab primarily used for determining which CPIs to work on – Slide-16 Quicklook CPIs dealt out using a cyclic distribution MIT Lincoln Laboratory LiMIT SAR SAR Block Diagram Collect Pulse Collect Cube 8 Real Samples @480MS/Sec • FFT 8 Select FFT Bins 8 8 ∑ (downsample) Complex Samples @180MS/Sec Equalization Coefficients Polar Histogram Remap & SAR Register w. Image Autofocus IMU Output Display Most complex pMatlab application built (at that time) – – • 8 Buffer A/D Buffer Chirp Coefficients ~4000 lines of Matlab code CornerTurns of ~1 GByte data cubes Drove new pMatlab features – Improving Corner turn performance Working with Mathworks to improve – Selection of submatrices Will be a key enabler for parallel linear algebra (LU, QR, …) – Large memory footprint applications Can the file system be used more effectively Slide-17 Quicklook MIT Lincoln Laboratory SAR pMatlab Implementation • SAR pMatlab code fragment % Create Parallel Maps. m apA = map([1 Ncpus],0:Ncpus-1); m apB = map([Ncpus 1],0:Ncpus-1); % Prepare distributed Matrices. fd_ midc=zeros(m w,Totalnu mPulses,mapA); fd_ midr=zeros(m w,Totalnu mPulses,mapB); % Corner Turn (columns to rows). fd_ midr(:,:) = fd_ midc; • Cornerturn Communication performed by overloaded ‘=‘ operator – – Slide-18 Quicklook Determines which pieces of matrix belongs where Executes appropriate MatlabMPI send commands MIT Lincoln Laboratory Outline • Introduction • Implementation • Results • Summary Slide-19 Quicklook • Scaling Results • Mission Results • Future Work MIT Lincoln Laboratory Parallel Speedup Parallel Performance 30 GMTI (1 per node) GMTI (2 per node) SAR (1 per node) Linear 20 10 0 0 Slide-20 Quicklook 10 20 Number of Processors MIT Lincoln Laboratory 30 SAR Parallel Performance Corner Turn bandwidth • • Slide-21 Quicklook Application memory requirements too large for 1 CPU • pMatlab a requirement for this application Corner Turn performance is limiting factor • Optimization efforts have improved time by 30% • Believe additional improvement is possible MIT Lincoln Laboratory July Mission Plan • Final Integration – Debug pMatlab on plane – Working ~1 week before mission (~1 week after first flight) – Development occurred during mission • Flight Plan – Two data collection flights – Flew a 50 km diameter box – Six GPS-instrumented vehicles Two 2.5T trucks Two CUCV's Two M577's Slide-22 Quicklook MIT Lincoln Laboratory July Mission Environment • Stressing desert environment Slide-23 Quicklook MIT Lincoln Laboratory July Mission GMTI results • GMTI successfully run on 707 in flight – – • Slide-24 Quicklook Target reports Range Doppler images Plans to use QuickLook for streaming processing in October mission MIT Lincoln Laboratory Embedded Computing Alternatives • Embedded Computer Systems – – – • Designed for embedded signal processing Advantages 1. Rugged - Certified Mil Spec 2. Lab has in-house experience Disadvantage 1. Proprietary OS ⇒ No Matlab Octave – – – Slide-25 Quicklook Matlab “clone” Advantage 1. MatlabMPI demonstrated using Octave on SKY computer hardware Disadvantages 1. Less functionality 2. Slower? 3. No object-oriented support ⇒ No pMatlab support ⇒ Greater coding effort MIT Lincoln Laboratory Petascale pMatlab pMapper: automatically finds best parallel mapping A FFT C pOoc: allows disk to be used as memory Matlab ~1 GByte RAM • FFT pMatlab (N x GByte) ~1 GByte RAM ~1 GByte RAM D MULT E Optimal Mapping Parallel Computer • B Petascale pMatlab (N x TByte) ~1 GByte RAM ~1 GByte RAM ~1 TByte RAID disk ~1 TByte RAID disk Performance (MFlops) • 100 in core FFT out of core FFT 75 50 25 0 10 100 1000 10000 Matrix Size (MBytes) pMex: allows use of optimized parallel libraries (e.g. PVL) pMatlab User Interface Matlab*P Client/Server Parallel Libraries: PVL, ||VSIPL++, ScaLapack Slide-26 Quicklook pMex dmat/ddens translator pMatlab Toolbox Matlab math libraries MIT Lincoln Laboratory Summary • Airborne research platforms typically collect and process data later • pMatlab, bladed clusters and high speed disks enable parallel processing in the air – Reduces execution time from hours to minutes – Uses rapid prototyping environment required for research • Successfully demonstrated in LiMIT Boeing 707 – First ever in flight use of bladed clusters or parallel Matlab • Planned for continued use – Real Time streaming of GMTI to other assets • Drives new requirements for pMatlab – Expert mapping – Parallel Out-of-Core – pmex Slide-27 Quicklook MIT Lincoln Laboratory