An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm Wang Chen
by user
Comments
Transcript
An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm Wang Chen
An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm Wang Chen Panos Kosmas Miriam Leeser Carey Rappaport Northeastern University Boston, MA 1 FDTD Algorithm and Implementation ? Finite Difference Time-Domain ?Method for solving Maxwell’s equations ?Used for buried object detection ? Hardware Implementation ?3D to 2D model simplification ?Data dependency analysis ?Fixed-point quantization 2 Finite-Difference Time-Domain Method ?A direct time-domain Maxwell’s Equations solution of Maxwell's equations ?Accurate and flexible for solving electromagnetic problems ?Discretize time and electromagnetic space 3 FDTD Method (cont’d) Yee Cell Taylor Series Expansion ?Y Z-Axis ?X Ey Hz Ex Ex Ey Ez ?Z Ez Ez Hx (i,j,k) xi s X-A Hy (i,j+ 1/2,k+ 1/2) Ey One FDTD Equation Ex Y-Axis Adjacent Cells 4 FDTD Applications ? Antenna Design ? Discrete Scattering Studies ? Medical Studies ?The study of the cell phone electromagnetic waves' effect on human brain ?The study of breast cancer detection using electromagnetic antenna 5 Buried Object Detection Forward Model Initialization Buried Object Detection Model Space Excitation Z Calculate E Field n=n+1 t=n Transmitting Antenna Receiving Antenna Exterior Boundary Conditions t = n + 0.5 Calculate H Field Time over? Yes No, Go to Next Time Step Object X Mine Y End 6 FDTD Simulated Model Space 7 FDTD Simulated Model Space (cont’d) 8 Related Work ? ? Software acceleration of FDTD ?Parallel computers do not provide significant speedup FPGA implementations of FDTD ?1D FDTD on hardware: architecture is too simple ?Full 3D FDTD on hardware developed at UDel ? Design is slower than software: uses complex floating-point representation ? no parallelism or pipelining ? ? Our 2D FDTD hardware implementation ? 24 times speedup compare to 3.0G PC: ? ? fixed-point representation expandable structure 9 3D to 2D Model Simplification Initialization Initialization ? Initialize parameters of model space and time step ? Build parameters of soil and buried object ? Load all the EM space data into memory ? Initialize parameters of model space and time step ? Build parameters of soil and buried object ? Load all the EM space data into memory Z Simplify Transmitting Antenna Receiving Antenna Excitation Excitation Mine Calculate E Field Calculate E Field ? Update Eys field ? Update Exs field ? Update Eys field ? Update Ezs field t=n Y t=n Z Exterior Boundary Conditions Boundary of EYX Boundary of EZX Boundary of EZY X n=n+1 Boundary of EXY Boundary of EXZ Boundary of EYZ Exterior Boundary Conditions Receiving Antenna Boundary of EYX Transmitting Antenna t = n + 0.5 Calculate H Field n=n+1 Boundary of EYZ t = n + 0.5 Calculate H Field ? Update Hxs field ? Update Hzs field Mine X ? Update Hxs field ? Update Hys field ? Update Hzs field Time over? Y Yes No, Go to Next Time Step End Time over? Yes No, Go to Next Time Step End 10 Exterior Boundary Conditions Mur-type Absorbing Boundary Condition 3D Model Space 6 Faces and 12 Edges 2D Model Space 4 Edges 11 Data Dependency Analysis Initialization ? Initialize parameters of model space and time step ? Build parameters of soil and buried object ? Load all the EM space data into memory T-3 B A T-2 B A T-1 Excitation B A Mine Memory Space for Electric Field Data 2 Rows Calculate Hzs Field B A B A N ce lls Sequence of t he process in g B A Calculate Hxs Field M cells T Calculate Eys Field Time step n=n+1 Exterior Boundary Conditions Memory Space for Magnetic Field Data Boundary of EYX Boundary of EYZ Time over? Yes No, Go to Next Time Step End 12 Hardware Acceleration ? Smart memory interface ? Parallelism ? Pipelining ? Quantized fixed point representation ?Less area in datapath -- more parallelism ?Careful error analysis to ensure accurate results S A AA . B BBBBBBBBBBBBBBBBBBBBBBBBB 2 … 0 3 bits -1 …… -26 26 bits 13 Fixed-point quantization Average relative error (%) 2 1.8 1.6 1.4 Electric Field Value at R1 1.2 Electric Field Value at R2 1 Magnetic Field Value at R1 0.8 Magnetic Field Value at R2 0.6 Source Data 0.4 0.2 0 24bits 25bits 26bits Bit-width 27bits 28bits 14 Design Flow 15 Firebird FPGA Board from Annapolis ? ? ? ? ? A Xilinx VIRTEX-E XCV2000E with 2.5 million system gates Processing clock up to 150MHz FDTD runs at 70 MHz Five independent memory banks (4 x 64-bit, 1 x 32-bit) 288Mbytes in total 6.6Gbytes/sec of memory bandwidth 3Gbytes/sec of I/O bandwidth Utilization of Xilinx XCV2000E FPGA Chip Slices BlockRAM Number Available 19200 160 Number Used 8837 86 Percentage Used 46% 54% 16 FDTD on Firebird Board Simulated Electromagnetic Space DESIGN Memory Interface On-Board MEMORY Electric Field Pipeline Module Magnetic Field Pipeline Module Boundary Conditions Module On-Board MEMORY Memory in PC PCI BUS Memory in PC PC HOST FPGA FIREBIRD BOARD 17 Memory Interface HXS HZS 0 EYS HXS HZS 1 D HZS EYS EYS HXS HXS HZS 0 HZS 3 EYS HXS HZS EYS EYS HXS HXS HZS 0 HZS 3 EYS HXS HZS HZS 2 C B 1 R 2 EYS EYS HXS HXS HZS 0 HZS 3 EYS HXS HZS 1 2 HXS HZS 1 EYS HXS HZS 2 EYS EYS HXS HXS HZS 0 HZS 3 EYS HXS HZS 1 EYS HXS HZS 2 EYS EYS HXS HXS HZS HZS 0 EYS HXS HZS 1 EYS HXS HZS 2 EYS EYS HXS HXS HZS HZS 0 EYS HXS HZS 1 A EYS HXS HZS 2 EYS HXS HZS 3 3 es ul t Boundar y HXS Pi p eli n ed EYS HZS EYS C B HXS 0 DESIGN P ipe lin ed HXS EYS HZS 1 EYS HZS HXS D 2 Ele ctr ical Fie ld M o du le O N-BOARD MEMOR IES ? ??? HXS M ag n et ic F ield Mod ul e EYS EYS A ???? O N-BOARD MEMOR IES EYS Result 3 Res ult EYS HXS HZS 3 Input BlockRAMs Ouput BlockRAMs FPGA CHIP 18 Pipelining and Parallelism Read Read Read Hxs_A Hzs_A Eys_A Read Read DTin_1 Hxs_B Hzs_B Read Read Read Read EXS EysCo Hxs_C EysDo Hzs_C Read Read Ezs DTin_1 EXS EysBo EysBo Ezs DTin_1 DTin_2 - - DTin_2 - - DTin_2 - - 0 1 2 3 x x x x x x 4 5 6 7 - + + 8 + - - Write to Eys Write to Hxs Write to Hzs 9 19 Data Flow Electric Field On-board Memory ? Magnetic Field On-board Memory ? Source Data On-board Memory ? Pipeline Hzs BlockRam Pipeline Boundary Pipeline Hxs P ipeline E ys Memory Interface Module BlockRam Memory Interface Module BlockRam Electric Field On-board Memory ? Source Adder BlockRam Magnetic Field On-board Memory ? 20 Results and Performance Executing Tim e (Second) Performance Result 25 A Software Floating-point ~~ 25s Fortran code at 440 MHz Sun Workstation B Software Fixed-point 20 ~~ 3.375s C code at 3.0 GHz PC 15 C Hardware ~~ 0.145s Design working at 70MHz 10 5 0 A B C Model space 100*100 cells Iterate 200 time steps 21 Conclusions ? FPGA Implementation of FDTD exhibits significant speedup compared to software: 24 times faster than 3GHz PC ? With larger FPGA, more parallelism will be available, hence more speedup ? Current design easily extendible to handle multiple types of materials, 3D space 22 Future Work ? ? Upgrade curent design to handle multiple types of materials Upgrade to 3D model space ?Add three more field updating algorithms: same structure as the original three algorithms ?Upgrade boundary condition updating algorithm ?Redesign memory interface ? Apply FDTD Hardware to other applications 23