Project Context: ESL Design Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms
by user
Comments
Transcript
Project Context: ESL Design Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms
Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms Presentation at UNSW, 2 March 2016 Rainer Dömer, Guantao Liu, Tim Schmidt {doemer,guantaol,schmidtt}@uci.edu Center for Embedded and Cyber-Physical Systems University of California, Irvine Project Context: ESL Design • Electronic System Level Models – Abstract description of a complete system – Hardware + Software • Key Concepts in System Modeling SystemC Model – Explicit Structure • Block diagram structure • Connectivity through ports B0 B1 B2 B3 – Explicit Hierarchy • System composed of components – Explicit Concurrency • Potential for parallel execution • Potential for pipelined execution – Explicit Communication and Computation • Modules • Channels and Interfaces Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 2 1 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Project Context: ESL Design • Model Validation through Simulation! – Efficient system-level simulation is critical • Fast, and • Accurate! – Complexity of system models grows constantly • Need for speed! • Parallel Simulation! – Parallelism explicitly specified in model • System-level Description Language (SLDL) – SystemC [Groetker et. al, 2002]: SC_THREAD, SC_METHOD – SpecC [Gajski et. al, 2000]: par { }, pipe { } – Parallel processing available in standard PCs • Multi-core host PCs readily available • Many-core technology is arriving Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 3 Project Context: Related Work Modeling Techniques •Transaction-level modeling (TLM). •TLM temporal decoupling. •Savoiu et al. [MEMOCODE’05] •Razaghi et al.[ASPDAC’12] Discrete Event Simulation is slow Hardware-based Acceleration •Sirowy et al. [DAC’10] •Nanjundappa et al. [ASPDAC’10] •Sinha et al. [ASPDAC’12] Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS Distributed Simulation •Chandy et al. [TSE’79] •Huang et al. [SIES’08] •Chen et al. [CECS’11] SMP Parallel Simulation •Fujimoto. [CACM’90] •Chopard et al. [ICCS’06] •Ezudheen et al. [PADS’09] •Mello et al. [DATE’10] •Schumacher et al. [CODES’11] •Chen et al. [IEEED&T’11] •Yun et al. [TCAD’12] (c) 2016 R. Doemer, CECS 4 2 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Project with Intel: Key Points • Advanced Parallel SystemC Simulation – Out-of-Order PDES on many-core host platforms – Maximum compliance with current execution semantics – Support for parallel execution of virtual platforms • Introduction of a Dedicated SystemC Compiler – Recoding Infrastructure for SystemC (RISC) – Advanced static analysis for parallel execution – Model instrumentation and code generation • Parallel SystemC Core Library – Out-of-order parallel scheduler, multi-thread safe primitives – Many-core target platform (e.g. Intel® Xeon Phi™) • Open Source – Collaboration with Accellera SystemC Language WG Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 5 Outline • Advanced Parallel SystemC Simulation – Traditional Discrete Event Simulation (DES) – Parallel Discrete Event Simulation (PDES) – Out-of-Order Parallel Discrete Event Simulation (OoO PDES) • Project Overview – SystemC Compiler and Parallel Simulation Kernel • Recoding Infrastructure for SystemC (RISC), Segment Graph • Out-of-order parallel thread scheduling, many-core platforms – Demo and Experimental Results • Embedded application: Conceptual DVD player • Highly parallel application: Mandelbrot renderer – Prototype Implementation • Open source alpha release available • Concluding Remarks Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 6 3 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Out-of-Order PDES Technology SystemC Simulation must be Fast and Accurate! • Traditional Discrete Event Simulation (DES) – Reference simulators run sequentially, only one thread at a time (cooperative multi-threading model) – Cannot utilize the capabilities of multi- or many-core hosts • Parallel Discrete Event Simulation (PDES) – Threads run in parallel (if at the same delta cycle and time) – Simulation-cycles are absolute barriers! Out-of-order Parallel DE Simulation (OoO PDES) – Threads run in parallel and out-of-order [DATE’12, TCAD’14] even in different delta and time cycles if there are no conflicts! – Aggressive, runs maximum number of threads in parallel, but fully preserves DES semantics and model accuracy! Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 7 Discrete Event Simulation (DES) • Traditional DES th1 th2 th3 th4 – Concurrent threads of execution – Managed by a central scheduler – Driven by events and time advances T:Δ 0:0 10:0 10:1 • Delta-cycle • Time-cycle 10:2 Partial temporal order with barriers • Standard Simulator – SystemC reference simulator uses cooperative multi-threading A single thread is active at any time! Cannot exploit parallelism Cannot utilize multiple cores Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 20:0 20:1 20:2 30:0 (c) 2016 R. Doemer, CECS 8 4 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Parallel Discrete Event Simulation (PDES) • Parallel DES th1 th2 th3 th4 – Threads execute in parallel iff • in the same delta cycle, and • in the same time cycle T:Δ 0:0 10:0 10:1 Significant speed up! – Synchronous PDES: Cycle boundaries are absolute barriers! 10:2 • Aggressive Parallel DES 20:0 20:1 20:2 – Conservative Approaches • Careful static analysis prevents conflicts – Optimistic Approaches • Conflicts are detected and addressed (roll back) Out-of-Order Parallel SystemC, UNSW, 2 March 2016 30:0 (c) 2016 R. Doemer, CECS 9 Out-of-Order PDES Technology • Out-of-Order Parallel DES th1 th2 th3 th4 Breaks synchronization barrier! – Threads execute in parallel iff T:Δ 0:0 10:0 10:1 • in the same delta cycle, and • in the same time cycle, • OR if there are no conflicts! 10:2 Allows as many threads in parallel as possible Significantly higher speedup! • Results at [DATE’12], [IEEE TCAD’14] Advanced compiler fully preserves… 20:0 20:1 20:2 DES execution semantics Accuracy in results and timing 30:0 Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 10 5 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Out-of-Order PDES Technology • OoO PDES Key Ideas 1. Dedicated SystemC compiler with advanced model analysis Static conflict analysis based on Segment Graphs 2. Parallel simulator with out-of-order scheduling on many cores Fast decision making at run-time, optimized mapping • Fundamental Data Structure: Segment Graph – Key to semantics-compliant out-of-order execution [DATE’12] – Key to prediction of future thread state [DATE’13] • “Optimized Out-of-Order Parallel DE Simulation Using Predictions” – Key to May-Happen-in-Parallel Analysis [DATE’14] • “May-Happen-in-Parallel Analysis based on Segment Graphs for Safe ESL Models“ (Best Paper Award) – Journal publication: “OoO PDES for TLM” [IEEE TCAD’14] • Comprehensive article with HybridThreads extension Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 11 Project Overview and Tool Flow • Research and Development Tasks 1) Dedicated SystemC compiler (RISC compiler) Parallel 2) Parallel SystemC SystemC Headers simulator 3) Performance tuning Parallel for many‐core hosts SystemC 4) Virtual Platform (VP) Library integration Virtual Platform 5) Model analysis Library (may-happenVP in-parallel, MHP) Engine 6) Model recoding, VP-based transformation Prototyping and optimization Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS SystemC Model MHP MHP MHP Analysis Analysis Analysis Tools Tools Tools RISC RISC Recoding Recoding Recoding Tools Tools Tools RISC RISC RISC RISC Parallel C++ Model Reports Reports Reports Refined SystemC Model C++ Compiler Model Analysis Model Transformation and Optimization SystemC Compiler RISC Parallel Executable Parallel Simulation Many-Core Host Platform (c) 2016 R. Doemer, CECS 12 6 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Project Status after Year 1 of 3 • Research and Development Tasks Completed Y1 Dedicated SystemC compiler (RISC compiler) Parallel Y1 Parallel SystemC SystemC Headers simulator Y1 Performance tuning Parallel for many‐core hosts SystemC Y2 Virtual Platform (VP) Library integration Virtual Platform Y3 Model analysis Library (may-happenVP in-parallel, MHP) Engine Y3 Model recoding, VP-based transformation Prototyping and optimization SystemC Model MHP MHP MHP Analysis Analysis Analysis Tools Tools Tools RISC RISC Recoding Recoding Recoding Tools Tools Tools RISC RISC RISC RISC Parallel C++ Model Reports Reports Reports Refined SystemC Model C++ Compiler Model Analysis Model Transformation and Optimization SystemC Compiler RISC Parallel Executable Parallel Simulation Out-of-Order Parallel SystemC, UNSW, 2 March 2016 Many-Core Host Platform (c) 2016 R. Doemer, CECS 13 Dedicated SystemC Compiler • RISC Software Stack Recoding Infrastructure for SystemC – C/C++ foundation – ROSE compiler infrastructure RISC ROSE IR C/C++ Foundation • ROSE Internal Representation • Explicit support for • Source code analysis • Source-to-source transformations Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 14 7 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Dedicated SystemC Compiler RISC • RISC Software Stack Recoding Infrastructure for SystemC – SystemC Internal Representation SystemC IR ROSE IR C/C++ Foundation • Class hierarchy to represent SystemC objects Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 15 Dedicated SystemC Compiler RISC Segment Graph SystemC IR • RISC Software Stack Recoding Infrastructure for SystemC 1) Segment Graph construction 2) Segment conflict analysis SystemC Model Parallel C++ Model SystemC Compiler systemc.h Model.cpp ROSE IR C/C++ Foundation RISC Segment Graph Construction Parallel Access Conflict Analysis … Model _par.cpp Compilation, Simulation Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Step 1: Build a Segment Graph Seg 6 Segment Graph Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 16 8 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Dedicated SystemC Compiler • Segment Graph – Segment Graph is a directed graph Seg 1 • Nodes: Segments Code statements executed between two scheduling steps – Expression statements – Control flow statements (if, while, …) – Function calls Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 • Edges: Segment boundaries Primitives that trigger scheduler entry Segment Graph – wait(event) – wait(time) Segment Graph can be constructed statically by the compiler from the model source code • (see example on next slide) Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 17 Dedicated SystemC Compiler • Segment Graph Construction – Example: Source code and Segment Graph int a; if(cond) { int a; condition int b; int d; int e; int b; wait(1); int c; } else { int d; } int c; int e; int e; wait(2); int f; while(cond) { int f; condition int g; int h int g; } int h; Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 18 9 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Dedicated SystemC Compiler • Segment Graph Construction: – Support for straight-line code void straight() { x = 42; int xx = 43; int yy; yy; int o = y; Segment ID: 0 input_straight.cpp:24 (this) -> x = 42 input_straight.cpp:25 int xx = 43; input_straight.cpp:26 int yy; input_straight.cpp:27 yy input_straight.cpp:28 int o =(this) -> y; wait(10, SC_NS); Segment ID: 1 (input_straight.cpp:30) wait(); Segment ID: 2 (input_straight.cpp:32) int kk; input_straight.cpp:34 int kk; wait(); Segment ID: 3 (input_straight.cpp:37) int oo; } input_straight.cpp:39 int oo; Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 19 Dedicated SystemC Compiler • Segment Graph Construction: – Support for conditional statements if, if-else, switch-case (with break) Segment ID: 0 Segment ID: 1 (input_if_else.cpp:27) void if_statement() { wait(); int aaa; if(test) { int bbb; wait(); int ccc; } int ddd; wait(); int eee; } input_if_else.cpp:28 int aaa; compilerGenerated:0 (this) -> test input_if_else.cpp:30 int bbb; input_if_else.cpp:34 int ddd; Segment ID: 2 (input_if_else.cpp:31) input_if_else.cpp:32 int ccc; input_if_else.cpp:34 int ddd; Segment ID: 3 (input_if_else.cpp:35) input_if_else.cpp:36 int eee; Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 20 10 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Dedicated SystemC Compiler • Segment Graph Construction: – Support for loop statements while, do-while, for Segment ID: 0 Segment ID: 1 (input_while.cpp:13) input_while.cpp:14 int kk; void while_statement() compilerGenerated:0 (this) -> test { input_while.cpp:16 int aa; wait(); int kk; input_while.cpp:20 int cc; while(test) { int aa; wait(); Segment ID: 2 (input_while.cpp:17) int bb; input_while.cpp:18 int bb; } compilerGenerated:0 (this) -> test int cc; wait(); input_while.cpp:16 int aa; int dd; input_while.cpp:20 int cc; } Segment ID: 3 (input_while.cpp:21) input_while.cpp:22 int dd; Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 21 Dedicated SystemC Compiler • Segment Graph Construction: – Support for loop statements while, do-while, for (with break, continue) Segment ID: 0 input_while_continue.cpp:49 int kk; compilerGenerated:0 (this) -> test input_while_continue.cpp:51 int aa; input_while_continue.cpp:61 int dd; void while_continue_statement() { Segment ID: 1 (input_while_continue.cpp:52) int kk; input_while_continue.cpp:53 int bb; while(test){ compilerGenerated:0 (this) -> test1 input_while_continue.cpp:55 continue; int aa; input_while_continue.cpp:57 int oo; wait(); compilerGenerated:0 (this) -> test int bb; input_while_continue.cpp:51 int aa; if(test1) { input_while_continue.cpp:61 int dd; continue; } int oo; Segment ID: 2 (input_while_continue.cpp:58) wait(); input_while_continue.cpp:59 int cc; int cc; compilerGenerated:0 (this) ->; test input_while_continue.cpp:51 int aa; } input_while_continue.cpp:61 int dd; int dd; wait(); Segment ID: 3 (input_while_continue.cpp:62) } Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 22 11 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Dedicated SystemC Compiler • Segment Graph Construction: – Support for function calls f(x), return Segment ID: 0 input_function_calls.cpp:151 int aa; Segment ID: 1 (input_function_calls.cpp:152) void f() { int aa; wait(); int bb; g1(); int cc; wait(); int dd; } int g1() input_function_calls.cpp:153 int bb; { input_function_calls.cpp:154 (this) -> g1(); input_function_calls.cpp:162 int g_0; int g_0; wait(); Segment ID: 2 (input_function_calls.cpp:163) int g_1 = 33; input_function_calls.cpp:164 int g_1 = 33; if(g_1 == 88) { input_function_calls.cpp:166 g_1 == 88 int g_2; input_function_calls.cpp:167 int g_2; wait(); input_function_calls.cpp:173 int g_4; int g_3 = 44; return 43; Segment ID: 4 (input_function_calls.cpp:174) int DEAD_CODE; input_function_calls.cpp:175 int g_5; Segment ID: 3 (input_function_calls.cpp:168) } input_function_calls.cpp:169 int g_3 = 44; Segment ID: 5 (input_function_calls.cpp:176) int g_4; input_function_calls.cpp:170 43 input_function_calls.cpp:177 int g_6; wait(); input_function_calls.cpp:155 int cc; input_function_calls.cpp:178 int return_value = 2; int g_5; input_function_calls.cpp:179 return_value wait(); input_function_calls.cpp:155 int cc; int g_6; Segment ID: 6 (input_function_calls.cpp:156) int return_value = 2; input_function_calls.cpp:157 int dd; return return_value; } SystemC, UNSW, 2 March 2016 Out-of-Order Parallel (c) 2016 R. Doemer, CECS 23 Dedicated SystemC Compiler • Segment Graph Construction: Segment ID: 0 – Support for recursive function calls Direct, indirect recursion void main() { wait(); f(); wait(); } void g() { xx--; wait(); if(xx>0) { wait(); int before_rec; f(); int after_rec; wait(); } else { wait(); return; } } void f() { wait(); if(xx>0) { wait(); g(); wait(); } wait(); return; } input_recursive.cpp:152 (this) -> recursive1(); Segment ID: 2 (input_recursive.cpp:159) input_recursive.cpp:160 (this) -> xx > 0 Segment ID: 3 (input_recursive.cpp:161) input_recursive.cpp:162 (this) -> recursive2(); input_recursive.cpp:171 (this) -> xx-- Segment ID: 4 (input_recursive.cpp:172) input_recursive.cpp:173 (this) -> xx > 0 Segment ID: 5 (input_recursive.cpp:180) compilerGenerated:0 Segment ID: 6 (input_recursive.cpp:174) input_recursive.cpp:175 int before_rec; input_recursive.cpp:176 (this) -> recursive1() Segment ID: 7 (input_recursive.cpp:163) Segment ID: 8 (input_recursive.cpp:165) compilerGenerated:0 input_recursive.cpp:177 int after_rec; Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS Segment ID: 1 (input_recursive.cpp:151) Segment ID: 9 (input_recursive.cpp:178) Segment ID: 10 (input_recursive.cpp:153) (c) 2016 R. Doemer, CECS 24 12 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Dedicated SystemC Compiler RISC Segment Graph SystemC IR • RISC Software Stack Recoding Infrastructure for SystemC 1) Segment Graph construction 2) Segment conflict analysis SystemC Model ROSE IR C/C++ Foundation Parallel C++ Model SystemC Compiler systemc.h RISC Segment Graph Construction Model.cpp Parallel Access Conflict Analysis Model _par.cpp … Compilation, Simulation Instrumentation! Seg 2 R: a, b W: x RW: z Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 3 R: a, b W: x, y RW: Seg 6 Conflict Seg 1 Seg 1 True Seg 2 Seg 3 Step 2: Seg 2 True True Perform Conflict Seg 3 TrueAnalysis Segment Graph Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 25 Dedicated SystemC Compiler • Segment Conflict Analysis – Need to comply with SystemC LRM [IEEE Std 1666™] • Cooperative (or co-routine) multitasking semantics – “process instances execute without interruption” – System designer “can assume that a method process will execute in its entirety without interruption” A parallel implementation “would be obliged to analyze any dependencies between processes and constrain their execution to match the co-routine semantics.” – Must avoid race conditions when using shared variables! Prevent conflicting segments to be scheduled in parallel Seg 2 R: a, b W: x RW: z Seg 3 R: a, b W: x, y RW: Conflict Seg 1 Seg 1 True Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS Seg 2 Seg 3 Seg 2 True True Seg 3 True (c) 2016 R. Doemer, CECS 26 13 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Dedicated SystemC Compiler • Segment Conflict Analysis: – Variable access analysis for Read, Write, and Read/Write – Example: class Conflict: public sc_module { SC_CTOR(Conflict) { SC_THREAD(thread1); SC_THREAD(thread2); } int x, y, z; Segment ID: 0 conflict.cpp:34 int b = 2; conflict.cpp:25 a = 2 conflict.cpp:35 x = y Segment ID: 1 (conflict.cpp:26) conflict.cpp:27 a = x + y void thread1() { int a; a = 2; wait(); a = x + y; wait(); z++; } }; void thread2() { int b = 2; x = y; wait(); x = y * z; wait(); z++; wait(); x++; } Out-of-Order Parallel SystemC, UNSW, 2 March 2016 Segment ID: 3 conflict.cpp:24 int a; Segment ID: 2 (conflict.cpp:28) conflict.cpp:29 z++ Segment ID: 4 (conflict.cpp:36) conflict.cpp:37 x = y * z Segment ID: 5 (conflict.cpp:38) conflict.cpp:39 z++ Segment ID: 6 (conflict.cpp:40) conflict.cpp:41 x++ Segment Graph (c) 2016 R. Doemer, CECS 27 Dedicated SystemC Compiler • Segment Conflict Analysis: – Variable access analysis for Read, Write, and Read/Write – Example: Segment ID: 0 Segment ID: 3 conflict.cpp:24 int a; conflict.cpp:34 int b = 2; conflict.cpp:25 a = 2 conflict.cpp:35 x = y Segment ID: 1 (conflict.cpp:26) conflict.cpp:27 a = x + y Segment ID: 2 (conflict.cpp:28) conflict.cpp:29 z++ Segment ID: 4 (conflict.cpp:36) conflict.cpp:37 x = y * z Segment ID: 5 (conflict.cpp:38) conflict.cpp:39 z++ Segment ID: 6 (conflict.cpp:40) conflict.cpp:41 x++ Segment Variable Accesses Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS Segment Graph (c) 2016 R. Doemer, CECS 28 14 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Dedicated SystemC Compiler • Segment Conflict Analysis: – Variable access analysis for Read, Write, and Read/Write – Example: x x Segment Variable Accesses Segment Data Conflict Table Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 29 SystemC Compiler and Simulator • Compiler and Simulator work hand in hand! – Compiler performs conservative static analysis – Analysis results are passed to the simulator – Simulator can make safe scheduling decisions quickly Automatic Model Instrumentation Static analysis results are inserted into the source code Input Model SystemC Compiler systemc.h RISC … Model.cpp Parallel C++ Model Source Code Instrumentation systemc _par.h Model _par.cpp SystemC Simulator C++ Compiler Parallel Simulation Parallel SystemC Library Model Instrumentation: Segment and Instance IDs Segment Conflict Tables Time Advance Tables Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 30 15 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 SystemC Compiler and Simulator • Compiler and Simulator work hand in hand! – Compiler performs conservative static analysis – Analysis results are passed to the simulator – Simulator can make safe scheduling decisions quickly Automatic Model Instrumentation 1) Segment and instance IDs • Threads identified by creator instance and current code location 2) Data and event conflict tables • Segment concurrency hazards identified by fast table lookup (filtered for scope, instance path, references and port mapping) 3) Current and next time advance tables • Prediction of future thread states better scheduling decisions by looking ahead in time (future optimization) Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 31 Parallel SystemC Simulator • Simulator kernel with Out-of-Order Parallel Scheduler – Conceptual OoO PDES execution Issue Threads Issue threads… • truly in parallel and out-of-order • whenever they are ready • and will have no conflicts! Fast conflict table lookup Smart thread-to-core mapping (future optimization) Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 32 16 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Parallel SystemC Simulator • Protection of Inter-Thread Communication – Need to comply with SystemC LRM [IEEE Std 1666™] • Cooperative (or co-routine) multitasking semantics – Threads can assume execution “without interruption” – Must protect inter-thread communication in channels! • Primitive SystemC channels Static protection (special parallel SystemC headers, library) • User-defined hierarchical channels Dynamic protection through source code instrumentation Thread 1 Thread 2 Channel Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 33 Demo and Experimental Results • Interactive Demonstration – Two Application Examples • DVD player (conceptual) • Mandelbrot renderer (embarrassingly parallel) – Compilation • Static analysis based on segment graph • Conflict analysis and source code instrumentation – Simulation • Accellera reference library (Posix-based, sequential) • RISC simulator library (Posix-based, out-of-order parallel) Input Model systemc.h Model.cpp SystemC Compiler RISC Segment Graph Conflict Analysis Source Code Instrumentation systemc _par.h SystemC Simulator Model _par.cpp Parallel Simulation Parallel SystemC Library Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 34 17 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Example Model 1: DVD Player • DVD Player Example (conceptual) – Parallel video and audio decoding with different frame rates Multimedia input stream Stimulus 1: SC_MODULE(VideoCodec) 2: { sc_port<i_receiver> p1; 3: sc_port<i_sender> p2; 4: … 5: while(1){ 6: p1‐>receive(&inFrm); 7: outFrm = decode(inFrm); 8: wait(33330, SC_US); 9: p2‐>send(outFrm); 10: } 11: }; 1: SC_MODULE(AudioCodec) 2: { sc_port<i_receiver> p1; 3: sc_port<i_sender> p2; 4: … 5: while(1){ 6: p1‐>receive(&inFrm); 7: outFrm = decode(inFrm); 8: wait(26120, SC_US); 9: p2‐>send(outFrm); 10: } 11: }; DUT Video Codec Left Audio Codec Video Monitor Left Speaker Right Audio Codec DUT Video 30 FPS Right Speaker 2 Audio Channels 38.28 FPS Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 35 Example Model 1: DVD Player Stimulus • DVD Player Example (conceptual) – Parallel video and audio decoding with different frame rates DUT 1. Real time schedule: fully parallel 33.33 Video Frame 1 Left Right LF 1 RF 1 0 66.67 Frame 2 LF 2 RF 2 26.12 Left Audio Codec Video Monitor Left Speaker Right Audio Codec 100 Frame 3 LF 3 RF 3 52.25 Video Codec LF 4 RF 4 DUT Right Speaker 78.38 Time [ms] 2. Reference simulator schedule (DES) 33.33 Video 66.67 Frame 1 Left Right LF 1 LF 2 RF 1 0 26.12 Frame 3 LF 3 RF 2 Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 100 Frame 2 LF 4 RF 3 52.25 78.38 (c) 2016 R. Doemer, CECS Time [ms] … 36 18 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Example Model 1: DVD Player Stimulus • DVD Player Example (conceptual) – Parallel video and audio decoding with different frame rates DUT 1. Real time schedule: fully parallel 33.33 Video Frame 1 Left Right LF 1 RF 1 0 66.67 Frame 2 LF 2 RF 2 Left Audio Codec Video Monitor Left Speaker Right Audio Codec 100 Frame 3 LF 3 RF 3 52.25 26.12 Video Codec LF 4 RF 4 DUT Right Speaker 78.38 Time [ms] 3. Synchronous parallel schedule (PDES) 33.33 Video Frame 1 Left Right LF 1 RF 1 0 66.67 100 Frame 2 Frame 3 LF 2 RF 2 LF 3 RF 3 52.25 26.12 LF 4 RF 4 78.38 Time [ms] Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 37 Example Model 1: DVD Player Stimulus • DVD Player Example (conceptual) – Parallel video and audio decoding with different frame rates DUT 1. Real time schedule: fully parallel 33.33 Video Frame 1 Left Right LF 1 RF 1 0 66.67 Frame 2 LF 2 RF 2 Left Audio Codec Video Monitor Left Speaker Right Audio Codec 100 Frame 3 LF 3 RF 3 52.25 26.12 Video Codec LF 4 RF 4 DUT Right Speaker 78.38 Time [ms] 4. Out-of-order parallel schedule (OoO PDES) 33.33 Video Frame 1 Left Right LF 1 RF 1 0 66.67 Frame 2 LF 2 RF 2 26.12 LF 3 RF 3 52.25 100 Frame 3 LF 4 RF 4 78.38 Time [ms] Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 38 19 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Example Model 2: Mandelbrot • Mandelbrot Renderer (Graphics Pipeline Application) – Mandelbrot Set • Mathematical set of points in complex plane – Two-dimensional fractal shape • High computation load – Recursive/iterative function • Embarrassingly parallel – Parallelism at pixel level Top Platform – SystemC Model • • • • TLM abstraction Parallel slices Configurable Executable Out-of-Order Parallel SystemC, UNSW, 2 March 2016 DUT Coordinator Stimulus din dout M M M Monitor M (c) 2016 R. Doemer, CECS 39 Example Model 2: Mandelbrot • Mandelbrot Renderer (Graphics Pipeline Application) Simulated Graphics Demonstration (when network delays prevent actual graphical demo) Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 40 20 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Experimental Results • DVD Player Example (conceptual) Parallel video and audio decoding with different frame rates – Simulator run times on Intel® Xeon® multi-core host (‘delta’) (1 E3-1240 CPU, 3.4 GHz, 4 cores, 2 way hyper-threaded) – RISC V0.2.1, Posix-thread based comparison 10 sec stream 100 sec stream Seq Par OoO Run Time 6.98 s 4.67 s 2.94 s CPU Load 97% 145% 238% Speedup 1x 1.49 x 2.37 x Run Time 68.21 s 45.91 s 28.13 s CPU Load 100% 149% 251% Speedup 1x 1.49 x 2.42 x Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 41 Experimental Results • Mandelbrot Renderer Example Graphics Pipeline Application, embarrassingly parallel – Simulator run times on Intel® Xeon® multi-core host (‘phi’) (2 E5-2680 CPUs, 2.7 GHz, 8 cores, 2 way hyper-threaded) – RISC V0.2.1, Posix-thread based comparison Parallel Slices 1 2 4 8 16 32 64 128 256 DES Run CPU Time Load 162.13 s 99% 162.19 s 99% 162.56 s 99% 163.10 s 99% 164.01 s 99% 165.89 s 99% 170.32 s 99% 174.55 s 99% 185.47 s 100% Run Time 162.06 s 96.50 s 54.00 s 29.89 s 19.03 s 11.78 s 9.79 s 9.34 s 8.91 s Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS PDES OOO PDES CPU CPU Run Speedup Speedup Load Time Load 100% 1.00 x 161.90 s 100% 1.00 x 168% 1.68 x 96.48 s 168% 1.68 x 305% 3.01 x 53.85 s 304% 3.02 x 592% 5.46 x 30.05 s 589% 5.43 x 1050% 8.62 x 20.08 s 997% 8.17 x 2082% 14.08 x 11.99 s 2023% 13.84 x 2607% 17.40 x 9.85 s 2608% 17.29 x 2793% 18.69 x 9.39 s 2787% 18.59 x 2958% 20.82 x 8.90 s 2964% 20.84 x (c) 2016 R. Doemer, CECS 42 21 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Experimental Results • Mandelbrot Renderer Example Graphics Pipeline Application, embarrassingly parallel – Simulator run times on Intel® Many Integrated Core (MIC) Architecture Intel® Xeon Phi™ coprocessor • 5110P CPU at 1.052 GHz • 60 cores, 4 way hyper-threaded • Bidirectional ring interconnect, L2 chache • Appears as regular Linux machine with 240 cores! Experimental result: • Traditional DES vs. synchronous PDES (no conflicts): • Run time (seq/par): 226.57 sec / 5.50 sec 41x speedup! Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 43 RISC Compiler and Simulator Out-of-Order Parallel SystemC Compiler and Simulator • Open Source Prototype Implementation – Alpha Release V0.2.1 published October 30, 2015 • http://www.cecs.uci.edu/~doemer/risc.html Source tar ball: Installation: Doxygen documentation: Doxygen documentation: BSD license terms: risc_v0.2.1.tar.gz INSTALL, Makefile RISC API OoO Parallel SystemC API LICENSE • Downloads and feedback welcome! Code hardening Extension of supported parallel SystemC subset Standardization… Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 44 22 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 Concluding Remarks • Project on Advanced Parallel SystemC Simulation – Out-of-Order PDES on many-core host platforms – Maximum compliance with current execution semantics • SystemC Compiler Integrated with Parallel Simulator – Segment Graph based static analysis for parallel execution – Model instrumentation and protection of communication – Out-of-order parallel scheduler, many-core platform support • Open Source – RISC V0.2.1, working prototype implementation – Available at www.cecs.uci.edu/~doemer/risc.html • Ongoing and Future Work – Code hardening and virtual platform integration (i.e. Simics®) – Collaboration with Accellera SystemC Language WG Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 45 References • • • • • • • • [CECS-TR-15-02] G. Liu, T. Schmidt, R. Dömer: "RISC Compiler and Simulator, Alpha Release V0.2.1: Out-of-Order Parallel Simulatable SystemC Subset", Center for Embedded and Cyber-physical Systems, CECS, April 2015. [ASPDAC’15] G. Liu, T. Schmidt, R. Dömer, A. Dingankar, D. Kirkpatrick: "Optimizing Thread-to-Core Mapping on Manycore Platforms with Distributed Tag Directories", Proceedings of ASPDAC, Tokyo, Japan, January 2015. [IEEE TCAD’14] W. Chen, X. Han, C. Chang, G. Liu, R. Dömer: "Out-of-Order Parallel Discrete Event Simulation for Transaction Level Models", IEEE Transactions on CAD, vol. 33, no. 12, pp. 1859-1872, December 2014. [DATE’14] W. Chen, X. Han, R. Dömer: "May-Happen-in-Parallel Analysis based on Segment Graphs for Safe ESL Models", Proceedings of DATE, Dresden, Germany, March 2014. (Best Paper Award!) [DATE’13] W. Chen, R. Dömer: "Optimized Out-of-Order Parallel Discrete Event Simulation Using Predictions", Proceedings of DATE, Grenoble, France, March 2013. [IEEE D&T’13] W. Chen, X. Han, C. Chang, R. Dömer: "Advances in Parallel Discrete Event Simulation for Electronic System-Level Design", IEEE Design & Test of Computers, vol. 30, no. 1, pp. 45-54, Jan.-Feb. 2013. [DATE’12] W. Chen, X. Han, R. Dömer: "Out-of-Order Parallel Simulation for ESL Design", Proceedings of DATE, Dresden, Germany, March 2012. [ASPDAC’12] W. Chen, R. Dömer: "An Optimizing Compiler for Out-of-Order Parallel ESL Simulation Exploiting Instance Isolation", Proceedings of ASPDAC, Sydney, Australia, February 2012. Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS (c) 2016 R. Doemer, CECS 46 23