...

Project Context: ESL Design Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

by user

on
Category: Documents
17

views

Report

Comments

Transcript

Project Context: ESL Design Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Out-of-Order Parallel Simulation
of SystemC Models
on Many-Core Platforms
Presentation at UNSW, 2 March 2016
Rainer Dömer, Guantao Liu, Tim Schmidt
{doemer,guantaol,schmidtt}@uci.edu
Center for Embedded and Cyber-Physical Systems
University of California, Irvine
Project Context: ESL Design
• Electronic System Level Models
– Abstract description of a complete system
– Hardware + Software
• Key Concepts in System Modeling
SystemC Model
– Explicit Structure
• Block diagram structure
• Connectivity through ports
B0
B1
B2
B3
– Explicit Hierarchy
• System composed of components
– Explicit Concurrency
• Potential for parallel execution
• Potential for pipelined execution
– Explicit Communication and Computation
• Modules
• Channels and Interfaces
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
2
1
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Project Context: ESL Design
• Model Validation through Simulation!
– Efficient system-level simulation is critical
• Fast, and
• Accurate!
– Complexity of system models grows constantly
• Need for speed!
• Parallel Simulation!
– Parallelism explicitly specified in model
• System-level Description Language (SLDL)
– SystemC [Groetker et. al, 2002]: SC_THREAD, SC_METHOD
– SpecC [Gajski et. al, 2000]:
par { }, pipe { }
– Parallel processing available in standard PCs
• Multi-core host PCs readily available
• Many-core technology is arriving
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
3
Project Context: Related Work
Modeling Techniques
•Transaction-level modeling (TLM).
•TLM temporal decoupling.
•Savoiu et al. [MEMOCODE’05]
•Razaghi et al.[ASPDAC’12]
Discrete Event
Simulation is slow
Hardware-based Acceleration
•Sirowy et al. [DAC’10]
•Nanjundappa et al. [ASPDAC’10]
•Sinha et al. [ASPDAC’12]
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
Distributed Simulation
•Chandy et al. [TSE’79]
•Huang et al. [SIES’08]
•Chen et al. [CECS’11]
SMP Parallel Simulation
•Fujimoto. [CACM’90]
•Chopard et al. [ICCS’06]
•Ezudheen et al. [PADS’09]
•Mello et al. [DATE’10]
•Schumacher et al. [CODES’11]
•Chen et al. [IEEED&T’11]
•Yun et al. [TCAD’12]
(c) 2016 R. Doemer, CECS
4
2
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Project with Intel: Key Points
• Advanced Parallel SystemC Simulation
– Out-of-Order PDES on many-core host platforms
– Maximum compliance with current execution semantics
– Support for parallel execution of virtual platforms
• Introduction of a Dedicated SystemC Compiler
– Recoding Infrastructure for SystemC (RISC)
– Advanced static analysis for parallel execution
– Model instrumentation and code generation
• Parallel SystemC Core Library
– Out-of-order parallel scheduler, multi-thread safe primitives
– Many-core target platform (e.g. Intel® Xeon Phi™)
• Open Source
– Collaboration with Accellera SystemC Language WG
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
5
Outline
• Advanced Parallel SystemC Simulation
– Traditional Discrete Event Simulation (DES)
– Parallel Discrete Event Simulation (PDES)
– Out-of-Order Parallel Discrete Event Simulation (OoO PDES)
• Project Overview
– SystemC Compiler and Parallel Simulation Kernel
• Recoding Infrastructure for SystemC (RISC), Segment Graph
• Out-of-order parallel thread scheduling, many-core platforms
– Demo and Experimental Results
• Embedded application: Conceptual DVD player
• Highly parallel application: Mandelbrot renderer
– Prototype Implementation
• Open source alpha release available
• Concluding Remarks
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
6
3
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Out-of-Order PDES Technology
 SystemC Simulation must be Fast and Accurate!
• Traditional Discrete Event Simulation (DES)
– Reference simulators run sequentially, only one thread at a time
(cooperative multi-threading model)
– Cannot utilize the capabilities of multi- or many-core hosts
• Parallel Discrete Event Simulation (PDES)
– Threads run in parallel (if at the same delta cycle and time)
– Simulation-cycles are absolute barriers!
 Out-of-order Parallel DE Simulation (OoO PDES)
– Threads run in parallel and out-of-order [DATE’12, TCAD’14]
even in different delta and time cycles if there are no conflicts!
– Aggressive, runs maximum number of threads in parallel,
but fully preserves DES semantics and model accuracy!
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
7
Discrete Event Simulation (DES)
• Traditional DES
th1 th2 th3 th4
– Concurrent threads of execution
– Managed by a central scheduler
– Driven by events and time advances
T:Δ
0:0
10:0
10:1
• Delta-cycle
• Time-cycle
10:2
 Partial temporal order with barriers
• Standard Simulator
– SystemC reference simulator
uses cooperative multi-threading
 A single thread is active at any time!
 Cannot exploit parallelism
 Cannot utilize multiple cores
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
20:0
20:1
20:2
30:0
(c) 2016 R. Doemer, CECS
8
4
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Parallel Discrete Event Simulation (PDES)
• Parallel DES
th1 th2 th3 th4
– Threads execute in parallel iff
• in the same delta cycle, and
• in the same time cycle
T:Δ
0:0
10:0
10:1
 Significant speed up!
– Synchronous PDES:
Cycle boundaries are
absolute barriers!
10:2
• Aggressive Parallel DES
20:0
20:1
20:2
– Conservative Approaches
• Careful static analysis prevents conflicts
– Optimistic Approaches
• Conflicts are detected and addressed
(roll back)
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
30:0
(c) 2016 R. Doemer, CECS
9
Out-of-Order PDES Technology
• Out-of-Order Parallel DES
th1 th2 th3 th4
 Breaks synchronization barrier!
– Threads execute in parallel iff
T:Δ
0:0
10:0
10:1
• in the same delta cycle, and
• in the same time cycle,
• OR if there are no conflicts!
10:2
 Allows as many threads in parallel
as possible
 Significantly higher speedup!
• Results at [DATE’12], [IEEE TCAD’14]
 Advanced compiler fully preserves…
20:0
20:1
20:2
 DES execution semantics
 Accuracy in results and timing
30:0
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
10
5
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Out-of-Order PDES Technology
• OoO PDES Key Ideas
1. Dedicated SystemC compiler with advanced model analysis
 Static conflict analysis based on Segment Graphs
2.
Parallel simulator with out-of-order scheduling on many cores
 Fast decision making at run-time, optimized mapping
• Fundamental Data Structure: Segment Graph
– Key to semantics-compliant out-of-order execution [DATE’12]
– Key to prediction of future thread state [DATE’13]
• “Optimized Out-of-Order Parallel DE Simulation Using Predictions”
– Key to May-Happen-in-Parallel Analysis [DATE’14]
• “May-Happen-in-Parallel Analysis based on Segment Graphs
for Safe ESL Models“ (Best Paper Award)
– Journal publication: “OoO PDES for TLM” [IEEE TCAD’14]
• Comprehensive article with HybridThreads extension
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
11
Project Overview and Tool Flow
• Research and Development Tasks
1) Dedicated SystemC compiler
(RISC compiler)
Parallel
2) Parallel SystemC
SystemC
Headers
simulator
3) Performance tuning
Parallel
for many‐core hosts
SystemC
4) Virtual Platform (VP) Library
integration
Virtual
Platform
5) Model analysis
Library
(may-happenVP
in-parallel, MHP)
Engine
6) Model recoding,
VP-based
transformation
Prototyping
and optimization
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
SystemC
Model
MHP
MHP
MHP
Analysis
Analysis
Analysis
Tools
Tools
Tools
RISC
RISC
Recoding
Recoding
Recoding
Tools
Tools
Tools
RISC
RISC
RISC
RISC
Parallel
C++ Model
Reports
Reports
Reports
Refined
SystemC
Model
C++ Compiler
Model
Analysis
Model
Transformation
and
Optimization
SystemC
Compiler
RISC
Parallel
Executable
Parallel
Simulation
Many-Core
Host Platform
(c) 2016 R. Doemer, CECS
12
6
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Project Status after Year 1 of 3
• Research and Development Tasks Completed
Y1 Dedicated SystemC compiler
(RISC compiler)
Parallel
Y1 Parallel SystemC
SystemC
Headers
simulator
Y1 Performance tuning
Parallel
for many‐core hosts
SystemC
Y2 Virtual Platform (VP) Library
integration
Virtual
Platform
Y3 Model analysis
Library
(may-happenVP
in-parallel, MHP)
Engine
Y3 Model recoding,
VP-based
transformation
Prototyping
and optimization
SystemC
Model
MHP
MHP
MHP
Analysis
Analysis
Analysis
Tools
Tools
Tools
RISC
RISC
Recoding
Recoding
Recoding
Tools
Tools
Tools
RISC
RISC
RISC
RISC
Parallel
C++ Model
Reports
Reports
Reports
Refined
SystemC
Model
C++ Compiler
Model
Analysis
Model
Transformation
and
Optimization
SystemC
Compiler
RISC
Parallel
Executable
Parallel
Simulation
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
Many-Core
Host Platform
(c) 2016 R. Doemer, CECS
13
Dedicated SystemC Compiler
• RISC Software Stack
 Recoding Infrastructure for SystemC
– C/C++ foundation
– ROSE compiler infrastructure
RISC
ROSE IR
C/C++ Foundation
• ROSE Internal Representation
• Explicit support for
• Source code analysis
• Source-to-source
transformations
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
14
7
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Dedicated SystemC Compiler
RISC
• RISC Software Stack
 Recoding Infrastructure for SystemC
– SystemC Internal
Representation
SystemC IR
ROSE IR
C/C++ Foundation
• Class hierarchy to represent
SystemC objects
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
15
Dedicated SystemC Compiler
RISC
Segment Graph
SystemC IR
• RISC Software Stack
 Recoding Infrastructure for SystemC
1) Segment Graph construction
2) Segment conflict analysis
SystemC Model
Parallel
C++ Model
SystemC Compiler
systemc.h
Model.cpp
ROSE IR
C/C++ Foundation
RISC
Segment Graph
Construction
Parallel Access
Conflict Analysis
…
Model
_par.cpp
Compilation,
Simulation
Seg 1
Seg 2
Seg 3
Seg 4
Seg 5
Step 1: Build a Segment Graph
Seg 6
Segment Graph
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
16
8
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Dedicated SystemC Compiler
• Segment Graph
– Segment Graph is a directed graph
Seg 1
• Nodes: Segments
 Code statements executed
between two scheduling steps
– Expression statements
– Control flow statements (if, while, …)
– Function calls
Seg 2
Seg 3
Seg 4
Seg 5
Seg 6
• Edges: Segment boundaries
 Primitives that trigger scheduler entry
Segment Graph
– wait(event)
– wait(time)
 Segment Graph can be constructed statically
by the compiler from the model source code
• (see example on next slide)
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
17
Dedicated SystemC Compiler
• Segment Graph Construction
– Example: Source code
and
Segment Graph
int a;
if(cond) {
int a;
condition
int b;
int d;
int e;
int b;
wait(1);
int c;
} else {
int d;
}
int c;
int e;
int e;
wait(2);
int f;
while(cond) {
int f;
condition
int g;
int h
int g;
}
int h;
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
18
9
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Dedicated SystemC Compiler
• Segment Graph Construction:
– Support for straight-line code
void straight()
{
x = 42;
int xx = 43;
int yy;
yy;
int o = y;
Segment ID: 0
input_straight.cpp:24 (this) -> x = 42
input_straight.cpp:25 int xx = 43;
input_straight.cpp:26 int yy;
input_straight.cpp:27 yy
input_straight.cpp:28 int o =(this) -> y;
wait(10, SC_NS);
Segment ID: 1 (input_straight.cpp:30)
wait();
Segment ID: 2 (input_straight.cpp:32)
int kk;
input_straight.cpp:34 int kk;
wait();
Segment ID: 3 (input_straight.cpp:37)
int oo;
}
input_straight.cpp:39 int oo;
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
19
Dedicated SystemC Compiler
• Segment Graph Construction:
– Support for conditional statements
 if, if-else, switch-case
(with break)
Segment ID: 0
Segment ID: 1 (input_if_else.cpp:27)
void if_statement()
{
wait();
int aaa;
if(test) {
int bbb;
wait();
int ccc;
}
int ddd;
wait();
int eee;
}
input_if_else.cpp:28 int aaa;
compilerGenerated:0 (this) -> test
input_if_else.cpp:30 int bbb;
input_if_else.cpp:34 int ddd;
Segment ID: 2 (input_if_else.cpp:31)
input_if_else.cpp:32 int ccc;
input_if_else.cpp:34 int ddd;
Segment ID: 3 (input_if_else.cpp:35)
input_if_else.cpp:36 int eee;
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
20
10
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Dedicated SystemC Compiler
• Segment Graph Construction:
– Support for loop statements
 while, do-while, for
Segment ID: 0
Segment ID: 1 (input_while.cpp:13)
input_while.cpp:14 int kk;
void while_statement()
compilerGenerated:0 (this) -> test
{
input_while.cpp:16 int aa;
wait();
int kk;
input_while.cpp:20 int cc;
while(test) {
int aa;
wait();
Segment ID: 2 (input_while.cpp:17)
int bb;
input_while.cpp:18 int bb;
}
compilerGenerated:0 (this) -> test
int cc;
wait();
input_while.cpp:16 int aa;
int dd;
input_while.cpp:20 int cc;
}
Segment ID: 3 (input_while.cpp:21)
input_while.cpp:22 int dd;
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
21
Dedicated SystemC Compiler
• Segment Graph Construction:
– Support for loop statements
 while, do-while, for
(with break, continue)
Segment ID: 0
input_while_continue.cpp:49 int kk;
compilerGenerated:0 (this) -> test
input_while_continue.cpp:51 int aa;
input_while_continue.cpp:61 int dd;
void while_continue_statement()
{
Segment ID: 1 (input_while_continue.cpp:52)
int kk;
input_while_continue.cpp:53 int bb;
while(test){
compilerGenerated:0 (this) -> test1
input_while_continue.cpp:55 continue;
int aa;
input_while_continue.cpp:57 int oo;
wait();
compilerGenerated:0 (this) -> test
int bb;
input_while_continue.cpp:51 int aa;
if(test1) {
input_while_continue.cpp:61 int dd;
continue;
}
int oo;
Segment ID: 2 (input_while_continue.cpp:58)
wait();
input_while_continue.cpp:59 int cc;
int cc;
compilerGenerated:0 (this) ->; test
input_while_continue.cpp:51 int aa;
}
input_while_continue.cpp:61 int dd;
int dd;
wait();
Segment ID: 3 (input_while_continue.cpp:62)
}
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
22
11
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Dedicated SystemC Compiler
• Segment Graph Construction:
– Support for function calls
 f(x), return
Segment ID: 0
input_function_calls.cpp:151 int aa;
Segment ID: 1 (input_function_calls.cpp:152)
void f()
{
int aa;
wait();
int bb;
g1();
int cc;
wait();
int dd;
}
int g1()
input_function_calls.cpp:153 int bb;
{
input_function_calls.cpp:154 (this) -> g1();
input_function_calls.cpp:162 int g_0;
int g_0;
wait();
Segment ID: 2 (input_function_calls.cpp:163)
int g_1 = 33;
input_function_calls.cpp:164 int g_1 = 33;
if(g_1 == 88) {
input_function_calls.cpp:166 g_1 == 88
int g_2;
input_function_calls.cpp:167 int g_2;
wait();
input_function_calls.cpp:173 int g_4;
int g_3 = 44;
return 43;
Segment ID: 4 (input_function_calls.cpp:174)
int DEAD_CODE;
input_function_calls.cpp:175 int g_5;
Segment ID: 3 (input_function_calls.cpp:168)
}
input_function_calls.cpp:169 int g_3 = 44;
Segment ID: 5 (input_function_calls.cpp:176)
int g_4;
input_function_calls.cpp:170 43
input_function_calls.cpp:177 int g_6;
wait();
input_function_calls.cpp:155 int cc;
input_function_calls.cpp:178 int return_value = 2;
int g_5;
input_function_calls.cpp:179 return_value
wait();
input_function_calls.cpp:155 int cc;
int g_6;
Segment ID: 6 (input_function_calls.cpp:156)
int return_value = 2;
input_function_calls.cpp:157 int dd;
return return_value;
} SystemC, UNSW, 2 March 2016
Out-of-Order Parallel
(c) 2016 R. Doemer, CECS
23
Dedicated SystemC Compiler
• Segment Graph Construction:
Segment ID: 0
– Support for recursive function calls
 Direct, indirect recursion
void main()
{ wait();
f();
wait();
}
void g()
{ xx--;
wait();
if(xx>0) {
wait();
int before_rec;
f();
int after_rec;
wait();
} else {
wait();
return;
}
}
void f()
{ wait();
if(xx>0) {
wait();
g();
wait();
}
wait();
return;
}
input_recursive.cpp:152 (this) -> recursive1();
Segment ID: 2 (input_recursive.cpp:159)
input_recursive.cpp:160 (this) -> xx > 0
Segment ID: 3 (input_recursive.cpp:161)
input_recursive.cpp:162 (this) -> recursive2();
input_recursive.cpp:171 (this) -> xx--
Segment ID: 4 (input_recursive.cpp:172)
input_recursive.cpp:173 (this) -> xx > 0
Segment ID: 5 (input_recursive.cpp:180)
compilerGenerated:0
Segment ID: 6 (input_recursive.cpp:174)
input_recursive.cpp:175 int before_rec;
input_recursive.cpp:176 (this) -> recursive1()
Segment ID: 7 (input_recursive.cpp:163)
Segment ID: 8 (input_recursive.cpp:165)
compilerGenerated:0
input_recursive.cpp:177 int after_rec;
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
Segment ID: 1 (input_recursive.cpp:151)
Segment ID: 9 (input_recursive.cpp:178)
Segment ID: 10 (input_recursive.cpp:153)
(c) 2016 R. Doemer, CECS
24
12
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Dedicated SystemC Compiler
RISC
Segment Graph
SystemC IR
• RISC Software Stack
 Recoding Infrastructure for SystemC
1) Segment Graph construction
2) Segment conflict analysis
SystemC Model
ROSE IR
C/C++ Foundation
Parallel
C++ Model
SystemC Compiler
systemc.h
RISC
Segment Graph
Construction
Model.cpp
Parallel Access
Conflict Analysis
Model
_par.cpp
…
Compilation,
Simulation
Instrumentation!
Seg 2
R: a, b
W:
x
RW:
z
Seg 1
Seg 2
Seg 3
Seg 4
Seg 5
Seg 3
R: a, b
W: x, y
RW:
Seg 6
Conflict
Seg 1
Seg 1
True
Seg 2
Seg 3
Step
2:
Seg 2
True
True
Perform
Conflict
Seg 3
TrueAnalysis
Segment Graph
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
25
Dedicated SystemC Compiler
• Segment Conflict Analysis
– Need to comply with SystemC LRM [IEEE Std 1666™]
• Cooperative (or co-routine) multitasking semantics
– “process instances execute without interruption”
– System designer “can assume that a method process
will execute in its entirety without interruption”
 A parallel implementation “would be obliged
to analyze any dependencies between processes and
constrain their execution to match the co-routine semantics.”
– Must avoid race conditions when using shared variables!
 Prevent conflicting segments to be scheduled in parallel
Seg 2
R: a, b
W:
x
RW:
z
Seg 3
R: a, b
W: x, y
RW:
Conflict
Seg 1
Seg 1
True
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
Seg 2
Seg 3
Seg 2
True
True
Seg 3
True
(c) 2016 R. Doemer, CECS
26
13
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Dedicated SystemC Compiler
• Segment Conflict Analysis:
– Variable access analysis for Read, Write, and Read/Write
– Example:
class Conflict: public sc_module {
SC_CTOR(Conflict)
{ SC_THREAD(thread1);
SC_THREAD(thread2);
}
int x, y, z;
Segment ID: 0
conflict.cpp:34 int b = 2;
conflict.cpp:25 a = 2
conflict.cpp:35 x = y
Segment ID: 1 (conflict.cpp:26)
conflict.cpp:27 a = x + y
void thread1()
{
int a;
a = 2;
wait();
a = x + y;
wait();
z++;
}
};
void thread2()
{
int b = 2;
x = y;
wait();
x = y * z;
wait();
z++;
wait();
x++;
}
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
Segment ID: 3
conflict.cpp:24 int a;
Segment ID: 2 (conflict.cpp:28)
conflict.cpp:29 z++
Segment ID: 4 (conflict.cpp:36)
conflict.cpp:37 x = y * z
Segment ID: 5 (conflict.cpp:38)
conflict.cpp:39 z++
Segment ID: 6 (conflict.cpp:40)
conflict.cpp:41 x++
Segment Graph
(c) 2016 R. Doemer, CECS
27
Dedicated SystemC Compiler
• Segment Conflict Analysis:
– Variable access analysis for Read, Write, and Read/Write
– Example:
Segment ID: 0
Segment ID: 3
conflict.cpp:24 int a;
conflict.cpp:34 int b = 2;
conflict.cpp:25 a = 2
conflict.cpp:35 x = y
Segment ID: 1 (conflict.cpp:26)
conflict.cpp:27 a = x + y
Segment ID: 2 (conflict.cpp:28)
conflict.cpp:29 z++
Segment ID: 4 (conflict.cpp:36)
conflict.cpp:37 x = y * z
Segment ID: 5 (conflict.cpp:38)
conflict.cpp:39 z++
Segment ID: 6 (conflict.cpp:40)
conflict.cpp:41 x++
Segment
Variable
Accesses
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
Segment Graph
(c) 2016 R. Doemer, CECS
28
14
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Dedicated SystemC Compiler
• Segment Conflict Analysis:
– Variable access analysis for Read, Write, and Read/Write
– Example:
x
x
Segment
Variable
Accesses
Segment Data Conflict Table
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
29
SystemC Compiler and Simulator
• Compiler and Simulator work hand in hand!
– Compiler performs conservative static analysis
– Analysis results are passed to the simulator
– Simulator can make safe scheduling decisions quickly
 Automatic Model Instrumentation
 Static analysis results are inserted into the source code
Input Model
SystemC Compiler
systemc.h
RISC
…
Model.cpp
Parallel
C++ Model
Source Code
Instrumentation
systemc
_par.h
Model
_par.cpp
SystemC Simulator
C++
Compiler
Parallel
Simulation
Parallel
SystemC
Library
Model Instrumentation:
Segment and Instance IDs
Segment Conflict Tables
Time Advance Tables
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
30
15
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
SystemC Compiler and Simulator
• Compiler and Simulator work hand in hand!
– Compiler performs conservative static analysis
– Analysis results are passed to the simulator
– Simulator can make safe scheduling decisions quickly
 Automatic Model Instrumentation
1) Segment and instance IDs
• Threads identified by creator instance and current code location
2) Data and event conflict tables
• Segment concurrency hazards identified by fast table lookup
(filtered for scope, instance path, references and port mapping)
3) Current and next time advance tables
• Prediction of future thread states
 better scheduling decisions by looking ahead in time
(future optimization)
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
31
Parallel SystemC Simulator
• Simulator kernel with Out-of-Order Parallel Scheduler
– Conceptual OoO PDES execution
Issue
Threads
Issue threads…
• truly in parallel and out-of-order
• whenever they are ready
• and will have no conflicts!
 Fast conflict table lookup
 Smart thread-to-core mapping
(future optimization)
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
32
16
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Parallel SystemC Simulator
• Protection of Inter-Thread Communication
– Need to comply with SystemC LRM [IEEE Std 1666™]
• Cooperative (or co-routine) multitasking semantics
– Threads can assume execution “without interruption”
– Must protect inter-thread communication in channels!
• Primitive SystemC channels
 Static protection (special parallel SystemC headers, library)
• User-defined hierarchical channels
 Dynamic protection through source code instrumentation
Thread 1
Thread 2
Channel
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
33
Demo and Experimental Results
• Interactive Demonstration
– Two Application Examples
• DVD player (conceptual)
• Mandelbrot renderer (embarrassingly parallel)
– Compilation
• Static analysis based on segment graph
• Conflict analysis and source code instrumentation
– Simulation
• Accellera reference library (Posix-based, sequential)
• RISC simulator library (Posix-based, out-of-order parallel)
Input Model
systemc.h
Model.cpp
SystemC Compiler
RISC
Segment Graph
Conflict Analysis
Source Code
Instrumentation
systemc
_par.h
SystemC Simulator
Model
_par.cpp
Parallel
Simulation
Parallel
SystemC
Library
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
34
17
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Example Model 1: DVD Player
• DVD Player Example (conceptual)
– Parallel video and audio decoding with different frame rates
Multimedia
input
stream
Stimulus
1: SC_MODULE(VideoCodec)
2: { sc_port<i_receiver> p1;
3: sc_port<i_sender> p2;
4: …
5: while(1){
6: p1‐>receive(&inFrm);
7: outFrm = decode(inFrm);
8: wait(33330, SC_US);
9: p2‐>send(outFrm);
10: }
11: };
1: SC_MODULE(AudioCodec)
2: { sc_port<i_receiver> p1;
3: sc_port<i_sender> p2;
4: …
5: while(1){
6: p1‐>receive(&inFrm);
7: outFrm = decode(inFrm);
8: wait(26120, SC_US);
9: p2‐>send(outFrm);
10: }
11: };
DUT
Video
Codec
Left
Audio
Codec
Video
Monitor
Left
Speaker
Right
Audio
Codec
DUT
Video
30 FPS
Right
Speaker
2 Audio Channels
38.28 FPS
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
35
Example Model 1: DVD Player
Stimulus
• DVD Player Example (conceptual)
– Parallel video and audio decoding
with different frame rates
DUT
1. Real time schedule: fully parallel
33.33
Video
Frame 1
Left
Right
LF 1
RF 1
0
66.67
Frame 2
LF 2
RF 2
26.12
Left
Audio
Codec
Video
Monitor
Left
Speaker
Right
Audio
Codec
100
Frame 3
LF 3
RF 3
52.25
Video
Codec
LF 4
RF 4
DUT
Right
Speaker
78.38 Time [ms]
2. Reference simulator schedule (DES)
33.33
Video
66.67
Frame 1
Left
Right
LF 1
LF 2
RF 1
0
26.12
Frame 3
LF 3
RF 2
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
100
Frame 2
LF 4
RF 3
52.25
78.38
(c) 2016 R. Doemer, CECS
Time [ms] …
36
18
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Example Model 1: DVD Player
Stimulus
• DVD Player Example (conceptual)
– Parallel video and audio decoding
with different frame rates
DUT
1. Real time schedule: fully parallel
33.33
Video
Frame 1
Left
Right
LF 1
RF 1
0
66.67
Frame 2
LF 2
RF 2
Left
Audio
Codec
Video
Monitor
Left
Speaker
Right
Audio
Codec
100
Frame 3
LF 3
RF 3
52.25
26.12
Video
Codec
LF 4
RF 4
DUT
Right
Speaker
78.38 Time [ms]
3. Synchronous parallel schedule (PDES)
33.33
Video
Frame 1
Left
Right
LF 1
RF 1
0
66.67
100
Frame 2
Frame 3
LF 2
RF 2
LF 3
RF 3
52.25
26.12
LF 4
RF 4
78.38
Time [ms]
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
37
Example Model 1: DVD Player
Stimulus
• DVD Player Example (conceptual)
– Parallel video and audio decoding
with different frame rates
DUT
1. Real time schedule: fully parallel
33.33
Video
Frame 1
Left
Right
LF 1
RF 1
0
66.67
Frame 2
LF 2
RF 2
Left
Audio
Codec
Video
Monitor
Left
Speaker
Right
Audio
Codec
100
Frame 3
LF 3
RF 3
52.25
26.12
Video
Codec
LF 4
RF 4
DUT
Right
Speaker
78.38 Time [ms]
4. Out-of-order parallel schedule (OoO PDES)
33.33
Video
Frame 1
Left
Right
LF 1
RF 1
0
66.67
Frame 2
LF 2
RF 2
26.12
LF 3
RF 3
52.25
100
Frame 3
LF 4
RF 4
78.38 Time [ms]
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
38
19
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Example Model 2: Mandelbrot
• Mandelbrot Renderer (Graphics Pipeline Application)
– Mandelbrot Set
• Mathematical set of points
in complex plane
– Two-dimensional fractal shape
• High computation load
– Recursive/iterative function
• Embarrassingly parallel
– Parallelism at pixel level
Top
Platform
– SystemC Model
•
•
•
•
TLM abstraction
Parallel slices
Configurable
Executable
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
DUT
Coordinator
Stimulus
din
dout
M
M
M
Monitor
M
(c) 2016 R. Doemer, CECS
39
Example Model 2: Mandelbrot
• Mandelbrot Renderer (Graphics Pipeline Application)
 Simulated Graphics Demonstration
(when network delays prevent actual graphical demo)
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
40
20
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Experimental Results
• DVD Player Example (conceptual)
 Parallel video and audio decoding with different frame rates
– Simulator run times on Intel® Xeon® multi-core host (‘delta’)
(1 E3-1240 CPU, 3.4 GHz, 4 cores, 2 way hyper-threaded)
– RISC V0.2.1, Posix-thread based comparison
10 sec
stream
100 sec
stream
Seq
Par
OoO
Run Time
6.98 s
4.67 s
2.94 s
CPU Load
97%
145%
238%
Speedup
1x
1.49 x
2.37 x
Run Time
68.21 s
45.91 s
28.13 s
CPU Load
100%
149%
251%
Speedup
1x
1.49 x
2.42 x
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
41
Experimental Results
• Mandelbrot Renderer Example
 Graphics Pipeline Application, embarrassingly parallel
– Simulator run times on Intel® Xeon® multi-core host (‘phi’)
(2 E5-2680 CPUs, 2.7 GHz, 8 cores, 2 way hyper-threaded)
– RISC V0.2.1, Posix-thread based comparison
Parallel
Slices
1
2
4
8
16
32
64
128
256
DES
Run
CPU
Time
Load
162.13 s 99%
162.19 s 99%
162.56 s 99%
163.10 s 99%
164.01 s 99%
165.89 s 99%
170.32 s 99%
174.55 s 99%
185.47 s 100%
Run
Time
162.06 s
96.50 s
54.00 s
29.89 s
19.03 s
11.78 s
9.79 s
9.34 s
8.91 s
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
PDES
OOO PDES
CPU
CPU
Run
Speedup
Speedup
Load
Time
Load
100%
1.00 x 161.90 s 100%
1.00 x
168%
1.68 x 96.48 s 168%
1.68 x
305%
3.01 x 53.85 s 304%
3.02 x
592%
5.46 x 30.05 s 589%
5.43 x
1050% 8.62 x 20.08 s 997%
8.17 x
2082% 14.08 x 11.99 s 2023% 13.84 x
2607% 17.40 x 9.85 s 2608% 17.29 x
2793% 18.69 x 9.39 s 2787% 18.59 x
2958% 20.82 x 8.90 s 2964% 20.84 x
(c) 2016 R. Doemer, CECS
42
21
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Experimental Results
• Mandelbrot Renderer Example
 Graphics Pipeline Application, embarrassingly parallel
– Simulator run times on
Intel® Many Integrated Core (MIC)
Architecture
 Intel® Xeon Phi™ coprocessor
• 5110P CPU at 1.052 GHz
• 60 cores, 4 way hyper-threaded
• Bidirectional ring interconnect, L2 chache
• Appears as regular Linux machine with 240 cores!
 Experimental result:
• Traditional DES vs. synchronous PDES (no conflicts):
• Run time (seq/par): 226.57 sec / 5.50 sec
 41x speedup!
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
43
RISC Compiler and Simulator
 Out-of-Order Parallel SystemC Compiler and Simulator
• Open Source Prototype Implementation
– Alpha Release V0.2.1 published October 30, 2015
• http://www.cecs.uci.edu/~doemer/risc.html





Source tar ball:
Installation:
Doxygen documentation:
Doxygen documentation:
BSD license terms:
risc_v0.2.1.tar.gz
INSTALL, Makefile
RISC API
OoO Parallel SystemC API
LICENSE
• Downloads and feedback welcome!
 Code hardening
 Extension of supported parallel SystemC subset
 Standardization…
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
44
22
Out-of-Order Parallel Simulation of SystemC Models on
Many-Core Platforms
UNSW, 2 March 2016
Concluding Remarks
• Project on Advanced Parallel SystemC Simulation
– Out-of-Order PDES on many-core host platforms
– Maximum compliance with current execution semantics
• SystemC Compiler Integrated with Parallel Simulator
– Segment Graph based static analysis for parallel execution
– Model instrumentation and protection of communication
– Out-of-order parallel scheduler, many-core platform support
• Open Source
– RISC V0.2.1, working prototype implementation
– Available at www.cecs.uci.edu/~doemer/risc.html
• Ongoing and Future Work
– Code hardening and virtual platform integration (i.e. Simics®)
– Collaboration with Accellera SystemC Language WG
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
45
References
•
•
•
•
•
•
•
•
[CECS-TR-15-02] G. Liu, T. Schmidt, R. Dömer:
"RISC Compiler and Simulator, Alpha Release V0.2.1: Out-of-Order Parallel Simulatable
SystemC Subset", Center for Embedded and Cyber-physical Systems, CECS, April 2015.
[ASPDAC’15] G. Liu, T. Schmidt, R. Dömer, A. Dingankar, D. Kirkpatrick:
"Optimizing Thread-to-Core Mapping on Manycore Platforms with Distributed Tag
Directories", Proceedings of ASPDAC, Tokyo, Japan, January 2015.
[IEEE TCAD’14] W. Chen, X. Han, C. Chang, G. Liu, R. Dömer:
"Out-of-Order Parallel Discrete Event Simulation for Transaction Level Models",
IEEE Transactions on CAD, vol. 33, no. 12, pp. 1859-1872, December 2014.
[DATE’14] W. Chen, X. Han, R. Dömer: "May-Happen-in-Parallel Analysis based on
Segment Graphs for Safe ESL Models", Proceedings of DATE, Dresden, Germany, March
2014. (Best Paper Award!)
[DATE’13] W. Chen, R. Dömer: "Optimized Out-of-Order Parallel Discrete Event Simulation
Using Predictions", Proceedings of DATE, Grenoble, France, March 2013.
[IEEE D&T’13] W. Chen, X. Han, C. Chang, R. Dömer: "Advances in Parallel Discrete Event
Simulation for Electronic System-Level Design", IEEE Design & Test of Computers, vol. 30,
no. 1, pp. 45-54, Jan.-Feb. 2013.
[DATE’12] W. Chen, X. Han, R. Dömer: "Out-of-Order Parallel Simulation for ESL Design",
Proceedings of DATE, Dresden, Germany, March 2012.
[ASPDAC’12] W. Chen, R. Dömer: "An Optimizing Compiler for Out-of-Order Parallel ESL
Simulation Exploiting Instance Isolation", Proceedings of ASPDAC, Sydney, Australia,
February 2012.
Out-of-Order Parallel SystemC, UNSW, 2 March 2016
(c) 2016 R. Doemer, CECS
(c) 2016 R. Doemer, CECS
46
23
Fly UP