## EECS 222: Embedded System Modeling Lecture 11

#### Rainer Dömer

doemer@uci.edu

The Henry Samueli School of Engineering Electrical Engineering and Computer Science University of California, Irvine

#### Lecture 11: Overview

- Discrete Event Simulation Semantics
  - Discrete Event Simulation
  - Parallel Discrete Event Simulation
  - Out-of-Order Parallel Discrete Event Simulation
- Formal Execution Semantics
  - Time-Interval Formalism

EECS222: Embedded System Modeling, Lecture 11

(c) 2019 R. Doemer

2

### **Discrete Event Simulation Semantics**

- Discrete Event Simulation Algorithm for SpecC
  - available in LRM (appendix), good for documentation
  - ⇒ abstract definition (defines a set of valid implementations)
  - ⇒ not general (possibly incomplete)
- Definitions:
  - At any time, each thread t is in one of the following sets:
    - READY: set of threads ready to execute (initially root thread)
    - WAIT: set of threads suspended by wait (initially Ø)
    - WAITFOR: set of threads suspended by waitfor (initially Ø)
  - Notified events are stored in a set N
    - notify el adds event el to N
    - wait e1 will wakeup when e1 is in N
    - · Consumption of event e means event e is taken out of N
    - Expiration of notified events means N is set to Ø

EECS222: Embedded System Modeling, Lecture 11

(c) 2019 R. Doemer

3









#### **Formal Execution Semantics**

- Examples of Formally Defined Semantics
  - 1) Time-interval formalism
    - Formally defines timed execution semantics of SpecC
      - · Covers sequentiality, concurrency, synchronization
      - · Allows reasoning over execution order, dependencies
      - ➤ Discussed in the following slides!
  - 2) Abstract State Machines (ASM)
    - · Complete execution semantics of SpecC
      - · wait, notify, notifyone, par, pipe, try-trap-interrupt
      - Operational semantics only (no data types!)
    - Abstract model closely matches SystemC
    - · Abstract model closely matches VHDL, Verilog
    - > Not discused in this course

EECS222: Embedded System Modeling, Lecture 11

(c) 2019 R. Doemer

8









#### **Formal Execution Semantics**

- Time-interval formalism
  - Atomicity
    - Since there is generally no atomicity guaranteed, a safe mechanism for mutual exclusion is necessary
    - SpecC 2.0: Channels behave as Monitors!
      - A mutex is implicitly contained in each channel instance
      - Each channel method implicitly
        - » acquires the mutex when it starts execution, and
        - » releases the mutex again when it finishes
      - wait and waitfor statements implicitly (and atomically!)
        - » release an acquired mutex in a channel, and
        - » re-acquire the mutex before execution resumes
      - This easily enables safe communication without heavy restrictions to the implementation!

EECS222: Embedded System Modeling, Lecture 11

(c) 2019 R. Doemer

13

## Discrete Event Simulation (DES)

- Parallel Simulation!?
- Safe Communication in Parallel Execution Context
  - > Requires protection of inter-thread communication!
  - SpecC
    - Preemptive multi-threading mandates channels as "monitors"
  - SystemC
    - Cooperative multi-threading assumes execution "without interruption"
  - > Protection: Insert a mutex lock into channel instances
    - Lock the channel on thread entry
    - Unlock the channel on thread exit
    - Atomic execution of channel methods



EECS222: Embedded System Modeling, Lecture 11

(c) 2019 R. Doemer

14









### Recoding Infrastructure for SystemC (RISC)

- Advanced Parallel SystemC Simulation
  - Aggressive PDES on many-core host platforms
  - Maximum compliance with IEEE SystemC semantics
- Introduction of a Dedicated SystemC Compiler
  - Advanced conflict analysis for safe parallel execution
  - Automatic model instrumentation and code generation
- Parallel SystemC Simulator
  - Out-of-order parallel scheduler, multi-thread safe primitives
  - Multi- and many-core host platforms (e.g. Intel® Xeon Phi™)
- Open Source
  - Freely available for evaluation and collaboration
  - Thanks to Intel Corporation!

FDL '18 Keynote, "Limits of Standard-compliant Parallel SystemC"

(c) 2018 R. Doemer, CECS

19

### Recoding Infrastructure for SystemC (RISC)

- Out-of-Order PDES Key Ideas
  - Dedicated SystemC compiler with advanced model analysis
     Static conflict analysis based on Segment Graphs
  - 2. Parallel simulator with out-of-order scheduling
    - > Fast decision making at run-time, optimized mapping
- Fundamental Data Structure: Segment Graph
  - Key to semantics-compliant out-of-order execution [DATE'12]
  - Key to prediction of future thread state [DATE'13]
    - "Optimized Out-of-Order Parallel DE Simulation Using Predictions"
  - Key to May-Happen-in-Parallel Analysis [DATE'14]
    - "May-Happen-in-Parallel Analysis based on Segment Graphs for Safe ESL Models" (Best Paper Award)
  - Combined: "OoO PDES for TLM" [IEEE TCAD'14]
    - · Comprehensive summary with HybridThreads extension

FDL '18 Keynote, "Limits of Standard-compliant Parallel SystemC"

(c) 2018 R. Doemer, CECS

20





























## RISC: Experiments and Results

- Mandelbrot Renderer (Graphics Pipeline Application)
  - Simulator run times on 16-core Intel® Xeon® multi-core host
  - 2 CPUs at 2.7 GHz, 8 cores each, 2-way hyper-threaded
  - RISC V0.2.1, Posix-threads

| Parallel<br>Slices | DES      |      | PDES     |       |         | 000 PDES |       |         |
|--------------------|----------|------|----------|-------|---------|----------|-------|---------|
|                    | Run      | CPU  | Run      | CPU   | Speedup | Run      | CPU   | Speedup |
|                    | Time     | Load | Time     | Load  |         | Time     | Load  |         |
| 1                  | 162.13 s | 99%  | 162.06 s | 100%  | 1.00 x  | 161.90 s | 100%  | 1.00 x  |
| 2                  | 162.19 s | 99%  | 96.50 s  | 168%  | 1.68 x  | 96.48 s  | 168%  | 1.68 x  |
| 4                  | 162.56 s | 99%  | 54.00 s  | 305%  | 3.01 x  | 53.85 s  | 304%  | 3.02 x  |
| 8                  | 163.10 s | 99%  | 29.89 s  | 592%  | 5.46 x  | 30.05 s  | 589%  | 5.43 x  |
| 16                 | 164.01 s | 99%  | 19.03 s  | 1050% | 8.62 x  | 20.08 s  | 997%  | 8.17 x  |
| 32                 | 165.89 s | 99%  | 11.78 s  | 2082% | 14.08 x | 11.99 s  | 2023% | 13.84 x |
| 64                 | 170.32 s | 99%  | 9.79 s   | 2607% | 17.40 x | 9.85 s   | 2608% | 17.29 x |
| 128                | 174.55 s | 99%  | 9.34 s   | 2793% | 18.69 x | 9.39 s   | 2787% | 18.59 x |
| 256                | 185.47 s | 100% | 8.91 s   | 2958% | 20.82 x | 8.90 s   | 2964% | 20.84 x |

FDL '18 Keynote, "Limits of Standard-compliant Parallel SystemC'

(c) 2018 R. Doemer, CECS

35

## RISC: Experiments and Results

- Many-Core Target Platform: Intel® Xeon Phi™
  - Many Integrated Core (MIC) architecture
    - 1 Coprocessor 5110P CPU at 1.052 GHz
      - 60 physical cores with 4-way hyper-threading
        - Appears as regular Linux host with 240 cores
      - · Up to 8 lanes available for vector processing
- > RISC extended for exploiting 2 types of parallelism
  - Out-of-Order PDES: thread-level parallelism
  - Intel® compiler SIMD: data-level parallelism
  - ➤ RISC SIMD Advisor identifies functions with data-level parallelism suitable for SIMD vectorization
  - DAC '17 paper: "Exploiting Thread and Data Level Parallelism for Ultimate Parallel SystemC Simulation"

FDL '18 Keynote, "Limits of Standard-compliant Parallel SystemC"

(c) 2018 R. Doemer, CECS

36

## RISC: Experiments and Results

- Many-Core Target Platform: Intel® Xeon Phi™
  - Exploiting thread- and data-level parallelism [DAC'17]
  - Mandelbrot renderer (graphics pipeline application)
- Experimental Results:

| Experimental results. |       |      |         |  |  |  |  |  |  |
|-----------------------|-------|------|---------|--|--|--|--|--|--|
| PAR                   | MT    | SIMD | MT+SIMD |  |  |  |  |  |  |
| 1                     | 1.00  | 6.92 | 6.94    |  |  |  |  |  |  |
| 2                     | 1.68  | 6.92 | 11.77   |  |  |  |  |  |  |
| 4                     | 3.04  | 6.92 | 21.19   |  |  |  |  |  |  |
| 8                     | 5.84  | 6.92 | 40.10   |  |  |  |  |  |  |
| 16                    | 11.37 | 6.92 | 72.52   |  |  |  |  |  |  |
| 32                    | 21.32 | 6.91 | 137.21  |  |  |  |  |  |  |
| 64                    | 41.07 | 6.90 | 208.41  |  |  |  |  |  |  |
| 128                   | 46.29 | 6.89 | 212.96  |  |  |  |  |  |  |
| 256                   | 49.90 | 6.87 | 194.19  |  |  |  |  |  |  |
|                       |       |      |         |  |  |  |  |  |  |



Increasing degree of parallelism (PAR = number of threads) reaches a combined multi-threading (MT) and data-level (SIMD) speedup of up to 212x!

FDL '18 Keynote, "Limits of Standard-compliant Parallel SystemC"

(c) 2018 R. Doemer, CECS

37

# RISC Open Source Software

- RISC Compiler and Simulator are freely available
  - http://www.cecs.uci.edu/~doemer/risc.html#RISC042

• Installation notes and script: INSTALL, Makefile

• Open source tar ball: risc\_v0.4.2.tar.gz

Docker script and container:
 Dockerfile

Doxygen documentation:
 RISC API, OOPSC API

• Tool manual pages: risc, simd, visual, ...

BSD license terms:

LICENSE

Companion Technical Report

CECS Technical Report 17-05: CECS\_TR\_17\_05.pdf

bash# docker pull ucirvinelecs/risc
bash# docker run -it ucirvinelecs/risc
[dockeruser]# cd demodir
[dockeruser]# make test

Docker container:

> https://hub.docker.com/r/ucirvinelecs/risc/

FDL '18 Keynote, "Limits of Standard-compliant Parallel SystemC"

(c) 2018 R. Doemer, CECS

38