Target Simulation (Virtual Platforms)

# Target Simulation (Virtual Platforms) When we discussed [[System Level Simulation|system-level simulation]], we were referring rather generically to a "Target" that would run the embedded binary or binaries of our system. There, we focused more on modeling the outer part of the system (equipment, sensors, actuators, environment). Now it's time to discuss said "Target". And here is where it gets a bit tricky. When we discussed the [[Hierarchy of Digital Systems (Start Here)|hierarchical composition]] of digital systems, we saw that such a system can be formed in multiple different ways. We can talk about OEM boards alone, a board with a mezzanine, backplanes carrying many boards inside a unit, a collection of units in racks, or full-blown data centers. Then, by generically calling it a "Target", we relieve ourselves a bit from having to specify exactly what our target is composed of. But when we are supposed to replicate the behavior (i.e., simulate) of this target, we cannot escape that anymore. Depending on the level of fidelity we aim for our simulation, we might need to create a model that represents the internal hierarchy of the Target accurately, otherwise, the simulation may lose important features that are necessary for the development. What I call "Target" here is also called a Virtual Platform, a Virtual Prototype, or a Full-System Simulator (FSS). Naturally, we wish to simulate the target system with ultra-high accuracy. We discussed when we talked about [[System Level Simulation]] why model fidelity never goes unpaid. Then, a virtual replica of our target, just like any other simulation environment, must strike a balance between accuracy and performance. That is, it must be sufficiently abstract to achieve tolerable performance levels with, at the same time, sufficient functional accuracy to run realistic workloads and sufficient timing accuracy to interface to detailed hardware models. ![Use of languages for different levels of abstraction in simulation (source: "SystemC: From the Ground Up ", David C. Black, Jack Donovan, Bill Bunton, Anna Keist, Springer)](image437.png) > [!Figure] > *Use of languages for different levels of abstraction in simulation (source: #/ref/Black )* Necessary features in Virtual Platforms/Prototypes are: - Breakpoints: VPs must offer streamlined use of GDB in a way that software developers can keep their workflows as they would while using the real hardware. - Checkpointing: Checkpointing is the ability to save the complete state of a simulation to disk and later bring the saved state back and continue the simulation without any logical interruption from the perspective of the modeled hardware and especially the target software. Checkpoints contain the state of both the hardware and the software, although the latter is implicit in the hardware state. Checkpointing needs to support the following operations: - Storing the simulation state to disk. - Restoring the simulation state on the same host into the same simulation binary. - Restoring on a different host machine, possibly belonging to another user or organization, where different can mean a machine with a different word length, endianness, operating system, and installed software base. - Restoring into an updated version of the same simulation model, for example, with bug fixes or expanded functionality. - Restoring into a completely different simulation model that uses the same architectural state. For example, a detailed clock-cycle-driven model. Checkpointing can be used to support workflow optimization, such as a “nightly boot” setup where target system configurations are booted as part of a nightly build, and checkpoints saved. During the workday, software developers simply pick up checkpoints of the relevant target states, with no need to boot the target machines themselves. Another important use of checkpointing is to package bugs and communicate them between testing and engineering, between companies, and across the world. With a deterministic simulator and checkpoint attached to the bug report, reproducing a bug is easier. - Reconfiguration: Virtual Platforms must be easily reconfigured and extended at any point during a simulation. VPs must provide mechanisms where new modules can be loaded and new hardware models can be added. Connections in the VP should be easily changed from scripts or the command line at will. Reconfiguration can also be used to implement fault injection and to test the target software response to faults or sudden changes in the hardware. - Repeatability: VPs must show a repeatable, deterministic behavior. Any simulation run must be repeated precisely on any host at any time. Note that determinism does not mean that the simulation will always run the same target software in the same way. If the timing of any input or any part of the initial state changes, the simulation flow will change accordingly. - Scripting: A decent Virtual Platform must be easy to script using popular languages like Python or a Command Line Interface. The CLI would offer an accessible way to perform common tasks, while Python offers the ability to write programs to accomplish more complex tasks. Scripts can hook into events in the simulation, such as printouts on serial consoles, network packets being sent, breakpoints being hit, interrupts firing in the target hardware, and task switches done by target operating systems. When it comes to modeling Targets with software, one may stand on the shoulder of a breadth of libraries and languages that have been conceived for that end. Although this is not strictly necessary (one could, for instance, model a Target in plain C++ if desired), these languages and libraries are equipped with some special features that make the modeling task more straightforward than starting from scratch. We explore these next. ## SystemC/TLM As the electronics industry builds more complex systems involving large numbers of components, including software, there is an increasing need for a modeling language that can manage the complexity and size of these systems. SystemC provides a mechanism for managing this complexity with its facility for modeling hardware and software together at multiple levels of abstraction. Stakeholders in SystemC include Electronic Design Automation (EDA) companies who implement SystemC class libraries and tools, System-on-chip (SoC) suppliers who extend those class libraries and use SystemC to model their intellectual property, and end users who use SystemC to model their systems. Before the publication of the standard, SystemC was defined by an open-source proof-of-concept C++ library, also known as the reference simulator, available from the Open SystemC Initiative (OSCI^[https://systemc.org/]). SystemC has then been approved by the IEEE Standards Association as IEEE 1666-2011—the SystemC Language Reference Manual (LRM). The LRM provides the definitive statement of the semantics of SystemC. OSCI also provides an open-source proof-of-concept simulator (sometimes incorrectly referred to as the reference simulator), which can be downloaded from the OSCI website. Although OSCI intended that commercial vendors and academia could create original software compliant with IEEE 1666, in practice most SystemC implementations have been at least partly based on the OSCI proof-of-concept simulator. ### Relationship with C++ The general purpose of SystemC is to provide a C++-based facility for designers and architects who need to address complex systems that are a hybrid between hardware and software. The SystemC standard is closely related to the C++ programming language and adheres to the terminology used in ISO/IEC 14882^[https://www.iso.org/standard/79358.html]. The SystemC standard does not seek to restrict the usage of the C++ programming language; a SystemC application may use any of the facilities provided by C++, which in turn may use any of the facilities provided by C. However, where the facilities provided by SystemC are used, they must be used in accordance with the rules and constraints set out in its standard, which defines the public interface to the SystemC class library and the constraints on how those classes may be used. The SystemC class library may be implemented in any manner whatsoever, provided only that the obligations imposed by the standard are respected. A C++ class library may be extended using the mechanisms provided by the C++ language. Developers and users are free to extend SystemC in this way, provided that they do not violate the standard. ### Architecture The architecture of a SystemC application is shown in the figure below. The shaded blocks represent the SystemC class library itself. The layer shown immediately above the SystemC class library represents standard or proprietary C++ libraries associated with specific design or verification methodologies or specific communication channels and is outside the scope of the standard. ![](SystemC_1.png) > [!Figure] > _SystemC language architecture_ ==The classes of the SystemC library fall into four categories: the core language, the SystemC data types, the predefined channels, and the utilities.== The core language and the data types may be used independently of one another, although they are more typically used together. At the core of SystemC is a simulation engine containing a process scheduler. Processes are executed in response to the notification of events. Events are notified at specific points in simulated time. In the case of time-ordered events, the scheduler is deterministic. In the case of events occurring at the same point in simulation time, the scheduler is non-deterministic. The scheduler is non-preemptive. The _module_ is the basic structural building block. Systems are represented by a module hierarchy consisting of a set of modules related by instantiation. A module can contain the following: - Ports - Exports - Channels - Processes - Events - Instances of other modules - Other data members - Other member functions Modules, ports, exports, channels, interfaces, events, and times are implemented as C++ classes. The execution of a SystemC application consists of an elaboration phase, during which the module hierarchy is created, followed by a simulation phase, during which the scheduler runs. Both elaboration and simulation involve the execution of code both from the application and from the kernel. The kernel is the part of a SystemC class library implementation that provides the core functionality for elaboration and the scheduler. Instances of ports, exports, channels, and modules can only be created during elaboration. Once created during elaboration, this hierarchical structure remains fixed for the remainder of elaboration and simulation. Process instances can be created statically during elaboration or dynamically during simulation. ==Modules, channels, ports, exports, and processes are derived from a common base class **sc_object**, which provides methods for traversing the module hierarchy.== Arbitrary attributes (name-value pairs) can be attached to instances of **sc_object**. Instances of ports, exports, channels, and modules can only be created within modules. The only exception to this rule is top-level modules. Processes are used to perform computations and hence to model the functionality of a system. Although notionally concurrent, processes are actually scheduled to execute in sequence. Processes are C++ functions registered with the kernel during elaboration (static processes) or during simulation (dynamic processes), and called from the kernel during simulation. The sensitivity of a process identifies the set of events that would cause the scheduler to execute that process should those events be notified. Both static and dynamic sensitivity are provided. Static sensitivity is created at the time the process instance is created, whereas dynamic sensitivity is created during the execution of the function associated with the process during simulation. A process may be sensitive to named events or to events buried within channels or behind ports and located using an _event finder_. Furthermore, dynamic sensitivity may be created with a time-out, meaning that the scheduler executes the process after a given time interval has elapsed. Channels serve to encapsulate the mechanisms through which processes communicate and hence to model the communication aspects or protocols of a system. Channels can be used for inter-module communication or inter-process communication within a module. Interfaces provide a means of accessing channels. An interface proper is an abstract class that declares a set of pure virtual functions (interface methods). A channel is said to implement an interface if it defines all of the methods (that is, member functions) declared in that interface. The purpose of interfaces is to exploit the object-oriented type system of C++ so that channels can be refined independently from the modules that use them. Specifically, any channel that implements a particular interface can be interchanged with any other such channel in a context that names that interface type. The methods defined within a channel are typically called through an interface. A channel may implement more than one interface and a single interface may be implemented by more than one channel. Interface methods implemented in channels may create dynamic sensitivity to events contained within those same channels. This is a typical coding idiom and results in a so-called blocking method in which the process calling the method is suspended until the given event occurs. Such methods can only be called from certain kinds of processes known as thread processes. Because processes and channels may be encapsulated within modules, communication between processes (through channels) may cross boundaries within the module hierarchy. Such boundary crossing is mediated by ports and exports, which serve to forward method calls from the processes within a module to channels to which those ports or exports are bound. A port specifies that a particular interface is required by a module, whereas an export specifies that a particular interface is provided by a module. Ports allow interface method calls within a module to be independent of the context in which the module is instantiated in the sense that the module need not have any explicit knowledge of the identity of the channels to which its ports are bound. Exports allow a single module to provide multiple instances of the same interface. Ports belonging to specific module instances are bound to channel instances during elaboration. The port binding policy can be set to control whether a port need be bound but the binding cannot be changed subsequently. Exports are bound to channel instances that lie within or below the module containing the export. Hence, each interface method call made through a port or export is directed to a specific channel instance in the elaborated module hierarchy - the channel instance to which that port is bound. Ports can only forward method calls up or out of a module, whereas exports can only forward method calls down or into a module. Such method calls always originate from processes within a module and are directed to channels instantiated elsewhere in the module hierarchy. Ports and exports are instances of a templated class that is parameterized with an interface type. The port or export can only be bound to a channel that implements that particular interface or one derived from it. There are two categories of channels: hierarchical channels and primitive channels. A hierarchical channel is a module. A primitive channel is derived from a specific base class (**sc_prim_channel**) and is not a module. Hence, a hierarchical channel can contain processes and instances of modules, ports, and other channels, whereas a primitive channel can contain none of these. It is also possible to define channels derived from neither of these base classes, but every channel implements one or more interfaces. A primitive channel provides unique access to the update phase of the scheduler, enabling the very efficient implementation of certain communication schemes. The SystemC standard includes a set of predefined channels, together with associated interfaces and ports, as follows: - **sc_signal** - **sc_buffer** - **sc_clock** - **sc_signal_resolved** - **sc_signal_rv** - **sc_fifo** - **sc_mutex** - **sc_semaphore** - **sc_event_queue** Class **sc_signal** provides the semantics for creating register transfer level or pin-accurate models of digital hardware. Class **sc_fifo** provides the semantics for point-to-point FIFO-based communication appropriate for models based on networks of communicating processes. Classes **sc_mutex** and **sc_semaphore** provide communication primitives appropriate for software modeling. The SystemC standard includes a set of data types for modeling digital logic and fixed-point arithmetic, as follows: **sc_int<>** **sc_uint<>** **sc_bigint** **sc_biguint** **sc_logic** **sc_lv<>** **sc_bv<>** **sc_fixed<>** **sc_ufixed<>** Classes **sc_int** and **sc_uint** provide signed and unsigned limited-precision integers with a word length limited by the C++ implementation. Classes **sc_bigint** and **sc_biguint** provide finite-precision integers. Class **sc_logic** provides four-valued logic. Classes **sc_bv** and **sc_lv** provide two- and four-valued logic vectors. Classes **sc_fixed** and **sc_ufixed** provide signed and unsigned fixed-point arithmetic. The classes **sc_report** and **sc_report_handler** provide a general mechanism for error handling that is used by the SystemC class library itself and is also available to the user. Reports can be categorized by severity and by message type, and customized actions can be set for each category of the report, such as writing a message, throwing an exception, or aborting the program. ### Example Here's a basic example of a SystemC module that implements a simple counter: ```C #include <systemc.h> SC_MODULE(Counter) { sc_in<bool> clock; sc_in<bool> reset; sc_out<int> count; int current_count; void counter_process() { while (true) { if (reset.read() == true) { current_count = 0; count.write(current_count); } else if (clock.posedge()) { current_count++; count.write(current_count); } wait(); // Wait for next clock edge } } SC_CTOR(Counter) { SC_THREAD(counter_process); sensitive << clock.pos(); } }; int sc_main(int argc, char* argv[]) { sc_clock clock("clock", 10, SC_NS); sc_signal<bool> reset; sc_signal<int> count; Counter counter("counter"); counter.clock(clock); counter.reset(reset); counter.count(count); reset = 0; // Simulate for 100 ns sc_start(100, SC_NS); // Reset the counter reset = 1; // Simulate for another 100 ns sc_start(100, SC_NS); // Deassert reset reset = 0; // Simulate for another 100 ns sc_start(100, SC_NS); // Stop simulation sc_stop(); return 0; } ``` This example defines a SystemC module `Counter` that increments a count value on each positive clock edge. The count can be reset to 0 by setting the `reset` input to true. In the `sc_main` function, a clock signal, reset signal, and count signal are declared and connected to an instance of the `Counter` module. The simulation is then run for a certain period of time with the count being reset in the middle. Finally, the simulation is stopped. A naive implementation of a CPU core with support for LOAD, STORE, ADD, SUB, and MOV instructions in SystemC would look like this: ```C #include <systemc.h> SC_MODULE(CPU) { // Inputs sc_in<bool> clock; sc_in<bool> reset; sc_in<int> data_in; // Input data for LOAD and MOV instructions // Outputs sc_out<int> data_out; // Output data for LOAD and STORE instructions // Internal registers sc_signal<int> register_file[4]; // 4 general-purpose registers // Internal control signals sc_signal<int> opcode; sc_signal<int> operand1; sc_signal<int> operand2; sc_signal<int> result; sc_signal<bool> enable_write; // Methods void fetch() { while (true) { wait(); // Wait for clock edge if (reset.read() == true) { opcode = 0; operand1 = 0; operand2 = 0; enable_write = false; } else { // Fetch instruction from memory (not implemented in this example) // For simplicity, we directly decode here // Instruction format: opcode (2 bits) | operand1 (2 bits) | operand2 (2 bits) // Example: LOAD R1, data_in -> opcode = 00, operand1 = 01 (R1), operand2 = 00 (data_in) opcode = 0b00; operand1 = 0b01; operand2 = 0b00; enable_write = true; } } } void execute() { while (true) { wait(); // Wait for clock edge if (reset.read() == true) { result = 0; } else { // Execute instruction based on opcode switch (opcode.read()) { case 0b00: // LOAD result = data_in.read(); // Load data_in into result register break; case 0b01: // STORE data_out.write(register_file[operand1.read()].read()); // Store register value into data_out break; case 0b10: // ADD result = register_file[operand1.read()].read() + register_file[operand2.read()].read(); break; case 0b11: // SUB result = register_file[operand1.read()].read() - register_file[operand2.read()].read(); break; case 0b100: // MOV result = data_in.read(); // Move data_in to result register break; default: result = 0; // Invalid opcode } } } } void write_back() { while (true) { wait(); // Wait for clock edge if (enable_write.read()) { register_file[operand1.read()].write(result.read()); // Write result to register } } } // Constructor SC_CTOR(CPU) { SC_THREAD(fetch); sensitive << clock.pos(); SC_THREAD(execute); sensitive << clock.pos(); SC_THREAD(write_back); sensitive << clock.pos(); } }; int sc_main(int argc, char* argv[]) { sc_clock clock("clock", 10, SC_NS); sc_signal<bool> reset; sc_signal<int> data_in; sc_signal<int> data_out; CPU cpu("cpu"); cpu.clock(clock); cpu.reset(reset); cpu.data_in(data_in); cpu.data_out(data_out); // Simulate for 100 ns sc_start(100, SC_NS); // Stop simulation sc_stop(); return 0; } ``` In this example, note that there is no interface to any memory implemented. Any CPU core model would require it. If we were to model the memory interface in a cycle-accurate manner, that would impact the overall performance of the simulation. The key to obtaining better speed of execution is to raise the abstraction level when modeling memory interfaces, and that is done through Transaction-level modeling (TLM). ### Transaction-Level Modeling (TLM) Transaction-level modeling (TLM) focuses on the communication between different components rather than the internal details of each component. TLM operates at a higher level of abstraction compared to RTL (Register Transfer Level) modeling. It abstracts away implementation details and focuses on the interactions between different modules or components in a system. TLM models communication between modules using transactions, which represent abstract data transfers between components. Transactions encapsulate the essential information exchanged between modules, such as data, addresses, and control signals. TLM typically does not specify precise timing details such as cycle-accurate behavior. Instead, it allows for loosely timed modeling, where the timing behavior is abstracted to a level sufficient for system-level exploration and analysis without getting into low-level timing intricacies. TLM interfaces define a standardized way for modules to communicate with each other. These interfaces specify the methods and protocols used for initiating transactions, transferring data, and handling responses. TLM can be further categorized into different levels, including TLM-1, TLM-2, and TLM-2.0. Each level provides increasing levels of abstraction and sophistication, with TLM-2.0 being the most advanced and commonly used in modern SystemC designs. ```C #include <systemc.h> #include <tlm.h> #include <tlm_utils/simple_initiator_socket.h> #include <tlm_utils/simple_target_socket.h> using namespace sc_core; using namespace sc_dt; using namespace tlm; // Memory module SC_MODULE(Memory) { tlm_utils::simple_target_socket<Memory> socket; // Memory data int memory_data[256]; // 256 memory locations // Constructor SC_CTOR(Memory) : socket("socket") { // Initialize memory data for (int i = 0; i < 256; ++i) { memory_data[i] = 0; } // Register callback function for incoming transactions socket.register_b_transport(this, &Memory::b_transport); } // Blocking transport function virtual void b_transport(tlm_generic_payload& trans, sc_time& delay) { tlm_command cmd = trans.get_command(); sc_dt::uint64 addr = trans.get_address(); unsigned char* data_ptr = trans.get_data_ptr(); unsigned int length = trans.get_data_length(); if (cmd == TLM_READ_COMMAND) { // Read data from memory memcpy(data_ptr, &memory_data[addr], length); } else if (cmd == TLM_WRITE_COMMAND) { // Write data to memory memcpy(&memory_data[addr], data_ptr, length); } // Finish the transaction trans.set_response_status(TLM_OK_RESPONSE); } }; // CPU module SC_MODULE(CPU) { // TLM initiator socket for CPU to memory communication tlm_utils::simple_initiator_socket<CPU> socket; // Internal registers sc_signal<int> register_file[4]; // 4 general-purpose registers // Methods void execute() { // Dummy code for instruction execution wait(SC_ZERO_TIME); // Perform memory read operation tlm_generic_payload trans; unsigned int data; trans.set_command(TLM_READ_COMMAND); trans.set_address(0x100); // Address to read from trans.set_data_ptr(reinterpret_cast<unsigned char*>(&data)); trans.set_data_length(sizeof(data)); trans.set_streaming_width(sizeof(data)); trans.set_byte_enable_ptr(0); trans.set_dmi_allowed(false); trans.set_response_status(TLM_INCOMPLETE_RESPONSE); // Perform memory read socket->b_transport(trans, sc_time(10, SC_NS)); // Check if read was successful if (trans.is_response_error()) { cerr << "Error reading memory" << endl; } else { cout << "Data read from memory: " << data << endl; } // Perform other operations here... } // Constructor SC_CTOR(CPU) : socket("socket") { // Register method for execution SC_THREAD(execute); } }; // Top module SC_MODULE(Top) { Memory memory; CPU cpu; // Constructor SC_CTOR(Top) : memory("memory"), cpu("cpu") { // Connect CPU socket to memory socket cpu.socket.bind(memory.socket); } }; int sc_main(int argc, char* argv[]) { Top top("top"); // Simulate for 100 ns sc_start(100, SC_NS); return 0; } ``` - The `Memory` module represents an external memory with a simple target socket for TLM communication. It has a memory array and implements a `b_transport()` method to handle incoming read and write transactions. - The `CPU` module represents the CPU core with a simple initiator socket for TLM communication. It performs memory read operations by initiating read transactions using the `b_transport()` method. - The `Top` module instantiates both the `Memory` and `CPU` modules and connects their sockets together. - The simulation in `sc_main()` runs for 100 ns. > [!warning] > This section is under #development ### A RISC-V-Based Virtual Platform in SystemC As we commented before, the [[Semiconductors#RISC-V|RISC-V]] ecosystem is rapidly growing, ranging from HW, e.g., various HW implementations (free as well as commercial) to high-speed Instruction Set Simulators (ISSs) These ISSs facilitate functional verification of RTL implementations as well as early SW development to some extent. However, being designed predominantly for speed, they can hardly be extended to support further system-level use cases such as design space exploration, power/timing/performance validation, or analysis of complex HW/SW interactions. The RISC-V ecosystem already has various high-speed ISSs such as the reference simulator Spike, RISCV-QEMU, RV8, or DBT-RISE. They are mainly designed to simulate as fast as possible and predominantly employ [[Semiconductors#Dynamic Translation|dynamic binary translation]] (to x86_64) techniques. This is however a trade-off as accurately modeling power or timing information for instructions becomes much more challenging. The full-system simulator gem5, at the time of writing also has initial support for RISC-V. gem5 provides more detailed models of processors and memories and can in principle also be extended for accurate modeling of extra-functional properties. Renode is another full-system simulator with RISC-V support. Renode puts a particular focus on modeling and debugging multi-node networks of embedded systems. FORVIS and GRIFT are Haskell-based implementations that aim to provide an executable formalization of the RISC-V ISA to be used as a foundation for several (formal) analysis techniques. SAIL-RISCV aims to be another RISC-V formalization that is implemented in Sail, which is a special language for describing ISAs with support for the generation of simulator back-ends (in C and OCaml) as well as theorem-prover definitions. Commercial VP tools such as Synopsys Virtualizer or Mentor Vista might also support RISC-V but their implementation is proprietary. In this section, an open-source RISC-V-based Virtual Platform implemented in SystemC/TLM, which attempts to close the gap in virtual prototyping in the RISC-V ecosystem, is presented. The VP shown here^[https://github.com/vherdt/riscv-vp/] is implemented in SystemC and designed as an extensible and configurable platform around a RISC-V RV32/64IMAC (multi-)core with a generic bus system employing TLM 2.0 communication and support for the GNU toolchain with SW coverage measurement (GCOV) and debug capabilities (using GDB). A block diagram of the VP is shown in the figure below. ![](riscv_vp.png) > [!Figure] > RISC-V Virtual Platform Block Diagram (source: #ref/Herdt ) The simulated CPU core loads, decodes, and executes one instruction after another. RISCV compressed instructions are expanded on the fly into regular instructions in a pre-processing step before being passed to the normal execution unit. The VP provides a 32-bit and 64-bit core supporting the RISC-V RV32IMAC and RV64IMAC instruction set, respectively. Besides the mandatory Machine mode, each core implements the RISC-V Supervisor and User mode privilege levels and provides support for user mode interrupt and trap handling (N extension). This includes the Control and Status Register (CSR) for the corresponding privilege levels (as specified in the RISC-V privileged architecture specification) as well as instructions for interrupt handling and environment interaction. Multiple RISC-V cores can be integrated to build a multi-core platform. It is also possible to mix 32- and 64-bit cores. The Atomic extension provides instructions to enable synchronization of the cores. Each core is attached to the bus through a memory interface. Essentially, the memory interface translates load/store requests into [[Target Simulation (Virtual Platforms)#SystemC/TLM|TLM]] transactions and ensures that the atomic instructions are handled correctly. The TLM bus is responsible for routing transactions from an initiator, i.e., (bus) master, to a target. Therefore, all target components are attached to the TLM bus at specific non-overlapping address ranges. The bus will match the transaction address with the address ranges and dispatch the transaction accordingly to the matching target. Note that, in this process, the bus performs a global-to-local address translation in the transaction. For example, assume that a sensor component is mapped to the address range (start=0x50000000, end=0x50001000) and the transaction address is 0x50000010, then the bus will route the transaction to the sensor and change the transaction address to 0x00000010 before passing it on to the sensor. Thus the sensor works on local address ranges. The TLM bus supports multiple masters initiating transactions. Currently, the CPU core as well as the DMA controller are configured as bus masters. Please note that a single component can be both master and target, as the DMA controller receives transactions initiated by the CPU core to configure the source and destination address ranges and also initiates transactions by itself to perform the memory access operations without the CPU core. Traps and interrupts result in the CPU core performing a context switch to the trap/interrupt handler. Traps are raised to perform a system call or when an execution exception, e.g., invalid memory access is encountered. Two sources of interrupts are available: (1) local and (2) external. Essentially, there are two sources of local interrupts: SW as well as timer interrupts generated by the RISC-V-specific CLINT (Core Local INTerruptor). The timer is part of the CLINT, and the interrupt frequency can be configured for each core through memory-mapped I/O. CLINT also provides a memory-mapped register for each core to trigger a SW interrupt for the corresponding core. External interrupts are all remaining interrupts triggered by the various components in the system. To handle external interrupts, we provide the RISC-V-specific PLIC (Platform Level Interrupt Controller). PLIC will collect and prioritize all external interrupts and then route them to each CPU core one by one. The core that claims the interrupt first will process it. According to the RISC-V specification, external interrupts are processed with higher priority than local interrupts, and SW interrupts are higher prioritized than timer interrupts. The C/C++ library defines a set of system calls as an abstraction from the actual execution environment. For example, the `printf` function performs the formatting in platform-independent C code and finally invokes the `write` system call with a fixed char array. Typically, an embedded system provides a trap handler that redirects the write system call to a UART/terminal component. We also provide a SyscallHandler component to emulate system calls of the C/C++ library by redirecting them to the simulation host system. Our emulation layer for example allows us to open and read/write from/to files of the host system. We use this functionality for example to support SW coverage measurement with GCOV. The syscall handler can be called in one of two ways: (1) through a trap handler that redirects the system call to the syscall handler from SW using memory-mapped I/O (this approach enables a flexible redirection of selected system calls), or (2) directly intercept the system call (i.e., the RISC-V ECALL instruction) in the CPU core, instead of jumping to the trap handler. The behavior is configurable per core. The main function in the VP is responsible for instantiating, initializing, and connecting all components, i.e., setting up the architecture. An ELF loader is provided to parse and load an executable RISC-V ELF file into the memory and set up the program counter in the CPU core accordingly. Finally, the SystemC simulation is started. The ELF file is produced by the GNU toolchain by (cross-)compiling the application program and optionally linking it with the C/C++ standard library or other RISC-V libraries. The VP also supports a bare-metal execution environment without any additional libraries, whose code snippet is shown below. #### Simple Memory Simulation ```C #ifndef RISCV_ISA_MEMORY_H #define RISCV_ISA_MEMORY_H #include <stdint.h> #include <boost/iostreams/device/mapped_file.hpp> #include <iostream> #include "bus.h" #include <tlm_utils/simple_target_socket.h> #include <systemc> struct SimpleMemory : public sc_core::sc_module { tlm_utils::simple_target_socket<SimpleMemory> tsock; uint8_t *data; uint32_t size; bool read_only; SimpleMemory(sc_core::sc_module_name, uint32_t size, bool read_only = false) : data(new uint8_t[size]()), size(size), read_only(read_only) { tsock.register_b_transport(this, &SimpleMemory::transport); tsock.register_get_direct_mem_ptr(this, &SimpleMemory::get_direct_mem_ptr); tsock.register_transport_dbg(this, &SimpleMemory::transport_dbg); } void load_binary_file(const std::string &filename, unsigned addr) { boost::iostreams::mapped_file_source f(filename); assert(f.is_open()); write_data(addr, (const uint8_t *)f.data(), f.size()); } void write_data(unsigned addr, const uint8_t *src, unsigned num_bytes) { assert(addr + num_bytes <= size); memcpy(data + addr, src, num_bytes); } void read_data(unsigned addr, uint8_t *dst, unsigned num_bytes) { assert(addr + num_bytes <= size); memcpy(dst, data + addr, num_bytes); } void transport(tlm::tlm_generic_payload &trans, sc_core::sc_time &delay) { transport_dbg(trans); delay += sc_core::sc_time(10, sc_core::SC_NS); } unsigned transport_dbg(tlm::tlm_generic_payload &trans) { tlm::tlm_command cmd = trans.get_command(); unsigned addr = trans.get_address(); auto *ptr = trans.get_data_ptr(); auto len = trans.get_data_length(); assert(addr < size); if (cmd == tlm::TLM_WRITE_COMMAND) { write_data(addr, ptr, len); } else if (cmd == tlm::TLM_READ_COMMAND) { read_data(addr, ptr, len); } else { sc_assert(false && "unsupported tlm command"); } return len; } bool get_direct_mem_ptr(tlm::tlm_generic_payload &trans, tlm::tlm_dmi &dmi) { (void)trans; dmi.set_start_address(0); dmi.set_end_address(size); dmi.set_dmi_ptr(data); if (read_only) dmi.allow_read(); else dmi.allow_read_write(); return true; } }; #endif // RISCV_ISA_MEMORY_H ``` #### Instructions Set Simulation ```C #include "iss.h" // to save *cout* format setting, see *ISS::show* #include <boost/io/ios_state.hpp> // for safe down-cast #include <boost/lexical_cast.hpp> using namespace rv32; #define RAISE_ILLEGAL_INSTRUCTION() raise_trap(EXC_ILLEGAL_INSTR, instr.data()); #define REQUIRE_ISA(X) \ if (!(csrs.misa.reg & X)) \ RAISE_ILLEGAL_INSTRUCTION() #define RD instr.rd() #define RS1 instr.rs1() #define RS2 instr.rs2() #define RS3 instr.rs3() const char *regnames[] = { "zero (x0)", "ra (x1)", "sp (x2)", "gp (x3)", "tp (x4)", "t0 (x5)", "t1 (x6)", "t2 (x7)", "s0/fp(x8)", "s1 (x9)", "a0 (x10)", "a1 (x11)", "a2 (x12)", "a3 (x13)", "a4 (x14)", "a5 (x15)", "a6 (x16)", "a7 (x17)", "s2 (x18)", "s3 (x19)", "s4 (x20)", "s5 (x21)", "s6 (x22)", "s7 (x23)", "s8 (x24)", "s9 (x25)", "s10 (x26)", "s11 (x27)", "t3 (x28)", "t4 (x29)", "t5 (x30)", "t6 (x31)", }; int regcolors[] = { #if defined(COLOR_THEME_DARK) 0, 1, 2, 3, 4, 5, 6, 52, 8, 9, 53, 54, 55, 56, 57, 58, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, #elif defined(COLOR_THEME_LIGHT) 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 153, 154, 155, 156, 157, 158, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, #else #endif }; RegFile::RegFile() { memset(regs, 0, sizeof(regs)); } RegFile::RegFile(const RegFile &other) { memcpy(regs, other.regs, sizeof(regs)); } void RegFile::write(uint32_t index, int32_t value) { assert(index <= x31); assert(index != x0); regs[index] = value; } int32_t RegFile::read(uint32_t index) { if (index > x31) throw std::out_of_range("out-of-range register access"); return regs[index]; } uint32_t RegFile::shamt(uint32_t index) { assert(index <= x31); return BIT_RANGE(regs[index], 4, 0); } int32_t &RegFile::operator[](const uint32_t idx) { return regs[idx]; } #if defined(COLOR_THEME_LIGHT) || defined(COLOR_THEME_DARK) #define COLORFRMT "\e[38;5;%um%s\e[39m" #define COLORPRINT(fmt, data) fmt, data #else #define COLORFRMT "%s" #define COLORPRINT(fmt, data) data #endif void RegFile::show() { for (unsigned i = 0; i < NUM_REGS; ++i) { printf(COLORFRMT " = %8x\n", COLORPRINT(regcolors[i], regnames[i]), regs[i]); } } ISS::ISS(uint32_t hart_id, bool use_E_base_isa) : systemc_name("Core-" + std::to_string(hart_id)) { csrs.mhartid.reg = hart_id; if (use_E_base_isa) csrs.misa.select_E_base_isa(); sc_core::sc_time qt = tlm::tlm_global_quantum::instance().get(); cycle_time = sc_core::sc_time(10, sc_core::SC_NS); assert(qt >= cycle_time); assert(qt % cycle_time == sc_core::SC_ZERO_TIME); for (int i = 0; i < Opcode::NUMBER_OF_INSTRUCTIONS; ++i) instr_cycles[i] = cycle_time; const sc_core::sc_time memory_access_cycles = 4 * cycle_time; const sc_core::sc_time mul_div_cycles = 8 * cycle_time; instr_cycles[Opcode::LB] = memory_access_cycles; instr_cycles[Opcode::LBU] = memory_access_cycles; instr_cycles[Opcode::LH] = memory_access_cycles; instr_cycles[Opcode::LHU] = memory_access_cycles; instr_cycles[Opcode::LW] = memory_access_cycles; instr_cycles[Opcode::SB] = memory_access_cycles; instr_cycles[Opcode::SH] = memory_access_cycles; instr_cycles[Opcode::SW] = memory_access_cycles; instr_cycles[Opcode::MUL] = mul_div_cycles; instr_cycles[Opcode::MULH] = mul_div_cycles; instr_cycles[Opcode::MULHU] = mul_div_cycles; instr_cycles[Opcode::MULHSU] = mul_div_cycles; instr_cycles[Opcode::DIV] = mul_div_cycles; instr_cycles[Opcode::DIVU] = mul_div_cycles; instr_cycles[Opcode::REM] = mul_div_cycles; instr_cycles[Opcode::REMU] = mul_div_cycles; op = Opcode::UNDEF; } void ISS::exec_step() { assert(((pc & ~pc_alignment_mask()) == 0) && "misaligned instruction"); try { uint32_t mem_word = instr_mem->load_instr(pc); instr = Instruction(mem_word); } catch (SimulationTrap &e) { op = Opcode::UNDEF; instr = Instruction(0); throw; } if (instr.is_compressed()) { op = instr.decode_and_expand_compressed(RV32); pc += 2; if (op != Opcode::UNDEF) REQUIRE_ISA(C_ISA_EXT); } else { op = instr.decode_normal(RV32); pc += 4; } if (trace) { printf("core %2u: prv %1x: pc %8x: %s ", csrs.mhartid.reg, prv, last_pc, Opcode::mappingStr[op]); switch (Opcode::getType(op)) { case Opcode::Type::R: printf(COLORFRMT ", " COLORFRMT ", " COLORFRMT, COLORPRINT(regcolors[instr.rd()], regnames[instr.rd()]), COLORPRINT(regcolors[instr.rs1()], regnames[instr.rs1()]), COLORPRINT(regcolors[instr.rs2()], regnames[instr.rs2()])); break; case Opcode::Type::I: printf(COLORFRMT ", " COLORFRMT ", 0x%x", COLORPRINT(regcolors[instr.rd()], regnames[instr.rd()]), COLORPRINT(regcolors[instr.rs1()], regnames[instr.rs1()]), instr.I_imm()); break; case Opcode::Type::S: printf(COLORFRMT ", " COLORFRMT ", 0x%x", COLORPRINT(regcolors[instr.rs1()], regnames[instr.rs1()]), COLORPRINT(regcolors[instr.rs2()], regnames[instr.rs2()]), instr.S_imm()); break; case Opcode::Type::B: printf(COLORFRMT ", " COLORFRMT ", 0x%x", COLORPRINT(regcolors[instr.rs1()], regnames[instr.rs1()]), COLORPRINT(regcolors[instr.rs2()], regnames[instr.rs2()]), instr.B_imm()); break; case Opcode::Type::U: printf(COLORFRMT ", 0x%x", COLORPRINT(regcolors[instr.rd()], regnames[instr.rd()]), instr.U_imm()); break; case Opcode::Type::J: printf(COLORFRMT ", 0x%x", COLORPRINT(regcolors[instr.rd()], regnames[instr.rd()]), instr.J_imm()); break; default:; } puts(""); } switch (op) { case Opcode::UNDEF: if (trace) std::cout << "WARNING: unknown instruction '" << std::to_string(instr.data()) << "' at address '" << std::to_string(last_pc) << "'" << std::endl; raise_trap(EXC_ILLEGAL_INSTR, instr.data()); break; case Opcode::ADDI: regs[instr.rd()] = regs[instr.rs1()] + instr.I_imm(); break; case Opcode::SLTI: regs[instr.rd()] = regs[instr.rs1()] < instr.I_imm(); break; case Opcode::SLTIU: regs[instr.rd()] = ((uint32_t)regs[instr.rs1()]) < ((uint32_t)instr.I_imm()); break; case Opcode::XORI: regs[instr.rd()] = regs[instr.rs1()] ^ instr.I_imm(); break; case Opcode::ORI: regs[instr.rd()] = regs[instr.rs1()] | instr.I_imm(); break; case Opcode::ANDI: regs[instr.rd()] = regs[instr.rs1()] & instr.I_imm(); break; case Opcode::ADD: regs[instr.rd()] = regs[instr.rs1()] + regs[instr.rs2()]; break; case Opcode::SUB: regs[instr.rd()] = regs[instr.rs1()] - regs[instr.rs2()]; break; case Opcode::SLL: regs[instr.rd()] = regs[instr.rs1()] << regs.shamt(instr.rs2()); break; case Opcode::SLT: regs[instr.rd()] = regs[instr.rs1()] < regs[instr.rs2()]; break; case Opcode::SLTU: regs[instr.rd()] = ((uint32_t)regs[instr.rs1()]) < ((uint32_t)regs[instr.rs2()]); break; case Opcode::SRL: regs[instr.rd()] = ((uint32_t)regs[instr.rs1()]) >> regs.shamt(instr.rs2()); break; case Opcode::SRA: regs[instr.rd()] = regs[instr.rs1()] >> regs.shamt(instr.rs2()); break; case Opcode::XOR: regs[instr.rd()] = regs[instr.rs1()] ^ regs[instr.rs2()]; break; case Opcode::OR: regs[instr.rd()] = regs[instr.rs1()] | regs[instr.rs2()]; break; case Opcode::AND: regs[instr.rd()] = regs[instr.rs1()] & regs[instr.rs2()]; break; case Opcode::SLLI: regs[instr.rd()] = regs[instr.rs1()] << instr.shamt(); break; case Opcode::SRLI: regs[instr.rd()] = ((uint32_t)regs[instr.rs1()]) >> instr.shamt(); break; case Opcode::SRAI: regs[instr.rd()] = regs[instr.rs1()] >> instr.shamt(); break; case Opcode::LUI: regs[instr.rd()] = instr.U_imm(); break; case Opcode::AUIPC: regs[instr.rd()] = last_pc + instr.U_imm(); break; case Opcode::JAL: { auto link = pc; pc = last_pc + instr.J_imm(); trap_check_pc_alignment(); regs[instr.rd()] = link; } break; case Opcode::JALR: { auto link = pc; pc = (regs[instr.rs1()] + instr.I_imm()) & ~1; trap_check_pc_alignment(); regs[instr.rd()] = link; } break; case Opcode::SB: { uint32_t addr = regs[instr.rs1()] + instr.S_imm(); mem->store_byte(addr, regs[instr.rs2()]); } break; case Opcode::SH: { uint32_t addr = regs[instr.rs1()] + instr.S_imm(); trap_check_addr_alignment<2, false>(addr); mem->store_half(addr, regs[instr.rs2()]); } break; case Opcode::SW: { uint32_t addr = regs[instr.rs1()] + instr.S_imm(); trap_check_addr_alignment<4, false>(addr); mem->store_word(addr, regs[instr.rs2()]); } break; case Opcode::LB: { uint32_t addr = regs[instr.rs1()] + instr.I_imm(); regs[instr.rd()] = mem->load_byte(addr); } break; case Opcode::LH: { uint32_t addr = regs[instr.rs1()] + instr.I_imm(); trap_check_addr_alignment<2, true>(addr); regs[instr.rd()] = mem->load_half(addr); } break; case Opcode::LW: { uint32_t addr = regs[instr.rs1()] + instr.I_imm(); trap_check_addr_alignment<4, true>(addr); regs[instr.rd()] = mem->load_word(addr); } break; case Opcode::LBU: { uint32_t addr = regs[instr.rs1()] + instr.I_imm(); regs[instr.rd()] = mem->load_ubyte(addr); } break; case Opcode::LHU: { uint32_t addr = regs[instr.rs1()] + instr.I_imm(); trap_check_addr_alignment<2, true>(addr); regs[instr.rd()] = mem->load_uhalf(addr); } break; case Opcode::BEQ: if (regs[instr.rs1()] == regs[instr.rs2()]) { pc = last_pc + instr.B_imm(); trap_check_pc_alignment(); } break; case Opcode::BNE: if (regs[instr.rs1()] != regs[instr.rs2()]) { pc = last_pc + instr.B_imm(); trap_check_pc_alignment(); } break; case Opcode::BLT: if (regs[instr.rs1()] < regs[instr.rs2()]) { pc = last_pc + instr.B_imm(); trap_check_pc_alignment(); } break; case Opcode::BGE: if (regs[instr.rs1()] >= regs[instr.rs2()]) { pc = last_pc + instr.B_imm(); trap_check_pc_alignment(); } break; case Opcode::BLTU: if ((uint32_t)regs[instr.rs1()] < (uint32_t)regs[instr.rs2()]) { pc = last_pc + instr.B_imm(); trap_check_pc_alignment(); } break; case Opcode::BGEU: if ((uint32_t)regs[instr.rs1()] >= (uint32_t)regs[instr.rs2()]) { pc = last_pc + instr.B_imm(); trap_check_pc_alignment(); } break; case Opcode::FENCE: case Opcode::FENCE_I: { // not using out of order execution so can be ignored } break; case Opcode::ECALL: { if (sys) { sys->execute_syscall(this); } else { switch (prv) { case MachineMode: raise_trap(EXC_ECALL_M_MODE, last_pc); break; case SupervisorMode: raise_trap(EXC_ECALL_S_MODE, last_pc); break; case UserMode: raise_trap(EXC_ECALL_U_MODE, last_pc); break; default: throw std::runtime_error("unknown privilege level " + std::to_string(prv)); } } } break; case Opcode::EBREAK: { // TODO: also raise trap and let the SW deal with it? status = CoreExecStatus::HitBreakpoint; } break; case Opcode::CSRRW: { auto addr = instr.csr(); if (is_invalid_csr_access(addr, true)) { RAISE_ILLEGAL_INSTRUCTION(); } else { auto rd = instr.rd(); auto rs1_val = regs[instr.rs1()]; if (rd != RegFile::zero) { regs[instr.rd()] = get_csr_value(addr); } set_csr_value(addr, rs1_val); } } break; case Opcode::CSRRS: { auto addr = instr.csr(); auto rs1 = instr.rs1(); auto write = rs1 != RegFile::zero; if (is_invalid_csr_access(addr, write)) { RAISE_ILLEGAL_INSTRUCTION(); } else { auto rd = instr.rd(); auto rs1_val = regs[rs1]; auto csr_val = get_csr_value(addr); if (rd != RegFile::zero) regs[rd] = csr_val; if (write) set_csr_value(addr, csr_val | rs1_val); } } break; case Opcode::CSRRC: { auto addr = instr.csr(); auto rs1 = instr.rs1(); auto write = rs1 != RegFile::zero; if (is_invalid_csr_access(addr, write)) { RAISE_ILLEGAL_INSTRUCTION(); } else { auto rd = instr.rd(); auto rs1_val = regs[rs1]; auto csr_val = get_csr_value(addr); if (rd != RegFile::zero) regs[rd] = csr_val; if (write) set_csr_value(addr, csr_val & ~rs1_val); } } break; case Opcode::CSRRWI: { auto addr = instr.csr(); if (is_invalid_csr_access(addr, true)) { RAISE_ILLEGAL_INSTRUCTION(); } else { auto rd = instr.rd(); if (rd != RegFile::zero) { regs[rd] = get_csr_value(addr); } set_csr_value(addr, instr.zimm()); } } break; case Opcode::CSRRSI: { auto addr = instr.csr(); auto zimm = instr.zimm(); auto write = zimm != 0; if (is_invalid_csr_access(addr, write)) { RAISE_ILLEGAL_INSTRUCTION(); } else { auto csr_val = get_csr_value(addr); auto rd = instr.rd(); if (rd != RegFile::zero) regs[rd] = csr_val; if (write) set_csr_value(addr, csr_val | zimm); } } break; case Opcode::CSRRCI: { auto addr = instr.csr(); auto zimm = instr.zimm(); auto write = zimm != 0; if (is_invalid_csr_access(addr, write)) { RAISE_ILLEGAL_INSTRUCTION(); } else { auto csr_val = get_csr_value(addr); auto rd = instr.rd(); if (rd != RegFile::zero) regs[rd] = csr_val; if (write) set_csr_value(addr, csr_val & ~zimm); } } break; case Opcode::MUL: { REQUIRE_ISA(M_ISA_EXT); int64_t ans = (int64_t)regs[instr.rs1()] * (int64_t)regs[instr.rs2()]; regs[instr.rd()] = ans & 0xFFFFFFFF; } break; case Opcode::MULH: { REQUIRE_ISA(M_ISA_EXT); int64_t ans = (int64_t)regs[instr.rs1()] * (int64_t)regs[instr.rs2()]; regs[instr.rd()] = (ans & 0xFFFFFFFF00000000) >> 32; } break; case Opcode::MULHU: { REQUIRE_ISA(M_ISA_EXT); int64_t ans = ((uint64_t)(uint32_t)regs[instr.rs1()]) * (uint64_t)((uint32_t)regs[instr.rs2()]); regs[instr.rd()] = (ans & 0xFFFFFFFF00000000) >> 32; } break; case Opcode::MULHSU: { REQUIRE_ISA(M_ISA_EXT); int64_t ans = (int64_t)regs[instr.rs1()] * (uint64_t)((uint32_t)regs[instr.rs2()]); regs[instr.rd()] = (ans & 0xFFFFFFFF00000000) >> 32; } break; case Opcode::DIV: { REQUIRE_ISA(M_ISA_EXT); auto a = regs[instr.rs1()]; auto b = regs[instr.rs2()]; if (b == 0) { regs[instr.rd()] = -1; } else if (a == REG_MIN && b == -1) { regs[instr.rd()] = a; } else { regs[instr.rd()] = a / b; } } break; case Opcode::DIVU: { REQUIRE_ISA(M_ISA_EXT); auto a = regs[instr.rs1()]; auto b = regs[instr.rs2()]; if (b == 0) { regs[instr.rd()] = -1; } else { regs[instr.rd()] = (uint32_t)a / (uint32_t)b; } } break; case Opcode::REM: { REQUIRE_ISA(M_ISA_EXT); auto a = regs[instr.rs1()]; auto b = regs[instr.rs2()]; if (b == 0) { regs[instr.rd()] = a; } else if (a == REG_MIN && b == -1) { regs[instr.rd()] = 0; } else { regs[instr.rd()] = a % b; } } break; case Opcode::REMU: { REQUIRE_ISA(M_ISA_EXT); auto a = regs[instr.rs1()]; auto b = regs[instr.rs2()]; if (b == 0) { regs[instr.rd()] = a; } else { regs[instr.rd()] = (uint32_t)a % (uint32_t)b; } } break; case Opcode::LR_W: { REQUIRE_ISA(A_ISA_EXT); uint32_t addr = regs[instr.rs1()]; trap_check_addr_alignment<4, true>(addr); regs[instr.rd()] = mem->atomic_load_reserved_word(addr); if (lr_sc_counter == 0) lr_sc_counter = 17; // this instruction + 16 additional ones, (an over-approximation) to cover the RISC-V forward progress property } break; case Opcode::SC_W: { REQUIRE_ISA(A_ISA_EXT); uint32_t addr = regs[instr.rs1()]; trap_check_addr_alignment<4, false>(addr); uint32_t val = regs[instr.rs2()]; regs[instr.rd()] = 1; // failure by default (in case a trap is thrown) regs[instr.rd()] = mem->atomic_store_conditional_word(addr, val) ? 0 : 1; // overwrite result (in case no trap is thrown) lr_sc_counter = 0; } break; case Opcode::AMOSWAP_W: { REQUIRE_ISA(A_ISA_EXT); execute_amo(instr, [](int32_t a, int32_t b) { (void)a; return b; }); } break; case Opcode::AMOADD_W: { REQUIRE_ISA(A_ISA_EXT); execute_amo(instr, [](int32_t a, int32_t b) { return a + b; }); } break; case Opcode::AMOXOR_W: { REQUIRE_ISA(A_ISA_EXT); execute_amo(instr, [](int32_t a, int32_t b) { return a ^ b; }); } break; case Opcode::AMOAND_W: { REQUIRE_ISA(A_ISA_EXT); execute_amo(instr, [](int32_t a, int32_t b) { return a & b; }); } break; case Opcode::AMOOR_W: { REQUIRE_ISA(A_ISA_EXT); execute_amo(instr, [](int32_t a, int32_t b) { return a | b; }); } break; case Opcode::AMOMIN_W: { REQUIRE_ISA(A_ISA_EXT); execute_amo(instr, [](int32_t a, int32_t b) { return std::min(a, b); }); } break; case Opcode::AMOMINU_W: { REQUIRE_ISA(A_ISA_EXT); execute_amo(instr, [](int32_t a, int32_t b) { return std::min((uint32_t)a, (uint32_t)b); }); } break; case Opcode::AMOMAX_W: { REQUIRE_ISA(A_ISA_EXT); execute_amo(instr, [](int32_t a, int32_t b) { return std::max(a, b); }); } break; case Opcode::AMOMAXU_W: { REQUIRE_ISA(A_ISA_EXT); execute_amo(instr, [](int32_t a, int32_t b) { return std::max((uint32_t)a, (uint32_t)b); }); } break; // RV32F Extension case Opcode::FLW: { REQUIRE_ISA(F_ISA_EXT); uint32_t addr = regs[instr.rs1()] + instr.I_imm(); trap_check_addr_alignment<4, true>(addr); fp_regs.write(RD, float32_t{(uint32_t)mem->load_word(addr)}); } break; case Opcode::FSW: { REQUIRE_ISA(F_ISA_EXT); uint32_t addr = regs[instr.rs1()] + instr.S_imm(); trap_check_addr_alignment<4, false>(addr); mem->store_word(addr, fp_regs.u32(RS2)); } break; case Opcode::FADD_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f32_add(fp_regs.f32(RS1), fp_regs.f32(RS2))); fp_finish_instr(); } break; case Opcode::FSUB_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f32_sub(fp_regs.f32(RS1), fp_regs.f32(RS2))); fp_finish_instr(); } break; case Opcode::FMUL_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f32_mul(fp_regs.f32(RS1), fp_regs.f32(RS2))); fp_finish_instr(); } break; case Opcode::FDIV_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f32_div(fp_regs.f32(RS1), fp_regs.f32(RS2))); fp_finish_instr(); } break; case Opcode::FSQRT_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f32_sqrt(fp_regs.f32(RS1))); fp_finish_instr(); } break; case Opcode::FMIN_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); bool rs1_smaller = f32_lt_quiet(fp_regs.f32(RS1), fp_regs.f32(RS2)) || (f32_eq(fp_regs.f32(RS1), fp_regs.f32(RS2)) && f32_isNegative(fp_regs.f32(RS1))); if (f32_isNaN(fp_regs.f32(RS1)) && f32_isNaN(fp_regs.f32(RS2))) { fp_regs.write(RD, f32_defaultNaN); } else { if (rs1_smaller) fp_regs.write(RD, fp_regs.f32(RS1)); else fp_regs.write(RD, fp_regs.f32(RS2)); } fp_finish_instr(); } break; case Opcode::FMAX_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); bool rs1_greater = f32_lt_quiet(fp_regs.f32(RS2), fp_regs.f32(RS1)) || (f32_eq(fp_regs.f32(RS2), fp_regs.f32(RS1)) && f32_isNegative(fp_regs.f32(RS2))); if (f32_isNaN(fp_regs.f32(RS1)) && f32_isNaN(fp_regs.f32(RS2))) { fp_regs.write(RD, f32_defaultNaN); } else { if (rs1_greater) fp_regs.write(RD, fp_regs.f32(RS1)); else fp_regs.write(RD, fp_regs.f32(RS2)); } fp_finish_instr(); } break; case Opcode::FMADD_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f32_mulAdd(fp_regs.f32(RS1), fp_regs.f32(RS2), fp_regs.f32(RS3))); fp_finish_instr(); } break; case Opcode::FMSUB_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f32_mulAdd(fp_regs.f32(RS1), fp_regs.f32(RS2), f32_neg(fp_regs.f32(RS3)))); fp_finish_instr(); } break; case Opcode::FNMADD_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f32_mulAdd(f32_neg(fp_regs.f32(RS1)), fp_regs.f32(RS2), f32_neg(fp_regs.f32(RS3)))); fp_finish_instr(); } break; case Opcode::FNMSUB_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f32_mulAdd(f32_neg(fp_regs.f32(RS1)), fp_regs.f32(RS2), fp_regs.f32(RS3))); fp_finish_instr(); } break; case Opcode::FCVT_W_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); regs[RD] = f32_to_i32(fp_regs.f32(RS1), softfloat_roundingMode, true); fp_finish_instr(); } break; case Opcode::FCVT_WU_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); regs[RD] = f32_to_ui32(fp_regs.f32(RS1), softfloat_roundingMode, true); fp_finish_instr(); } break; case Opcode::FCVT_S_W: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, i32_to_f32(regs[RS1])); fp_finish_instr(); } break; case Opcode::FCVT_S_WU: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, ui32_to_f32(regs[RS1])); fp_finish_instr(); } break; case Opcode::FSGNJ_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); auto f1 = fp_regs.f32(RS1); auto f2 = fp_regs.f32(RS2); fp_regs.write(RD, float32_t{(f1.v & ~F32_SIGN_BIT) | (f2.v & F32_SIGN_BIT)}); fp_set_dirty(); } break; case Opcode::FSGNJN_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); auto f1 = fp_regs.f32(RS1); auto f2 = fp_regs.f32(RS2); fp_regs.write(RD, float32_t{(f1.v & ~F32_SIGN_BIT) | (~f2.v & F32_SIGN_BIT)}); fp_set_dirty(); } break; case Opcode::FSGNJX_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); auto f1 = fp_regs.f32(RS1); auto f2 = fp_regs.f32(RS2); fp_regs.write(RD, float32_t{f1.v ^ (f2.v & F32_SIGN_BIT)}); fp_set_dirty(); } break; case Opcode::FMV_W_X: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); fp_regs.write(RD, float32_t{(uint32_t)regs[RS1]}); fp_set_dirty(); } break; case Opcode::FMV_X_W: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); regs[RD] = fp_regs.u32(RS1); } break; case Opcode::FEQ_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); regs[RD] = f32_eq(fp_regs.f32(RS1), fp_regs.f32(RS2)); fp_update_exception_flags(); } break; case Opcode::FLT_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); regs[RD] = f32_lt(fp_regs.f32(RS1), fp_regs.f32(RS2)); fp_update_exception_flags(); } break; case Opcode::FLE_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); regs[RD] = f32_le(fp_regs.f32(RS1), fp_regs.f32(RS2)); fp_update_exception_flags(); } break; case Opcode::FCLASS_S: { REQUIRE_ISA(F_ISA_EXT); fp_prepare_instr(); regs[RD] = f32_classify(fp_regs.f32(RS1)); } break; // RV32D Extension case Opcode::FLD: { REQUIRE_ISA(D_ISA_EXT); uint32_t addr = regs[instr.rs1()] + instr.I_imm(); trap_check_addr_alignment<8, true>(addr); fp_regs.write(RD, float64_t{(uint64_t)mem->load_double(addr)}); } break; case Opcode::FSD: { REQUIRE_ISA(D_ISA_EXT); uint32_t addr = regs[instr.rs1()] + instr.S_imm(); trap_check_addr_alignment<8, false>(addr); mem->store_double(addr, fp_regs.f64(RS2).v); } break; case Opcode::FADD_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f64_add(fp_regs.f64(RS1), fp_regs.f64(RS2))); fp_finish_instr(); } break; case Opcode::FSUB_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f64_sub(fp_regs.f64(RS1), fp_regs.f64(RS2))); fp_finish_instr(); } break; case Opcode::FMUL_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f64_mul(fp_regs.f64(RS1), fp_regs.f64(RS2))); fp_finish_instr(); } break; case Opcode::FDIV_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f64_div(fp_regs.f64(RS1), fp_regs.f64(RS2))); fp_finish_instr(); } break; case Opcode::FSQRT_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f64_sqrt(fp_regs.f64(RS1))); fp_finish_instr(); } break; case Opcode::FMIN_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); bool rs1_smaller = f64_lt_quiet(fp_regs.f64(RS1), fp_regs.f64(RS2)) || (f64_eq(fp_regs.f64(RS1), fp_regs.f64(RS2)) && f64_isNegative(fp_regs.f64(RS1))); if (f64_isNaN(fp_regs.f64(RS1)) && f64_isNaN(fp_regs.f64(RS2))) { fp_regs.write(RD, f64_defaultNaN); } else { if (rs1_smaller) fp_regs.write(RD, fp_regs.f64(RS1)); else fp_regs.write(RD, fp_regs.f64(RS2)); } fp_finish_instr(); } break; case Opcode::FMAX_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); bool rs1_greater = f64_lt_quiet(fp_regs.f64(RS2), fp_regs.f64(RS1)) || (f64_eq(fp_regs.f64(RS2), fp_regs.f64(RS1)) && f64_isNegative(fp_regs.f64(RS2))); if (f64_isNaN(fp_regs.f64(RS1)) && f64_isNaN(fp_regs.f64(RS2))) { fp_regs.write(RD, f64_defaultNaN); } else { if (rs1_greater) fp_regs.write(RD, fp_regs.f64(RS1)); else fp_regs.write(RD, fp_regs.f64(RS2)); } fp_finish_instr(); } break; case Opcode::FMADD_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f64_mulAdd(fp_regs.f64(RS1), fp_regs.f64(RS2), fp_regs.f64(RS3))); fp_finish_instr(); } break; case Opcode::FMSUB_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f64_mulAdd(fp_regs.f64(RS1), fp_regs.f64(RS2), f64_neg(fp_regs.f64(RS3)))); fp_finish_instr(); } break; case Opcode::FNMADD_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f64_mulAdd(f64_neg(fp_regs.f64(RS1)), fp_regs.f64(RS2), f64_neg(fp_regs.f64(RS3)))); fp_finish_instr(); } break; case Opcode::FNMSUB_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f64_mulAdd(f64_neg(fp_regs.f64(RS1)), fp_regs.f64(RS2), fp_regs.f64(RS3))); fp_finish_instr(); } break; case Opcode::FSGNJ_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); auto f1 = fp_regs.f64(RS1); auto f2 = fp_regs.f64(RS2); fp_regs.write(RD, float64_t{(f1.v & ~F64_SIGN_BIT) | (f2.v & F64_SIGN_BIT)}); fp_set_dirty(); } break; case Opcode::FSGNJN_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); auto f1 = fp_regs.f64(RS1); auto f2 = fp_regs.f64(RS2); fp_regs.write(RD, float64_t{(f1.v & ~F64_SIGN_BIT) | (~f2.v & F64_SIGN_BIT)}); fp_set_dirty(); } break; case Opcode::FSGNJX_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); auto f1 = fp_regs.f64(RS1); auto f2 = fp_regs.f64(RS2); fp_regs.write(RD, float64_t{f1.v ^ (f2.v & F64_SIGN_BIT)}); fp_set_dirty(); } break; case Opcode::FCVT_S_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f64_to_f32(fp_regs.f64(RS1))); fp_finish_instr(); } break; case Opcode::FCVT_D_S: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, f32_to_f64(fp_regs.f32(RS1))); fp_finish_instr(); } break; case Opcode::FEQ_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); regs[RD] = f64_eq(fp_regs.f64(RS1), fp_regs.f64(RS2)); fp_update_exception_flags(); } break; case Opcode::FLT_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); regs[RD] = f64_lt(fp_regs.f64(RS1), fp_regs.f64(RS2)); fp_update_exception_flags(); } break; case Opcode::FLE_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); regs[RD] = f64_le(fp_regs.f64(RS1), fp_regs.f64(RS2)); fp_update_exception_flags(); } break; case Opcode::FCLASS_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); regs[RD] = (int64_t)f64_classify(fp_regs.f64(RS1)); } break; case Opcode::FCVT_W_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); regs[RD] = f64_to_i32(fp_regs.f64(RS1), softfloat_roundingMode, true); fp_finish_instr(); } break; case Opcode::FCVT_WU_D: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); regs[RD] = (int32_t)f64_to_ui32(fp_regs.f64(RS1), softfloat_roundingMode, true); fp_finish_instr(); } break; case Opcode::FCVT_D_W: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, i32_to_f64((int32_t)regs[RS1])); fp_finish_instr(); } break; case Opcode::FCVT_D_WU: { REQUIRE_ISA(D_ISA_EXT); fp_prepare_instr(); fp_setup_rm(); fp_regs.write(RD, ui32_to_f64((int32_t)regs[RS1])); fp_finish_instr(); } break; // privileged instructions case Opcode::WFI: // NOTE: only a hint, can be implemented as NOP // std::cout << "[sim:wfi] CSR mstatus.mie " << csrs.mstatus->mie << std::endl; release_lr_sc_reservation(); if (s_mode() && csrs.mstatus.tw) raise_trap(EXC_ILLEGAL_INSTR, instr.data()); if (u_mode() && csrs.misa.has_supervisor_mode_extension()) raise_trap(EXC_ILLEGAL_INSTR, instr.data()); if (!ignore_wfi && !has_local_pending_enabled_interrupts()) sc_core::wait(wfi_event); break; case Opcode::SFENCE_VMA: if (s_mode() && csrs.mstatus.tvm) raise_trap(EXC_ILLEGAL_INSTR, instr.data()); mem->flush_tlb(); break; case Opcode::URET: if (!csrs.misa.has_user_mode_extension()) raise_trap(EXC_ILLEGAL_INSTR, instr.data()); return_from_trap_handler(UserMode); break; case Opcode::SRET: if (!csrs.misa.has_supervisor_mode_extension() || (s_mode() && csrs.mstatus.tsr)) raise_trap(EXC_ILLEGAL_INSTR, instr.data()); return_from_trap_handler(SupervisorMode); break; case Opcode::MRET: return_from_trap_handler(MachineMode); break; // instructions accepted by decoder but not by this RV32IMACF ISS -> do normal trap // RV64I case Opcode::LWU: case Opcode::LD: case Opcode::SD: case Opcode::ADDIW: case Opcode::SLLIW: case Opcode::SRLIW: case Opcode::SRAIW: case Opcode::ADDW: case Opcode::SUBW: case Opcode::SLLW: case Opcode::SRLW: case Opcode::SRAW: // RV64M case Opcode::MULW: case Opcode::DIVW: case Opcode::DIVUW: case Opcode::REMW: case Opcode::REMUW: // RV64A case Opcode::LR_D: case Opcode::SC_D: case Opcode::AMOSWAP_D: case Opcode::AMOADD_D: case Opcode::AMOXOR_D: case Opcode::AMOAND_D: case Opcode::AMOOR_D: case Opcode::AMOMIN_D: case Opcode::AMOMAX_D: case Opcode::AMOMINU_D: case Opcode::AMOMAXU_D: // RV64F case Opcode::FCVT_L_S: case Opcode::FCVT_LU_S: case Opcode::FCVT_S_L: case Opcode::FCVT_S_LU: // RV64D case Opcode::FCVT_L_D: case Opcode::FCVT_LU_D: case Opcode::FMV_X_D: case Opcode::FCVT_D_L: case Opcode::FCVT_D_LU: case Opcode::FMV_D_X: RAISE_ILLEGAL_INSTRUCTION(); default: throw std::runtime_error("unknown opcode"); } } uint64_t ISS::_compute_and_get_current_cycles() { assert(cycle_counter % cycle_time == sc_core::SC_ZERO_TIME); assert(cycle_counter.value() % cycle_time.value() == 0); uint64_t num_cycles = cycle_counter.value() / cycle_time.value(); return num_cycles; } bool ISS::is_invalid_csr_access(uint32_t csr_addr, bool is_write) { if (csr_addr == csr::FFLAGS_ADDR || csr_addr == csr::FRM_ADDR || csr_addr == csr::FCSR_ADDR) { REQUIRE_ISA(F_ISA_EXT); } PrivilegeLevel csr_prv = (0x300 & csr_addr) >> 8; bool csr_readonly = ((0xC00 & csr_addr) >> 10) == 3; bool s_invalid = (csr_prv == SupervisorMode) && !csrs.misa.has_supervisor_mode_extension(); bool u_invalid = (csr_prv == UserMode) && !csrs.misa.has_user_mode_extension(); return (is_write && csr_readonly) || (prv < csr_prv) || s_invalid || u_invalid; } void ISS::validate_csr_counter_read_access_rights(uint32_t addr) { // match against counter CSR addresses, see RISC-V privileged spec for the address definitions if ((addr >= 0xC00 && addr <= 0xC1F) || (addr >= 0xC80 && addr <= 0xC9F)) { auto cnt = addr & 0x1F; // 32 counter in total, naturally aligned with the mcounteren and scounteren CSRs if (s_mode() && !csr::is_bitset(csrs.mcounteren, cnt)) RAISE_ILLEGAL_INSTRUCTION(); if (u_mode() && (!csr::is_bitset(csrs.mcounteren, cnt) || !csr::is_bitset(csrs.scounteren, cnt))) RAISE_ILLEGAL_INSTRUCTION(); } } uint32_t ISS::get_csr_value(uint32_t addr) { validate_csr_counter_read_access_rights(addr); auto read = [=](auto &x, uint32_t mask) { return x.reg & mask; }; using namespace csr; switch (addr) { case TIME_ADDR: case MTIME_ADDR: { uint64_t mtime = clint->update_and_get_mtime(); csrs.time.reg = mtime; return csrs.time.low; } case TIMEH_ADDR: case MTIMEH_ADDR: { uint64_t mtime = clint->update_and_get_mtime(); csrs.time.reg = mtime; return csrs.time.high; } case MCYCLE_ADDR: csrs.cycle.reg = _compute_and_get_current_cycles(); return csrs.cycle.low; case MCYCLEH_ADDR: csrs.cycle.reg = _compute_and_get_current_cycles(); return csrs.cycle.high; case MINSTRET_ADDR: return csrs.instret.low; case MINSTRETH_ADDR: return csrs.instret.high; SWITCH_CASE_MATCH_ANY_HPMCOUNTER_RV32: // not implemented return 0; case MSTATUS_ADDR: return read(csrs.mstatus, MSTATUS_MASK); case SSTATUS_ADDR: return read(csrs.mstatus, SSTATUS_MASK); case USTATUS_ADDR: return read(csrs.mstatus, USTATUS_MASK); case MIP_ADDR: return read(csrs.mip, MIP_READ_MASK); case SIP_ADDR: return read(csrs.mip, SIP_MASK); case UIP_ADDR: return read(csrs.mip, UIP_MASK); case MIE_ADDR: return read(csrs.mie, MIE_MASK); case SIE_ADDR: return read(csrs.mie, SIE_MASK); case UIE_ADDR: return read(csrs.mie, UIE_MASK); case SATP_ADDR: if (csrs.mstatus.tvm) RAISE_ILLEGAL_INSTRUCTION(); break; case FCSR_ADDR: return read(csrs.fcsr, FCSR_MASK); case FFLAGS_ADDR: return csrs.fcsr.fflags; case FRM_ADDR: return csrs.fcsr.frm; // debug CSRs not supported, thus hardwired case TSELECT_ADDR: return 1; // if a zero write by SW is preserved, then debug mode is supported (thus hardwire to non-zero) case TDATA1_ADDR: case TDATA2_ADDR: case TDATA3_ADDR: case DCSR_ADDR: case DPC_ADDR: case DSCRATCH0_ADDR: case DSCRATCH1_ADDR: return 0; } if (!csrs.is_valid_csr32_addr(addr)) RAISE_ILLEGAL_INSTRUCTION(); return csrs.default_read32(addr); } void ISS::set_csr_value(uint32_t addr, uint32_t value) { auto write = [=](auto &x, uint32_t mask) { x.reg = (x.reg & ~mask) | (value & mask); }; using namespace csr; switch (addr) { case MISA_ADDR: // currently, read-only, thus cannot be changed at runtime SWITCH_CASE_MATCH_ANY_HPMCOUNTER_RV32: // not implemented break; case SATP_ADDR: { if (csrs.mstatus.tvm) RAISE_ILLEGAL_INSTRUCTION(); write(csrs.satp, SATP_MASK); // std::cout << "[iss] satp=" << boost::format("%x") % csrs.satp.reg << std::endl; } break; case MTVEC_ADDR: write(csrs.mtvec, MTVEC_MASK); break; case STVEC_ADDR: write(csrs.stvec, MTVEC_MASK); break; case UTVEC_ADDR: write(csrs.utvec, MTVEC_MASK); break; case MEPC_ADDR: write(csrs.mepc, pc_alignment_mask()); break; case SEPC_ADDR: write(csrs.sepc, pc_alignment_mask()); break; case UEPC_ADDR: write(csrs.uepc, pc_alignment_mask()); break; case MSTATUS_ADDR: write(csrs.mstatus, MSTATUS_MASK); break; case SSTATUS_ADDR: write(csrs.mstatus, SSTATUS_MASK); break; case USTATUS_ADDR: write(csrs.mstatus, USTATUS_MASK); break; case MIP_ADDR: write(csrs.mip, MIP_WRITE_MASK); break; case SIP_ADDR: write(csrs.mip, SIP_MASK); break; case UIP_ADDR: write(csrs.mip, UIP_MASK); break; case MIE_ADDR: write(csrs.mie, MIE_MASK); break; case SIE_ADDR: write(csrs.mie, SIE_MASK); break; case UIE_ADDR: write(csrs.mie, UIE_MASK); break; case MIDELEG_ADDR: write(csrs.mideleg, MIDELEG_MASK); break; case MEDELEG_ADDR: write(csrs.medeleg, MEDELEG_MASK); break; case SIDELEG_ADDR: write(csrs.sideleg, SIDELEG_MASK); break; case SEDELEG_ADDR: write(csrs.sedeleg, SEDELEG_MASK); break; case MCOUNTEREN_ADDR: write(csrs.mcounteren, MCOUNTEREN_MASK); break; case SCOUNTEREN_ADDR: write(csrs.scounteren, MCOUNTEREN_MASK); break; case MCOUNTINHIBIT_ADDR: write(csrs.mcountinhibit, MCOUNTINHIBIT_MASK); break; case FCSR_ADDR: write(csrs.fcsr, FCSR_MASK); break; case FFLAGS_ADDR: csrs.fcsr.fflags = value; break; case FRM_ADDR: csrs.fcsr.frm = value; break; // debug CSRs not supported, thus hardwired case TSELECT_ADDR: case TDATA1_ADDR: case TDATA2_ADDR: case TDATA3_ADDR: case DCSR_ADDR: case DPC_ADDR: case DSCRATCH0_ADDR: case DSCRATCH1_ADDR: break; default: if (!csrs.is_valid_csr32_addr(addr)) RAISE_ILLEGAL_INSTRUCTION(); csrs.default_write32(addr, value); } } void ISS::init(instr_memory_if *instr_mem, data_memory_if *data_mem, clint_if *clint, uint32_t entrypoint, uint32_t sp) { this->instr_mem = instr_mem; this->mem = data_mem; this->clint = clint; regs[RegFile::sp] = sp; pc = entrypoint; } void ISS::sys_exit() { shall_exit = true; } unsigned ISS::get_syscall_register_index() { if (csrs.misa.has_E_base_isa()) return RegFile::a5; else return RegFile::a7; } uint64_t ISS::read_register(unsigned idx) { return (uint32_t)regs.read(idx); //NOTE: zero extend } void ISS::write_register(unsigned idx, uint64_t value) { regs.write(idx, boost::lexical_cast<uint32_t>(value)); } uint64_t ISS::get_progam_counter(void) { return pc; } void ISS::block_on_wfi(bool block) { ignore_wfi = !block; } CoreExecStatus ISS::get_status(void) { return status; } void ISS::set_status(CoreExecStatus s) { status = s; } void ISS::enable_debug(void) { debug_mode = true; } void ISS::insert_breakpoint(uint64_t addr) { breakpoints.insert(addr); } void ISS::remove_breakpoint(uint64_t addr) { breakpoints.erase(addr); } uint64_t ISS::get_hart_id() { return csrs.mhartid.reg; } std::vector<uint64_t> ISS::get_registers(void) { std::vector<uint64_t> regvals; for (auto v : regs.regs) regvals.push_back((uint32_t)v); //NOTE: zero extend return regvals; } void ISS::fp_finish_instr() { fp_set_dirty(); fp_update_exception_flags(); } void ISS::fp_prepare_instr() { assert(softfloat_exceptionFlags == 0); fp_require_not_off(); } void ISS::fp_set_dirty() { csrs.mstatus.sd = 1; csrs.mstatus.fs = FS_DIRTY; } void ISS::fp_update_exception_flags() { if (softfloat_exceptionFlags) { fp_set_dirty(); csrs.fcsr.fflags |= softfloat_exceptionFlags; softfloat_exceptionFlags = 0; } } void ISS::fp_setup_rm() { auto rm = instr.frm(); if (rm == FRM_DYN) rm = csrs.fcsr.frm; if (rm >= FRM_RMM) RAISE_ILLEGAL_INSTRUCTION(); softfloat_roundingMode = rm; } void ISS::fp_require_not_off() { if (csrs.mstatus.fs == FS_OFF) RAISE_ILLEGAL_INSTRUCTION(); } void ISS::return_from_trap_handler(PrivilegeLevel return_mode) { switch (return_mode) { case MachineMode: prv = csrs.mstatus.mpp; csrs.mstatus.mie = csrs.mstatus.mpie; csrs.mstatus.mpie = 1; pc = csrs.mepc.reg; if (csrs.misa.has_user_mode_extension()) csrs.mstatus.mpp = UserMode; else csrs.mstatus.mpp = MachineMode; break; case SupervisorMode: prv = csrs.mstatus.spp; csrs.mstatus.sie = csrs.mstatus.spie; csrs.mstatus.spie = 1; pc = csrs.sepc.reg; if (csrs.misa.has_user_mode_extension()) csrs.mstatus.spp = UserMode; else csrs.mstatus.spp = SupervisorMode; break; case UserMode: prv = UserMode; csrs.mstatus.uie = csrs.mstatus.upie; csrs.mstatus.upie = 1; pc = csrs.uepc.reg; break; default: throw std::runtime_error("unknown privilege level " + std::to_string(return_mode)); } if (trace) printf("[vp::iss] return from trap handler, time %s, pc %8x, prv %1x\n", quantum_keeper.get_current_time().to_string().c_str(), pc, prv); } void ISS::trigger_external_interrupt(PrivilegeLevel level) { if (trace) std::cout << "[vp::iss] trigger external interrupt, " << sc_core::sc_time_stamp() << std::endl; switch (level) { case UserMode: csrs.mip.ueip = true; break; case SupervisorMode: csrs.mip.seip = true; break; case MachineMode: csrs.mip.meip = true; break; } wfi_event.notify(sc_core::SC_ZERO_TIME); } void ISS::clear_external_interrupt(PrivilegeLevel level) { if (trace) std::cout << "[vp::iss] clear external interrupt, " << sc_core::sc_time_stamp() << std::endl; switch (level) { case UserMode: csrs.mip.ueip = false; break; case SupervisorMode: csrs.mip.seip = false; break; case MachineMode: csrs.mip.meip = false; break; } } void ISS::trigger_timer_interrupt(bool status) { if (trace) std::cout << "[vp::iss] trigger timer interrupt=" << status << ", " << sc_core::sc_time_stamp() << std::endl; csrs.mip.mtip = status; wfi_event.notify(sc_core::SC_ZERO_TIME); } void ISS::trigger_software_interrupt(bool status) { if (trace) std::cout << "[vp::iss] trigger software interrupt=" << status << ", " << sc_core::sc_time_stamp() << std::endl; csrs.mip.msip = status; wfi_event.notify(sc_core::SC_ZERO_TIME); } PrivilegeLevel ISS::prepare_trap(SimulationTrap &e) { // undo any potential pc update (for traps the pc should point to the originating instruction and not it's // successor) pc = last_pc; unsigned exc_bit = (1 << e.reason); // 1) machine mode execution takes any traps, independent of delegation setting // 2) non-delegated traps are processed in machine mode, independent of current execution mode if (prv == MachineMode || !(exc_bit & csrs.medeleg.reg)) { csrs.mcause.interrupt = 0; csrs.mcause.exception_code = e.reason; csrs.mtval.reg = boost::lexical_cast<uint32_t>(e.mtval); return MachineMode; } // see above machine mode comment if (prv == SupervisorMode || !(exc_bit & csrs.sedeleg.reg)) { csrs.scause.interrupt = 0; csrs.scause.exception_code = e.reason; csrs.stval.reg = boost::lexical_cast<uint32_t>(e.mtval); return SupervisorMode; } assert(prv == UserMode && (exc_bit & csrs.medeleg.reg) && (exc_bit & csrs.sedeleg.reg)); csrs.ucause.interrupt = 0; csrs.ucause.exception_code = e.reason; csrs.utval.reg = boost::lexical_cast<uint32_t>(e.mtval); return UserMode; } void ISS::prepare_interrupt(const PendingInterrupts &e) { if (trace) { std::cout << "[vp::iss] prepare interrupt, pending=" << e.pending << ", target-mode=" << e.target_mode << std::endl; } csr_mip x{e.pending}; ExceptionCode exc; if (x.meip) exc = EXC_M_EXTERNAL_INTERRUPT; else if (x.msip) exc = EXC_M_SOFTWARE_INTERRUPT; else if (x.mtip) exc = EXC_M_TIMER_INTERRUPT; else if (x.seip) exc = EXC_S_EXTERNAL_INTERRUPT; else if (x.ssip) exc = EXC_S_SOFTWARE_INTERRUPT; else if (x.stip) exc = EXC_S_TIMER_INTERRUPT; else if (x.ueip) exc = EXC_U_EXTERNAL_INTERRUPT; else if (x.usip) exc = EXC_U_SOFTWARE_INTERRUPT; else if (x.utip) exc = EXC_U_TIMER_INTERRUPT; else throw std::runtime_error("some pending interrupt must be available here"); switch (e.target_mode) { case MachineMode: csrs.mcause.exception_code = exc; csrs.mcause.interrupt = 1; break; case SupervisorMode: csrs.scause.exception_code = exc; csrs.scause.interrupt = 1; break; case UserMode: csrs.ucause.exception_code = exc; csrs.ucause.interrupt = 1; break; default: throw std::runtime_error("unknown privilege level " + std::to_string(e.target_mode)); } } PendingInterrupts ISS::compute_pending_interrupts() { uint32_t pending = csrs.mie.reg & csrs.mip.reg; if (!pending) return {NoneMode, 0}; auto m_pending = pending & ~csrs.mideleg.reg; if (m_pending && (prv < MachineMode || (prv == MachineMode && csrs.mstatus.mie))) { return {MachineMode, m_pending}; } pending = pending & csrs.mideleg.reg; auto s_pending = pending & ~csrs.sideleg.reg; if (s_pending && (prv < SupervisorMode || (prv == SupervisorMode && csrs.mstatus.sie))) { return {SupervisorMode, s_pending}; } auto u_pending = pending & csrs.sideleg.reg; if (u_pending && (prv == UserMode && csrs.mstatus.uie)) { return {UserMode, u_pending}; } return {NoneMode, 0}; } void ISS::switch_to_trap_handler(PrivilegeLevel target_mode) { if (trace) { printf("[vp::iss] switch to trap handler, time %s, last_pc %8x, pc %8x, irq %u, t-prv %1x\n", quantum_keeper.get_current_time().to_string().c_str(), last_pc, pc, csrs.mcause.interrupt, target_mode); } // free any potential LR/SC bus lock before processing a trap/interrupt release_lr_sc_reservation(); auto pp = prv; prv = target_mode; switch (target_mode) { case MachineMode: csrs.mepc.reg = pc; csrs.mstatus.mpie = csrs.mstatus.mie; csrs.mstatus.mie = 0; csrs.mstatus.mpp = pp; pc = csrs.mtvec.get_base_address(); if (csrs.mcause.interrupt && csrs.mtvec.mode == csrs.mtvec.Vectored) pc += 4 * csrs.mcause.exception_code; break; case SupervisorMode: assert(prv == SupervisorMode || prv == UserMode); csrs.sepc.reg = pc; csrs.mstatus.spie = csrs.mstatus.sie; csrs.mstatus.sie = 0; csrs.mstatus.spp = pp; pc = csrs.stvec.get_base_address(); if (csrs.scause.interrupt && csrs.stvec.mode == csrs.stvec.Vectored) pc += 4 * csrs.scause.exception_code; break; case UserMode: assert(prv == UserMode); csrs.uepc.reg = pc; csrs.mstatus.upie = csrs.mstatus.uie; csrs.mstatus.uie = 0; pc = csrs.utvec.get_base_address(); if (csrs.ucause.interrupt && csrs.utvec.mode == csrs.utvec.Vectored) pc += 4 * csrs.ucause.exception_code; break; default: throw std::runtime_error("unknown privilege level " + std::to_string(target_mode)); } } void ISS::performance_and_sync_update(Opcode::Mapping executed_op) { ++total_num_instr; if (!csrs.mcountinhibit.IR) ++csrs.instret.reg; if (lr_sc_counter != 0) { --lr_sc_counter; assert (lr_sc_counter >= 0); if (lr_sc_counter == 0) release_lr_sc_reservation(); } auto new_cycles = instr_cycles[executed_op]; if (!csrs.mcountinhibit.CY) cycle_counter += new_cycles; quantum_keeper.inc(new_cycles); if (quantum_keeper.need_sync()) { if (lr_sc_counter == 0) // match SystemC sync with bus unlocking in a tight LR_W/SC_W loop quantum_keeper.sync(); } } void ISS::run_step() { assert(regs.read(0) == 0); // speeds up the execution performance (non debug mode) significantly by // checking the additional flag first if (debug_mode && (breakpoints.find(pc) != breakpoints.end())) { status = CoreExecStatus::HitBreakpoint; return; } last_pc = pc; try { exec_step(); auto x = compute_pending_interrupts(); if (x.target_mode != NoneMode) { prepare_interrupt(x); switch_to_trap_handler(x.target_mode); } } catch (SimulationTrap &e) { if (trace) std::cout << "take trap " << e.reason << ", mtval=" << e.mtval << std::endl; auto target_mode = prepare_trap(e); switch_to_trap_handler(target_mode); } // NOTE: writes to zero register are supposedly allowed but must be ignored // (reset it after every instruction, instead of checking *rd != zero* // before every register write) regs.regs[regs.zero] = 0; // Do not use a check *pc == last_pc* here. The reason is that due to // interrupts *pc* can be set to *last_pc* accidentally (when jumping back // to *mepc*). if (shall_exit) status = CoreExecStatus::Terminated; performance_and_sync_update(op); } void ISS::run() { // run a single step until either a breakpoint is hit or the execution // terminates do { run_step(); } while (status == CoreExecStatus::Runnable); // force sync to make sure that no action is missed quantum_keeper.sync(); } void ISS::show() { boost::io::ios_flags_saver ifs(std::cout); std::cout << "=[ core : " << csrs.mhartid.reg << " ]===========================" << std::endl; std::cout << "simulation time: " << sc_core::sc_time_stamp() << std::endl; regs.show(); std::cout << "pc = " << std::hex << pc << std::endl; std::cout << "num-instr = " << std::dec << csrs.instret.reg << std::endl; } ``` #### Peripheral Simulation Sensor simulation: ```C #ifndef RISCV_ISA_SENSOR_H #define RISCV_ISA_SENSOR_H #include <cstdlib> #include <cstring> #include <systemc> #include <tlm_utils/simple_target_socket.h> #include "core/common/irq_if.h" struct SimpleSensor : public sc_core::sc_module { tlm_utils::simple_target_socket<SimpleSensor> tsock; interrupt_gateway *plic = 0; uint32_t irq_number = 0; sc_core::sc_event run_event; // memory mapped data frame std::array<uint8_t, 64> data_frame; // memory mapped configuration registers uint32_t scaler = 25; uint32_t filter = 0; std::unordered_map<uint64_t, uint32_t *> addr_to_reg; enum { SCALER_REG_ADDR = 0x80, FILTER_REG_ADDR = 0x84, }; SC_HAS_PROCESS(SimpleSensor); SimpleSensor(sc_core::sc_module_name, uint32_t irq_number) : irq_number(irq_number) { tsock.register_b_transport(this, &SimpleSensor::transport); SC_THREAD(run); addr_to_reg = { {SCALER_REG_ADDR, &scaler}, {FILTER_REG_ADDR, &filter}, }; } void transport(tlm::tlm_generic_payload &trans, sc_core::sc_time &delay) { auto addr = trans.get_address(); auto cmd = trans.get_command(); auto len = trans.get_data_length(); auto ptr = trans.get_data_ptr(); if (addr <= 63) { // access data frame assert(cmd == tlm::TLM_READ_COMMAND); assert((addr + len) <= data_frame.size()); // return last generated random data at requested address memcpy(ptr, &data_frame[addr], len); } else { assert(len == 4); // NOTE: only allow to read/write whole register auto it = addr_to_reg.find(addr); assert(it != addr_to_reg.end()); // access to non-mapped address // trigger pre read/write actions if ((cmd == tlm::TLM_WRITE_COMMAND) && (addr == SCALER_REG_ADDR)) { uint32_t value = *((uint32_t *)ptr); if (value < 1 || value > 100) return; // ignore invalid values } // actual read/write if (cmd == tlm::TLM_READ_COMMAND) { *((uint32_t *)ptr) = *it->second; } else if (cmd == tlm::TLM_WRITE_COMMAND) { *it->second = *((uint32_t *)ptr); } else { assert(false && "unsupported tlm command for sensor access"); } // trigger post read/write actions if ((cmd == tlm::TLM_WRITE_COMMAND) && (addr == SCALER_REG_ADDR)) { run_event.cancel(); run_event.notify(sc_core::sc_time(scaler, sc_core::SC_MS)); } } (void)delay; // zero delay } void run() { while (true) { run_event.notify(sc_core::sc_time(scaler, sc_core::SC_MS)); sc_core::wait(run_event); // 40 times per second by default // fill with random data for (auto &n : data_frame) { if (filter == 1) { n = rand() % 10 + 48; } else if (filter == 2) { n = rand() % 26 + 65; } else { // fallback for all other filter values n = rand() % 92 + 32; // random printable char } } plic->gateway_trigger_interrupt(irq_number); } } }; #endif // RISCV_ISA_SENSOR_H ``` #### System Build ```C #include <cstdlib> #include <ctime> #include "basic_timer.h" #include "core/common/clint.h" #include "display.hpp" #include "dma.h" #include "elf_loader.h" #include "ethernet.h" #include "fe310_plic.h" #include "flash.h" #include "debug_memory.h" #include "iss.h" #include "mem.h" #include "memory.h" #include "mram.h" #include "sensor.h" #include "sensor2.h" #include "syscall.h" #include "terminal.h" #include "util/options.h" #include "platform/common/options.h" #include "gdb-mc/gdb_server.h" #include "gdb-mc/gdb_runner.h" #include <boost/io/ios_state.hpp> #include <boost/program_options.hpp> #include <iomanip> #include <iostream> using namespace rv32; namespace po = boost::program_options; class BasicOptions : public Options { public: typedef unsigned int addr_t; std::string mram_image; std::string flash_device; std::string network_device; std::string test_signature; addr_t mem_size = 1024 * 1024 * 32; // 32 MB ram, to place it before the CLINT and run the base examples (assume // memory start at zero) without modifications addr_t mem_start_addr = 0x00000000; addr_t mem_end_addr = mem_start_addr + mem_size - 1; addr_t clint_start_addr = 0x02000000; addr_t clint_end_addr = 0x0200ffff; addr_t sys_start_addr = 0x02010000; addr_t sys_end_addr = 0x020103ff; addr_t term_start_addr = 0x20000000; addr_t term_end_addr = term_start_addr + 16; addr_t ethernet_start_addr = 0x30000000; addr_t ethernet_end_addr = ethernet_start_addr + 1500; addr_t plic_start_addr = 0x40000000; addr_t plic_end_addr = 0x41000000; addr_t sensor_start_addr = 0x50000000; addr_t sensor_end_addr = 0x50001000; addr_t sensor2_start_addr = 0x50002000; addr_t sensor2_end_addr = 0x50004000; addr_t mram_start_addr = 0x60000000; addr_t mram_size = 0x10000000; addr_t mram_end_addr = mram_start_addr + mram_size - 1; addr_t dma_start_addr = 0x70000000; addr_t dma_end_addr = 0x70001000; addr_t flash_start_addr = 0x71000000; addr_t flash_end_addr = flash_start_addr + Flashcontroller::ADDR_SPACE; // Usually 528 Byte addr_t display_start_addr = 0x72000000; addr_t display_end_addr = display_start_addr + Display::addressRange; bool use_E_base_isa = false; OptionValue<unsigned long> entry_point; BasicOptions(void) { // clang-format off add_options() ("memory-start", po::value<unsigned int>(&mem_start_addr),"set memory start address") ("memory-size", po::value<unsigned int>(&mem_size), "set memory size") ("use-E-base-isa", po::bool_switch(&use_E_base_isa), "use the E instead of the I integer base ISA") ("entry-point", po::value<std::string>(&entry_point.option),"set entry point address (ISS program counter)") ("mram-image", po::value<std::string>(&mram_image)->default_value(""),"MRAM image file for persistency") ("mram-image-size", po::value<unsigned int>(&mram_size), "MRAM image size") ("flash-device", po::value<std::string>(&flash_device)->default_value(""),"blockdevice for flash emulation") ("network-device", po::value<std::string>(&network_device)->default_value(""),"name of the tap network adapter, e.g. /dev/tap6") ("signature", po::value<std::string>(&test_signature)->default_value(""),"output filename for the test execution signature"); // clang-format on } void parse(int argc, char **argv) override { Options::parse(argc, argv); entry_point.finalize(parse_ulong_option); mem_end_addr = mem_start_addr + mem_size - 1; assert((mem_end_addr < clint_start_addr || mem_start_addr > display_end_addr) && "RAM too big, would overlap memory"); mram_end_addr = mram_start_addr + mram_size - 1; assert(mram_end_addr < dma_start_addr && "MRAM too big, would overlap memory"); } }; int sc_main(int argc, char **argv) { BasicOptions opt; opt.parse(argc, argv); std::srand(std::time(nullptr)); // use current time as seed for random generator tlm::tlm_global_quantum::instance().set(sc_core::sc_time(opt.tlm_global_quantum, sc_core::SC_NS)); ISS core(0, opt.use_E_base_isa); SimpleMemory mem("SimpleMemory", opt.mem_size); SimpleTerminal term("SimpleTerminal"); ELFLoader loader(opt.input_program.c_str()); SimpleBus<3, 12> bus("SimpleBus"); CombinedMemoryInterface iss_mem_if("MemoryInterface", core); SyscallHandler sys("SyscallHandler"); FE310_PLIC<1, 64, 96, 32> plic("PLIC"); CLINT<1> clint("CLINT"); SimpleSensor sensor("SimpleSensor", 2); SimpleSensor2 sensor2("SimpleSensor2", 5); BasicTimer timer("BasicTimer", 3); SimpleMRAM mram("SimpleMRAM", opt.mram_image, opt.mram_size); SimpleDMA dma("SimpleDMA", 4); Flashcontroller flashController("Flashcontroller", opt.flash_device); EthernetDevice ethernet("EthernetDevice", 7, mem.data, opt.network_device); Display display("Display"); DebugMemoryInterface dbg_if("DebugMemoryInterface"); MemoryDMI dmi = MemoryDMI::create_start_size_mapping(mem.data, opt.mem_start_addr, mem.size); InstrMemoryProxy instr_mem(dmi, core); std::shared_ptr<BusLock> bus_lock = std::make_shared<BusLock>(); iss_mem_if.bus_lock = bus_lock; instr_memory_if *instr_mem_if = &iss_mem_if; data_memory_if *data_mem_if = &iss_mem_if; if (opt.use_instr_dmi) instr_mem_if = &instr_mem; if (opt.use_data_dmi) { iss_mem_if.dmi_ranges.emplace_back(dmi); } uint64_t entry_point = loader.get_entrypoint(); if (opt.entry_point.available) entry_point = opt.entry_point.value; loader.load_executable_image(mem.data, mem.size, opt.mem_start_addr); core.init(instr_mem_if, data_mem_if, &clint, entry_point, rv32_align_address(opt.mem_end_addr)); sys.init(mem.data, opt.mem_start_addr, loader.get_heap_addr()); sys.register_core(&core); if (opt.intercept_syscalls) core.sys = &sys; // address mapping bus.ports[0] = new PortMapping(opt.mem_start_addr, opt.mem_end_addr); bus.ports[1] = new PortMapping(opt.clint_start_addr, opt.clint_end_addr); bus.ports[2] = new PortMapping(opt.plic_start_addr, opt.plic_end_addr); bus.ports[3] = new PortMapping(opt.term_start_addr, opt.term_end_addr); bus.ports[4] = new PortMapping(opt.sensor_start_addr, opt.sensor_end_addr); bus.ports[5] = new PortMapping(opt.dma_start_addr, opt.dma_end_addr); bus.ports[6] = new PortMapping(opt.sensor2_start_addr, opt.sensor2_end_addr); bus.ports[7] = new PortMapping(opt.mram_start_addr, opt.mram_end_addr); bus.ports[8] = new PortMapping(opt.flash_start_addr, opt.flash_end_addr); bus.ports[9] = new PortMapping(opt.ethernet_start_addr, opt.ethernet_end_addr); bus.ports[10] = new PortMapping(opt.display_start_addr, opt.display_end_addr); bus.ports[11] = new PortMapping(opt.sys_start_addr, opt.sys_end_addr); // connect TLM sockets iss_mem_if.isock.bind(bus.tsocks[0]); dbg_if.isock.bind(bus.tsocks[2]); PeripheralWriteConnector dma_connector("SimpleDMA-Connector"); // to respect ISS bus locking dma_connector.isock.bind(bus.tsocks[1]); dma.isock.bind(dma_connector.tsock); dma_connector.bus_lock = bus_lock; bus.isocks[0].bind(mem.tsock); bus.isocks[1].bind(clint.tsock); bus.isocks[2].bind(plic.tsock); bus.isocks[3].bind(term.tsock); bus.isocks[4].bind(sensor.tsock); bus.isocks[5].bind(dma.tsock); bus.isocks[6].bind(sensor2.tsock); bus.isocks[7].bind(mram.tsock); bus.isocks[8].bind(flashController.tsock); bus.isocks[9].bind(ethernet.tsock); bus.isocks[10].bind(display.tsock); bus.isocks[11].bind(sys.tsock); // connect interrupt signals/communication plic.target_harts[0] = &core; clint.target_harts[0] = &core; sensor.plic = &plic; dma.plic = &plic; timer.plic = &plic; sensor2.plic = &plic; ethernet.plic = &plic; std::vector<debug_target_if *> threads; threads.push_back(&core); core.trace = opt.trace_mode; // switch for printing instructions if (opt.use_debug_runner) { auto server = new GDBServer("GDBServer", threads, &dbg_if, opt.debug_port); new GDBServerRunner("GDBRunner", server, &core); } else { new DirectCoreRunner(core); } sc_core::sc_start(); core.show(); if (opt.test_signature != "") { auto begin_sig = loader.get_begin_signature_address(); auto end_sig = loader.get_end_signature_address(); { boost::io::ios_flags_saver ifs(cout); std::cout << std::hex; std::cout << "begin_signature: " << begin_sig << std::endl; std::cout << "end_signature: " << end_sig << std::endl; std::cout << "signature output file: " << opt.test_signature << std::endl; } assert(end_sig >= begin_sig); assert(begin_sig >= opt.mem_start_addr); auto begin = begin_sig - opt.mem_start_addr; auto end = end_sig - opt.mem_start_addr; ofstream sigfile(opt.test_signature, ios::out); auto n = begin; while (n < end) { sigfile << std::hex << std::setw(2) << std::setfill('0') << (unsigned)mem.data[n]; ++n; } } return 0; } ``` > [!warning] > This section is under #development ## System Verilog SystemVerilog is a hardware description and verification language that extends the capabilities of [[Semiconductors#Using Verilog for Simulating Digital Logic Building Blocks|Verilog]], which is widely used for designing digital systems. It was standardized by the IEEE in 2005 as IEEE 1800 and has since become a prominent language in both digital design and verification domains. 1. **Integration of Hardware Description and Verification**: - One of the significant advancements introduced by SystemVerilog is the integration of hardware description and verification features into a single language. This integration allows engineers to describe both the design and verification aspects of a digital system within the same language framework, streamlining the development process. 2. **Enhanced Design Constructs**: - SystemVerilog introduces several new constructs and features to enhance the expressiveness and productivity of digital design. These include enhanced data types such as `bit`, `logic`, `byte`, and `shortint`, which provide more flexibility and clarity in specifying data widths and types. - Additionally, SystemVerilog introduces enumerated types (`enum`), structures (`struct`), and unions (`union`), allowing designers to organize and manage complex data structures more effectively. - Another important enhancement is the introduction of interfaces (`interface`), which provide a powerful mechanism for defining communication protocols between modules and components, promoting modularity and reusability in design. 3. **Concurrency Constructs**: - SystemVerilog introduces concurrency constructs that enable the modeling of concurrent behavior within digital systems. This includes the `fork-join` construct for parallel execution, `fork-join_none` for non-blocking parallel execution, and `fork-join_any` for parallel execution with synchronization. - SystemVerilog also introduces the `initial` and `always` blocks for specifying initial values and continuous behavior, respectively, providing more flexibility in modeling dynamic and reactive systems. 4. **Assertion-Based Verification**: - One of the significant additions in SystemVerilog is support for assertion-based verification (ABV). Assertions allow designers to specify properties and constraints on the behavior of their designs, facilitating formal verification and debugging. - SystemVerilog provides built-in assertion constructs such as `assert`, `assume`, `cover`, and `expect`, which can be used to specify properties, assumptions, coverage goals, and expected behavior of a design. 5. **Constrained Random Testing**: - SystemVerilog introduces constrained random testing (CRT), which is a powerful verification methodology for validating digital designs. CRT allows engineers to specify random stimuli for simulation while constraining the stimulus generation based on user-defined constraints. - This methodology is particularly useful for stress-testing designs by generating corner-case scenarios and exploring the behavior of the design under various operating conditions. 6. **Unified Verification Methodologies**: - SystemVerilog supports unified verification methodologies such as the Universal Verification Methodology (UVM), which provides a standardized framework for developing, organizing, and executing verification environments. - UVM leverages the features of SystemVerilog, including assertions, constrained random testing, and transaction-level modeling, to create robust and scalable verification environments for complex digital designs. ### RISC-V Core Implementation in SystemVerilog An example implementation of the RISC-V (RV32I) core in SystemVerilog is shown below. Code is written in a subset of SystemVerilog understood by [Yosys](http://www.clifford.at/yosys/), the open-source hardware synthesis framework, and [Verilator](https://www.veripool.org/wiki/verilator), an open-source Verilog to C++ compiler. The implementation is simple, and it's reasonably modularized so that schematics generated by synthesis tools are readable. Code snippets pulled from: https://github.com/tilk/riscv-simple-sv #### Register File ```Verilog // RISC-V SiMPLE SV -- register file // BSD 3-Clause License // (c) 2017-2019, Arthur Matos, Marcus Vinicius Lamar, Universidade de Brasília, // Marek Materzok, University of Wrocław `include "config.sv" `include "constants.sv" module regfile ( input clock, input write_enable, input [4:0] rd_address, input [4:0] rs1_address, input [4:0] rs2_address, input [31:0] rd_data, output [31:0] rs1_data, output [31:0] rs2_data ); // 32 registers of 32-bit width logic [31:0] register [0:31]; // Read ports for rs1 and rs2 assign rs1_data = register[rs1_address]; assign rs2_data = register[rs2_address]; // Register x0 is always 0 initial register[0] = 32'b0; // Write port for rd always_ff @(posedge clock) if (write_enable) if (rd_address != 5'b0) register[rd_address] <= rd_data; endmodule ``` #### ALU ```Verilog // RISC-V SiMPLE SV -- ALU module // BSD 3-Clause License // (c) 2017-2019, Arthur Matos, Marcus Vinicius Lamar, Universidade de Brasília, // Marek Materzok, University of Wrocław `include "config.sv" `include "constants.sv" module alu ( input [4:0] alu_function, input signed [31:0] operand_a, input signed [31:0] operand_b, output logic [31:0] result, output result_equal_zero ); `ifdef M_MODULE logic [63:0] signed_multiplication; logic [63:0] unsigned_multiplication; logic [63:0] signed_unsigned_multiplication; `endif assign result_equal_zero = (result == 32'b0); always_comb begin result = `ZERO; case (alu_function) `ALU_ADD: result = operand_a + operand_b; `ALU_SUB: result = operand_a - operand_b; `ALU_SLL: result = operand_a << operand_b[4:0]; `ALU_SRL: result = operand_a >> operand_b[4:0]; `ALU_SRA: result = operand_a >>> operand_b[4:0]; `ALU_SEQ: result = {31'b0, operand_a == operand_b}; `ALU_SLT: result = {31'b0, operand_a < operand_b}; `ALU_SLTU: result = {31'b0, $unsigned(operand_a) < $unsigned(operand_b)}; `ALU_XOR: result = operand_a ^ operand_b; `ALU_OR: result = operand_a | operand_b; `ALU_AND: result = operand_a & operand_b; `ifdef M_MODULE `ALU_MUL: result = signed_multiplication[31:0]; `ALU_MULH: result = signed_multiplication[63:32]; `ALU_MULHSU: result = signed_unsigned_multiplication[63:32]; `ALU_MULHU: result = unsigned_multiplication[63:32]; `ALU_DIV: if (operand_b == `ZERO) result = 32'b1; else if ((operand_a == 32'h80000000) && (operand_b == 32'b1)) result = 32'h80000000; else result = operand_a / operand_b; `ALU_DIVU: if (operand_b == `ZERO) result = 32'b1; else result = $unsigned(operand_a) / $unsigned(operand_b); `ALU_REM: if (operand_b == `ZERO) result = operand_a; else if ((operand_a == 32'h80000000) && (operand_b == 32'b1)) result = `ZERO; else result = operand_a % operand_b; `ALU_REMU: if (operand_b == `ZERO) result = operand_a; else result = $unsigned(operand_a) % $unsigned(operand_b); `endif default: result = `ZERO; endcase end `ifdef M_MODULE always_comb begin signed_multiplication = operand_a * operand_b; unsigned_multiplication = $unsigned(operand_a) * $unsigned(operand_b); signed_unsigned_multiplication = $signed(operand_a) * $unsigned(operand_b); end `endif endmodule ``` > [!warning] > This section is under #development #### Instruction Parser/Decoder ```Verilog // RISC-V SiMPLE SV -- instruction decoder // BSD 3-Clause License // (c) 2017-2019, Arthur Matos, Marcus Vinicius Lamar, Universidade de Brasília, // Marek Materzok, University of Wrocław `include "config.sv" `include "constants.sv" module instruction_decoder( input [31:0] inst, output [6:0] inst_opcode, output [2:0] inst_funct3, output [6:0] inst_funct7, output [4:0] inst_rd, output [4:0] inst_rs1, output [4:0] inst_rs2 ); assign inst_opcode = inst[6:0]; assign inst_funct3 = inst[14:12]; assign inst_funct7 = inst[31:25]; assign inst_rd = inst[11:7]; assign inst_rs1 = inst[19:15]; assign inst_rs2 = inst[24:20]; endmodule ``` #### Data Memory Interface ``` Verilog // RISC-V SiMPLE SV -- data memory interface // BSD 3-Clause License // (c) 2017-2019, Arthur Matos, Marcus Vinicius Lamar, Universidade de Brasília, // Marek Materzok, University of Wrocław `include "config.sv" `include "constants.sv" module data_memory_interface ( input clock, input read_enable, input write_enable, input [2:0] data_format, input [31:0] address, input [31:0] write_data, output [31:0] read_data, output [31:0] bus_address, input [31:0] bus_read_data, output [31:0] bus_write_data, output logic [3:0] bus_byte_enable, output bus_read_enable, output bus_write_enable ); logic [31:0] position_fix; logic [31:0] sign_fix; assign bus_address = address; assign bus_write_enable = write_enable; assign bus_read_enable = read_enable; assign bus_write_data = write_data << (8*address[1:0]); // calculate byte enable always_comb begin bus_byte_enable = 4'b0000; case (data_format[1:0]) 2'b00: bus_byte_enable = 4'b0001 << address[1:0]; 2'b01: bus_byte_enable = 4'b0011 << address[1:0]; 2'b10: bus_byte_enable = 4'b1111 << address[1:0]; default: bus_byte_enable = 4'b0000; endcase end // correct for unaligned accesses always_comb begin position_fix = bus_read_data >> (8*address[1:0]); end // sign-extend if necessary always_comb begin case (data_format[1:0]) 2'b00: sign_fix = {{24{~data_format[2] & position_fix[7]}}, position_fix[7:0]}; 2'b01: sign_fix = {{16{~data_format[2] & position_fix[15]}}, position_fix[15:0]}; 2'b10: sign_fix = position_fix[31:0]; default: sign_fix = 32'bx; endcase end assign read_data = sign_fix; endmodule ``` ## Fault-injection in Simulated Targets > [!warning] > This section is under #development ## The Trade-Off of Virtual Platforms When modeling complex targets, an uncomfortable question tends to always float around: is it worth the effort? Can't we just wait until the real hardware is available and we do the development on top of that? This is one of the most fundamental conundrums in embedded systems development. Simulating complex targets is always an appealing task for engineers, so they might be biased to favor their implementation versus using dev kits or similar. Procuring proprietary virtual platform creation software also tends to be a tricky path due to factors like cost and vendor lock-in. As in many aspects of engineering, the answer as to whether creating Virtual Platforms makes sense is: it depends. In fact, it literally depends on the scenario in front of you: If the design team is in a "pre-silicon" stage (the processor or the SoC where the software is supposed to run on is being developed as well, so it does not exist yet), a virtual platform may make sense. This would also feed useful information to the hardware team and perhaps find design issues early in the process. When the team is working on a product that uses existing silicon, virtual platforms may lose some of their meaning. I mean, it's always great to have high-fidelity models you can code with, with breakpoints and other nice-to-haves facilities. The question here is: isn't there a development kit that might get you going until you get the real target? If the virtual platform is too "lo-fi", it might require the software team to tweak the code too much to make it run, which might invalidate the whole point of having a virtual platform in the first place. A third scenario is about needing a simulated target because the team will never use a real target. How is this scenario even possible? Simulators are used for a wide variety of purposes, and one of those purposes might be training. We are all familiar with flight simulators that pilots use for training. Flight simulators for pilot training are typically categorized into several classes based on their capabilities and the level of fidelity they provide in replicating real-world flying conditions. These classes are defined by regulatory bodies such as the Federal Aviation Administration (FAA) in the United States and the European Union Aviation Safety Agency (EASA). Here are the main classes of flight simulators along with their associated standards: 1. **Level A Full Flight Simulator (FFS)**: - Level A FFS is the highest fidelity simulator class, offering the most realistic replication of aircraft systems, flight dynamics, and environmental conditions. - These simulators are used for the most critical training scenarios, including type rating and recurrent training for commercial aircraft. - Level A simulators must meet the highest standards set by regulatory authorities, such as FAA Level D or EASA Level D. - FAA Level D and EASA Level D standards require the simulator to accurately simulate the aircraft type it represents, including its flight envelope, systems behavior, and environmental effects. They also mandate rigorous motion and visual system requirements. 2. **Level B Flight Training Device (FTD)**: - Level B FTDs offer a high level of fidelity but are not as comprehensive as Level A FFS. - These devices are suitable for training tasks that do not require the full range of motion or systems replication found in Level A simulators. - Level B FTDs typically meet FAA Level 6 or EASA Level 2 standards. - FAA Level 6 and EASA Level 2 standards involve requirements for the simulation of aircraft systems, flight dynamics, and visual systems, though they are less stringent than Level D requirements. 3. **Level C Flight Training Device (FTD)**: - Level C FTDs offer moderate fidelity and are used for a wide range of training tasks, including procedural training, instrument training, and recurrent training. - These devices provide a representative flight experience but may lack certain advanced features found in higher-level simulators. - Level C FTDs typically meet FAA Level 5 or EASA Level 3 standards. - FAA Level 5 and EASA Level 3 standards involve requirements for aircraft systems, flight dynamics, and visual systems, but with some simplifications compared to Level B standards. 4. **Level D Basic Aviation Training Device (BATD)**: - Level D BATDs offer basic flight training capabilities suitable for private pilot training, instrument training, and proficiency maintenance. - These devices are typically desktop or compact setups and do not provide motion cues. - Level D BATDs must meet specific requirements outlined by regulatory authorities, ensuring a minimum level of fidelity for training purposes. ![](FFS.jpg) > [!Figure] > _CAE 7000XR Series Level D Full-flight Simulator (credit: CAE)_ ![](FTD.jpeg) > [!Figure] > _CAE600XR Flight Training Device (credit: CAE)_ If a simulator for training purposes requires high-fidelity simulation of the most relevant constituent parts, then having good virtual platforms makes a lot of sense. That way, the overall behavior of the system will match closely the behavior of the machine the personnel is training for, including simulating emergency and off-nominal scenarios accurately. In these cases, Virtual Platforms are a good option as long as the level of abstraction of the System Simulation architecture requires so. > [!warning] > This section is under #development