RIOS Lab Student was Invited to Give a Talk at RISC-V Summit Europe

RIOS-Lab

Posted on

12/06/2023

RIOS Lab student Yifei Zhu was invited to give a talk on GreenRio: A Linux-Compatible RISC-V Processor Designed for Open-Source EDA Implementations at RISC-V Summit Europe. She introduced the background and shared her experience on this project.

Fig.1 Yifei Zhu gave a talk at RISC-V Summit Europe

Fig.3 Group photo of RIOS Lab Students and RISC-V Foundation CEO Calista Redmond

Background

Traditional fabrication methods often require large margins to ensure functional performance, But the adoption of public tool chains and open PDK can recycle these margins by the wisdom of the masses. Because, EDA licenses will no longer serve as a bottleneck to design space exploration in this field.

Therefore, we are trying to answer whether the open EDA tools are competent for building a modern processor? What’s the gap between open and proprietary ASIC fabrication flow? That’s why we build GreenRio, a RISC-V core with modern processors characteristics.

Through building such non-toy-like design, we optimized the OpenEDA along with the RISC-V architecture together and explore every stage of this flow.

Fig.2  Milestone of GreenRio

GreenRio1.0 is a 7-stage, dual-issue, out-of-order (OoO) processor. We demonstrated it in the efabless OpenMPW-7 program using the Skywater 130nm process. Related works were accepted by the Workshop on Open-Source EDA Technology (WOSET) as well as RISC-V Days Tokyo 2022 Autumn2. Greenrio 2.0 adds more hardware supports allowing for running applications. GreenRio2.0 has won the ISSCC Code-a-Chip competition and is becoming a benchmark in the OpenEDA domain.

User Experience of Open Toolchains

One of the challenges that the community faces is the gap between open and proprietary tools in terms of QoR and runtime. So We quantified this gap by hardening GreenRio. We used open EDA Tools: OpenROAD/OpenLANE versus appropriate EDA tools: Genus/Design Compiler/Innovus to harden this design.

We can concluded that:

  1. The open pdk lack of process corners and key parameters for timing optimization, also exists bugs
  2. Compared to propriety tools, the open one is not complete enough and needs further optimization
  3. For logic synthesis, propriety tools bring about smaller area, less gate count and lower leakage power. But the open tools win in synthesis time and can get time convergence netlists without human adjustment.
  4. For physical implementation, compared to Innovus, open tools cannot achieve higher core utilization and higher cell density. Currently, it is inferior in both runtime and performance, because its placement/routing engines are out-of-date.
  5. For the runtime allocation, open tools spend most of the time in routing, but the other one puts more effort in placement.

Fig.3  Different macro placement

RVV Sail Model

RIOS Lab cooperate with the RISC-V foundation and design a configurable RVV test generator for the RISC-V community. We first designed and implemented the RVV Sail Model as the golden model to execute the RVV instructions. We also employ Spike for cross-validation of RVV program executions as compared to our Sail model. After massive iterations of RVV instruction tests, we ensure the correctness of our introduced Sail Model. We have issued pull requests to the official RISC-V Sail Model and our effort is partly upstream now.

With the Sail model support, we further developed a configurable test generator with quantifiable coverage for RISC-V Vector Extension. The test generator covers all RVV extension instructions,  7 new CSRs and diverse configuration parameters, especially due to the combinations of distinct RVV parameter configurations like vlen, vsew, vmul, etc. With a given vlen, about 30,000 tests are generated by our test generator.

The introduced test generator has provides the coverage support. We designed the RVV coverage points based on the test coverage points in RISC-V Arch Test for RV64I/M/F/D, and tuned the coverage setting for vector instructions in a fine-grained manner.

PicoRio

PicoRio is an open-source CPU project developed by the RISC-V International Open Source (RIOS) laboratory with the goal of creating an open, affordable, Linux-capable RISC-V hardware platform to help software developers and elevate the RISC-V software and hardware ecosystem collaboratively with both academia and industry.

PicoRio 2.0 CPU is a RV64GC (IMAFDC) core with 13+ stages superscalar out-of-order pipeline. The front-end features two-level branch prediction, and decoupled fetch pipeline and predict pipeline. The PicoRio 2.0 system is a scalable multi-core design for server application with 8/16-core configuration. The system features a coherent network, scalable cache coherence support, mesh linker for on-die network scale-up, and die linker for inter-die network scale-up. The PicoRio 2.0 CPU tile includes a PicoRio 2 core, configurable core number per tile (1~4 cores), configurable exclusive private L2 cache, and configurable NoC Router (0~4 local device ports).

Software Simulation

Software simulation plays a crucial role in the development of hardware systems and microarchitectures. To ensure that chip designs meet established requirements, developers typically analyze and verify all aspects of the architecture. In this context, software simulators, facilitated by software tools, are often more effective than hardware languages. The high degree of flexibility offered by software languages enables the efficient implementation of hardware models, as well as debugging and verification of their design. Moreover, software simulators are significantly more efficient than hardware emulators, enabling substantial time savings in the development process.

However, software simulation can be biased depending on the specific requirements. Different software simulators prioritize different designs, striking a trade-off between performance and accuracy. For complex workloads, balancing efficiency and accuracy poses significant challenges to mainstream simulators, which limit their utility in analyzing complex programs.

This thesis proposes a full-system cycle accurate software simulation framework, (Name), based on QEMU platform. The framework connects the CPU performance model to the virtual System-of-Chip (SOC) via a bus, allowing for the support of complex IO devices while accurately simulating cycle-level instructions. Furthermore, this thesis proposes a dynamic CPU model switching mechanism, which enables architects to switch between functional implementation and timing simulation in real-time, providing high flexibility and configurability to suit specific simulation accuracy requirements.

To demonstrate the effectiveness, we implement a case study based on the RISC-V Instruction Set Architecture (ISA). This includes an out-of-order CPU model, memory model, and common IO devices. Using this implementation, we conduct experiments on some complex benchmarks, and demonstrate the superiority of our simulator through experimental data and comparisons with mainstream simulators.

GreenRio

GreenRio1.0 is a seven-stage RISC-V core that supports dynamic branch prediction and out-of-order execution, as well as a non-blocking data cache. The decode unit separates the CPU’s backend and frontend. During stages 0-3, the predicted PC is determined via gshare and the branch target buffer, with the redirect target PC being obtained from the backend. In stage 4, fetched instructions are placed into a FIFO queue for decoding. Then, the ISA registers are renamed, and the reorder buffer maintains the program order of the instructions, providing precise exception. Dependency checks are performed between instructions to enable dual issuing. During the execution stage, instructions are processed by their corresponding units. Load and store operations involve calculating the address, sending the request to Dcache, and utilizing MSRHs to increase cache bandwidth. In the final stage, results are written back. Each Instruction can graduate once its all preceding instructions have successfully written back their results. The SOC of Greenrio1.0 is adapted from another open project in Google’s MPW6.

GreenRio v2.0 employs a dual-issue 7-stage out-of-order architecture supporting the RV64ICMA unprivileged RISC-V ISA. It also supports Zicsr, Zifencei and Sfence.vma privileged extensions as well as the S, M and U privilege mode. GreenRio 2 is written in openEDA friendly synthesizable Verilog. Synthesized with the latest version of OpenLane, the gate count of a single GreenRio 2.0 CPU pipeline is slightly over 230K on the Skywater 130nm process. The area is 15.87 mm^2 at a frequency of 25 MHz.

GreenRio 2 is an improvement from the previous version 1.0, which has already been validated in the Google/Skywater OpenMPW-7 tapeout November last year. GreeRio 1.0 is already the most complex processor design that has ever been fabricated by all past Google/SkyWater OpenMPW shuttle runs.

With the close collaboration between RIOS and Google along with helps from the OpenEDA tool community,  GreenRio 2 pushes the design complexity limit that OpenMPW and OpenEDA flow can handle. It is also a very meaningful project demonstrating building a high-performance processor with complete open-source technologies from architecture, RTL to GDSII possible. And GreenRio2.0 is the first place winner in the inaugural international Code-a-Chip competition.

code:https://github.com/0616ygh/GreenRio2

ISSCC code-a-chip competition:

https://github.com/sscs-ose/sscs-ose-code-a-chip.github.io/tree/main/ISSCC23