RIOS Retreat 2023

RIOS&CALAS HK RISC-V Day 2023

Since its establishment, the RISC-V International Open Source (RIOS) Laboratory has garnered widespread attention and support from various sectors of society. In order to further promote and enhance the development of the global RISC-V open-source technology ecosystem, RIOS, in collaboration with the CityU Architecture Lab for Arithmetic and Security (CALAS) of City University of Hong Kong, co-hosted the “RIOS & CALAS HK RISC-V Day 2023” on October 27-28, 2023.

The visit to Hong Kong aimed to deepen understanding of the academic and innovation ecosystem in Hong Kong, strengthen exchanges and cooperation between academia in Hong Kong and mainland China, promote academic research and knowledge sharing, and facilitate collaboration among domestic and international chip and computer industry leaders. Together, they explored the current development status and future opportunities of RISC-V in the Guangdong-Hong Kong-Macao Greater Bay Area.

On one hand, the laboratory showcased recent research achievements to industry experts and scholars; on the other hand, the laboratory invited frontline experts from internationally renowned universities and supercomputing centers to deliver exciting cutting-edge academic presentations on the new computing architecture revolution driven by RISC-V and core technologies of open-source chips. This initiative aimed to provide students with a more comprehensive understanding of the international academic environment, enhance their cross-cultural communication skills and professional capabilities, and lay a solid foundation for their future development on international platforms.

The Highlights of the Event

Cutting-edge Technology Presentations: The RIOS Laboratory showcased its advancements in chip design, large language models, AI chip engineering, low-level computer compilation, vector extension instruction set testing and verification, and Open EDA/PDK to renowned experts and scholars from around the world.

Student Achievements: Notable achievements by students were highlighted, including winning first place in international semiconductor design competitions and receiving awards for designing the world’s first AI-designed RISC-V CPU in the “AI-generated Open Source Chip Challenge.”

Special Presentations by Experts: World-class experts in relevant fields delivered specialized presentations, sharing insights into research trends and the latest developments in international advanced semiconductor initiatives, including open-source chip design and high-efficiency supercomputing architectures based on RISC-V and AI.

Personalized Guidance: Senior experts provided face-to-face, one-on-one guidance to students on their research topics and future development, fostering open and engaging discussions and providing invaluable mentorship.

Opportunities for Students: The event provided students with a rare opportunity to gain insights into the latest industry technologies, understand the newest innovations and technological trends in the industry, and maintain their research capabilities at the forefront of relevant disciplines.

The RIOS Laboratory organized a trip for students to visit the Hong Kong Science Park, where they had the opportunity to tour multiple international research laboratories. This experience allowed them to gain firsthand knowledge of the forefront of innovation and technological development in the Guangdong-Hong Kong-Macao Greater Bay Area, as well as insights into the ecosystem and industrial prospects. The students found the experience to be highly enriching and gained valuable insights into the region’s innovation and technology landscape.

    Original link: https://https://mp.weixin.qq.com/s/vYvrZ31iqHmF7URfEm8DYA


    Generative AI for Chip Design

    Overview

    The Generative AI for Chip Design project at RIOS Lab is at the forefront of merging artificial intelligence (AI) with the intricate world of chip design. Our mission is to harness the power of generative AI models, such as open-source Large Language Models (LLMs) like Llama2 and Mixtral, and tailor them specifically for the challenges and opportunities within the semiconductor industry. By fine-tuning these models on a rich dataset of Verilog code and other domain-specific data, we aim to create specialized tools that can automate and innovate the processes involved in designing chips and System on Chips (SoCs).

    Research Focus Areas

    Fine-Tuning LLMs for Chip Design

    We start with open-source LLMs, recognizing their vast potential. The process of fine-tuning these models with additional, highly specialized Verilog data allows us to create domain-specific LLMs that understand the intricacies of chip design. This initiative not only streamlines the design process but also significantly reduces the entry barrier for complex chip design tasks.

    Automation of Chip Design Tasks

    Our laboratory focuses on automating time-consuming chip design tasks that traditionally require extensive human intervention. This includes tasks that involve interfacing with both natural languages and programming languages, thereby generating Register Transfer Level (RTL) code for simple to complex design modules and scripting for Electronic Design Automation (EDA) tools.

    Overcoming Data Scarcity and Privacy Challenges

    A major focus of our work is to solve the dual challenges of training data scarcity and privacy concerns in the semiconductor industry. We employ innovative strategies to generate synthetic data and use privacy-preserving techniques to enable the training of potent AI models without compromising sensitive information.

    Resource Optimization

    Understanding the constraints of GPU resources essential for training sophisticated models, our research also delves into optimizing the use of computational resources. This involves developing more efficient algorithms that can run on limited resources without sacrificing performance.

    VLSI Bug Summarization and Analysis

    We leverage generative AI to summarize and analyze bugs in Very Large Scale Integration (VLSI) designs. This approach helps in quickly identifying potential issues, speeding up the debug process, and ensuring a higher quality of the final chip design.

    Design Debug and PPA Prediction

    Our research extends to utilizing AI for design debug and Predicting Power, Performance, and Area (PPA) of chip designs. By predicting these critical parameters early in the design cycle, we can significantly enhance the productivity and efficiency of chip design projects.

    Enhancing Chip Design Productivity

    Ultimately, the goal of our research is to significantly improve chip design productivity. By automating routine tasks, providing early insights into potential design issues, and optimizing the design process, we aim to free human designers to focus on the most innovative aspects of chip design.

    Industrial Collaboration

    We are committed to translating our research findings into practical solutions for the industry. Collaboration with chip and SoC design companies ensures that our innovations are grounded in real-world needs and can directly contribute to advancing the state of the art in chip design.

    Conclusion

    At the Generative AI for Chip Design Laboratory, we are not just envisioning the future of chip design; we are actively creating it. Through our cutting-edge research and industry collaborations, we aim to lead the way in making chip design more efficient, innovative, and accessible.

    RIOS Lab Student was Invited to Give a Talk at RISC-V Summit Europe

    RIOS-Lab

    Posted on

    12/06/2023

    RIOS Lab student Yifei Zhu was invited to give a talk on GreenRio: A Linux-Compatible RISC-V Processor Designed for Open-Source EDA Implementations at RISC-V Summit Europe. She introduced the background and shared her experience on this project.

    Fig.1 Yifei Zhu gave a talk at RISC-V Summit Europe

    Fig.3 Group photo of RIOS Lab Students and RISC-V Foundation CEO Calista Redmond

    Background

    Traditional fabrication methods often require large margins to ensure functional performance, But the adoption of public tool chains and open PDK can recycle these margins by the wisdom of the masses. Because, EDA licenses will no longer serve as a bottleneck to design space exploration in this field.

    Therefore, we are trying to answer whether the open EDA tools are competent for building a modern processor? What’s the gap between open and proprietary ASIC fabrication flow? That’s why we build GreenRio, a RISC-V core with modern processors characteristics.

    Through building such non-toy-like design, we optimized the OpenEDA along with the RISC-V architecture together and explore every stage of this flow.

    Fig.2  Milestone of GreenRio

    GreenRio1.0 is a 7-stage, dual-issue, out-of-order (OoO) processor. We demonstrated it in the efabless OpenMPW-7 program using the Skywater 130nm process. Related works were accepted by the Workshop on Open-Source EDA Technology (WOSET) as well as RISC-V Days Tokyo 2022 Autumn2. Greenrio 2.0 adds more hardware supports allowing for running applications. GreenRio2.0 has won the ISSCC Code-a-Chip competition and is becoming a benchmark in the OpenEDA domain.

    User Experience of Open Toolchains

    One of the challenges that the community faces is the gap between open and proprietary tools in terms of QoR and runtime. So We quantified this gap by hardening GreenRio. We used open EDA Tools: OpenROAD/OpenLANE versus appropriate EDA tools: Genus/Design Compiler/Innovus to harden this design.

    We can concluded that:

    1. The open pdk lack of process corners and key parameters for timing optimization, also exists bugs
    2. Compared to propriety tools, the open one is not complete enough and needs further optimization
    3. For logic synthesis, propriety tools bring about smaller area, less gate count and lower leakage power. But the open tools win in synthesis time and can get time convergence netlists without human adjustment.
    4. For physical implementation, compared to Innovus, open tools cannot achieve higher core utilization and higher cell density. Currently, it is inferior in both runtime and performance, because its placement/routing engines are out-of-date.
    5. For the runtime allocation, open tools spend most of the time in routing, but the other one puts more effort in placement.

    Fig.3  Different macro placement

    RVV Sail Model

    RIOS Lab cooperate with the RISC-V foundation and design a configurable RVV test generator for the RISC-V community. We first designed and implemented the RVV Sail Model as the golden model to execute the RVV instructions. We also employ Spike for cross-validation of RVV program executions as compared to our Sail model. After massive iterations of RVV instruction tests, we ensure the correctness of our introduced Sail Model. We have issued pull requests to the official RISC-V Sail Model and our effort is partly upstream now.

    With the Sail model support, we further developed a configurable test generator with quantifiable coverage for RISC-V Vector Extension. The test generator covers all RVV extension instructions,  7 new CSRs and diverse configuration parameters, especially due to the combinations of distinct RVV parameter configurations like vlen, vsew, vmul, etc. With a given vlen, about 30,000 tests are generated by our test generator.

    The introduced test generator has provides the coverage support. We designed the RVV coverage points based on the test coverage points in RISC-V Arch Test for RV64I/M/F/D, and tuned the coverage setting for vector instructions in a fine-grained manner.

    PicoRio

    PicoRio is an open-source CPU project developed by the RISC-V International Open Source (RIOS) laboratory with the goal of creating an open, affordable, Linux-capable RISC-V hardware platform to help software developers and elevate the RISC-V software and hardware ecosystem collaboratively with both academia and industry.

    PicoRio 2.0 CPU is a RV64GC (IMAFDC) core with 13+ stages superscalar out-of-order pipeline. The front-end features two-level branch prediction, and decoupled fetch pipeline and predict pipeline. The PicoRio 2.0 system is a scalable multi-core design for server application with 8/16-core configuration. The system features a coherent network, scalable cache coherence support, mesh linker for on-die network scale-up, and die linker for inter-die network scale-up. The PicoRio 2.0 CPU tile includes a PicoRio 2 core, configurable core number per tile (1~4 cores), configurable exclusive private L2 cache, and configurable NoC Router (0~4 local device ports).

    Software Simulation

    Software simulation plays a crucial role in the development of hardware systems and microarchitectures. To ensure that chip designs meet established requirements, developers typically analyze and verify all aspects of the architecture. In this context, software simulators, facilitated by software tools, are often more effective than hardware languages. The high degree of flexibility offered by software languages enables the efficient implementation of hardware models, as well as debugging and verification of their design. Moreover, software simulators are significantly more efficient than hardware emulators, enabling substantial time savings in the development process.

    However, software simulation can be biased depending on the specific requirements. Different software simulators prioritize different designs, striking a trade-off between performance and accuracy. For complex workloads, balancing efficiency and accuracy poses significant challenges to mainstream simulators, which limit their utility in analyzing complex programs.

    This thesis proposes a full-system cycle accurate software simulation framework, (Name), based on QEMU platform. The framework connects the CPU performance model to the virtual System-of-Chip (SOC) via a bus, allowing for the support of complex IO devices while accurately simulating cycle-level instructions. Furthermore, this thesis proposes a dynamic CPU model switching mechanism, which enables architects to switch between functional implementation and timing simulation in real-time, providing high flexibility and configurability to suit specific simulation accuracy requirements.

    To demonstrate the effectiveness, we implement a case study based on the RISC-V Instruction Set Architecture (ISA). This includes an out-of-order CPU model, memory model, and common IO devices. Using this implementation, we conduct experiments on some complex benchmarks, and demonstrate the superiority of our simulator through experimental data and comparisons with mainstream simulators.