Welcome to the lowRISC project!

lowRISC is creating a fully open-sourced, Linux-capable, RISC-V-based SoC, that can be used either directly or as the basis for a custom design. We aim to complete our SoC design this year. Find out more

Latest news

Generating a Gantt chart from HJSON input

This blog post is a slight departure from the normal topics here. Worry not, we’ll return to discussing Verilog, Chisel, and low-level software work soon. I wrote a quick script to help serve a need (producing a Gantt chart) and thought perhaps others would find it useful.

There are a wide range of online services to help produce and maintain Gantt charts, but none quite offered what I was looking for. I want something open source, easy to use, and where the underlying data is human readable and can be version controlled. The python-gantt library formed an excellent starting point for generating a Gantt chart in SVG, but I thought it was worth trying to support a slightly less verbose input format.

Enter hjson, which aims to be a superset of json with much more forgiving syntax. This has its disadvantages, but it does seem to work well as a concise and easy to edit data format. A quick python script to parse an hjson input to produce a Gantt chart and we’re away. One feature I do like is the use of fuzzy matching for project references and dependencies. Again, this makes it easy to hack on by hand. In the example below, I’m able to use “mftr widgets” to refer to the “Manufacture widgets” task.

Example input:

        projects: [
                name: Project Alpha
                color: green

        tasks: [
                name: Design widget
                begin: 2016-10-14
                duration: 7,
                people: Farquaad
                project: alpha
                name: Set up widget production line
                begin: 2016-10-19
                duration: 6
                people: Zack
                project: alpha
                name: Manufacture widgets
                duration: 7
                people: Carrie
                deps: ["design widget", "widget prod line"]
                project: alpha

        milestones: [
                name: Widgets start shipping
                start: 2016-10-30
                deps: ["mftr widgets"]
                project: alpha

Example output (./hjson_to_gantt --begin-date 2016-10-10 --end-date 2016-11-13 example.hjson --name example):

Example Gantt chart

hjson_to_gantt is available on Github.

Alex Bradbury

lowRISC+IMC internship: second update

This is the second update from our team of interns, comprised of four University of Cambridge undergrads. Their work is kindly sponsored by IMC Financial Markets who are also helping to advise this summer project.

At the time of our last blog post, we had just finished VGA and were working on implementing the frame buffer. Over the last 2 weeks, we have made significant progress, completing the frame buffer and starting video decode.

The frame buffer was developed iteratively. Initially, it was a small Block RAM attached to the SoC’s AXI-Lite bus. This was useful for creating a prototype, but its limited size lead to it being replaced with an in-memory frame buffer, supplemented with a BRAM line buffer. The end result is shown below, the component is connected to the TileLink bus.

Framebuffer diagram

This component builds on the VGA controller, for which documentation will be added shortly. We have added a data mover (a unidirectional DMA), to move data from the in memory frame buffer into the local video memory. The video memory acts as a line buffer, the data mover (DM) moves one line from the in-memory frame buffer at a time. Meanwhile the VGA controller flushes the new lines to the screen, displaying the image from memory. The DM obeys the state machine shown in the diagram below. The DM Controller is memory mapped, allowing the CPU to communicate with it. Currently, it can only accept one request at a time, further requests are ignored until the component moves back to the IDLE step. Requests consist of source and destination addresses, and a length. A planned extension is adding a FIFO queue to the controller to allow multiple requests to be supported.

Data mover state diagram

The decision was made to implement a unidirectional data mover over a more complex and capable bidirectional DMA component as only unidirectional movement was needed for now. Later on, we will need bidirectional access as the video accelerator will need to write back to memory. We hope to enable bidirectionality by simply duplicating the existing unidirectional design.

Our next tasks relate to video decoding. We will be adapting a reference MPEG-2 codec to decode video on the FPGA and adding extra components to the SoC design to improve the performance of the codec, such as DCT and iDCT accelerators.

Notes from the fourth RISC-V workshop

Many of the lowRISC team (Robert Mullins, Wei Song, and Alex Bradbury) have been in Boston this week for the fourth RISC-V workshop. By any measure, this has been a massive success with over 250 attendees representing 63 companies and 42 Universities. Wei presented our most recent work on integrating trace debug, which you’ll soon be able to read much more about here (it’s worth signing up to our announcement list if you want to be informed of each of our releases).

RISC-V Foundation update: Rick O’Connor

  • Next RISC-V Workshop will be Nov 29th-30th at Google’s Mountain View, CA
  • The RISC-V ISA and related standards shall remain open and license-free to all parties, and the member agreement with RISC-V Foundation will include a license for the trademark
  • Trademark license for commercial use is part of being a silver, gold, or platinum member
  • Founding member status has now finished.
  • You don’t have to be a member to participate in specifications - each task group must include at least one round of public consultation.
  • Question: any plans for workshops outside of the USA? Answer: yes, we would like to do that

RISC-V interrupts: Krste Asanović

  • Want a standard that is useful in high performance Unix-like systems (fast cores, smart devices), low/mid embedded systems (slow cores, dumb devices), and high-performance realtime systems (can’t waste time on interrupt overhead)
  • Design goals: simplicity, support all kinds of platforms, allow tradeoffs between performance and implementation cost, be flexible to support specialised needs
  • Interrupts are either local or global.
    • Local interrupts are directly connected to one hardware thread (hart) with no arbitration. On RISC-V, there is currently only two of these: software and timer.
    • For global (external) interrupts, they are routed via the memory-mapped Platform-Level Interrupt Controller (PLIC)
  • A new CSR, the Machine Interrupt Pending (mip) register is added. It has separate interrupts for each supported privilege level.
  • User-level interrupt handling is an optional feature. This may be used in secure embedded systems.
  • Software interrupts
    • MSIP (machine software interrupt) can only be written in machine mode via a memory-mapped control register. This is used for inter-hart interrupts. Also have HSIP, SSIP, USIP. A hart can only write its own HSIP, SSIP, USIP.
    • The App/OS/Hypervisor can only perform inter-hart interrupts via ABI/SBI/HBI calls
  • Timer interrupts: MTIP is a single 64-bit real-time hardware timer and comparator in M-mode. You want this because due to frequency scaling etc, just going by cycle count is not useful. HTIP, STIP, UTIP are set up by M-mode software.
  • When running at a given privilege level, all interrupts for lower levels are disabled.
  • All interrupts trap to M-mode by default, and M-mode software can redirect to the other privilege level as necessary. mideleg can be used to automatically delegate interrupts to the next privilege level.
  • Conceptually, when interrupts come in to the PLIC they are handled by the gateway. This abstracts away differences between different interrupt sources. e.g. level-triggered, edge-triggered etc. A new request isn’t forwarded to the PLIC core unless the previous request’s handler has signaled completion.
  • Each interrupt has an ID and priority. These priorities can be fixed or variable. The PLIC stores per-target information
  • An interrupted hart will try to claim an interrupt from the PLIC with a read of the memory-mapped register. It could have been claimed by someone else, and the PLIC core is responsible for ensuring the interrupts it received by only one hart.
  • If you want to add more levels of nested interrupt handling, add more harts to your system.
  • The position of the PLIC in the memory map isn’t defined by the specification because many people will have existing memory maps.
  • Question: would you have multiple PLICs on a multi-core system? Answer: conceptually, there is only one PLIC though it could be implemented in a distributed fashion.

Formal specification of RISC-V uniprocessor consistency: Arvind

  • Joint project with Adam Chlipala. The slogan is “chips with proofs”. These are multicore chips that satisfy the RISC-V ISA specifications and are capable of booting Linux.
  • Both the design and the proofs must be modular and amenable to modular refinement.
  • Mostly concerned about microarchitecture and memory system correctness.
  • Specs and designs are expressed in Bluespec.
  • See also Kami, a framework for Coq for performing proofs about Bluespec programs.
  • A specification should avoid using concepts such as partially executed instructions or “a store was been performed with respect to…”. Non-determinism is necessary, but unspecified behaviour should be avoided.
  • Semantics are defined in terms of ‘I2E’, the Instantaneous Instruction Execution framework. Simply, an instruction executes instantaneously ensuring the processor state is always up to date. Data moves between processors and memory asynchronously according to some background rules. Memory model-specific buffers are placed between the processor state and memory.
  • WMM is a possible memory model for RISC-V, where both loads and stores can be re-ordered. Conceptually, invalidation buffers are added alongside the store buffer in order to make stale values visible. Whenever a stale value is removed from the invalidation buffer, any values that are older (more stale) must also be removed.
  • Memory issues arise even within a uniprocessor, due to self-modifying code and page table access and the TLB. The fundamental issue is with multiple paths to the same memory.
  • Arvind is concerned that when defining formal semantics, a very weak memory model may become very (too?) complex.

Heterogeneous Multicore RISC-V Processors in FD-SOI Silicon: Thomas Peyret

  • Want to build a large ecosystem around FD-SOI in Europe, including IP and chipset libraries.
  • PULSAR is a RISC-V big.LITTLE-style heterogeneous multicore. Two small cores (rocket without FPU, 8KB L1 caches) and two big cores (3-way super-scalar BOOM and 32KB L1 caches). It features an AMBA interconnect generated by Synopsys CoreAssembler and has multiple body-bias zones.
  • Currently looking to use it in the context of a pedestrian navigation system.
  • 128-bit link to DDR5 controller, plus 4+4GTX SERDES to a separate FPGA.
  • Also features the AntX processor, which is a very small 32-bit RISC Harvard design from CEA Tech.
  • Used hardware emulation with ZeBu (Synopsys)
  • Also used SESAM for virtual prototyping (based on SystemC/TLM 2.0). This is up to 90% accurate compared to RTL. Have also developed SCale, a new parallel SystemC kernel.
  • Synthesis results show 2.64mm2, 0.6W, 700MHz.
  • Question: will the work be open-sourced? Answer: Don’t know yet.

NVidia RISC-V evaluation story: Joe Xie

  • Want to reproduce the existing NVIDIA falcon CPU with a new ISA
  • Falcon - FAst Logic CONtroller. Introduced over 10 years ago and used in more than 15 different hardware engines today. Low area, secure, flexible. 6 stage pipeline, variable length instructions (proprietary NVIDIA ISA).
  • The next generation for Falcon is needed for higher performance and rich OS support. Old Falcon is 0.67 DMIPS/MHz, 1.4 Coremark/Mhz
  • Options were to buy access to a current architecture (MIPS, ARM, others) or build (move to RISC-V or improve Falcon). Obviously, they elected to move to RISC-V. The fact the ISA is extensible is a key advantage. Want an area of less than 0.1mm2 at 16FF.
  • NV-RISCV is 5 stage in-order issue, out-of-order execution. It has a in-order write buffer. No FPU. Makes use of an MPU with base and bound protection. It will initially be added to the Falcon as a 2nd core to provide easy backwards compatibility.
  • Area for 16FF: Falcon 0.03mm2 vs Rocket 0.055mm2 vs NV-RISC-V 0.05-0.06mm2.
  • Did a lot of cache optimisations to tolerate large latency. Store buffer, write merging, line-fill buffer, victim buffer, stream buffer.
  • Areas of interest include toolchain (for automotive, debug, performance tuning, flexibility, ilp32/ilp64). Also security (crypto instructions and extensions), and adding cache manipulation instructions.
  • Question: why design your own core rather than use an existing one? Answer: after evaluating the options, it made the most sense. The motivation to go to RISC-V was technical as well as influenced by cost.

ISA Shootout – a Comparison of RISC-V, ARM, and x86: Chris Celio

  • Recently released a new tech report The renewed case for the Reduced Instruction Set Computer.
  • The conventional wisdom is that CISC ISAs are more expressive and dense than RISC ISAs, while RISC ISAs map well to high-performance pipelines. Of course, a number of designs have CISC instructions translating to RISC-like micro-ops.
  • Chris’ contention is that a well designed RISC ISA can be very competitive with CISC ISAs. It can be denser and higher performance.
  • ldmiaeq sp!, {r4-r7, PC} is an ARMv7 instruction (load multiple-increment-address) which will write to 7 registers and perform 6 loads. This is a common idiom for stack pop and return from a function call.
  • Goal is to get a baseline to measure the current code generation quality of the RISC-V gcc port. Given a fixed ISA, what can the compiler do to improve performance? What can the programmer do? What can the micro-architect do? A specific non-goal is to lobby for more instructions (CISC or otherwise).
  • Dynamic instruction count can be very misleading due to the possibility it decodes to many micro-ops. Conversely, macro-op fusion may take multiple instructions and fuse them on the fly.
  • Looking at 6 ISAs, using 12 benchmarks from SpecINT 2006.
  • Average 16% more instructions for RISC-V vs x86-64, though roughly even in terms of micro-ops. With the compressed instruction set extension, RISC-V wins in terms of instruction bytes fetched on many of the benchmarks. Overall, 28% fewer instruction bytes than ARMv7 and 8% fewer than x86-64.
  • Adding array indices is the most common idiom, so why not add an indexed load instruction to match x86? But with the compressed ISA, a pair of compressed instructions can be treated as an indexed load by the decoder.
  • Proposed macro-op fusion pairs: load effective address, indexed load, clear upper word. These idioms provide 5.4% fewer “effective” instructions for RV64.
  • Fusion isn’t just for superscalar out-of-order cores. Chris believes it should be used by all RISC-V cores. For instance, Rocket (single issue) can be modified to perform this.
  • Better code generation is possible if the compiler knows fusion is available.
  • RISC can be denser, faster, and stay simple!
  • Question: will compressed become standard? Answer: it may become part of the de-facto standard or even Linux ABI standard. Still more to be done to fully understand the complexity for processor implementations.
  • Question: how does macro-op fusion interact with things like faults and precise exceptions? Answer: it does add extra complexity. One solution is if you get a fault, then re-fetch and execute without fusion.

Trace debugging in lowRISC: Wei Song

  • Watch this blog for much more on our trace debug work very soon.

RISC-V I/O Scale Out Architecture for Distributed Data Analytics: Mohammad Akhter

  • For analytics, need a deep net with many nodes. This demands balanced low-latency computing I/O, memory, and storage processing.
  • Wireless network evolution is driven by real-time data with better QoS. Very rapid growth rate in bandwidth and reduction in round-trip time latency for LTE, LTE-A, 5G, …
  • Built a deep learning micro-cluster. Uses RapidIO, NVidia GPUs. No RISC-V though. They’ve then looked at how this might look with RISC-V cores instead.
  • Want to support AXI rand RapidIO.
  • Produced a hardware simulation model with TileLink packet generators producing data that is transferred over a RapidIO transport.
  • A RISC-V CPU generator model with port for RapidIO available (where?)

Coherent storage. The brave new world of non-volatile main memory: Dejan Vucinić

  • There are two emerging resistive non-volatile memories. ReRAM and PCM. Read latency is orders of magnitude lower than NAND, somewhere between that of DRAM and NAND.
  • Should non-volatile memories be treated like memory, or like storage?
  • For now, it seems to make sense to have the digital logic for the NVM controller off-chip (including coherence state).
  • Wear levelling, data protection at rest further motivate the controller being placed along with non-volatile media.
  • One potential approach is a coherent storage controller in reconfigurable logic.
  • RISC-V shopping list: Hardware coherence, fast+wide ports for peripherals to join the coherence domain, relinquish the non-volatile memory controller for now, and get used to high variability in main memory response time.

RISC-V as a basis for ASIP design: Drake Smith

  • “Every design is different, so why is every embedded processor the same?”
  • Using RISC-V as a basis for ASIP can avoid many concerns. SecureRF produces quantum-resistant security for low resource devices using group theoretic cryptography.
  • Using the Microsemi Smartfusion2+ board as the test platform.
  • With a software-only port, their WalnutDSA was 63x faster than Micro-ECC.
  • Adding a custom instruction to accelerate it only added 2% area on the FPGA and gave another 3x increase in speed.

An updated on building the RISC-V software ecosystem: Arun Thomas

  • 2016 wishlist: Upstream GNU toolchain, Clang//LLVM and QEMU. Also Linux kernel, Yocto, Gentoo, and BSD. Plus Debian/RISC-V port.
  • Now people are getting ready to send patches for review for toolchains and QEMU.
  • FreeBSD 11 will officially support RISC-V. For the Debian/RISC-V port, see Manuel’s talk tomorrow.
  • Arun argues the Foundation should fund developers to build core software infrastructure. Additionally, we should also decide on a process for proposing ISA enhancements.
  • What might funded developers do? Upstreaming and maintainership, porting software, performance optimisation/analysis, enhancing test suites and methodologies, continuous integration and release management.
  • How should proposals be be handled? Various groups have approaches for this already. e.g. Rust, Python, IEEE, IETF. Arun has put together a straw-man proposal on specification development.
  • Arun would like to see the next iteration for the privileged spec to go through a comment period.

ORCA-LVE, embedded RISC-V with lightweight vector extensions: Guy Lemieux

  • Using 900LUTs for a speedup of 12x. Proposed and added a standardised vector engine to their processor.
  • Smallest version of the ORCA implementation can fit in 2k LUTS on the Lattice iCE40 and runs at about 20MHz.
  • Their approach for lightweight vector extensions is to add a dedicated vector data scratchpad and to re-use the RISC-V ALU.
  • Vector operands are just RISC-V scalar registers containing pointers into the vector scratchpad.
  • To encode vector operations, they use two 32-bit instruction bundles.
  • To allocate vector data, just use an alternative malloc function. Intrinsics are available to manipulate vectors.
  • In the future, want to add 2D and 3D operations as well as subword SIMD.
  • Why not using the proposed RISC-V vector extensions? Because the detailed proposal isn’t yet released, and LVE intends to be more lightweight and lower overhead.
  • Question: can these instructions raise exceptions? Answer: that hasn’t been properly defined yet.

FPGArduino. A cross-platform RISC-V IDE for the masses: Marko Zec

  • The main attraction of the Arduino IDE is simplicity and quick results
  • Provide pre-compiled toolchains for OSX, Windows, and Linux. For C libraries, took mainly from FreeBSD.
  • boards.txt defines IDE menu entries and options. Also support pre-build FPGA bitstreams and support for upload from IDE.
  • Have produced f32c, a retargetable scalar RISC-V core written (mostly) in VHDL.

SiFive’s RISC-V computer: Jack Kang

  • SiFive is a fabless semiconductor company building customisable SoCs
  • They produce a free and open platform spec for their platforms
  • This week announced “Freedom Unleashed” (Linux application cores, high speed peripherals), and “Freedom Everywhere” (targeted at embedded and IoT).
  • The Freedom Unleashed demo will be shown today, running on an FPGA connected to PCIe.
  • Question: why 180nm for the Freedom Everywhere, isn’t it rather old now? Answer: it is low cost and fast time to market so will make sense for some.
  • Question: will peripherals etc be open sourced? Answer: things we do ourselves e.g. SPI will be.

MIT’s RISCy expedition: Andy Wright

  • Build proofs from small components, build them up to complete, real processors.
  • They are now releasing their work, the Riscy processor library, Riscy BSV utility library, and reference processor implementations. Currently multi-cycle and in-order pipelined. Soon, out-of-order execution.
  • Have infrastructure for tandem verification.
  • How is modular design possible? RTL modules are not modularly refinable under composition, i.e. implementation details of one module may put additional constraints on another. But BSV language features do support composability.
  • The processor design flow involves taking the Riscy blocks, forming the initial connections, performing modular refinement, and then scheduling optimisation to reduce overheads due to BSV scheduling logic.
  • Connectal implements the connections from FPGA to a host computer through PCIe. This also works on Zynq FPGAs with an AXI transport.
  • Tandem verification: run the same program on two RISC-V implementations at once. Generated verification packets at commit stage, use non-deterministic information from the implementation under test for synchronisation, and then compare the results.
  • Check out the code on Github.
  • Planned work involves formal specifications, proofs for modules, and proof for processors.

A software-programmable FPGA IoT platform: Andrew Canis

  • Lattice’s vision for an FPGA IoT platform is that it has high ease of use (use C/C++ as design entry), and flexibility for a range of sensors, actuators, communication devices.
  • A hybrid computing solution: the RISC-V processor with FPGA hardware. RISC-V processor plus LegUp-generated hardware acclerators to handle the processing part of the IoT platform.
  • The Lattice RISC-V processor has a 4 stage pipeline, and can be configured for RV32I, RV32IM, and RV32IC. It compares favourably to the LM32, e.g. RV32IC takes 1.6K LUTs vs 2K LUTs for the LM32 while also achieving higher DMIPS and code density.
  • LegUp is a high level synthesis tool.
  • For a sum of squares of speech samples example, the LegUp synthesized accelerator gives a 5.4x speedup vs the RISC-V software implementation.
  • LegUp has plans to support LegUp-synthesized custom instruction implementations

Apache mynewt: James Pace

  • Mynewt is an open source OS for constrained IOT. Supports ARM, AVR, Mips (and now RISC-V?).
  • Apache Mynewt is “Linux” for devices that cannot run Linux.
  • It is a community effort, run through the Apache Software Foundation. Currently ~280k lines of code.
  • Plans for Bluetooth 5 support in the future, deployments for industrial wireless sensor networks.
  • The Mynewt kernel is a pre-emptive, multi-tasking RTOS with a tickless kernel.
  • Question: does Mynewt support SMP? Answer: not currently.

DSP ISA extensions for an open-source RISC-V implementation (PULP): Pasquale Davide Schiavone

  • RI5CYv2 is an evolution of their RISC-V implementation. It is an RV32IMC implementation with some PULP-specific ISA extensions to target energy efficiency.
  • Includes support for profiling and core execution trace.
  • Coremark/Mhz is competitive with the ARM Cortex M4.
  • Hardware loop instructions benefit control-intensive applications
  • Add DSP extensions to improve energy efficiency for signal processing algorithms. Want to execute more quickly so the core can enter a low-power mode.
  • RI5CYv2 adds dot product between vectors, saturation instructions, small vector instructions, … GCC support is present for these.
  • These additional instructions give a performance increase of up to 9.5x, 6.4x on average for data-intensive kernels.
  • The fan-out 4 of the critical path is 31. When laying out at 65nm, area is 67 kilo-gate equivalents.
  • Released so far just 10% of what they will release in the future, so there’s much more to come. The full PULP will be released in December, an in the meantime you can use the PULPino core.

The DOVER Edge: A Metadata-Enhanced RISC-V Architecture: André DeHon

  • How do we handle the ‘edge’ of a metadata tagged system? e.g. I/O to the untagged world, legacy devices, DMA.
  • PUMP is a metadata processing engine that checks tags upon every instruction and memory access.
  • For slave devices, tags can be associated with memory mapped devices. These are used to write rules to control access. This allows giving configuration control to particular drivers, without giving the driver control to all devices or other privileges.
  • DMA I/O policies might target: containment (who’s allowed to read/write a buffer), integrity (mark incoming data as untrusted), secrecy, and data presence/synchronisation.
  • Add new supported opcodes as input to the PUMP representing DMA load and DMA store. Modify PC tag and Instr tag to represent the state of the DMA and the DMA source.
  • If a DMA is deemed to be misbehaving, it can be totally disabled by the PUMP or the particular operation could be discarded.
  • In this design, there is both an IO pump and a processor PUMP. The IO pump is pipelined so it will not reduce system throughput.
  • The IO pump generates an interrupt on a rule miss. The miss handler uses the same rule function as for the processor PUMP.

Improving the performance-per-area factor of RISC-V based multi-core systems: Tobias Strauch

  • The speaker has spent many years working on C-slow retiming
  • System hyper pipelining is based on C-slow retiming. It replaces original registers with memories, and adds thread stalling and bypassing features.
  • In ‘deep pipelining’, run one thread in ‘beast mode’. Switch to another thread if an instruction dependency is detected.
  • Created the microRISC project, working on the V-scale design. With SHP was able to move from 80MHz to 250MHz.
  • miniRISC (based on lowRISC). Want to perform SHP on the Rocket core. The speaker proposes that instead of having multiple minions you have a hyper-pipelined core.
  • The source code of the projects will be released in PDVL, a new language “way better than Chisel and Bluespec Verilog(!)” that produces VHDL and Verilog.

Working towards a Debian RISC-V port: Manuel A. Fernandez Montecelo

  • Debian is a community of volunteers who care about free and open-source software.
  • Debian contains more than 22k source packages
  • Debian contains a mix of officially supported ports, unofficial releases (on Debian infrastructure but not part of the stable release process), and others are outside of Debian infrastructure (e.g. Raspbian).
  • Why a Debian port for RISC-V? Interested as Manuel feels affinity with the goals of the project, previously enjoyed working with the OpenRISC port.
  • Goal is to have a complete, fully supported, continuously updated Debian port. The initial step is to bootstrap a viable, basic OS disk image.
  • The chosen RISC-V target is 64-bit little endian. This is the recommended default and what is planned for the lowRISC board.
  • Been working on and off since November 2014. Upstreaming of toolchains etc would be very helpful. Have now built 300-400 “essential” packages.
  • Packages where mostly cross-compiled, with some compiled ‘natively’ inside emulators. Some require building multiple times to e.g. break circular dependencies.
  • ABI changes mean work has to restart from scratch.

Kami. A framework for hardware verification: Murali Vijayaraghavan

  • This work is part of the “Riscy Expedition” by MIT. Want to build chips with proofs.
  • Must to able to verify an optimisation is correct independent of contexts, to enable modular verification of a full system.
  • Kami is a DSL inside the Coq proof assistant for verifying Bluespec-style hardware.
  • Have finished building required theory and proof automation infrastructure.
    Are currently working on proving a cluster of multi-cycle cores connected to a coherent cache hierarchy implements sequential consistency.

lowRISC / IMC internship week one - VGA output

Begnning on Monday, June 27th, we had a team of four University of Cambridge undergrads begin a 10 week internship working on the lowRISC project at the Computer Laboratory, kindly sponsored by IMC Financial Markets (who are also helping to advise this project). The team will be blogging regularly over the course of the summer - I’ll pass over to them to introduce themselves.

After some initial brainstorming, we decided to aim to extend the current lowRISC SoC design to enable video output, with the final goal of playing video smoothly at a resolution of 640x480 on FPGA. The photo below shows the four of us (left to right: Gary Guo, Profir-Petru Pârțachi, Alistair Fisher, Nathanael Davison).

2016 lowRISC/IMC 
internship team

The final goal has been decomposed into several milestones: adding VGA functionality to lowRISC, adding an in-memory framebuffer, implementing a video codec for RISC-V and designing and creating a 2D accelerator to speed up video decoding. Our plan for the augmented SoC architecture is shown the in the diagram below:


In our first week, we’ve succeeded in adding VGA output to lowRISC, a demonstration of this is shown in the video below. The demo shows lowRISC instantiated on a Nexys4 DDR board (Artix-7) displaying a static image that has been loaded into its BRAM. This image is read from SD card by a bare-metal program on the RISC-V application core, which then loads it in to the memory-mapped BRAM we hooked up to the AXI-Lite bus. The on-chip BRAM is obviously a very limited resource, so our next step is to use the board’s DRAM to hold the framebuffer and make use of the BRAM for a line-buffer.

We aim to publish something every week, either in the form of a blog post like this or as a more detailed guide showing how to repeat our work. Next week we’ll share a guide on how to enable VGA in lowRISC. By the end of the summer, as well as a working technical demo, we will also have produced detailed documentation on the whole process of adding a customised accelerator to lowRISC.

Announcing the LibreCores design contest and ORConf 2016

Our friends and collaborators at the Free and Open Source Silicon Foundation have launched the LibreCores design contest. This is a student design contest which aims to recognise and reward contributions to the open source hardware ecosystem. The main evaluation criteria are:

  • Openness. Your work must be published under an established Open Source license.
  • Reusability. How easily can your work be used and modified by someone else? Is it well documented? Do you plan to continue to work on your project and help others to get started?
  • Usefulness. Is your work filling a much-needed gap in the world of Open Source hardware design? Something that was not there before?

See the design contest site for full details. The deadline for submission is August 31st, 2016. Entrants will have the opportunity to present their work at ORConf, with some travel funding available thanks to the sponsors.

This neatly leads me to the next activity I wanted to highlight in this post. Again, thanks to the hard work of our friends at FOSSi, registration is now open for ORConf 2016. To quote that page: “ORCONF is an open source digital design and embedded systems conference, covering areas of electronics from the transistor level up to Linux user space and beyond. Expect presentations and discussion on free and open source IP projects, implementations on FPGA and in silicon, verification, EDA tools, licensing and embedded software”.

Last year’s ORConf was the biggest and most enthusiastic event focused on open source digital design I’ve ever been to, and saw the birth of the FOSSi Foundation. Please do register to attend and submit talks. With the growing interest in open source hardware and open source digital logic design, I expect it will be even bigger and better than last year. ORConf 2016 will be held at the University of Bologna in Italy between October 7th and October 9th. Register now.

Alex Bradbury

lowRISC's 2016 Google Summer of Code Students

The 2016 Google Summer of Code is now underway and we’re delighted to be working with five students, covering a variety of interesting projects. They have all introduced themselves over the past few weeks on our project mailing list. Many thanks to everyone who applied, to the mentors who volunteered, and to Google for sponsoring this programme. If your application was unsuccessful, I hope you’ll try again next year.

The projects for lowRISC in the 2016 GSoC are:

  • Porting the Arduino library to RISC-V (PULPino). Mahmoud Elmohr, mentored by Andreas Traber
  • Implementing an open-source DDRx controller. Bittu N, mentored by Wei Song
  • Porting Musl libc to RISC-V. Masanori Ogino, mentored by Rich Felker. See Masanori’s first status update
  • Porting the OP-TEE Trusted Execution Environment to the lowRISC platform.Rahul S Mahadev, mentored by Stefan Wallentowitz. See Rahul’s blog post on starting this project.
  • Porting the xv6 teaching operating system to the lowRISC platform. Jeff Rogers, mentored by Alex Bradbury

Apply now to work with lowRISC in Google Summer of Code

We are very grateful to have been selected to take part as a mentoring organisation in the Google Summer of Code for the second year running. As with last year, we’re working with a number of friends from across the wider open source hardware community to act as an umbrella for a range of hardware-related projects. If you are a student who would like to be paid to work on open source during the summer, then take a look at the lowRISC ideas list and apply. As was pointed out on the Google Open Source Program’s blog, there is a good showing from hardware-related projects in GSoC this year. The deadline for applications is this coming Friday, 25th March at 7pm GMT.

We welcome ideas of your own creation, but the ideas we’ve suggested this year include:

  • A trace debug analysis tool (ideally using TypeScript and Electron)
  • Improving device-tree support for the Linux RISC-V port
  • Various ideas related to the Yosys open-source synthesis tool.
  • Porting a teaching OS such as xv6 or XINU to the lowRISC platform
  • Porting CMSIS-DSP to PULPino
  • Doom on PULPino
  • Porting the Arduino libraries to PULPino
  • Integrating additional open-source IP for lowRISC on FPGA
  • Implementing a Trusted Execution Environment
  • Porting musl libc to RISC-V
  • A Generic hardware/software interface for software-defined radio
  • A SPIR-V frontend for Nyuzi
  • Porting an OS kernel to Nyuzi

Third RISC-V Workshop: Day Two

Today is the second day of the third RISC-V workshop. Again, I’ll be keeping a semi-live blog of talks and announcements throughout the day. See here for notes from the first day.

RISC-V ASIC and FPGA implementations: Richard Herveille

  • Look for freedom of design. Want to free migrate between FPGAs, structured ASICs, standard cell ASICs
  • Want to make it easier to migrate FPGAs to ASICs for advantages in price, performance, power, IP protection.
  • Roa Logic’s RV32I/64 implementations are called RV11 and RV22. RV11 is in-order, single-issue, single thread. RV22 is in-order, dual-issue and dual thread.
  • Implement a ‘folded’ optimizing 5-stage pipeline, where some classic RISC stages are folded together for performance improvement. e.g. the instruction decode stage decides if the instruction sequence can be optimized.
  • Ported a debug unit for or1k from OpenCores
  • Mostly target the eASIC nextreme platform. Start with an existing FPGA design, then transfer.
  • Achieved Fmax of 649MHz (32-bit core) on a nextreme-3, vs 114MHz on the customer’s current CYCLONE-V. Achieved a 70% power reduction.
  • Next steps are to improve resource utilization, increase offering of extensions, and add multi-threading and multi-issue.

lowRISC: plans for RISC-V in 2016: Alex Bradbury

  • Find my slides here. Apologies for not live-blogging my own talk.

A 32-bit 100MHz RISC-V microcontroller with 10-bit SAR ADC in 130nm GP: Elkim Roa

  • Current goal: a low-footprint RISC-V microcontroller like EFM32 or SAMD11 with USB low-speed PHY on-chip.
  • Looked at picorv32 and vscale, but ultimately implemented their own implementation. Also adding in a 10-bit SAR ADC, DAC, PLL, GPIO.
  • Implement RV32IM using a 3-stage pipeline. IRQ handling is adapted from the picorv32 timer.
  • Provides AXI-4 lite and an APB bridge.
  • SAR ADC intends to be fully synthesizable, 10-bit 10MHz.
  • The chip was taped out in October 2015 on 130nm TSMC GP. The core+interfaces area was 800um x 480um.
  • Undertook a large effort on verification, implemented verification testbenches for AXI-4 and APB peripheral functionality. Would like to partner to get access to proven VIP.
  • Future work to be done on a USB PHY low-speed interface, DMA channels, watchdog timer, eNVM 1-poly ROM
  • Question: where and when can I download it? Soon! Still cleaning it up.

SoC for satelline navigation unit based on the RISC-V single-core Rocket chip: Sergei Khabarov

  • Currently have an RF-mezzanine card for FPGA prototypes, and silicon-verified GNSS IP and ASIC development board with a LEON3 CPU inside.
  • On the software, have universal receive firmware and plug-and-play support for different targets. Plus a host application for data analysis. See gnss-snsor.com.
  • Now transitioning from the previous 180nm ASIC to a new 90nm chip with a RISC-V core. The target frequency is 300MHz.
  • The new SoC design aims to take the best ideas of the GPL-licensed grlib library (Gaisler Research) and will be written in VHDL.
  • Current code can be found here.
  • Plug and play approach taken from grlib to help quickly assemble a complex SoC design. Device ID, vendor ID, address and interrupt configuration, cacheability etc etc routed in sideband signals accessible via a dedicated slave device on a system bus.
  • Some memory access optimizations have been implemented to allow access to AXI peripherals in one clock cycle.
  • Implemented (or implementing?) the multi-core debug protocol, potentially supported by Trace32 (Lauterbach).

RISC-V photonic processor: Chen Sun

  • Process scaling has helped massively for data transfer within a chip, but we’ve had little improvement for moving data off-chip.
  • The I/O wall involves being both power and pin limited. Silicon photonics may help overcome this.
  • Started by tring to provide DRAM connected by photonics (as part of the DARPA POEM project).
  • What about the foundry? Do it without a foundry. How to connect electronics and photonics? Put them on the same chip. Where are you going to get a processor? RISC-V!
  • They produced it and it was published in Nature last month. Dual-core 1.65GHz RISC-V. Manufactured in a commercial SOI process.
  • Build a waveguide with planar silicon processing. Silicon is the high-index core. Oxides form the low-index cladding.
  • Transmitter is driven by a CMOS logic inverter. 5Gb/s data rate at 30fj/b.
  • Issue with ring resonators is massive variation based on process and temperature variation. Need to stabilise this some-how. Add thermal tuning.
  • To demonstrate the optical memory system, had a second chip emulating a DRAM controller.
  • We proposed an architecture, built it, and got performance competitive to our predictions!

Untethering the RISC-V Rocket Chip: Wei Song

  • Rocket is an open-source SoC generator from Berkeley. The base Rocket core is a 56 stage single-issue in-order processor.
  • Previously an host-target interface was connected to the L2 bus which communicates with an ARM core on the Zynq to provide peripherals.
  • The untethered Rocket chip adds a separate I/O bus. Currently uses Xilinx IP for peripherals. First stage bootloader is on FPGA block RAM, second stage bootloader is loaded from SD. I/O read and write are totally uncached.
  • The top-level (including I/O devices) is all in System Verilog. There is a separate ‘Chisel island’ containing the Rocket interface.
  • The I/O and memory maps can be configured by CSRs.
  • The first-stage bootloader copies the second stage to DRAM, performing an uncached copy. It then remaps the DRAM to memory address 0, resets Rocket and starts the second stage. The second stage uses a version of the Berkeley bootloader. It starts multi-core, VM support, then loads and boots RISC-V Linux in a virtual address space.
  • Currently the second stage bootloader stays resident to handle HTIF requests.
  • Our code release contains a very detailed tutorial. Key features include support for the Nexys4 as well as the more expensive KC705. You can also use Verilator for simulation and a free WebPACK Vivado license.
  • The code release includes a rewritten TileLink/NASTI interface, DDR2/3 controller, SD, UART.
  • Future work: re-integrate tagged memory, remove HTIF from Linux kernel (help wanted), interrupt controller, trace debugger (Stefan Wallentowitz), run-control debug (SiFive), platform spec.

MIT’s RISCY expedition: Andy Wright

  • MIT implemented an IMAFD 64-bit RISC-V processor in Bluespec System Verilog. It supports machine, supervisor, and user modes, boots Linux, and has been tandem-verified with Spike.
  • Want to work on formal specification of the ISA, formally verified processor implementations, memory consistency models, accelerators, microarchitecture exploration, VLSI implementations using a standard ASIC flow.
  • Philosophy: get a working processor first, figure out why it’s slow, and make it faster without breaking it.
  • Moving to work on formal verification, which requests a formal specification for RISC-V. Lots of questions to be answered, e.g. whether referenced bits in page table entries should be set for speculatively accessed pages.
  • A single instruction can result in up to 13 effective memory accesses - how do these interact with each other, and how do they influence the memory model?
  • Want simple operational definitions where legal behaviours can be observable on a simple abstract machine consisting of cores and a shared monolithic memory.
  • Looked at defining WMM, a new easy-to-specify weak consistency model.
  • Propose there should be a new instruction similar to sfence.vm, but going in the opposite direction.
  • See also their website.
  • Question: will it be open source? Concerned currently because some aspects of the design are used as challenges for student projects and releasing it could compromise the projects.

Pydgin for RISC-V, a fast and productive instruction-set simulator: Berkin Ilbeyi

  • Simple interpreted instruction set simulators get 1-10MIPS of performance. Typical dynamic binary translation may achieve 100s of MIPS, with QEMU achieving up to 1000 MIPS.
  • Another aspect worthy of consideration is productivity when working with the simulator, for instance when looking to extend it to explore new hardware features. Can you achieve high developer productivity and runtime performance?
  • Observe there are similar productivity-performance challenges for building high performance language runtimes as for simulators. e.g. the PyPy project.
  • Pydgin uses PyPy’s RPython framework.
  • Pydgin describes its own architectural description language (really a Python DSL).
  • Pydgin running on a standard Python interpreter gives ~100KIPS. But when going through RPython this gives 10MIPS. When targeting the RPython JIT, adding extra RPython JIT hints are added achieved up to a 23x performance improvement over no annotations.
  • Performs 2-3x better than Spike for many workloads. Spike caches decoded instructions and uses PC-indexed dispatch to improve performance.
  • Achieved a 100MIPS+ RISC-V port after 9 days of development.
  • Pydgin is used in the Cornell research group to gain statistics for software-defined regions, experimentations with data-structure specialisation, control and memory divergence for SIMD, and more.
  • Pydgin is online at Github.

ORCA, FPGA-optimized RISC-V soft processors: Guy Lemieux

  • ORCA is completely open-source. See it at Github.
  • Initially targeted the Lattice iCE40 (3.5kLUTs, under $5 in low quantities).
  • RV32M implemented in 2kLUTs at around 20MHz on the iCE40.
  • ORCA is highly parameterized, ideally suitable for FPGAs, portable across FPGA vendors, and BSD-licensed.
  • Achieved 244MHz, 212MIPS on an Altera Stratix-V. Lots of room for further improvements.
  • Clock speed is close to matching the picorv32 clock speed, but with higher DMIPS/MHz.
  • Found 64-bit vs 32-bit counters added a lot of area.
  • A good FPGA implementation often leads to a good ASIC implementation, but a good ASIC implementation often leads to a poor FPGA implementation.
  • Use dual-ported block RAMS on the FPGA for the register file.
  • Observe that reduced register count in RV32E makes no difference for FPGAs. Divide is very expensive.
  • Beware when writing software, a shift could be as slow as 1b/cycle.
  • The privileged architecture spec contains too many CSRs and the 64bit counters are too big, putting pressure on multiplexers. For FPGAs, perhaps defined small/med/full versions.

PULPino, a small single-core RISC-V SoC: Andreas Traber

  • PULP and PULPino developed at ETH Zurich and University of Bologna with many partners.
  • Develop an ultra low power processor for computing IoT. Explot parallelism using multiple small simple cores organised in clusters.
  • Share memory within the cluster.
  • Support near-threshold operation for very low power.
  • PULP has been taped out over a dozen times across multiple process technologies, down to 28nm.
  • PULP has a large number of IPs. To start with, open source PULPino as a starting point.
  • PULPino is a microcontroller-style platform. No caches, no memory hierarchy, no DMA. It re-uses IP from the PULP project and will be taped out in 65nm UMC.
  • The boot ROM loads a program from SPI flash.
  • Motivated to switch to RISC-V due to more modern design (no set flags, no delay slot), compressed instructions, easily extensible.
  • Looking to extend RISC-V with non-standard extensions for hardware loops, post-increment load+store, multiply-accumulate, ALU extensions (min, max absolute value).
  • RI5CY core has full support for RV32I, implements just the mul from RV32M. It has a 4 stage pipeline.
  • Performed a comparison based on published Cortex M4 numbers. RI5CY is a little faster and a little smaller.
  • The RI5CY core itself is just 7% of the area of a PULPino SoC (assuming 32KiB instruction and data RAM).
  • Open source release will follow shortly, including a FreeRTOS port. Using the Solderpad license. Just awaiting final approval from the University (expected by the end of the month).
  • Want to port PULPino to IP-XACT. Also want to add floating point support, branch prediction, and evaluate further non-standard ISA extensions.

The Berkeley Out-of-Order Machine (BOOM): Christopher Celio

  • An out-of-order core. It’s synthesizable, parameterizable, and open-source.
  • Out-of-order is great for tolerating variable latencies, finding instruction-level parallelism, and working with poor compilers or lazily written code.
  • Downsides are it’s more complex and potentially expensive in area and power.
  • Should work as an interesting baseline for micro-architecture research. Also enables research that need and out of order core (e.g. on memory systems, accelerators, VLSI methodologies).
  • BOOM implements IMAFD and the RV64G+ privileged spec.
  • The RISC-V ISA is easy to implement. Relaxed memory model, accrued FP exception flags, no integer side-effects (e.g. condition codes), no cmov or predication, no implicit register specifiers, rs1+rs2+rs3+rd are always in the same space allowing decode and rename to proceed in parallel.
  • As Rocket-chip gets better, so does BOOM.
  • The host-target interface is being removed from rocket-chip to provide an untethered system.
  • BOOM has a unified physical register file (floating point and integer).
  • Masses of parameters can be tweaked.
  • 2-wide BOOM with 16KiB L1 caches 1.2mm2 in TSMC 45nm. Can clock at 1.5GHz for two-wide.
  • Currently designed for single-cycle SRAM access as the critical path.
  • Planning on a tapeout later this year.
  • Achieve 50MHz on an FPGA where the bottleneck is the FPGA tools can’t register-rename the FPU.
  • BOOM is 9kloc, plus 11kloc from Rocket.
  • 9% CoreMarks/MHz for ARM Cortex-A9 with similar architectural parameters and smaller (but lacking the NEON unit). Power is also similar based on public numbers.
  • Don’t yet have a SPEC score. Need more DRAM on the FPGA and DRAM emulation. With a cluster of FPGAs, this should only take about a day to run.
  • BOOM-chip is currently a branch of rocket-chip. See the slides for how to get going.
  • Currently test/verify using riscv-tests, coremark, spec, and the riscv-torture tool.
  • riscv-torture, a randomised test generator was open-sourced yesterday. If it finds a bug, it will minimise the program for you.
  • A design document is a work in progress up on github.com/ccelio (doesn’t seem to be published yet?)
  • Want to grow a community of “baby BOOMers” :)

Bluespec’s “RISC-V Factory”: Rishiyur Nikhil

  • Bluespec’s ‘RISC-V Factory’ is aimed at organisations who want to create their own RISC-V based CPUs or SoCs without the usual learning curve, startup costs, and ownership costs.
  • Currently working with Draper on the DOVER project we’ll be hearing about in the next talk.
  • The Flute RISC-V CPU has interfaces for direct GDB control, an elastic (latency-insensitive) pipeline, hooks for optional tagged data.
  • Have components such as interconnect, memory controller, DMA engine, devices. Working on flash for booting and Ethernet.
  • Provide a complete development and verification environment.

DOVER, a metadata-extended RISC-V: Andre DeHon

  • Current computer systems are insecure, and the processor architecture contributes by blinding running code and making the secure and safe thing expensive.
  • Add software defined metadata processing as implemented in PUMP.
  • Give each word a programmable tag. This is indivisible from a word and its interpretation is programmable.
  • PUMP is a function from (Opcode, PCtag, Instrtag, RS1tag, RS2tag, MRtag) to (Allowed?, PCtag, Resulttag).
  • Possible policies include access control, types, fine-grained instruction permissions, memory safety, control flow integrity, taint tracking and information flow control.
  • Rules are installed by software on PUMP misses. This demands metadata structures be immutable.
  • A metadata tag can be a pointer, meaning it can point to a data structure of arbitrary size.
  • Can support composite policies. i.e. no limit of only one policy at once.
  • There are no instructions to read or write metadata, i.e. no set-tag or read tag. All tag manipulation is done through the PUMP.
  • In RISC-V use PUMP CSRs for rule inputs and outputs.
  • Compared to lowRISC: lowRISC has a limited number of tag bits, tags are accessible to user code. Good for self-protection safety but argue it’s not adequate to enforce policies against malicious code (i.e. code actively trying to circumvent protection).
  • Compare to Oracle M7. M7 has a limited number of colors and a fixed policy.
  • Have some global tags and rules so they have the same meaning across different processes.
  • Idiosyncrasies about RISC-V: one instruction uses RS3, sparse opcode space increases table size, multiple instructions per machine word (policies want tagged instructions)
  • Draper plans to make available Bluespec RISC-V and metadata changes, PUMP, set of basic micropolicies, runtime support and tools all under open source licenses.
  • Building a consortium around Dover, an “Inherently Secure Processing Hive”.
  • Question: what is the overhead? Don’t have figures yet for RISC-V. From previous work, have 10% area overhead and twice the area, 60% energy overhead.

Third RISC-V Workshop: Day One

The third RISC-V workshop is going on today and tomorrow at the Oracle Conference Center, California. I’ll be keeping a semi-live blog of talks and announcements throughout the day. See here for notes from the second day.

Introductions and RISC-V Foundation Overview: Rick O’Connor

  • Save the date, the 4th RISC-V workshop will be July 12th-13th at the MIT CSAIL/Stata Center.
  • In August 2015, articles of incorporation were filed to create a non-profit RISC-V Foundation to govern the ISA.
  • RISC-V Foundation mission statement: The RISC-V Foundation is a non-profit consortium chartered to standardize, protect, and promote the free and open RISC-V instruction set architecture together with its hardware and software ecosystem for use in all computing devices.
  • The RISC-V ISA and related standards shall remain open and license-free to all parties.
  • The compatibility test suites shall always be publicly available as a source code download.
  • To protect the standard, only members (with commercial RISC-V products) of the Foundation in good standing can use “RISC-V” and associated trademarks, and only for devices that pass the tests in the open-source compatibility suites maintained by the Foundation.
  • Drafting a new license for the ISA ‘combining the best of BSD, GPL and so on’.
  • The Foundation will have a board of 7 directors elected by the members. All members of committees must be members of the RISC-V Foundation.
  • All details of the foundation are a work in progress. Feedback is welcome!
  • 16 member companies so far. Bluespec, Draper, Google, Hewlett Packard Labs, Microsemi, Oracle, SiFive, antmicro, codasip, Gray Research, Lattice Semiconductor, lowRISC, ROA logic, Rumble Development, Syntacore, Technolution

RISC-V Updates: Krste Asanovic

  • The 1.9 version of the compressed extension is on track to become frozen and become blessed as v2.0. Now is the time to speak up if you have any feedback or concerns!
  • Privileged architecture 1.7 was released in May 2015 and has received a lot of feedback. Hope to release an updated draft addressing feedback in the next month or two. Doubt it will really settle down before the summer, as it needs more OS and driver development against it.
  • ‘V’ Vector Extension: not yet ready to release a draft for this workshop, but the RTL of the HWacha vector coprocessor has been open-sourced along with 3 tech-reports.
  • Hwacha is NOT the V extensions. It’s a research project designed to push the limits of in-order decoupled data-parallel accelerators (e.g. GPUs). The microarchitecture does demonstrate some of the ideas that will appear in V.
  • Ongoing ISA research at UCB: virtual local store (VLS, Henry Cook’s 2009 master’s thesis) and user-level DMA (copying data between DRAM and VLS directly).
  • New Berkeley open-source cores: BOOM out-of-order implementation and V-Scale (verilog implementation of Z-Scale).
  • RISC-V is being transitioned out of Berkeley. This involves upstreaming of tools and the RISC-V Foundation taking over the standards process.
  • SiFive has been founded by some of the lead RISC-V developers (Krste Asanovic, Yunsup Lee, Andrew Waterman). Sutter Hill Ventures have invested.
  • Most urgent outstanding issues: holes and ambiguities in the specification and the platform specification.
  • Holes in the spec: floating-point NaNs (resolved and updated), CSR read/write semantics (resolved, updated spec), memory model (far from resolved), Hypervisor support (no proposal). A formal spec of the whole ISA and memory model is desired.
  • Although RISC-V was designed in reusable layers, some concrete standards for hardware platforms are desirable e.g. memory maps, interrupt controller, power management. To what extent can platform specs be shared across microcontroller-class, application cores, and rack-scale designs?

RISC-V External Debug (aka JTAG debugging): Tim Newsome

  • Goals: a debug system that works for everybody with a working system done by July 1st 2016.
  • The specification will be submitted to the RISC-V Foundation and there are plans for an open source release of the debugger and implementations for Rocket and Z-Scale.
  • Status: the specification is mostly complete.
  • Features: (many, see the slides). Interested in tracing core execution to on or off-chip RAM and providing a framework to debug any component on the system, not just the RISC-V cores.
  • The debug transport module provides access to the system bus. It implements a message queue and optional authentication.
  • New CSRs will be added and exposed on the system bus.
  • The spec describes a small amount of debug memory (1KiB ROM) and 8-16 bytes of RAM. This memory is not cached and is shared between all cores.
  • Up to 4095 hardware breakpoints supported (but 4 is more typical). Each may support exact address match, address range match, exact data match, data range match, …
  • The work-in-progress spec will be posted to the hw-dev RISC-V mailing list later today. Comments very welcome.
  • Question: how does this map to gdb’s capabilities? Are there things it can do but gdb can’t or vice-versa? It’s not a 1:1 match but should be a superset of what gdb can do.
  • Question: how does the tracing scale? Hasn’t be investigated yet.
  • Question: will support be integrated into the BOOM codebase? Answer: not currently planned.
  • Question: there’s a wide spectrum of functionality that different implementations could implement. Is there a discovery mechanism for the functionality that is supported? Yes.

RISC-V in Axiom: Michael Gielda

  • Axiom is a fully open source 4K film camera design. It is an EU Horizon 2020 project with multiple partners.
  • Aim of Axiom is to create an extensible open source platform that is also a desirable project in itself. The aim is to open up what is currently a fairly closed industry and lower barriers of entry to new players. There is an obvious parallel to the RISC-V and lowRISC projects.
  • Chose the largest Zynq FPGA that had zero-cost tool support to maximise the number of people who can hack on the design.
  • Using the Z-Scale as a soft-core for communication between the FPGA pre-processing board (Kintex-7) and the FPGA SoC main board with a dual-core Cortex-A9 (Zynq 7030).
  • Long-term goals are to ensure the design can be reused through parameterisation, and look at broadening adoption of the Chisel design methodology.
  • The Axiom Gamma project is almost half-way done. There will be an EU technical review in March at which point it should be working.

Emulating future HPC SoC architectures using RISC-V: Farzad Fatollahi-Fard

  • Should HPC take inspiration from the embedded market?
  • Is building an SoC for HPC a good idea? HPC is power limited (performance/Watt) which arguably means HPC and embedded requirements are aligned.
  • From a previous project case study (Green Wave), they found an embedded SoC design was performance/power competitive with Fermi. This had a 12x12 2D on-chip torus network with 676 compute cores, 33 supervisor cores, 1 PCI express interface, 8 Hybrid Memory Cube interfaces, …
  • Proposed an FPGA-implemented SoC for HPC. This contains 4 Z-scale processors with a 2x2 concentrated mesh with 2 virtual channels. The Z-Scale was chosen for area-efficiency on FPGA.
  • The network is implemented using the OpenSoC fabric (open source and in chisel). AHB endpoints have now beed added and AXI is in-development.
  • A 96-core system was constructed using multiple FPGAs.
  • For more info, see CoDEx HPC.

GRVI Phalanx. A massively parallel RISC-V FPGA accelerator: Jan Gray

  • GRVI is pronounced ‘groovy’.
  • There’s lots of interest in FPGA accelerators right now (Altera acquisition, MSR’s catapult).
  • FPGAs are an interesting platform. Massively parallel. Specialized. Connected. High throughput. Low latency. The big barrier is of course porting your software. Jan argues OpenCL for FPGAs is a major breakthrough for this problem, if you’re lucky enough to have an application that can be expressed in OpenCL.
  • Phalanx is an accelerator accelerator - an infrastructure making it easier to run you application on an FPGA and connect everything together. It is composed of processor+accelerator clusters+NoC.
  • Jan’s Razor: “In a CMP, cut inessential resources from each CPU, to maximize CPUs per die.”
  • Jan has achieved an RV32I datapath in about 250 LUTs. This core can achieve 300-375MHz, 1.3-1.6CPI. The ‘GRVI’ core is ~320 6-LUTs so “1 MIPS/LUT”.
  • How many can you fit on a modern FPGA? The limiting resource is really the block RAMs. In a cluster, two processing elements can share an instruction BRAM, and all PEs can share a cluster memory.
  • How should the clusters be interconnected? A 5-port virtual channel router might be a sensible choice in an ASIC, but does not map well to an FPGA. Instead use a Hoplite 2D router. This is only 1% of the area x delay product of FPGA-optimized VC routers. Each cluster has a 300 bit connection to the Hoplite router (with a 256bit payload).
  • 400 of the GRVI Phalanx PEs can fit on a Xilinx KU040. The amortized cost of the router per processor is only 40 LUTs.
  • Can fit 32 GRVI Phalanx PEs on an Artix-7-35T.
  • Want to support different accelerated parallel programming models: SPMD, MIMD, MP. All potentially accelerated by custom GRVI And cluster function units, custom memory or interconnects, custom accelerators on the NOC.
  • Next steps: debug/trace over NoC, Hoplite/AXI4 bridges, OpenCL stack, potential bridge to Chisel RISC-V infrastructure?
  • Question: how do I get this? Not available yet, and not yet sure on the licensing model.

Coreboot on RISC-V: Ron Minnich

  • Initializing the stuff outside the main CPU on a chromebook takes about 1 billion instructions before it can start Linux.
  • Firmware, 1990-2005 “fire and forget”. Set al lthe stff kernels can’t do (e.g. LinuxBIOS), then get out of the way. But now there’s a push for the firmware to hang around after boot.
  • Ron argues this sucks. It’s slow, there’s no easy bugfix path, and it’s not SMP capable on x86.
  • Why doesn’t Ron like persistent firmware? It’s another attack vector, indistinguishable from a persistent embedded threat. Ron’s preference is the platform management code run as a kernel thread. Minion cores are ideal for this (Ron’s words rather than mine - I of course agree whole-heartedly).
  • coreboot is a GPLv2 BIOS replacement (not a bootloader). It has multiple possible payloads including SeaBIOS and depthcharge (used for verified boot on Chromebooks).
  • Port was started in October 2014 as a side project. The effort resumed in July 2015 with the privileged spec, and as-of September is up and running again. The most recent port runs on Spike but not QEMU (due to lack of support for the privileged spec).
  • RISC-V is a first class citizen in coreboot, all commits must pass tests for the RISC-V buildbot.
  • src/arch/riscv is 2685 LoC.
  • The Federal Office for Information Security in Germany runs a hardware test station for coreboot. As soon as real hardware is running, they’ve offered to integrate it into their system.
  • Lessons learned
    • provide a boot time SRAM (make sure the address is fixed and not aliased by DRAM once DRAM is up).
    • Provide a serial port.
    • Ron reiterates that runtime functions belong in the kernel, not persistent firmware.
    • Firmware tables always need translation by kernel, so make them text not binary.
    • Keep the mask ROM as simple as possible.
    • Don’t cheap out on SPI or flash part size. Just plan a 64MiB part.
    • Don’t reset the chipset on IE device not present.

RISC-V and UEFI: Dong Wei and Abner Chang

  • There is a UEFI Forum consistent of a board of 12 directors, 12 promoters, 42 contributors, 213 adopters.
  • UEFI and ACPI are both now handled by the UEFI Forum.
  • A RISC-V UEFI port is taking place using EDKII (EFI Development Kit II) and OVMF (Open Virtual Machine Firmware).
  • The speakers are giving a very thorough description of the UEFI boot mechanism which I’m not able to do justice. You’re best waiting for the slides+video I’m afraid.
  • The project was started a few months ago, and can now boot to a UEFI shell.
  • They have created a new RISC-V QEMU target with some PC peripherals (CMOS, PM, PCI and other devices), and also implemented RISC-V machine mode.
  • Requests for new RISC-V spec additions: a periodic timer CSR, RTC with alarm CSR, PI management mode support, … (sorry, missed some).

FreeBSD and RISC-V: Ruslan Bukin

  • FreeBSD will support RV64I in release 11.0.
  • Why use FreeBSD? Among other reasons, it gives a full-stack BSD license (RISC-V, FreeBSD, LLVM/Clang).
  • FreeBSD has been brought up on Spike.
  • The early boot assembly code will put the hardware in a known state, build a ring buffer for interrupts, initialise the page tables, enable the MMU, then finally branch to a virtual address and call into C code.
  • Userspace porting required modifications to jemalloc, csu (crt1.S, crtn.S, crti.S), libc, msun (libm), rtld-elf (the runtime linker).
  • The FreeBSD port is based on the ARMv8 port. It has a 25k line diff and took 6 months from scratch.
  • Userland started working in December. Support will now be committed to FreeBSD SVN.
  • Next plans include multicore, FPU, increasing the virtual address space, DTrace, performance monitoring counters, QEMU, …
  • Proposed changes: split sptbr to sptrbr0 and sptbr1 for the user VA and the kernel VA. This means there is no need to change SPTBR when changing the privilege level, and should reduce code size.
  • For more on the project, see the relevant FreeBSD wiki page.

Building the RISC-V software ecosystem: Arun Thomas

  • “2016 is the year of RISC-V”. Or at least, the year of RISC-V software. We have a great opportunity to push the software stack forwards.
  • What can we achieve by the end of the year? Hopefully upstereamed GNU toolchain and QEMU. More mature Clang/LLVM support, upstreamed OS support, Debian/RISC-V port, start thinking about Android and a real-time OS.
  • How do we get there? We need to recruit more RISC-V developers and make it easier for people to get started by producing more docs and specifications.
  • Right now, the RISC-V Github has had 48 contributors from a wide range of Universities, companies and OSS projects.
  • We should present talks and tutorials at developer conferences and local user group meetings.
  • If you have local patches, upstream them!
  • How to attract developers? Could fund developers/projects via the Foundation, apply to be a Google Summer of Code mentoring organization, update the list of open bugs and future requests on github and track contribution statistics.
  • We can make it much easier for people to get started by building Debian packages, upstreaming, and providing regular binary snapshots.
  • Spike is great for prototyping hardware features, but QEMU is a better tool for software development and a critical part of the RISC-V software story.
  • There’s more to specify. e.g. a platform specification (e.g. ARMv8 Server Base System Architecture), boot architecture (look at the ARMv8 Server Base Boot Requirements), RISC-V ABI, hypervisor, security.
  • Useful documents include a RISC-V Assembly Guide, some equivalent of the ARM Cortex-A Programmer’s Guide, and a New Contributor’s Guide.

Untethered lowRISC release

Over the past several months, we’ve been working to provide a standalone or ‘untethered’ SoC. Cores in the original Rocket chip rely on communicating with a companion processor via the host-target interface (HTIF) to access peripherals and I/O. This release removes this requirement, adding an I/O bus and instantiating FPGA peripherals. The accompanying tutorial, written by Wei Song, describes how to build this code release and explains the underlying structural changes. We support both the Xilinx KC705 and the lower-priced Nexys4 DDR development boards. We would gladly welcome assistance in supporting other boards.

Please note that the codebase temporarily lacks support for tagged memory included in the previous release. We plan to re-integrate tagged memory support with additional optimisations early next year. You can find a detailed list of changes in the release notes. One highlight is support for RTL simulation using the open-source Verilator tool.

This development milestone should make it easier for others to contribute. If you’re looking to get stuck in, you might want to consider looking at tasks such as:

  • Cleaning up the RISC-V Linux port, improving devicetree support and removing the host-target interface.
  • Replacing use of proprietary peripheral IP with open-source IP cores.
  • Adding support for different FPGA development boards, including Altera boards.
  • Implementing the BERI Programmable Interrupt Controller (p73), and adding necessary Linux support.

Our next development priorities are the re-integration of tagged memory support and an initial integration of a minion core design. We also expect to put out a job advert in the next few weeks for a new member of the lowRISC development team at the University of Cambridge Computer Laboratory. Interested applicants are encouraged to make informal enquiries about the post to Rob Mullins Robert.Mullins@cl.cam.ac.uk.

We hope to see many of you at the 3rd RISC-V Workshop in January, where Wei Song and Alex Bradbury will be presenting about lowRISC.