Welcome to the lowRISC project!

lowRISC is producing fully open hardware systems. From the processor core to the development board, our goal is to create a completely open computing eco-system. Find out more

Latest news

Second RISC-V Workshop: Day Two

It’s the second day of the second RISC-V workshop today in Berkeley, California. I’ll be keeping a semi-live blog of talks and announcements throughout the day.

Z-scale. Tiny 32-bit RISC-V Systems: Yunsup Lee

  • Z-Scale is a family of tiny cores, similar in spirit to the ARM Cortex-M family. It integrates with the AHB-Lite interconnect.
  • Contrast to Rocket (in-order cores, 64-bit, 32-bit, dual-issue options), and BOOM (a family of out-of-order cores).
  • Z-Scale is a 32-bit 3-stage single-issue in-order pipeline executing the RV32IM ISA.
  • The instruction bus and data base are 32-bit AHB-Lite buses
  • There is a plan to publish a microarchitecture specification to make it easy for others to implement an equivalent design in the language of their choice.
  • The Zscale is slightly larger than the Cortex-M0 due to having 32 vs 16 registers, 64-bit performance counters, and a fast multiply and divide. The plan is to add an option to generate a Zscale implementing RV32E (i.e. only having 16 registers).
  • Zscale is only 604 loc in Chisel. 274 lines for control, 267 for the datapath, and 63 for the top-level. Combine with 983loc borrowed from Rocket.
  • A Verilog implementation of Z-scale is being implemented. It’s currently 1215 lines of code.
  • The repo is here, but Yunsup needs to do a little more work to make it easily buildable. There will be a blog post on the RISC-V site soon.
  • All future Rocket development will move to the public rocket-chip repo!
  • Memory interfaces:
    • TileLink is the Berkeley cache-coherent interconnect
    • NASTI (Berkeley implementation of AXI4)
    • HASTI (implementation of AHB-lite)
    • POCI (implementation of APB)
  • The plan is to dump HTIF in Rocket, and add a standard JTAG debug interface.
  • Future work for Z-Scale includes a microarchitecture document, improving performance, implementing the C extensions, adding an MMU option, and adding more devices.

BOOM. Berkeley Out-of-order-Machine: Chris Celio

  • BOOM is a (work in progress) superscalar, out-of-order RISC-V processor written in Chisel.
  • Chris argues there’s been a general lack of effort in academia to build and evaluate out-of-order designs. As he points out, much research relies on software simulators with no area or power numbers.
  • Some of the difficult questions for BOOM are which benchmarks to use, and how many cycles you need to run. He points out that mapping to FPGA running at 50MHz, it would take around a day for the SPEC benchmarks for a cluster of FPGAs.
  • The fact that rs1, rs2, rs3 and rd are always in the same space in the RISC-V ISA allows decode and rename to proceed in parallel.
  • BOOM supports the full RV64G standard. It benefits from reusing Rocket as a library of components.
  • BOOM uses explicit renaming, with a unified register file holding both x-regs and f-regs (floating point). A unified issue window holds all instructions.
  • BOOM is synthesisable and hits 2GHz (30 FO4) in TSMC 45nm.
  • BOOM is 9kloc of its own code, and pulls in 11.5kloc from other libraries (rocket, uncore, floating poing)
  • BOOM compares well to an ARM Cortex-A9 and A15 in terms of CoreMark/MHz. A 4-wide BOOM gives a similar CoreMark/MHz to the A15.
  • Future work will look at new applications, a ROCC interface, new microarchitecture designs. The plan is to open source by this winter.

Fabscalar RISC-V: Rangeen Basu Roy Chowdhury

  • A FabScalar RISC-V version should be released in the next few days
  • FabScalar generates synthesisable RTL for arbitrary superscalar cores with a canonical superscalar template.
  • FabScalar uses a library of pipeline stages, providing many different designs for each canonical pipeline stage.
  • Two chips have been built with FabScalar so far (using PISA).
  • The RISC-V port was built on the previous PIA ‘Superset Core’. This had 64-bit instructions and 32-bit address and data.
  • For RISC-V FabScalar they have a unified physical register file and unified issue queue for floating point (so the FP ALU is treated like just another functional unit).
  • FabScalar RISC-V will be released as an open source tool complete with uncore components and verification infrastructure. It will be available on GitHub in the fall.
  • The license isn’t yet decided, but there’s a good chance it will be BSD.

Aristotle. A Logically Determined (Clockless) RISC-V RV32I: Matthew Kim

  • Two logical values are defined. Data and null (not data). Then define threshold operators to produce ‘null convention logic’.
  • See here for more on Null Convention Logic
  • This results in a system built entirely of combinational logic. I couldn’t hope to accurately summarise the work here. I’m afraid you might be best off waiting for the recording.
  • Current executing compiled quicksort at approximately 400mips (without serious optimisation).

RISC-V(erification): Prashanth Mundkur

  • Current architectures e.g. those from Intel and ARM have large errata sheets published semi-regularly. Can we do better for RISC-V?
  • Need an unambiguous formal ISA specification which should be coupled to a processor implementation amenable to the two, with a formal link between the two.
  • Currently specifying RISC-V in the L3 DSL. The interpreter is used as a reference oracle for processor implementations.
  • The current state of the spec is available on Github.
  • The work has already helped to highlight some areas where clarification is needed in the written specification
  • Next steps would involve support for the compressed instruction set and floating point, booting Linux, and using for tandem-verification (e.g. with Flue from Bluespec).
  • Hope to export usable HOL4 formal definitions, and use that to prove information properties (e.g. non-interference and information flow in low-level privileged code).
  • The talk is now moving to the second half, where Prashanth is presenting Nirav Dave’s work
  • This work is looking at rapidly verifying architectural and micro-architectural variants of RISC-V. Rely on translating between microarchitectural-states and ISA-level states.

Towards General-Purpose Tagged Memory: Wei Song

Raven3, a 28nm RISC-V Vector Processor with On-Chip DC/DC Convertors: Brian Zimmer

  • Support dynamic voltage and frequency scaling on-chip with no off-chip components.
  • Want to switch all converters simultaneously to avoid charge sharing. The clock frequency adapts to track the voltage ripple.
  • Raven has a RISC-V scalar core, vector accelerator. 16KB scalar instruction cache, 32KB shared data cache, and 8KB instruction cache. This has a 1.19mm^2 area.
  • The converter area is 0.19mm^2
  • The chip was wire-bonded on to a daughter-board, which was then connected to a larger motherboard connected to a Zedboard
  • Converter transitions are less than 20ns, which allows extremely fine-grained DVFS.
  • Raven3 showed 80% efficiency across a wide voltage range and achieved 26GFLOPS/W using the on-chip conversion.

Evaluating RISC-V Cores for PULP. An Open Parallel Ultra-Low-Power Platform : Sven Stucki

  • Approximately 40 people working on PULP in some way
  • Ultimate (ambitious) goal is one 1GOPS/mW (or 1pJ/op). Also hope to achieve energy proportionality.
  • Plan is to be open source on GitHub
  • PULP has been silicon-proven in 28nm, 65nm, 130nm and 180nm. The team have tape-outs planned through to 2016.
  • Sven has replaced the OpenRISC frontend with a RISC-V decoder, hoping to take advantage of the more active community and compressed instruction set support.
  • PULP is a simple 4-stage design supporting RV32IC as well as the mul instruction from the M extension.
  • Synthesising for UMC65, they see 22kilo-gate equivalent per core
  • The OR10N core was a new OpenRISC implementation with support for hardware loops, pre/postincrement memory access and vector instructions.
  • Heading for a GlobalFoundries 28nm tapeout in Q4 2015
  • See more on PULP at the website.

Alex Bradbury

Second RISC-V Workshop: Day One

The second RISC-V workshop is going on today and tomorrow in Berkeley, California. I’ll be keeping a semi-live blog of talks and announcements throughout the day.

Introductions and welcome: Krste Asanović

  • The beginning of Krste’s talk will be familiar for anyone who’s seen an introduction to RISC-V before. Pleasingly, there are a lot of new faces here at the workshop so the introduction of course makes a lot of sense.
  • Although the core RISC-V effort is focused on the ISA specification, there is interest in looking to expand this to look at how to standardise access to I/O etc.
  • RV32E is a “pre-emptive strike” at those who might be tempted to fragment the ISA space for very small cores. It is a 16-register subset of RV32I.
  • The compressed instruction set has been released since the last workshop, there will be talk later today about it. It gives 25-30% code size reduction, and surprisingly there’s still lots of 16-bit encode space for additional extensions.
  • Krste makes the point that AArch64 has 8 addressing modes vs just 1 for RISC-V. The comparison of the size of the GCC/LLVM backends is perhaps less interesting given that the ARM backend actually has rather a lot more optimisations.
  • “Simplicity breeds contempt”. “So far, no evidence more complex ISA is justified for general code”
  • Will be talking about a Cray-style vector ISA extension later today (i.e.
    not packed-SIMD ISA or GPU-style).
  • Rocket core is only about ~12kloc of Chisel in total. ~5kloc for the processor, ~2kloc for floating-point units, ~4.6kloc for ‘uncore’ (coherence hubs, L2, caches etc).
  • State of the RISC-V Nation: many companies ‘kicking the tires’. If you were thinking of designing your own RISC ISA for project, then use RISC-V. If you need a complete working support core today then pay $M for an industry core.
    If you need it in 6 months, then consider spending that $M on RISC-V development.
  • RISC-V Foundation is being formed, a 501©(6), with Rick O’Conner as Executive Director. The mission statement is “to standardize, protect, and promote the free and open RISC-V instruction set architecture and its hardware and software ecosystem for use in all computing devices”. Plan is to publicly announce before HotChips later this year and is actively recruiting companies who want to be ‘founding’ members. You will need to be a member of the foundation in good standing to use the RISC-V trademark (unless you are a non-profit).

An update on lowRISC: Alex Bradbury

  • Many thanks to the audience for all the questions. My slides are available here.
  • Unfortunately the SHAKTI project team from India have been caught up in the malfunctioning US State Department computer systems and so haven’t been able to get visas to give their talk

Compressed Extension Proposal: David Patterson

  • Looked at existing compressed instruction sets, and tried to simplify things and throw away ideas that add complexity but provide little benefit.
  • Ended up with a specification that is pleasingly minimal, with each instruction decoding to a single RV32 instruction.
  • Keen on community feedback on additional RVC instructions. Identified a set of 24 that have little impact on current compiler-generated code, but could be useful for some use cases.
  • You can read the RVC spec here.
  • Points out that Thumb2 is only a 32-bit address ISA. Although it is slightly smaller than RV32C, the RISC-V compressed spec has the benefit of supporting 64-bit addressing.
  • Rather than adding the complexity of load multiple and store multiple, experimented with adding calls to a function that does the same thing. This hurts performance, but gives a large benefit for code size.
  • One question was on the power consumption impact. Don’t have numbers on that
  • Should we require the compressed instruction set? Don’t want to add it to the minimal ‘I’ instruction set, but could add it to the standard expected by Linux.

GoblinCore64. A RISC-V Extension for Data Intensive Computing: John Leidel

  • Building a processor design aimed at data intensive algorithms and applications. Applications tend to be very cache unfriendly.
  • GC64 (Goblin Core) has a thread control unit. A very small micro-coded unit (e.g. implement RV64C) is microcoded to perform the contest switching task.
  • Have added user-visible registers for thread id, thread context, task exception register etc etc.
  • The GKEY supervisor register contains a 64-bit key loaded by the kernel. It determines whether a task may spawn and execute work on neighboring task processors, providing a very rudimentary protection mechanism.
  • Making use of RV128I - it’s not just there for fun!
  • Support various instruction extensions, e.g. IWAIT, SWPAWN, JOIN, GETTASK, SETTASK. Basic operations needed to write a thread management system (such as pthreads) implemented as microcoded instructions in the RISC-V ISA.
  • Also attempting to define the data structures which contain task queue data.
  • Currently looking at lowRISC-style minion cores to implement microcoded memory coalescing units.
  • Read the GC64 specification doc here.

Vector Extension Proposal: Krste Asanović

  • Goals: efficient and scalable to all reasonable design points. Be a good compiler target, and to support implicit auto-vectorisation through OpenMP and explicit SPMD (OpenCL) programming models. Want to work with virtualisation layers, and fit into the standard 32-bit encoding space.
  • Krste is critical of GPUs for general compute. I can summarise his arguments here, but the slides will be well worth a read. Krste has spent decades working on vector machines.
  • With packed SIMD you tend to need completely new instructions for wider SIMD. Traditional vector machines allow you to set the vector length register to provide a more uniform programming model. This makes loop strip-mining more straight-forward.
  • Add up to 32 vector data registers (v0-v31) in addition to the basic scalar x and f registers. Each vector register is at least 3 elements each, with variable bits per element. Also add 8 vector predicate registers, with 1-bit per element. Finally, add vector configuration and vector length CSR registers.
  • Other features
    • Reconfigurable vector registers allow you to exchange unused architectural registers for longer vectors. e.g. if you only need 4 architectural vector registers you’ll have a larger vector length.
    • Mixed-precision support
    • Intenger, fixed-point, floating-point arithmetic
    • Unit-stride, strided, indexed load/stores
    • Predication
  • Mixed-precision support allows you to subdivide a physical register into multiple narrower architectural registers as requested.
  • Sam binary code works regardless of number of physical register bits and the number of physical lanes.
  • Use a polymorphic instruction encoding. e.g. a single signed integer ADD opcode that works on different size inputs and outputs.
  • Have separate integer and floating-point loads and stores, where the size is inherent in the destination register number.
  • All instructions are implicitly predicated under the first predicate register by default.
  • What is the difference between V and Hwacha? Hwacha is a non-standard Berkeley vector extensions design to push the state-of-the-art for in-order/decoupled vector machines. There are similarities in the lane microarchitecture. Current focus is bringing up OpenCL for Hwacha, with the V extension to follow.
  • Restartable page faults are supported. Similar to the DEC Vector VAX.
  • Krste pleads people not to implement a packed SIMD extension, pointing out that a minimal V implementation would be very space efficient.

Privileged Architecture Proposal: Andrew Waterman

  • Aims to provide a clean split between layers of the stack.
  • You can read the privileged spec here.
  • Supports four privilege modes. User, Supervisor, Hypervisor and Machine mode.
  • For a simple embedded system that only needs M-mode there is a low implementation cost. Only 2^7 bits of architectural state in addition to the user ISA, plus 2^7 more bits for timers and another 2^7 for basic performance counters.
  • Defined the basic virtual memory architectures to support current Unix-style operating systems. The design is fairly conventional, using 4KiB pages.
  • Why go with 4KiB pages rather than 8KiB as was the initial plan? Concerned with porting software hard-coded to expect 4KiB pages. Also concerns about internal fragmentation.
  • Physical memory attributes such as cacheability are not encoded in the page table in RISC-V. Two major reasons that Andrew disagrees with this are that the granularity may not be tied to the page size, plus it is problematic for virtualisation. Potentially coherent DMA will become more common meaning you needn’t worry about these attributes.
  • Want to support device interactions via a virtio-style interface.
  • The draft Supervisor Binary Interface will be released with the next privileged ISA draft. It includes common functionality for TLB shootdowns, reboot/shutdown, sending inter-processor interrupts etc etc. This is a similar idea to the PALCode on the Alpha.
  • Hardware-accelerated virtualization (H-mode) is planned, but not yet specified.
  • A draft version of v1.8 of the spec is expected this summer, with a frozen v2.0 targeted for the fall.

RapidIO. The Unified Fabric for Performance-Critical Computing: Rick O’Connor

  • There are more 10Gbps RapidIO ports on the planet than there are 10Gbps Ethernet ports. This is primarily due to the 100% market penetration in 4G/LTE and 60% global 3G.
  • The IIT Madras team are using RapidIO extensively for their RISC-V work
  • Has been doing work in the data center and HPC space. Looking to use the AXI ACE and connect that to RapidIO.
  • There is interesting work on an open source RapidIO stack.

CAVA. Cluster in a rack: Peter Hsu

  • Problem: designing a new computer is expensive. But 80% is the same every time.
  • CAVA is not the same as the Oracle RAPID project.
  • Would like to build a 1024-node cluster in a rack. DDR4 3200 = 25.6GB/s per 64-bit channel. Each 1U card would be about 600W with 32 nodes.
  • Looking at a 96-core 10nm chip (scaled from a previous 350nm project).
    Suppose you have a 3-issue out of order core (600K gates) and 32KiB I+d cache, that would be around 0.24mm^2 in 10nm.
  • Estimate a vector unit might be around the same area.
  • Peter has detailed estimates for per-chip power, but it’s probably best to refer to the slides for these.
  • Research plan for the cluster involves a unified simulation environment, running on generic clusters of x86 using open-source software. Everyone uses the same simulator to perform “apples to apples” comparison. This allows easy replication of published work.
  • Simulation infrastructure will involve a pipeline siumlator, SoC simulator (uncore logic), and a network simulator.
  • Interested in applying Cray-style vectors to database workloads
  • Could also have the ability to make associativity ways and possible individual cache lines lockable.

Alex Bradbury

Summer of Code students for lowRISC

lowRISC was fortunate enough to be chosen as a mentoring organisation in this year’s Google Summer of Code. The Google Summer of Code program funds students to work on open source projects over the summer. We had 52 applications across the range of project ideas we’ve been advertising. As you can see from the range of project ideas, lowRISC is taking part as an umbrella organisation, working with a number of our friends in the wider open source software and hardware community. We were allocated three slots from Google, and given the volume of high quality applications making the selection was tremendously difficult. We have actually been able to fund an additional three applicants from other sources, but even then there were many promising projects we couldn’t support. We are extremely grateful to all the students who put so much time and effort in to their proposals, and to everyone who volunteered to mentor. The six ‘summer of code’ projects for lowRISC are:

  • An online Verilog IDE based on YosysJS. Baptiste Duprat mentored by Clifford Wolf

    • Baptiste will be working with an Emscripten-compiled version of the Yosys logic synthesis tool, building an online Verilog IDE on top of it which would be particularly suitable for training and teaching materials. A big chunk of the proposed work is related to visualisation of the generated logic. Improving the accessibility of hardware design is essential for growing the potential contributor base to open source hardware projects like lowRISC, and this is just the start of our efforts in that space.
  • Porting seL4 to RISC-V. Hesham ALMatary mentored by Stefan Wallentowitz

    • seL4 is a formally verified microkernel, which currently has ports for x86 and ARM. Hesham will be performing a complete port to RISC-V/lowRISC. Security and microkernels are of great interest to many in the community. It’s also a good opportunity to expand RISC-V platform support and to put the recently released RISC-V Privileged Architecture Specification through its paces. Hesham previously performed a port of RTEMS to OpenRISC.
  • Porting jor1k to RISC-V. Prannoy Pilligundla mentored by Sebastian Macke

    • jor1k is by far the fastest Javascript-based full system simulator. It also features a network device, filesystem support, and a framebuffer. Prannoy will be adding support for RISC-V and look at supporting some of the features we offer on lowRISC such as minion cores or tagged memory. This will be great not only as a demo, but also have practical uses in tutorial or educational material.
  • TCP offload to minion cores using rump kernels. Sebastian Wicki mentored by Justin Cormack

    • The intention here is to get a rump kernel (essentially a libified NetBSD) running bare-metal on a simple RISC-V system and evaluate exposing the TCP/IP stack for use by other cores. e.g. a TCP/IP offload engine running on a minion core. TCP offload is a good starting point, but of course the same concept could be applied elsewhere. For example, running a USB mass storage driver (and filesystem implementation) on a minion core and providing a simple high-level interface to the application cores.
  • Extend Tavor to support directed generation of assembly test cases. Yoann Blein mentored by Markus Zimmermann

    • Tavor is a sophisticated fuzzing tool implemented in Go. Yoann will be extending it to more readily support specifying instruction set features and generating a fuzzing suite targeting an ISA such as RISC-V. Yoann has some really interesting ideas on how to go about this, so I’m really interested in seeing where this on ends up.
  • Implement a Wishbone to TileLink bridge and extend TileLink documentation. Thomas Repetti mentored by Wei Song

    • Wishbone is the interconnect of choice for most existing open source IP cores, including most devices on opencores.org. The Berkeley Rocket RISC-V implementation uses their own ‘TileLink’ protocol (we provide a brief overview. By providing a reusable bridge, this project will allow the easy reuse of opencores devices and leverage the many man-years of effort that has already gone in to them.

The first 3 of the above projects are part of Google Summer of Code and the bottom 3 directly funded, operating over roughly the same timeline. We’re also going to be having two local students interning with us here at the University of Cambridge Computer Lab starting towards the end of June, so it’s going to be a busy and productive summer. It bears repeating just how much we appreciate the support of everyone involved so far - Google through their Summer of Code initiative, the students, and those who’ve offered to act as mentors. We’re very excited about these projects, so please join us in welcoming the students involved to our community. If you have any questions, suggestions, or guidance please do leave them in the comments.

Alex Bradbury

lowRISC tagged memory preview release

We’re pleased to announce the first lowRISC preview release, demonstrating support for tagged memory as described in our memo. Our ambition with lowRISC is to provide an open-source System-on-Chip platform for others to build on, along with low-cost development boards featuring a reference implementation. Although there’s more work to be done on the tagged memory implementation, now seemed a good time to document what we’ve done in order for the wider community to take a look. Please see our full tutorial which describes in some detail the changes we’ve made to the Berkeley Rocket core, as well as how you can build and try it out for yourself (either in simulation, or on an FPGA). We’ve gone to some effort to produce this documentation, both to document our work, and to share our experiences building upon the Berkeley RISC-V code releases in the hopes they’ll be useful to other groups.

The initial motivation for tagged memory was to prevent control-flow hijacking attacks, though there are a range of other potential uses including fine-grained memory synchronisation, garbage collection, and debug tools.
Please note that the instructions used to manipulate tagged memory in this release (ltag and stag) are only temporary and chosen simply because they require minimal changes to the core pipeline. Future work will include exploring better ISA support, collecting performance numbers across a range of tagged memory uses and tuning the tag cache. We are also working on developing an ‘untethered’ version of the SoC with the necessary peripherals integrated for standalone operation.

If you’ve visited lowrisc.org before, you’ll have noticed we’ve changed a few things around. Keep an eye on this blog (and its RSS feed) to keep an eye on developments - we expect to be updating at least every couple of weeks. We’re very grateful to the RISC-V team at Berkeley for all their support and guidance. A large portion of the credit for this initial code release goes to Wei Song, who’s been working tirelessly on the HDL implementation.