Welcome to the lowRISC project!

lowRISC is creating a fully open-sourced, Linux-capable, RISC-V-based SoC, that can be used either directly or as the basis for a custom design. We aim to tape out our first volume chip this year. Find out more

Latest news

Apply now to work with lowRISC in Google Summer of Code

We are very grateful to have been selected to take part as a mentoring organisation in the Google Summer of Code for the second year running. As with last year, we’re working with a number of friends from across the wider open source hardware community to act as an umbrella for a range of hardware-related projects. If you are a student who would like to be paid to work on open source during the summer, then take a look at the lowRISC ideas list and apply. As was pointed out on the Google Open Source Program’s blog, there is a good showing from hardware-related projects in GSoC this year. The deadline for applications is this coming Friday, 25th March at 7pm GMT.

We welcome ideas of your own creation, but the ideas we’ve suggested this year include:

  • A trace debug analysis tool (ideally using TypeScript and Electron)
  • Improving device-tree support for the Linux RISC-V port
  • Various ideas related to the Yosys open-source synthesis tool.
  • Porting a teaching OS such as xv6 or XINU to the lowRISC platform
  • Porting CMSIS-DSP to PULPino
  • Doom on PULPino
  • Porting the Arduino libraries to PULPino
  • Integrating additional open-source IP for lowRISC on FPGA
  • Implementing a Trusted Execution Environment
  • Porting musl libc to RISC-V
  • A Generic hardware/software interface for software-defined radio
  • A SPIR-V frontend for Nyuzi
  • Porting an OS kernel to Nyuzi

Third RISC-V Workshop: Day Two

Today is the second day of the third RISC-V workshop. Again, I’ll be keeping a semi-live blog of talks and announcements throughout the day. See here for notes from the first day.

RISC-V ASIC and FPGA implementations: Richard Herveille

  • Look for freedom of design. Want to free migrate between FPGAs, structured ASICs, standard cell ASICs
  • Want to make it easier to migrate FPGAs to ASICs for advantages in price, performance, power, IP protection.
  • Roa Logic’s RV32I/64 implementations are called RV11 and RV22. RV11 is in-order, single-issue, single thread. RV22 is in-order, dual-issue and dual thread.
  • Implement a ‘folded’ optimizing 5-stage pipeline, where some classic RISC stages are folded together for performance improvement. e.g. the instruction decode stage decides if the instruction sequence can be optimized.
  • Ported a debug unit for or1k from OpenCores
  • Mostly target the eASIC nextreme platform. Start with an existing FPGA design, then transfer.
  • Achieved Fmax of 649MHz (32-bit core) on a nextreme-3, vs 114MHz on the customer’s current CYCLONE-V. Achieved a 70% power reduction.
  • Next steps are to improve resource utilization, increase offering of extensions, and add multi-threading and multi-issue.

lowRISC: plans for RISC-V in 2016: Alex Bradbury

  • Find my slides here. Apologies for not live-blogging my own talk.

A 32-bit 100MHz RISC-V microcontroller with 10-bit SAR ADC in 130nm GP: Elkim Roa

  • Current goal: a low-footprint RISC-V microcontroller like EFM32 or SAMD11 with USB low-speed PHY on-chip.
  • Looked at picorv32 and vscale, but ultimately implemented their own implementation. Also adding in a 10-bit SAR ADC, DAC, PLL, GPIO.
  • Implement RV32IM using a 3-stage pipeline. IRQ handling is adapted from the picorv32 timer.
  • Provides AXI-4 lite and an APB bridge.
  • SAR ADC intends to be fully synthesizable, 10-bit 10MHz.
  • The chip was taped out in October 2015 on 130nm TSMC GP. The core+interfaces area was 800um x 480um.
  • Undertook a large effort on verification, implemented verification testbenches for AXI-4 and APB peripheral functionality. Would like to partner to get access to proven VIP.
  • Future work to be done on a USB PHY low-speed interface, DMA channels, watchdog timer, eNVM 1-poly ROM
  • Question: where and when can I download it? Soon! Still cleaning it up.

SoC for satelline navigation unit based on the RISC-V single-core Rocket chip: Sergei Khabarov

  • Currently have an RF-mezzanine card for FPGA prototypes, and silicon-verified GNSS IP and ASIC development board with a LEON3 CPU inside.
  • On the software, have universal receive firmware and plug-and-play support for different targets. Plus a host application for data analysis. See gnss-snsor.com.
  • Now transitioning from the previous 180nm ASIC to a new 90nm chip with a RISC-V core. The target frequency is 300MHz.
  • The new SoC design aims to take the best ideas of the GPL-licensed grlib library (Gaisler Research) and will be written in VHDL.
  • Current code can be found here.
  • Plug and play approach taken from grlib to help quickly assemble a complex SoC design. Device ID, vendor ID, address and interrupt configuration, cacheability etc etc routed in sideband signals accessible via a dedicated slave device on a system bus.
  • Some memory access optimizations have been implemented to allow access to AXI peripherals in one clock cycle.
  • Implemented (or implementing?) the multi-core debug protocol, potentially supported by Trace32 (Lauterbach).

RISC-V photonic processor: Chen Sun

  • Process scaling has helped massively for data transfer within a chip, but we’ve had little improvement for moving data off-chip.
  • The I/O wall involves being both power and pin limited. Silicon photonics may help overcome this.
  • Started by tring to provide DRAM connected by photonics (as part of the DARPA POEM project).
  • What about the foundry? Do it without a foundry. How to connect electronics and photonics? Put them on the same chip. Where are you going to get a processor? RISC-V!
  • They produced it and it was published in Nature last month. Dual-core 1.65GHz RISC-V. Manufactured in a commercial SOI process.
  • Build a waveguide with planar silicon processing. Silicon is the high-index core. Oxides form the low-index cladding.
  • Transmitter is driven by a CMOS logic inverter. 5Gb/s data rate at 30fj/b.
  • Issue with ring resonators is massive variation based on process and temperature variation. Need to stabilise this some-how. Add thermal tuning.
  • To demonstrate the optical memory system, had a second chip emulating a DRAM controller.
  • We proposed an architecture, built it, and got performance competitive to our predictions!

Untethering the RISC-V Rocket Chip: Wei Song

  • Rocket is an open-source SoC generator from Berkeley. The base Rocket core is a 56 stage single-issue in-order processor.
  • Previously an host-target interface was connected to the L2 bus which communicates with an ARM core on the Zynq to provide peripherals.
  • The untethered Rocket chip adds a separate I/O bus. Currently uses Xilinx IP for peripherals. First stage bootloader is on FPGA block RAM, second stage bootloader is loaded from SD. I/O read and write are totally uncached.
  • The top-level (including I/O devices) is all in System Verilog. There is a separate ‘Chisel island’ containing the Rocket interface.
  • The I/O and memory maps can be configured by CSRs.
  • The first-stage bootloader copies the second stage to DRAM, performing an uncached copy. It then remaps the DRAM to memory address 0, resets Rocket and starts the second stage. The second stage uses a version of the Berkeley bootloader. It starts multi-core, VM support, then loads and boots RISC-V Linux in a virtual address space.
  • Currently the second stage bootloader stays resident to handle HTIF requests.
  • Our code release contains a very detailed tutorial. Key features include support for the Nexys4 as well as the more expensive KC705. You can also use Verilator for simulation and a free WebPACK Vivado license.
  • The code release includes a rewritten TileLink/NASTI interface, DDR2/3 controller, SD, UART.
  • Future work: re-integrate tagged memory, remove HTIF from Linux kernel (help wanted), interrupt controller, trace debugger (Stefan Wallentowitz), run-control debug (SiFive), platform spec.

MIT’s RISCY expedition: Andy Wright

  • MIT implemented an IMAFD 64-bit RISC-V processor in Bluespec System Verilog. It supports machine, supervisor, and user modes, boots Linux, and has been tandem-verified with Spike.
  • Want to work on formal specification of the ISA, formally verified processor implementations, memory consistency models, accelerators, microarchitecture exploration, VLSI implementations using a standard ASIC flow.
  • Philosophy: get a working processor first, figure out why it’s slow, and make it faster without breaking it.
  • Moving to work on formal verification, which requests a formal specification for RISC-V. Lots of questions to be answered, e.g. whether referenced bits in page table entries should be set for speculatively accessed pages.
  • A single instruction can result in up to 13 effective memory accesses - how do these interact with each other, and how do they influence the memory model?
  • Want simple operational definitions where legal behaviours can be observable on a simple abstract machine consisting of cores and a shared monolithic memory.
  • Looked at defining WMM, a new easy-to-specify weak consistency model.
  • Propose there should be a new instruction similar to sfence.vm, but going in the opposite direction.
  • See also their website.
  • Question: will it be open source? Concerned currently because some aspects of the design are used as challenges for student projects and releasing it could compromise the projects.

Pydgin for RISC-V, a fast and productive instruction-set simulator: Berkin Ilbeyi

  • Simple interpreted instruction set simulators get 1-10MIPS of performance. Typical dynamic binary translation may achieve 100s of MIPS, with QEMU achieving up to 1000 MIPS.
  • Another aspect worthy of consideration is productivity when working with the simulator, for instance when looking to extend it to explore new hardware features. Can you achieve high developer productivity and runtime performance?
  • Observe there are similar productivity-performance challenges for building high performance language runtimes as for simulators. e.g. the PyPy project.
  • Pydgin uses PyPy’s RPython framework.
  • Pydgin describes its own architectural description language (really a Python DSL).
  • Pydgin running on a standard Python interpreter gives ~100KIPS. But when going through RPython this gives 10MIPS. When targeting the RPython JIT, adding extra RPython JIT hints are added achieved up to a 23x performance improvement over no annotations.
  • Performs 2-3x better than Spike for many workloads. Spike caches decoded instructions and uses PC-indexed dispatch to improve performance.
  • Achieved a 100MIPS+ RISC-V port after 9 days of development.
  • Pydgin is used in the Cornell research group to gain statistics for software-defined regions, experimentations with data-structure specialisation, control and memory divergence for SIMD, and more.
  • Pydgin is online at Github.

ORCA, FPGA-optimized RISC-V soft processors: Guy Lemieux

  • ORCA is completely open-source. See it at Github.
  • Initially targeted the Lattice iCE40 (3.5kLUTs, under $5 in low quantities).
  • RV32M implemented in 2kLUTs at around 20MHz on the iCE40.
  • ORCA is highly parameterized, ideally suitable for FPGAs, portable across FPGA vendors, and BSD-licensed.
  • Achieved 244MHz, 212MIPS on an Altera Stratix-V. Lots of room for further improvements.
  • Clock speed is close to matching the picorv32 clock speed, but with higher DMIPS/MHz.
  • Found 64-bit vs 32-bit counters added a lot of area.
  • A good FPGA implementation often leads to a good ASIC implementation, but a good ASIC implementation often leads to a poor FPGA implementation.
  • Use dual-ported block RAMS on the FPGA for the register file.
  • Observe that reduced register count in RV32E makes no difference for FPGAs. Divide is very expensive.
  • Beware when writing software, a shift could be as slow as 1b/cycle.
  • The privileged architecture spec contains too many CSRs and the 64bit counters are too big, putting pressure on multiplexers. For FPGAs, perhaps defined small/med/full versions.

PULPino, a small single-core RISC-V SoC: Andreas Traber

  • PULP and PULPino developed at ETH Zurich and University of Bologna with many partners.
  • Develop an ultra low power processor for computing IoT. Explot parallelism using multiple small simple cores organised in clusters.
  • Share memory within the cluster.
  • Support near-threshold operation for very low power.
  • PULP has been taped out over a dozen times across multiple process technologies, down to 28nm.
  • PULP has a large number of IPs. To start with, open source PULPino as a starting point.
  • PULPino is a microcontroller-style platform. No caches, no memory hierarchy, no DMA. It re-uses IP from the PULP project and will be taped out in 65nm UMC.
  • The boot ROM loads a program from SPI flash.
  • Motivated to switch to RISC-V due to more modern design (no set flags, no delay slot), compressed instructions, easily extensible.
  • Looking to extend RISC-V with non-standard extensions for hardware loops, post-increment load+store, multiply-accumulate, ALU extensions (min, max absolute value).
  • RI5CY core has full support for RV32I, implements just the mul from RV32M. It has a 4 stage pipeline.
  • Performed a comparison based on published Cortex M4 numbers. RI5CY is a little faster and a little smaller.
  • The RI5CY core itself is just 7% of the area of a PULPino SoC (assuming 32KiB instruction and data RAM).
  • Open source release will follow shortly, including a FreeRTOS port. Using the Solderpad license. Just awaiting final approval from the University (expected by the end of the month).
  • Want to port PULPino to IP-XACT. Also want to add floating point support, branch prediction, and evaluate further non-standard ISA extensions.

The Berkeley Out-of-Order Machine (BOOM): Christopher Celio

  • An out-of-order core. It’s synthesizable, parameterizable, and open-source.
  • Out-of-order is great for tolerating variable latencies, finding instruction-level parallelism, and working with poor compilers or lazily written code.
  • Downsides are it’s more complex and potentially expensive in area and power.
  • Should work as an interesting baseline for micro-architecture research. Also enables research that need and out of order core (e.g. on memory systems, accelerators, VLSI methodologies).
  • BOOM implements IMAFD and the RV64G+ privileged spec.
  • The RISC-V ISA is easy to implement. Relaxed memory model, accrued FP exception flags, no integer side-effects (e.g. condition codes), no cmov or predication, no implicit register specifiers, rs1+rs2+rs3+rd are always in the same space allowing decode and rename to proceed in parallel.
  • As Rocket-chip gets better, so does BOOM.
  • The host-target interface is being removed from rocket-chip to provide an untethered system.
  • BOOM has a unified physical register file (floating point and integer).
  • Masses of parameters can be tweaked.
  • 2-wide BOOM with 16KiB L1 caches 1.2mm2 in TSMC 45nm. Can clock at 1.5GHz for two-wide.
  • Currently designed for single-cycle SRAM access as the critical path.
  • Planning on a tapeout later this year.
  • Achieve 50MHz on an FPGA where the bottleneck is the FPGA tools can’t register-rename the FPU.
  • BOOM is 9kloc, plus 11kloc from Rocket.
  • 9% CoreMarks/MHz for ARM Cortex-A9 with similar architectural parameters and smaller (but lacking the NEON unit). Power is also similar based on public numbers.
  • Don’t yet have a SPEC score. Need more DRAM on the FPGA and DRAM emulation. With a cluster of FPGAs, this should only take about a day to run.
  • BOOM-chip is currently a branch of rocket-chip. See the slides for how to get going.
  • Currently test/verify using riscv-tests, coremark, spec, and the riscv-torture tool.
  • riscv-torture, a randomised test generator was open-sourced yesterday. If it finds a bug, it will minimise the program for you.
  • A design document is a work in progress up on github.com/ccelio (doesn’t seem to be published yet?)
  • Want to grow a community of “baby BOOMers” :)

Bluespec’s “RISC-V Factory”: Rishiyur Nikhil

  • Bluespec’s ‘RISC-V Factory’ is aimed at organisations who want to create their own RISC-V based CPUs or SoCs without the usual learning curve, startup costs, and ownership costs.
  • Currently working with Draper on the DOVER project we’ll be hearing about in the next talk.
  • The Flute RISC-V CPU has interfaces for direct GDB control, an elastic (latency-insensitive) pipeline, hooks for optional tagged data.
  • Have components such as interconnect, memory controller, DMA engine, devices. Working on flash for booting and Ethernet.
  • Provide a complete development and verification environment.

DOVER, a metadata-extended RISC-V: Andre DeHon

  • Current computer systems are insecure, and the processor architecture contributes by blinding running code and making the secure and safe thing expensive.
  • Add software defined metadata processing as implemented in PUMP.
  • Give each word a programmable tag. This is indivisible from a word and its interpretation is programmable.
  • PUMP is a function from (Opcode, PCtag, Instrtag, RS1tag, RS2tag, MRtag) to (Allowed?, PCtag, Resulttag).
  • Possible policies include access control, types, fine-grained instruction permissions, memory safety, control flow integrity, taint tracking and information flow control.
  • Rules are installed by software on PUMP misses. This demands metadata structures be immutable.
  • A metadata tag can be a pointer, meaning it can point to a data structure of arbitrary size.
  • Can support composite policies. i.e. no limit of only one policy at once.
  • There are no instructions to read or write metadata, i.e. no set-tag or read tag. All tag manipulation is done through the PUMP.
  • In RISC-V use PUMP CSRs for rule inputs and outputs.
  • Compared to lowRISC: lowRISC has a limited number of tag bits, tags are accessible to user code. Good for self-protection safety but argue it’s not adequate to enforce policies against malicious code (i.e. code actively trying to circumvent protection).
  • Compare to Oracle M7. M7 has a limited number of colors and a fixed policy.
  • Have some global tags and rules so they have the same meaning across different processes.
  • Idiosyncrasies about RISC-V: one instruction uses RS3, sparse opcode space increases table size, multiple instructions per machine word (policies want tagged instructions)
  • Draper plans to make available Bluespec RISC-V and metadata changes, PUMP, set of basic micropolicies, runtime support and tools all under open source licenses.
  • Building a consortium around Dover, an “Inherently Secure Processing Hive”.
  • Question: what is the overhead? Don’t have figures yet for RISC-V. From previous work, have 10% area overhead and twice the area, 60% energy overhead.

Third RISC-V Workshop: Day One

The third RISC-V workshop is going on today and tomorrow at the Oracle Conference Center, California. I’ll be keeping a semi-live blog of talks and announcements throughout the day. See here for notes from the second day.

Introductions and RISC-V Foundation Overview: Rick O’Connor

  • Save the date, the 4th RISC-V workshop will be July 12th-13th at the MIT CSAIL/Stata Center.
  • In August 2015, articles of incorporation were filed to create a non-profit RISC-V Foundation to govern the ISA.
  • RISC-V Foundation mission statement: The RISC-V Foundation is a non-profit consortium chartered to standardize, protect, and promote the free and open RISC-V instruction set architecture together with its hardware and software ecosystem for use in all computing devices.
  • The RISC-V ISA and related standards shall remain open and license-free to all parties.
  • The compatibility test suites shall always be publicly available as a source code download.
  • To protect the standard, only members (with commercial RISC-V products) of the Foundation in good standing can use “RISC-V” and associated trademarks, and only for devices that pass the tests in the open-source compatibility suites maintained by the Foundation.
  • Drafting a new license for the ISA ‘combining the best of BSD, GPL and so on’.
  • The Foundation will have a board of 7 directors elected by the members. All members of committees must be members of the RISC-V Foundation.
  • All details of the foundation are a work in progress. Feedback is welcome!
  • 16 member companies so far. Bluespec, Draper, Google, Hewlett Packard Labs, Microsemi, Oracle, SiFive, antmicro, codasip, Gray Research, Lattice Semiconductor, lowRISC, ROA logic, Rumble Development, Syntacore, Technolution

RISC-V Updates: Krste Asanovic

  • The 1.9 version of the compressed extension is on track to become frozen and become blessed as v2.0. Now is the time to speak up if you have any feedback or concerns!
  • Privileged architecture 1.7 was released in May 2015 and has received a lot of feedback. Hope to release an updated draft addressing feedback in the next month or two. Doubt it will really settle down before the summer, as it needs more OS and driver development against it.
  • ‘V’ Vector Extension: not yet ready to release a draft for this workshop, but the RTL of the HWacha vector coprocessor has been open-sourced along with 3 tech-reports.
  • Hwacha is NOT the V extensions. It’s a research project designed to push the limits of in-order decoupled data-parallel accelerators (e.g. GPUs). The microarchitecture does demonstrate some of the ideas that will appear in V.
  • Ongoing ISA research at UCB: virtual local store (VLS, Henry Cook’s 2009 master’s thesis) and user-level DMA (copying data between DRAM and VLS directly).
  • New Berkeley open-source cores: BOOM out-of-order implementation and V-Scale (verilog implementation of Z-Scale).
  • RISC-V is being transitioned out of Berkeley. This involves upstreaming of tools and the RISC-V Foundation taking over the standards process.
  • SiFive has been founded by some of the lead RISC-V developers (Krste Asanovic, Yunsup Lee, Andrew Waterman). Sutter Hill Ventures have invested.
  • Most urgent outstanding issues: holes and ambiguities in the specification and the platform specification.
  • Holes in the spec: floating-point NaNs (resolved and updated), CSR read/write semantics (resolved, updated spec), memory model (far from resolved), Hypervisor support (no proposal). A formal spec of the whole ISA and memory model is desired.
  • Although RISC-V was designed in reusable layers, some concrete standards for hardware platforms are desirable e.g. memory maps, interrupt controller, power management. To what extent can platform specs be shared across microcontroller-class, application cores, and rack-scale designs?

RISC-V External Debug (aka JTAG debugging): Tim Newsome

  • Goals: a debug system that works for everybody with a working system done by July 1st 2016.
  • The specification will be submitted to the RISC-V Foundation and there are plans for an open source release of the debugger and implementations for Rocket and Z-Scale.
  • Status: the specification is mostly complete.
  • Features: (many, see the slides). Interested in tracing core execution to on or off-chip RAM and providing a framework to debug any component on the system, not just the RISC-V cores.
  • The debug transport module provides access to the system bus. It implements a message queue and optional authentication.
  • New CSRs will be added and exposed on the system bus.
  • The spec describes a small amount of debug memory (1KiB ROM) and 8-16 bytes of RAM. This memory is not cached and is shared between all cores.
  • Up to 4095 hardware breakpoints supported (but 4 is more typical). Each may support exact address match, address range match, exact data match, data range match, …
  • The work-in-progress spec will be posted to the hw-dev RISC-V mailing list later today. Comments very welcome.
  • Question: how does this map to gdb’s capabilities? Are there things it can do but gdb can’t or vice-versa? It’s not a 1:1 match but should be a superset of what gdb can do.
  • Question: how does the tracing scale? Hasn’t be investigated yet.
  • Question: will support be integrated into the BOOM codebase? Answer: not currently planned.
  • Question: there’s a wide spectrum of functionality that different implementations could implement. Is there a discovery mechanism for the functionality that is supported? Yes.

RISC-V in Axiom: Michael Gielda

  • Axiom is a fully open source 4K film camera design. It is an EU Horizon 2020 project with multiple partners.
  • Aim of Axiom is to create an extensible open source platform that is also a desirable project in itself. The aim is to open up what is currently a fairly closed industry and lower barriers of entry to new players. There is an obvious parallel to the RISC-V and lowRISC projects.
  • Chose the largest Zynq FPGA that had zero-cost tool support to maximise the number of people who can hack on the design.
  • Using the Z-Scale as a soft-core for communication between the FPGA pre-processing board (Kintex-7) and the FPGA SoC main board with a dual-core Cortex-A9 (Zynq 7030).
  • Long-term goals are to ensure the design can be reused through parameterisation, and look at broadening adoption of the Chisel design methodology.
  • The Axiom Gamma project is almost half-way done. There will be an EU technical review in March at which point it should be working.

Emulating future HPC SoC architectures using RISC-V: Farzad Fatollahi-Fard

  • Should HPC take inspiration from the embedded market?
  • Is building an SoC for HPC a good idea? HPC is power limited (performance/Watt) which arguably means HPC and embedded requirements are aligned.
  • From a previous project case study (Green Wave), they found an embedded SoC design was performance/power competitive with Fermi. This had a 12x12 2D on-chip torus network with 676 compute cores, 33 supervisor cores, 1 PCI express interface, 8 Hybrid Memory Cube interfaces, …
  • Proposed an FPGA-implemented SoC for HPC. This contains 4 Z-scale processors with a 2x2 concentrated mesh with 2 virtual channels. The Z-Scale was chosen for area-efficiency on FPGA.
  • The network is implemented using the OpenSoC fabric (open source and in chisel). AHB endpoints have now beed added and AXI is in-development.
  • A 96-core system was constructed using multiple FPGAs.
  • For more info, see CoDEx HPC.

GRVI Phalanx. A massively parallel RISC-V FPGA accelerator: Jan Gray

  • GRVI is pronounced ‘groovy’.
  • There’s lots of interest in FPGA accelerators right now (Altera acquisition, MSR’s catapult).
  • FPGAs are an interesting platform. Massively parallel. Specialized. Connected. High throughput. Low latency. The big barrier is of course porting your software. Jan argues OpenCL for FPGAs is a major breakthrough for this problem, if you’re lucky enough to have an application that can be expressed in OpenCL.
  • Phalanx is an accelerator accelerator - an infrastructure making it easier to run you application on an FPGA and connect everything together. It is composed of processor+accelerator clusters+NoC.
  • Jan’s Razor: “In a CMP, cut inessential resources from each CPU, to maximize CPUs per die.”
  • Jan has achieved an RV32I datapath in about 250 LUTs. This core can achieve 300-375MHz, 1.3-1.6CPI. The ‘GRVI’ core is ~320 6-LUTs so “1 MIPS/LUT”.
  • How many can you fit on a modern FPGA? The limiting resource is really the block RAMs. In a cluster, two processing elements can share an instruction BRAM, and all PEs can share a cluster memory.
  • How should the clusters be interconnected? A 5-port virtual channel router might be a sensible choice in an ASIC, but does not map well to an FPGA. Instead use a Hoplite 2D router. This is only 1% of the area x delay product of FPGA-optimized VC routers. Each cluster has a 300 bit connection to the Hoplite router (with a 256bit payload).
  • 400 of the GRVI Phalanx PEs can fit on a Xilinx KU040. The amortized cost of the router per processor is only 40 LUTs.
  • Can fit 32 GRVI Phalanx PEs on an Artix-7-35T.
  • Want to support different accelerated parallel programming models: SPMD, MIMD, MP. All potentially accelerated by custom GRVI And cluster fucntion units, custom memory or interconnects, custom accelerators on the NOC.
  • Next steps: debug/trace over NoC, Hoplite/AXI4 bridges, OpenCL stack, potential bridge to Chisel RISC-V infrastructure?
  • Question: how do I get this? Not available yet, and not yet sure on the licensing model.

Coreboot on RISC-V: Ron Minnich

  • Initializing the stuff outside the main CPU on a chromebook takes about 1 billion instructions before it can start Linux.
  • Firmware, 1990-2005 “fire and forget”. Set al lthe stff kernels can’t do (e.g. LinuxBIOS), then get out of the way. But now there’s a push for the firmware to hang around after boot.
  • Ron argues this sucks. It’s slow, there’s no easy bugfix path, and it’s not SMP capable on x86.
  • Why doesn’t Ron like persistent firmware? It’s another attack vector, indistinguishable from a persistent embedded threat. Ron’s preference is the platform management code run as a kernel thread. Minion cores are ideal for this (Ron’s words rather than mine - I of course agree whole-heartedly).
  • coreboot is a GPLv2 BIOS replacement (not a bootloader). It has multiple possible payloads including SeaBIOS and depthcharge (used for verified boot on Chromebooks).
  • Port was started in October 2014 as a side project. The effort resumed in July 2015 with the privileged spec, and as-of September is up and running again. The most recent port runs on Spike but not QEMU (due to lack of support for the privileged spec).
  • RISC-V is a first class citizen in coreboot, all commits must pass tests for the RISC-V buildbot.
  • src/arch/riscv is 2685 LoC.
  • The Federal Office for Information Security in Germany runs a hardware test station for coreboot. As soon as real hardware is running, they’ve offered to integrate it into their system.
  • Lessons learned
    • provide a boot time SRAM (make sure the address is fixed and not aliased by DRAM once DRAM is up).
    • Provide a serial port.
    • Ron reiterates that runtime functions belong in the kernel, not persistent firmware.
    • Firmware tables always need translation by kernel, so make them text not binary.
    • Keep the mask ROM as simple as possible.
    • Don’t cheap out on SPI or flash part size. Just plan a 64MiB part.
    • Don’t reset the chipset on IE device not present.

RISC-V and UEFI: Dong Wei and Abner Chang

  • There is a UEFI Forum consistent of a board of 12 directors, 12 promoters, 42 contributors, 213 adopters.
  • UEFI and ACPI are both now handled by the UEFI Forum.
  • A RISC-V UEFI port is taking place using EDKII (EFI Development Kit II) and OVMF (Open Virtual Machine Firmware).
  • The speakers are giving a very thorough description of the UEFI boot mechanism which I’m not able to do justice. You’re best waiting for the slides+video I’m afraid.
  • The project was started a few months ago, and can now boot to a UEFI shell.
  • They have created a new RISC-V QEMU target with some PC peripherals (CMOS, PM, PCI and other devices), and also implemented RISC-V machine mode.
  • Requests for new RISC-V spec additions: a periodic timer CSR, RTC with alarm CSR, PI management mode support, … (sorry, missed some).

FreeBSD and RISC-V: Ruslan Bukin

  • FreeBSD will support RV64I in release 11.0.
  • Why use FreeBSD? Among other reasons, it gives a full-stack BSD license (RISC-V, FreeBSD, LLVM/Clang).
  • FreeBSD has been brought up on Spike.
  • The early boot assembly code will put the hardware in a known state, build a ring buffer for interrupts, initialise the page tables, enable the MMU, then finally branch to a virtual address and call into C code.
  • Userspace porting required modifications to jemalloc, csu (crt1.S, crtn.S, crti.S), libc, msun (libm), rtld-elf (the runtime linker).
  • The FreeBSD port is based on the ARMv8 port. It has a 25k line diff and took 6 months from scratch.
  • Userland started working in December. Support will now be committed to FreeBSD SVN.
  • Next plans include multicore, FPU, increasing the virtual address space, DTrace, performance monitoring counters, QEMU, …
  • Proposed changes: split sptbr to sptrbr0 and sptbr1 for the user VA and the kernel VA. This means there is no need to change SPTBR when changing the privilege level, and should reduce code size.
  • For more on the project, see the relevant FreeBSD wiki page.

Building the RISC-V software ecosystem: Arun Thomas

  • “2016 is the year of RISC-V”. Or at least, the year of RISC-V software. We have a great opportunity to push the software stack forwards.
  • What can we achieve by the end of the year? Hopefully upstereamed GNU toolchain and QEMU. More mature Clang/LLVM support, upstreamed OS support, Debian/RISC-V port, start thinking about Android and a real-time OS.
  • How do we get there? We need to recruit more RISC-V developers and make it easier for people to get started by producing more docs and specifications.
  • Right now, the RISC-V Github has had 48 contributors from a wide range of Universities, companies and OSS projects.
  • We should present talks and tutorials at developer conferences and local user group meetings.
  • If you have local patches, upstream them!
  • How to attract developers? Could fund developers/projects via the Foundation, apply to be a Google Summer of Code mentoring organization, update the list of open bugs and future requests on github and track contribution statistics.
  • We can make it much easier for people to get started by building Debian packages, upstreaming, and providing regular binary snapshots.
  • Spike is great for prototyping hardware features, but QEMU is a better tool for software development and a critical part of the RISC-V software story.
  • There’s more to specify. e.g. a platform specification (e.g. ARMv8 Server Base System Architecture), boot architecture (look at the ARMv8 Server Base Boot Requirements), RISC-V ABI, hypervisor, security.
  • Useful documents include a RISC-V Assembly Guide, some equivalent of the ARM Cortex-A Programmer’s Guide, and a New Contributor’s Guide.

Untethered lowRISC release

Over the past several months, we’ve been working to provide a standalone or ‘untethered’ SoC. Cores in the original Rocket chip rely on communicating with a companion processor via the host-target interface (HTIF) to access peripherals and I/O. This release removes this requirement, adding an I/O bus and instantiating FPGA peripherals. The accompanying tutorial, written by Wei Song, describes how to build this code release and explains the underlying structural changes. We support both the Xilinx KC705 and the lower-priced Nexys4 DDR development boards. We would gladly welcome assistance in supporting other boards.

Please note that the codebase temporarily lacks support for tagged memory included in the previous release. We plan to re-integrate tagged memory support with additional optimisations early next year. You can find a detailed list of changes in the release notes. One highlight is support for RTL simulation using the open-source Verilator tool.

This development milestone should make it easier for others to contribute. If you’re looking to get stuck in, you might want to consider looking at tasks such as:

  • Cleaning up the RISC-V Linux port, improving devicetree support and removing the host-target interface.
  • Replacing use of proprietary peripheral IP with open-source IP cores.
  • Adding support for different FPGA development boards, including Altera boards.
  • Implementing the BERI Programmable Interrupt Controller (p73), and adding necessary Linux support.

Our next development priorities are the re-integration of tagged memory support and an initial integration of a minion core design. We also expect to put out a job advert in the next few weeks for a new member of the lowRISC development team at the University of Cambridge Computer Laboratory. Interested applicants are encouraged to make informal enquiries about the post to Rob Mullins Robert.Mullins@cl.cam.ac.uk.

We hope to see many of you at the 3rd RISC-V Workshop in January, where Wei Song and Alex Bradbury will be presenting about lowRISC.

lowRISC at ORConf 2015

Please join us October 9th-11th in Geneva, Switzerland for ORConf 2015. The event is kindly being hosted by CERN at the IdeaSquare. Last year’s ORConf was home to the first public talk on lowRISC and we’re delighted this year it will also be hosting a series of lowRISC and RISC-V discussions, serving as a European lowRISC and RISC-V workshop. ORConf has in recent years grown to cover a range of open source hardware topics beyond the original OpenRISC focus. Expect presentations and discussion on free and open source IP projects, implementations on FPGA and in silicon, verification, EDA tools, licensing and embedded software, to name a few.

The event will run from 13:00 until 18:30 on Friday, 09:30 until 19:30 on Saturday, and from 09:30 until 15:30 on Sunday. Friday will consist primarily of breakout sessions, planning, and discussion regarding lowRISC. If you are already contributing or your are thinking of getting involved and want to learn more, you are very welcome to join us. If you would like to present, do submit a proposal either via the link at the ORConf website or to me at asb@lowrisc.org. We hope to see many of you there - please register here.

Second RISC-V Workshop: Day Two

It’s the second day of the second RISC-V workshop today in Berkeley, California. I’ll be keeping a semi-live blog of talks and announcements throughout the day.

Z-scale. Tiny 32-bit RISC-V Systems: Yunsup Lee

  • Z-Scale is a family of tiny cores, similar in spirit to the ARM Cortex-M family. It integrates with the AHB-Lite interconnect.
  • Contrast to Rocket (in-order cores, 64-bit, 32-bit, dual-issue options), and BOOM (a family of out-of-order cores).
  • Z-Scale is a 32-bit 3-stage single-issue in-order pipeline executing the RV32IM ISA.
  • The instruction bus and data base are 32-bit AHB-Lite buses
  • There is a plan to publish a microarchitecture specification to make it easy for others to implement an equivalent design in the language of their choice.
  • The Zscale is slightly larger than the Cortex-M0 due to having 32 vs 16 registers, 64-bit performance counters, and a fast multiply and divide. The plan is to add an option to generate a Zscale implementing RV32E (i.e. only having 16 registers).
  • Zscale is only 604 loc in Chisel. 274 lines for control, 267 for the datapath, and 63 for the top-level. Combine with 983loc borrowed from Rocket.
  • A Verilog implementation of Z-scale is being implemented. It’s currently 1215 lines of code.
  • The repo is here, but Yunsup needs to do a little more work to make it easily buildable. There will be a blog post on the RISC-V site soon.
  • All future Rocket development will move to the public rocket-chip repo!
  • Memory interfaces:
    • TileLink is the Berkeley cache-coherent interconnect
    • NASTI (Berkeley implementation of AXI4)
    • HASTI (implementation of AHB-lite)
    • POCI (implementation of APB)
  • The plan is to dump HTIF in Rocket, and add a standard JTAG debug interface.
  • Future work for Z-Scale includes a microarchitecture document, improving performance, implementing the C extensions, adding an MMU option, and adding more devices.

BOOM. Berkeley Out-of-order-Machine: Chris Celio

  • BOOM is a (work in progress) superscalar, out-of-order RISC-V processor written in Chisel.
  • Chris argues there’s been a general lack of effort in academia to build and evaluate out-of-order designs. As he points out, much research relies on software simulators with no area or power numbers.
  • Some of the difficult questions for BOOM are which benchmarks to use, and how many cycles you need to run. He points out that mapping to FPGA running at 50MHz, it would take around a day for the SPEC benchmarks for a cluster of FPGAs.
  • The fact that rs1, rs2, rs3 and rd are always in the same space in the RISC-V ISA allows decode and rename to proceed in parallel.
  • BOOM supports the full RV64G standard. It benefits from reusing Rocket as a library of components.
  • BOOM uses explicit renaming, with a unified register file holding both x-regs and f-regs (floating point). A unified issue window holds all instructions.
  • BOOM is synthesisable and hits 2GHz (30 FO4) in TSMC 45nm.
  • BOOM is 9kloc of its own code, and pulls in 11.5kloc from other libraries (rocket, uncore, floating poing)
  • BOOM compares well to an ARM Cortex-A9 and A15 in terms of CoreMark/MHz. A 4-wide BOOM gives a similar CoreMark/MHz to the A15.
  • Future work will look at new applications, a ROCC interface, new microarchitecture designs. The plan is to open source by this winter.

Fabscalar RISC-V: Rangeen Basu Roy Chowdhury

  • A FabScalar RISC-V version should be released in the next few days
  • FabScalar generates synthesisable RTL for arbitrary superscalar cores with a canonical superscalar template.
  • FabScalar uses a library of pipeline stages, providing many different designs for each canonical pipeline stage.
  • Two chips have been built with FabScalar so far (using PISA).
  • The RISC-V port was built on the previous PIA ‘Superset Core’. This had 64-bit instructions and 32-bit address and data.
  • For RISC-V FabScalar they have a unified physical register file and unified issue queue for floating point (so the FP ALU is treated like just another functional unit).
  • FabScalar RISC-V will be released as an open source tool complete with uncore components and verification infrastructure. It will be available on GitHub in the fall.
  • The license isn’t yet decided, but there’s a good chance it will be BSD.

Aristotle. A Logically Determined (Clockless) RISC-V RV32I: Matthew Kim

  • Two logical values are defined. Data and null (not data). Then define threshold operators to produce ‘null convention logic’.
  • See here for more on Null Convention Logic
  • This results in a system built entirely of combinational logic. I couldn’t hope to accurately summarise the work here. I’m afraid you might be best off waiting for the recording.
  • Current executing compiled quicksort at approximately 400mips (without serious optimisation).

RISC-V(erification): Prashanth Mundkur

  • Current architectures e.g. those from Intel and ARM have large errata sheets published semi-regularly. Can we do better for RISC-V?
  • Need an unambiguous formal ISA specification which should be coupled to a processor implementation amenable to the two, with a formal link between the two.
  • Currently specifying RISC-V in the L3 DSL. The interpreter is used as a reference oracle for processor implementations.
  • The current state of the spec is available on Github.
  • The work has already helped to highlight some areas where clarification is needed in the written specification
  • Next steps would involve support for the compressed instruction set and floating point, booting Linux, and using for tandem-verification (e.g. with Flue from Bluespec).
  • Hope to export usable HOL4 formal definitions, and use that to prove information properties (e.g. non-interference and information flow in low-level privileged code).
  • The talk is now moving to the second half, where Prashanth is presenting Nirav Dave’s work
  • This work is looking at rapidly verifying architectural and micro-architectural variants of RISC-V. Rely on translating between microarchitectural-states and ISA-level states.

Towards General-Purpose Tagged Memory: Wei Song

Raven3, a 28nm RISC-V Vector Processor with On-Chip DC/DC Convertors: Brian Zimmer

  • Support dynamic voltage and frequency scaling on-chip with no off-chip components.
  • Want to switch all converters simultaneously to avoid charge sharing. The clock frequency adapts to track the voltage ripple.
  • Raven has a RISC-V scalar core, vector accelerator. 16KB scalar instruction cache, 32KB shared data cache, and 8KB instruction cache. This has a 1.19mm^2 area.
  • The converter area is 0.19mm^2
  • The chip was wire-bonded on to a daughter-board, which was then connected to a larger motherboard connected to a Zedboard
  • Converter transitions are less than 20ns, which allows extremely fine-grained DVFS.
  • Raven3 showed 80% efficiency across a wide voltage range and achieved 26GFLOPS/W using the on-chip conversion.

Evaluating RISC-V Cores for PULP. An Open Parallel Ultra-Low-Power Platform : Sven Stucki

  • Approximately 40 people working on PULP in some way
  • Ultimate (ambitious) goal is one 1GOPS/mW (or 1pJ/op). Also hope to achieve energy proportionality.
  • Plan is to be open source on GitHub
  • PULP has been silicon-proven in 28nm, 65nm, 130nm and 180nm. The team have tape-outs planned through to 2016.
  • Sven has replaced the OpenRISC frontend with a RISC-V decoder, hoping to take advantage of the more active community and compressed instruction set support.
  • PULP is a simple 4-stage design supporting RV32IC as well as the mul instruction from the M extension.
  • Synthesising for UMC65, they see 22kilo-gate equivalent per core
  • The OR10N core was a new OpenRISC implementation with support for hardware loops, pre/postincrement memory access and vector instructions.
  • Heading for a GlobalFoundries 28nm tapeout in Q4 2015
  • See more on PULP at the website.

Alex Bradbury

Second RISC-V Workshop: Day One

The second RISC-V workshop is going on today and tomorrow in Berkeley, California. I’ll be keeping a semi-live blog of talks and announcements throughout the day.

Introductions and welcome: Krste Asanović

  • The beginning of Krste’s talk will be familiar for anyone who’s seen an introduction to RISC-V before. Pleasingly, there are a lot of new faces here at the workshop so the introduction of course makes a lot of sense.
  • Although the core RISC-V effort is focused on the ISA specification, there is interest in looking to expand this to look at how to standardise access to I/O etc.
  • RV32E is a “pre-emptive strike” at those who might be tempted to fragment the ISA space for very small cores. It is a 16-register subset of RV32I.
  • The compressed instruction set has been released since the last workshop, there will be talk later today about it. It gives 25-30% code size reduction, and surprisingly there’s still lots of 16-bit encode space for additional extensions.
  • Krste makes the point that AArch64 has 8 addressing modes vs just 1 for RISC-V. The comparison of the size of the GCC/LLVM backends is perhaps less interesting given that the ARM backend actually has rather a lot more optimisations.
  • “Simplicity breeds contempt”. “So far, no evidence more complex ISA is justified for general code”
  • Will be talking about a Cray-style vector ISA extension later today (i.e.
    not packed-SIMD ISA or GPU-style).
  • Rocket core is only about ~12kloc of Chisel in total. ~5kloc for the processor, ~2kloc for floating-point units, ~4.6kloc for ‘uncore’ (coherence hubs, L2, caches etc).
  • State of the RISC-V Nation: many companies ‘kicking the tires’. If you were thinking of designing your own RISC ISA for project, then use RISC-V. If you need a complete working support core today then pay $M for an industry core.
    If you need it in 6 months, then consider spending that $M on RISC-V development.
  • RISC-V Foundation is being formed, a 501©(6), with Rick O’Conner as Executive Director. The mission statement is “to standardize, protect, and promote the free and open RISC-V instruction set architecture and its hardware and software ecosystem for use in all computing devices”. Plan is to publicly announce before HotChips later this year and is actively recruiting companies who want to be ‘founding’ members. You will need to be a member of the foundation in good standing to use the RISC-V trademark (unless you are a non-profit).

An update on lowRISC: Alex Bradbury

  • Many thanks to the audience for all the questions. My slides are available here.
  • Unfortunately the SHAKTI project team from India have been caught up in the malfunctioning US State Department computer systems and so haven’t been able to get visas to give their talk

Compressed Extension Proposal: David Patterson

  • Looked at existing compressed instruction sets, and tried to simplify things and throw away ideas that add complexity but provide little benefit.
  • Ended up with a specification that is pleasingly minimal, with each instruction decoding to a single RV32 instruction.
  • Keen on community feedback on additional RVC instructions. Identified a set of 24 that have little impact on current compiler-generated code, but could be useful for some use cases.
  • You can read the RVC spec here.
  • Points out that Thumb2 is only a 32-bit address ISA. Although it is slightly smaller than RV32C, the RISC-V compressed spec has the benefit of supporting 64-bit addressing.
  • Rather than adding the complexity of load multiple and store multiple, experimented with adding calls to a function that does the same thing. This hurts performance, but gives a large benefit for code size.
  • One question was on the power consumption impact. Don’t have numbers on that
    yet.
  • Should we require the compressed instruction set? Don’t want to add it to the minimal ‘I’ instruction set, but could add it to the standard expected by Linux.

GoblinCore64. A RISC-V Extension for Data Intensive Computing: John Leidel

  • Building a processor design aimed at data intensive algorithms and applications. Applications tend to be very cache unfriendly.
  • GC64 (Goblin Core) has a thread control unit. A very small micro-coded unit (e.g. implement RV64C) is microcoded to perform the contest switching task.
  • Have added user-visible registers for thread id, thread context, task exception register etc etc.
  • The GKEY supervisor register contains a 64-bit key loaded by the kernel. It determines whether a task may spawn and execute work on neighboring task processors, providing a very rudimentary protection mechanism.
  • Making use of RV128I - it’s not just there for fun!
  • Support various instruction extensions, e.g. IWAIT, SWPAWN, JOIN, GETTASK, SETTASK. Basic operations needed to write a thread management system (such as pthreads) implemented as microcoded instructions in the RISC-V ISA.
  • Also attempting to define the data structures which contain task queue data.
  • Currently looking at lowRISC-style minion cores to implement microcoded memory coalescing units.
  • Read the GC64 specification doc here.

Vector Extension Proposal: Krste Asanović

  • Goals: efficient and scalable to all reasonable design points. Be a good compiler target, and to support implicit auto-vectorisation through OpenMP and explicit SPMD (OpenCL) programming models. Want to work with virtualisation layers, and fit into the standard 32-bit encoding space.
  • Krste is critical of GPUs for general compute. I can summarise his arguments here, but the slides will be well worth a read. Krste has spent decades working on vector machines.
  • With packed SIMD you tend to need completely new instructions for wider SIMD. Traditional vector machines allow you to set the vector length register to provide a more uniform programming model. This makes loop strip-mining more straight-forward.
  • Add up to 32 vector data registers (v0-v31) in addition to the basic scalar x and f registers. Each vector register is at least 3 elements each, with variable bits per element. Also add 8 vector predicate registers, with 1-bit per element. Finally, add vector configuration and vector length CSR registers.
  • Other features
    • Reconfigurable vector registers allow you to exchange unused architectural registers for longer vectors. e.g. if you only need 4 architectural vector registers you’ll have a larger vector length.
    • Mixed-precision support
    • Intenger, fixed-point, floating-point arithmetic
    • Unit-stride, strided, indexed load/stores
    • Predication
  • Mixed-precision support allows you to subdivide a physical register into multiple narrower architectural registers as requested.
  • Sam binary code works regardless of number of physical register bits and the number of physical lanes.
  • Use a polymorphic instruction encoding. e.g. a single signed integer ADD opcode that works on different size inputs and outputs.
  • Have separate integer and floating-point loads and stores, where the size is inherent in the destination register number.
  • All instructions are implicitly predicated under the first predicate register by default.
  • What is the difference between V and Hwacha? Hwacha is a non-standard Berkeley vector extensions design to push the state-of-the-art for in-order/decoupled vector machines. There are similarities in the lane microarchitecture. Current focus is bringing up OpenCL for Hwacha, with the V extension to follow.
  • Restartable page faults are supported. Similar to the DEC Vector VAX.
  • Krste pleads people not to implement a packed SIMD extension, pointing out that a minimal V implementation would be very space efficient.

Privileged Architecture Proposal: Andrew Waterman

  • Aims to provide a clean split between layers of the stack.
  • You can read the privileged spec here.
  • Supports four privilege modes. User, Supervisor, Hypervisor and Machine mode.
  • For a simple embedded system that only needs M-mode there is a low implementation cost. Only 2^7 bits of architectural state in addition to the user ISA, plus 2^7 more bits for timers and another 2^7 for basic performance counters.
  • Defined the basic virtual memory architectures to support current Unix-style operating systems. The design is fairly conventional, using 4KiB pages.
  • Why go with 4KiB pages rather than 8KiB as was the initial plan? Concerned with porting software hard-coded to expect 4KiB pages. Also concerns about internal fragmentation.
  • Physical memory attributes such as cacheability are not encoded in the page table in RISC-V. Two major reasons that Andrew disagrees with this are that the granularity may not be tied to the page size, plus it is problematic for virtualisation. Potentially coherent DMA will become more common meaning you needn’t worry about these attributes.
  • Want to support device interactions via a virtio-style interface.
  • The draft Supervisor Binary Interface will be released with the next privileged ISA draft. It includes common functionality for TLB shootdowns, reboot/shutdown, sending inter-processor interrupts etc etc. This is a similar idea to the PALCode on the Alpha.
  • Hardware-accelerated virtualization (H-mode) is planned, but not yet specified.
  • A draft version of v1.8 of the spec is expected this summer, with a frozen v2.0 targeted for the fall.

RapidIO. The Unified Fabric for Performance-Critical Computing: Rick O’Connor

  • There are more 10Gbps RapidIO ports on the planet than there are 10Gbps Ethernet ports. This is primarily due to the 100% market penetration in 4G/LTE and 60% global 3G.
  • The IIT Madras team are using RapidIO extensively for their RISC-V work
  • Has been doing work in the data center and HPC space. Looking to use the AXI ACE and connect that to RapidIO.
  • There is interesting work on an open source RapidIO stack.

CAVA. Cluster in a rack: Peter Hsu

  • Problem: designing a new computer is expensive. But 80% is the same every time.
  • CAVA is not the same as the Oracle RAPID project.
  • Would like to build a 1024-node cluster in a rack. DDR4 3200 = 25.6GB/s per 64-bit channel. Each 1U card would be about 600W with 32 nodes.
  • Looking at a 96-core 10nm chip (scaled from a previous 350nm project).
    Suppose you have a 3-issue out of order core (600K gates) and 32KiB I+d cache, that would be around 0.24mm^2 in 10nm.
  • Estimate a vector unit might be around the same area.
  • Peter has detailed estimates for per-chip power, but it’s probably best to refer to the slides for these.
  • Research plan for the cluster involves a unified simulation environment, running on generic clusters of x86 using open-source software. Everyone uses the same simulator to perform “apples to apples” comparison. This allows easy replication of published work.
  • Simulation infrastructure will involve a pipeline siumlator, SoC simulator (uncore logic), and a network simulator.
  • Interested in applying Cray-style vectors to database workloads
  • Could also have the ability to make associativity ways and possible individual cache lines lockable.

Alex Bradbury

Summer of Code students for lowRISC

lowRISC was fortunate enough to be chosen as a mentoring organisation in this year’s Google Summer of Code. The Google Summer of Code program funds students to work on open source projects over the summer. We had 52 applications across the range of project ideas we’ve been advertising. As you can see from the range of project ideas, lowRISC is taking part as an umbrella organisation, working with a number of our friends in the wider open source software and hardware community. We were allocated three slots from Google, and given the volume of high quality applications making the selection was tremendously difficult. We have actually been able to fund an additional three applicants from other sources, but even then there were many promising projects we couldn’t support. We are extremely grateful to all the students who put so much time and effort in to their proposals, and to everyone who volunteered to mentor. The six ‘summer of code’ projects for lowRISC are:

  • An online Verilog IDE based on YosysJS. Baptiste Duprat mentored by Clifford Wolf

    • Baptiste will be working with an Emscripten-compiled version of the Yosys logic synthesis tool, building an online Verilog IDE on top of it which would be particularly suitable for training and teaching materials. A big chunk of the proposed work is related to visualisation of the generated logic. Improving the accessibility of hardware design is essential for growing the potential contributor base to open source hardware projects like lowRISC, and this is just the start of our efforts in that space.
  • Porting seL4 to RISC-V. Hesham ALMatary mentored by Stefan Wallentowitz

    • seL4 is a formally verified microkernel, which currently has ports for x86 and ARM. Hesham will be performing a complete port to RISC-V/lowRISC. Security and microkernels are of great interest to many in the community. It’s also a good opportunity to expand RISC-V platform support and to put the recently released RISC-V Privileged Architecture Specification through its paces. Hesham previously performed a port of RTEMS to OpenRISC.
  • Porting jor1k to RISC-V. Prannoy Pilligundla mentored by Sebastian Macke

    • jor1k is by far the fastest Javascript-based full system simulator. It also features a network device, filesystem support, and a framebuffer. Prannoy will be adding support for RISC-V and look at supporting some of the features we offer on lowRISC such as minion cores or tagged memory. This will be great not only as a demo, but also have practical uses in tutorial or educational material.
  • TCP offload to minion cores using rump kernels. Sebastian Wicki mentored by Justin Cormack

    • The intention here is to get a rump kernel (essentially a libified NetBSD) running bare-metal on a simple RISC-V system and evaluate exposing the TCP/IP stack for use by other cores. e.g. a TCP/IP offload engine running on a minion core. TCP offload is a good starting point, but of course the same concept could be applied elsewhere. For example, running a USB mass storage driver (and filesystem implementation) on a minion core and providing a simple high-level interface to the application cores.
  • Extend Tavor to support directed generation of assembly test cases. Yoann Blein mentored by Markus Zimmermann

    • Tavor is a sophisticated fuzzing tool implemented in Go. Yoann will be extending it to more readily support specifying instruction set features and generating a fuzzing suite targeting an ISA such as RISC-V. Yoann has some really interesting ideas on how to go about this, so I’m really interested in seeing where this on ends up.
  • Implement a Wishbone to TileLink bridge and extend TileLink documentation. Thomas Repetti mentored by Wei Song

    • Wishbone is the interconnect of choice for most existing open source IP cores, including most devices on opencores.org. The Berkeley Rocket RISC-V implementation uses their own ‘TileLink’ protocol (we provide a brief overview. By providing a reusable bridge, this project will allow the easy reuse of opencores devices and leverage the many man-years of effort that has already gone in to them.

The first 3 of the above projects are part of Google Summer of Code and the bottom 3 directly funded, operating over roughly the same timeline. We’re also going to be having two local students interning with us here at the University of Cambridge Computer Lab starting towards the end of June, so it’s going to be a busy and productive summer. It bears repeating just how much we appreciate the support of everyone involved so far - Google through their Summer of Code initiative, the students, and those who’ve offered to act as mentors. We’re very excited about these projects, so please join us in welcoming the students involved to our community. If you have any questions, suggestions, or guidance please do leave them in the comments.

Alex Bradbury

lowRISC tagged memory preview release

We’re pleased to announce the first lowRISC preview release, demonstrating support for tagged memory as described in our memo. Our ambition with lowRISC is to provide an open-source System-on-Chip platform for others to build on, along with low-cost development boards featuring a reference implementation. Although there’s more work to be done on the tagged memory implementation, now seemed a good time to document what we’ve done in order for the wider community to take a look. Please see our full tutorial which describes in some detail the changes we’ve made to the Berkeley Rocket core, as well as how you can build and try it out for yourself (either in simulation, or on an FPGA). We’ve gone to some effort to produce this documentation, both to document our work, and to share our experiences building upon the Berkeley RISC-V code releases in the hopes they’ll be useful to other groups.

The initial motivation for tagged memory was to prevent control-flow hijacking attacks, though there are a range of other potential uses including fine-grained memory synchronisation, garbage collection, and debug tools.
Please note that the instructions used to manipulate tagged memory in this release (ltag and stag) are only temporary and chosen simply because they require minimal changes to the core pipeline. Future work will include exploring better ISA support, collecting performance numbers across a range of tagged memory uses and tuning the tag cache. We are also working on developing an ‘untethered’ version of the SoC with the necessary peripherals integrated for standalone operation.

If you’ve visited lowrisc.org before, you’ll have noticed we’ve changed a few things around. Keep an eye on this blog (and its RSS feed) to keep an eye on developments - we expect to be updating at least every couple of weeks. We’re very grateful to the RISC-V team at Berkeley for all their support and guidance. A large portion of the credit for this initial code release goes to Wei Song, who’s been working tirelessly on the HDL implementation.