How we used differential testing to rapidly find and fix missed optimisation opportunities in LLVM's RISC-V backend

by Luís Marques,

At this October 2020 LLVM Developers’ Meeting I presented a poster about how, with a surprisingly simple tool, we were able to rapidly identify, isolate, and fix a range of missed optimisation opportunities in LLVM’s RISC-V backend.

The tool works by generating random C programs, compiling each program with both Clang and GCC (targeting RISC-V) and comparing the assembly generated by both compilers. If it estimates that Clang/LLVM generated worse code than GCC then it saves that case for further analysis. We found that, even with a simple implementation, this tool was surprisingly effective.

The tool works in multiple stages:

  • The random code generator (a simple recursive descent generator) directly emits the code, without building an AST. This is easy to customise, so we can use our knowledge of the RISC-V ISA and of the backend to ensure it generates C code that is more likely to identify problematic cases.
  • To perform quality estimation and comparison of the assembly output, the tool assigns costs to the individual instructions, and adds up the costs for the entire assembly sequence. An individual instruction cost is determined by its operation kind (load/store/branch/arithmetic/…). The tool detects cases where Clang could improve by checking if its assembly output has a higher total cost than that of GCC.
  • Detected cases are minimized by running C-Reduce in combination with the tool. By minimizing the source program while ensuring that the quality difference is preserved we produce concise test cases that isolate code quality issues.

This straight-forward approach has proven to be powerful enough to detect issues across a variety of categories. Examples include:

  • Poor constant materialisation sequences;
  • Unnecessary sign extensions;
  • Using branches instead of RISC-V comparison instructions;
  • Not using the offset immediates in some load and store instructions;
  • Unnecessary floating-point conversions;
  • Dead instructions.

We have already addressed a range of the issues we found. As we take care of all of the low-hanging fruit, we can also keep improving the tool to detect more subtle cases.

If you missed the poster session, you can still watch a 5-minute video summary, shown below.