The state of GHC on ARM

| tagged with
  • GHC
  • ARM
  • linking

tl;dr, GHC 7.8 should run well on ARM with the LLVM code generator and dynamic linking. The process of building GHC itself might be a bit hairy due to linker terribleness. After this, however, you’ll have fully-featured GHC installation supporting all of the modern amenities including GHCi and Template Haskell.

ARM has long been a second-class citizen in the GHC world. At first, this was due to the lack of a native code generator. This however, was resolved with the arrival of the LLVM code generator. At this point the primary sticking point is the lack of a runtime linker, which GHC relies on for evaluating Template Haskell splices and GHCi. Over the last few years, there have been an effort by me and others to try to resolve this.

Much of this effort has centered around improving ARM support in GHC’s own home-brew linker, neccessary as GHC until recently has loaded modules by linking static objects at runtime. Unfortunately, writing a fully functional linker is no small task, especially in the case of ARM where one must worry about no fewer than two instruction sets and a seemingly endless set of relocation types (the curious reader is invited to look over the ELF for ARM specification which defines some of what the linker must do; the rest is documented in the minds of various linker authors). Even while the code has seen numerous eyes over the years, GHC’s runtime linker still suffers from some rather serious bugs.

Thankfully, there is another path for ARM support: recently, GHC has been gradually improving its support for dynamic linking. This carries with it the advantage that we can punt the details of section loading and relocation handling to the system’s dynamic linker, opening up an easy migration path away from the maintenance burden of GHC’s runtime linker.

PLT handling

Note: This section will be a bit technical; feel free to skip it.

Enabling dynamic linking with the LLVM backend, while easier than writing a linker from the ground up, is complicated by a small detail of relocation handling. The relocation of code references is often handled through the procedure linkage table (PLT). This is a table of trampoline fragments that allows the runtime linker to intercept function calls. The PLT is often the first choice of the compile-time linker when emitting function references as it allows the runtime linker perform just-in-time relocation, reducing process start-up times by deferring relocation of rarely used code to the point when the code is needed.

As it turns out, the PLT interacts very badly with GHC’s runtime system (RTS), which requires that all Haskell objects (constructors or closures) are preceded by an info table, where RTS can track garbage collection and profiling information. If the PLT is used, pointers to what might otherwise refer to a closure (e.g. executable code) and its preceding info table instead point to a linker-generated trampoline. While things may appear to run fine at first, calamity ensues after the run-time system attempts its first garbage collection pass, usually ending in the untimely death of the process with an “evacuate(static): strange closure type” error from the RTS.

For this reason, GHC must ensure that no relocations in Haskell code go through the PLT. In assembler, one typically indicates to the compile-time linker that a symbol is a code (which can go through the PLT) or data (which must be handled with any of a number of other relocation schemes) with a .type directive. Unfortunately, it seems that LLVM provides no analogous mechanism in its own intermediate language. Lacking this, we must settle for a workaround, adding yet another hack to the LLVM code generator’s assembler mangler to add .type @object directives to every emitted symbol.

A note about performance (technical)

In discussions on ghc-dev Simon Marlow expressed the very reasonable concern that LLVM may not perform an important optimization used by the native code generator in dynamically linked code. Emitting position-independent code like is seen in a dynamic library requires a small amount of overhead due to relocations. However, this cost can be avoided for calls within an object as the relative offsets between symbols are known by the linker at compile-time. The native code generator goes to some effort to ensure that local references are used whenever possible. Thankfully, it seems that LLVM is smart enough to do the right thing with intra-object references by default. Unfortunately, it’s not entirely clear to me where LLVM pulls this trick from.

Installation

With the release of GHC 7.8 imminent, I’m happy to say that ARM support is now finally looking sustainable. With the dynamic linking now viable, the surface area of the ARM-specific code in GHC has been greatly reduced, most of the nasty details now being handled by either LLVM or the system’s dynamic linker.

Unfortunately, there are still a few rough edges, mostly in the build process of GHC itself. This is largely due to a bug in bfd ld’s (the old ld implementation distributed with binutils) relocation support on ARM. Unfortunately, bfd ld is still the default linker in most Linux distributions. In short, ld.bfd needlessly produces ARM_R_COPY relocations for some Haskell objects. During dynamic linking, the linker copies the body of the function but not its info table. This will usually manifest itself as a failure of dll-split during the build process with internal error: evacuate(static): strange closure type 0.

Until this is fixed, the newer gold linker will be the only supported linker with GHC on ARM (at least when tables-next-to-code is enabled). GHC now checks that the linker being used isn’t affected by the bug in question, so hopefully users won’t be affected beyond needing to switch linkers.

Unfortunately, gold doesn’t support producing dynamic objects with certain relocation types that seem to appear in (at least Debian’s) GHC 7.6 object code1. For added fun, GHC 7.6 tries to blindly pass bfd ld-specific flags to the linker, causing gold to barf.

All of this unfortunately means that users upgrading from GHC 7.6 to 7.8 may need to perform an awkward dance to ensure that the correct linker over the course of the build process. I’ve written a small script to automate this process which works for me (tested on Debian Jessie on an Hardkernel Odroid XU). This ensures that ld.bfd is used while building with the phase 0 (GHC 7.6) compiler and ld.gold is used thereafter. After GHC has been built, ld.gold should be used.

There are still a few more patches that should go in to GHC 7.8 before release to make building on ARM as painless as possible.

In sum, bringing up a GHC build on my ARM Debian installation isn’t so different from what one would expect on any other machine,

$ sudo apt-get install ghc automake build-essential cabal-install
$ cabal update
$ cabal install happy alex
$ git clone git://git.haskell.org/ghc
$ cd ghc
$ git checkout --track origin/ghc-7.8
$ ./sync-all get
$ ./boot
$ ./configure --with-ghc=/usr/bin/ghc
$ wget https://gist.githubusercontent.com/bgamari/9399430/raw/build-ghc-arm.sh
$ chmod ugo+rx build-ghc-arm.sh
$ ./build-ghc-arm.sh -j4
$ make test

The proof is in the pudding

Of course, it is easy to claim that a compiler works. The proof, however, is in the testsuite results,

Unexpected results from:
TEST="T8557 T7386 print021 print020 print022 Flags02 T8766 ghci053 T703 linker_unload T7574 T7850 rangeTest ghcirun002 ghcirun001"

OVERALL SUMMARY for test run started at Fri Mar  7 05:27:05 2014 CET
 0:58:11 spent to go through
    3895 total tests, which gave rise to
   12782 test cases, of which
    9299 were skipped

      28 had missing libraries
    3368 expected passes
      56 expected failures

      20 caused framework failures
       0 unexpected passes
      16 unexpected failures

That’s right, you can count the testsuite failures on two hands and a foot!


  1. Namely ld.gold will produce errors of the form,

    /usr/bin/ld.gold: error: /usr/lib/ghc/libHSrts.a(GetTime.o): requires unsupported dynamic reloc R_ARM_THM_MOVW_ABS_NC; recompile with -fPIC
    ↩︎