Understanding GHC Core

| tagged with
  • ghc
  • core
  • work-in-progress

This document is a work-in-progress. If there is something that you would like to see discussed feel free to let me know.

While there have been a variety of writings about GHC’s Core representation, the language is a living entity which changes quickly in ways that aren’t always well documented. This

What is Core?

GHC Core is a close relative of the System FC language and the first of a series of intermediate representations used by GHC to turn Haskell into machine code (the others being STG and C–).

While on face value Core appears very similar to Haskell, there are a few

Type application A transformation-based optimiser for Haskell

Coercions

Core preserves newtypes as types unique from their representation (e.g. the type that they wrap). Wrapping and unwrapping of newtype values are represented by applications of the cast function. This essentially has the type,

data Coercion a b

cast :: a -> Coercion a b -> b

Coercions are GHC’s internal representation for type equalities1. On account of various

TODO ~R#

@~: type application of a coercion. For instance myFunction @~ (<a>_N :: a GHC.Prim.~# a) means that myFunction is being given a coercion argument <a>_N, a nominal coercion between a and a.

cast

Sym

Sub

<ty>_R is a type parameter with representational role. Roughly speaking this means that given a type constructor T and types A and B, T <A>_R and T <B>_R are representationally distinct. @~

IdInfo

Attached to identifiers GHC will often keep metadata known as IdInfo to inform various optimization passes. This information can be found within brackets in Core’s textual representation. In the event that you just want to merely see the “shape” of a program IdInfo annotations are often noise that can be safely suppressed with -dsuppress-idinfo.

The information in IdInfo can be useful when examining the optimization with respect to specific passes. In this case it can be useful hto have a rough understanding of what these annotations mean. A brief summary of the IdInfo annotations used by GHC 7.10 is included below although these will likely change with future compiler versions.

annotation example meaning
Caf Caf=NoCafRefs The value is a a function or static constructor that refers to no CAFs defined in basicTypes/IdInfo.hs(CafInfo).
Arity Arity=1 An ArityInfo of \(n\) tells us that partial application of the value to up to \(n-1\) value arguments does essentially no work. defined in basicTypes/IdInfo.hs(ArityInfo)
GlbId GblId[[RecSel]] TODO
Unf Unf=Unf{...} The value has an associated unfolding template.
SpecInfo TODO Records the available specializations of the identifier defined in basicTypes/IdInfo.hs(SpecInfo).
OS OS=OneShot Records that the identifier will be used precisely once TODO
Str Str=DmdType <S(L...) The result of demand analysis. defined in basicTypes/Demand.hs(DmdType).
Occ Occ=Dead The result of occurrence analysis

Unfolding templates

Unfolding (sometimes known as inlining) is one of the central optimizations performed by GHC’s Core optimizer. In this optimization, identifier occurrences are substituted by the right-hand side of their respective definition.

Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
        WorkFree=True, Expandable=True,
        Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
        Tmpl= \ (ds_daj5 [Occ=Once!] :: Memcpy2Dargs) ->
                case ds_daj5
                of _ [Occ=Dead]
                { Memcpy2Dargs _ [Occ=Dead] ds2_daj7 [Occ=Once] _ [Occ=Dead]
                               _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead]
                               _ [Occ=Dead] _ [Occ=Dead] ->
                ds2_daj7
                }}]

Src

How are values used?

GHC uses a variety of local analyses to determine various ways function arguments are used (or unused). These analyses include,

  • strictness (or sometimes called demand) analysis: Is an argument forced (partially or completely) by a function?
  • absence analysis: Is an argument entirely unused by the function?
  • occurrence analysis:

When optimization is enabled GHC runs these analyses on all top-level bindings and records the results in the binding’s IdInfo. These results are then examined by later stages of compilation to help guide further optimizations (notably the worker-wrapper transformation).

In the external Core output, the results of demand analysis can be found in the Str= section of IdInfo,

foldl' :: forall a b. (b -> a -> b) -> b -> [a] -> b
[Str=DmdType <L,C(C1(U))><S,1*U><S,1*U>]
foldl' = ...

Here we see the IdInfo of the usual strict left fold on list. Here we see that GHC has associated with each argument a tuple enclosed in angle brackets. These tuples consist of the strictness and usage annotations of the argument. Let’s look at each,

argument type demand annotation
b -> a -> a <L,C(C1(U))>
b <S,1*U>
[a] <S,1*U>

We will discuss the meaning of these annotations in the following sections.

Demand analysis

Demand analysis (or strictness analysis) is an analysis which attempts to determine how the arguments of a function are used by its definition. The result of the analysis is a demand signature

signature description
strictness
L lazy. As far as the analysis could tell, the argument isn’t demanded
S head-strict. The analysis determined that the argument will be evaluated at least to head-normal form.
S(...) structured demand. The argument will itself be evaluated to at least head-normal form and the values within it (e.g. its fields) will be evaluated to at least the given strictness.
absence
U used. The binder is used
1* used once. The argument is used precisely once.
A absent == definitely unused. Indicates that the binder is certainly not used.
B hyper-strict.
C call. Applied to functio
suffixes
m returns a constructed product result.
b bottoming. The function is known to diverge.

Let’s consider a few examples (stolen from the paper),

-- Demand signature: <S,1*U>
null :: [a] -> Bool
null v = case v of []   -> True
                   x:xs -> False

The S here arises from the case-analysis, which implies that if we evaluate null v then v must be evaluated to at least WHNF.

-- Demand signature: <S(SL),1*U(1*U,A)>
fst :: (a,b) -> a
fst p = case p of (x,_) -> x
-- Demand signature: <S,1*U(U,U)>m
swap :: (a,b) -> (b,a)
swap p = case p of (x,y) -> (y,x)
-- Demand signature: <S,1*U(U)><L,A>m
f :: Int -> Int -> Int
f x y = x + 1
-- Demand signature: <L,C(C1(U))><S,1*U><S,1*U>
foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' f x0

Absence analysis

Consider the function,

f :: (Int, Int) -> Int
f (a,_) = a + 42

The strictness analysis discussed above allows us to determine that the first element of the tuple argument is demanded strictly. This observation allows to perform the [worker-wrapper transformation][worker-wrapper] on f, resulting in,

f_workerer

However, what of the second element?

It is quite obvious that the second element is not called for at all

Trick: Locating Outputable instances

As GHC Core is an internal representation there is a tendency for documentation to fall behind what GHC actually emits. For this reason, it can be useful to quickly find the Outputable instance defintion from which a given piece of output originated. ghci is quite handy for this purpose,

$ ghci
GHCi, version 7.10.2.20151118: http://www.haskell.org/ghc/  :? for help
λ> :set -package ghc
package flags have changed, resetting and loading new packages...
λ> import GHC
λ> import Outputable
λ> :info Outputable
class Outputable a where
  ppr :: a -> SDoc
  pprPrec :: Rational -> a -> SDoc
    -- Defined in ‘Outputable’
instance Outputable BreakInfo -- Defined in ‘ByteCodeInstr’
instance Outputable Type -- Defined in ‘TypeRep’
instance Outputable TyThing -- Defined in ‘TypeRep’
instance Outputable TyCon -- Defined in ‘TyCon’
instance Outputable ClsInst -- Defined in ‘InstEnv’

Optimizations

Float-out

Float-in

Float-

In brackets: IdInfo

Role annotations

Glossary

binding

the introduction of a name for a value. For instance, let hello = 5 in ... is a binding.

binder

the name of a binding. For instance, in let hello = 5 in ... hello is the binder


  1. GHC’s Core language is known in the literature as System FC, owing to the fact that it is closely related to System F. In fact, the “C” in “FC” stands for “coercion”, indicative of the fact that System FC is simply System F with coercions.