Understanding GHC Core
This document is a work-in-progress. If there is something that you would like to see discussed feel free to let me know.
While there have been a variety of writings about GHC’s Core representation, the language is a living entity which changes quickly in ways that aren’t always well documented. This
What is Core?
GHC Core is a close relative of the System FC language and the first of a series of intermediate representations used by GHC to turn Haskell into machine code (the others being STG and C–).
While on face value Core appears very similar to Haskell, there are a few
Type application A transformation-based optimiser for Haskell
Coercions
Core preserves newtypes
as types unique from their representation (e.g. the type that they wrap). Wrapping and unwrapping of newtype
values are represented by applications of the cast
function. This essentially has the type,
data Coercion a b
cast :: a -> Coercion a b -> b
Coercion
s are GHC’s internal representation for type equalities^{1}. On account of various
TODO ~R#
@~
: type application of a coercion. For instance myFunction @~ (<a>_N :: a GHC.Prim.~# a)
means that myFunction
is being given a coercion argument <a>_N
, a nominal coercion between a
and a
.
cast
Sym
Sub
<ty>_R
is a type parameter with representational role. Roughly speaking this means that given a type constructor T
and types A
and B
, T <A>_R
and T <B>_R
are representationally distinct. @~
IdInfo
Attached to identifiers GHC will often keep metadata known as IdInfo
to inform various optimization passes. This information can be found within brackets in Core’s textual representation. In the event that you just want to merely see the “shape” of a program IdInfo
annotations are often noise that can be safely suppressed with -dsuppress-idinfo
.
The information in IdInfo
can be useful when examining the optimization with respect to specific passes. In this case it can be useful hto have a rough understanding of what these annotations mean. A brief summary of the IdInfo annotations used by GHC 7.10 is included below although these will likely change with future compiler versions.
annotation | example | meaning |
---|---|---|
Caf |
Caf=NoCafRefs |
The value is a a function or static constructor that refers to no CAFs defined in basicTypes/IdInfo.hs(CafInfo) . |
Arity |
Arity=1 |
An ArityInfo of \(n\) tells us that partial application of the value to up to \(n-1\) value arguments does essentially no work. defined in basicTypes/IdInfo.hs(ArityInfo) |
GlbId |
GblId[[RecSel]] |
TODO |
Unf |
Unf=Unf{...} |
The value has an associated unfolding template. |
SpecInfo |
TODO | Records the available specializations of the identifier defined in basicTypes/IdInfo.hs(SpecInfo) . |
OS |
OS=OneShot |
Records that the identifier will be used precisely once TODO |
Str |
Str=DmdType <S(L...) |
The result of demand analysis. defined in basicTypes/Demand.hs(DmdType) . |
Occ |
Occ=Dead |
The result of occurrence analysis |
Unfolding templates
Unfolding (sometimes known as inlining) is one of the central optimizations performed by GHC’s Core optimizer. In this optimization, identifier occurrences are substituted by the right-hand side of their respective definition.
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
Tmpl= \ (ds_daj5 [Occ=Once!] :: Memcpy2Dargs) ->
case ds_daj5
of _ [Occ=Dead]
{ Memcpy2Dargs _ [Occ=Dead] ds2_daj7 [Occ=Once] _ [Occ=Dead]
_ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead]
_ [Occ=Dead] _ [Occ=Dead] ->
ds2_daj7
}}]
Src
How are values used?
GHC uses a variety of local analyses to determine various ways function arguments are used (or unused). These analyses include,
- strictness (or sometimes called demand) analysis: Is an argument forced (partially or completely) by a function?
- absence analysis: Is an argument entirely unused by the function?
- occurrence analysis:
When optimization is enabled GHC runs these analyses on all top-level bindings and records the results in the binding’s IdInfo
. These results are then examined by later stages of compilation to help guide further optimizations (notably the worker-wrapper transformation).
In the external Core output, the results of demand analysis can be found in the Str=
section of IdInfo
,
foldl' :: forall a b. (b -> a -> b) -> b -> [a] -> b
[Str=DmdType <L,C(C1(U))><S,1*U><S,1*U>]
foldl' = ...
Here we see the IdInfo
of the usual strict left fold on list. Here we see that GHC has associated with each argument a tuple enclosed in angle brackets. These tuples consist of the strictness and usage annotations of the argument. Let’s look at each,
argument type | demand annotation |
---|---|
b -> a -> a |
<L,C(C1(U))> |
b |
<S,1*U> |
[a] |
<S,1*U> |
We will discuss the meaning of these annotations in the following sections.
Demand analysis
Demand analysis (or strictness analysis) is an analysis which attempts to determine how the arguments of a function are used by its definition. The result of the analysis is a demand signature
signature | description |
---|---|
strictness | |
L |
lazy. As far as the analysis could tell, the argument isn’t demanded |
S |
head-strict. The analysis determined that the argument will be evaluated at least to head-normal form. |
S(...) |
structured demand. The argument will itself be evaluated to at least head-normal form and the values within it (e.g. its fields) will be evaluated to at least the given strictness. |
absence | |
U |
used. The binder is used |
1* |
used once. The argument is used precisely once. |
A |
absent == definitely unused. Indicates that the binder is certainly not used. |
B |
hyper-strict. |
C |
call. Applied to functio |
suffixes | |
m |
returns a constructed product result. |
b |
bottoming. The function is known to diverge. |
Let’s consider a few examples (stolen from the paper),
-- Demand signature: <S,1*U>
null :: [a] -> Bool
null v = case v of [] -> True
x:xs -> False
The S
here arises from the case-analysis, which implies that if we evaluate null v
then v
must be evaluated to at least WHNF.
-- Demand signature: <S(SL),1*U(1*U,A)>
fst :: (a,b) -> a
fst p = case p of (x,_) -> x
-- Demand signature: <S,1*U(U,U)>m
swap :: (a,b) -> (b,a)
swap p = case p of (x,y) -> (y,x)
-- Demand signature: <S,1*U(U)><L,A>m
f :: Int -> Int -> Int
f x y = x + 1
-- Demand signature: <L,C(C1(U))><S,1*U><S,1*U>
foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' f x0
Absence analysis
Consider the function,
f :: (Int, Int) -> Int
f (a,_) = a + 42
The strictness analysis discussed above allows us to determine that the first element of the tuple argument is demanded strictly. This observation allows to perform the [worker-wrapper transformation][worker-wrapper] on f
, resulting in,
f_workerer
However, what of the second element?
It is quite obvious that the second element is not called for at all
Trick: Locating Outputable
instances
As GHC Core is an internal representation there is a tendency for documentation to fall behind what GHC actually emits. For this reason, it can be useful to quickly find the Outputable
instance defintion from which a given piece of output originated. ghci
is quite handy for this purpose,
$ ghci
GHCi, version 7.10.2.20151118: http://www.haskell.org/ghc/ :? for help
λ> :set -package ghc
package flags have changed, resetting and loading new packages...
λ> import GHC
λ> import Outputable
λ> :info Outputable
class Outputable a where
ppr :: a -> SDoc
pprPrec :: Rational -> a -> SDoc
-- Defined in ‘Outputable’
instance Outputable BreakInfo -- Defined in ‘ByteCodeInstr’
instance Outputable Type -- Defined in ‘TypeRep’
instance Outputable TyThing -- Defined in ‘TypeRep’
instance Outputable TyCon -- Defined in ‘TyCon’
instance Outputable ClsInst -- Defined in ‘InstEnv’
Optimizations
Float-out
Float-in
Float-
In brackets: IdInfo
Role annotations
Glossary
- binding
the introduction of a name for a value. For instance,
let hello = 5 in ...
is a binding.- binder
the name of a binding. For instance, in
let hello = 5 in ...
hello
is the binder