The tale of keepAlive#
bgamari - 2021-06-08
In GHC 9.0 we have reworked the primops used for guaranteeing heap object
lifetime in the presence of off-heap references.
The result is a new primop, keepAlive#
, which largely replaces the venerable
touch#
operation. This post will give some background on touch#
, describe
the rationale for this change, discuss some of the alternatives that were
evaluated on the way to this new design, and provide some guidance on how this
change affects users.
Users who merely want to know how to adapt their existing touch#
uses for the
new keepAlive#
should feel free to skip to the last section of
the post.
The challenge of foreign references
One of the joys of working in a garbage collected language is that one is freed from the burden of thinking about object lifetime. Instead, it is the responsibility of the runtime to retain each object as long as it is reachable from the roots of the heap graph.
Moreover, one of the great joys of Haskell is its robust foreign function interface, giving users the ability to easily weave together Haskell and foreign code in a safe and efficient manner.
However, taken together these two features pose significant challenges to language implementors; specifically, consider a program such as:
-- Recall that:
-- newPinnedByteArray :: Int -> IO (MutableByteArray RealWorld)
-- setByteArray :: Prim a => MutableByteArray RealWorld -> Int -> Int -> a -> IO ()
-- mutableByteArrayContents :: MutableByteArray RealWorld -> Ptr a
import Data.Primitive.ByteArray
= printDots 10
main
printDots :: Int -> IO ()
= do
printDots len <- newPinnedByteArray len
arr 0 len (fromIntegral $ ord '.' :: Word8)
setByteArray arr let ptr :: Ptr CChar
= byteArrayContents arr
ptr
puts ptr
-- Recall that:
-- void puts(const char *);
import ccall "puts"
foreign puts :: Ptr CChar -> IO ()
Here we have allocated an array on the Haskell heap, initialized it, and passed
it to the C puts
function, which will print it to stdout
. On its face, this
program looks completely innocuous: it contains no mentions of unsafe
functions, the array’s bounds are clearly respected, and puts
is specified to
treat its argument as immutable. Indeed, running the program even prints the
expected result (at least most of the time).
However, despite its appearance, this program is fatally flawed. Specifically,
note that puts
is (implicitly) declared to be a safe
foreign call; this
means that GHC is free to do as it pleases (e.g. continue running another
Haskell thread or, perhaps, initiate a GC) while the C call executes in another
operating system thread. This is problematic since upon entering puts
there
are no references to arr
remaining on the Haskell heap, leading the collector
to conclude that the array can be safely reclaimed. However, arr
is not
dead: it is still in use via the off-heap reference passed to puts
.
However, this reference is not visible to the garbage collector.
Premature collection of live heap objects may result in arbitrary heap
corruption, which may cause a program crash at best and incorrect runtime
result at worst. Also note that while FFI is the most common reason for off-heap
references, it is certainly not the only reason. In particular, any use of
byteArrayContents
is potentially problematic in this way.
To make such off-heap references visible to the runtime, GHC has long provided the
touch
operation (embodied by the GHC.Exts.touch#
primop):
touch :: forall a. a -> IO ()
Semantically (and, mostly, operationally) the action touch x
is a no-op; it
merely instructs GHC to ensure that the given value is alive at the point when the
action is executed. Using this operation we can fix printDots
as follows:
=
printDots len <- newPinnedByteArray len
arr 0 len (fromIntegral $ ord '.' :: Word8)
setByteArray arr
puts (byteArrayContents arr) touch arr
Here we have added a touch
after the puts
, ensuring that arr
is retained
at least until the call has returned.
To a functional programmer, touch
may smell a bit odd, merely serving to
introduce a side-effect. By contrast, most idiomatic Haskell libraries rather
express resource lifetimes using the “with” pattern. That is, rather than
write:
<- openFile "hello.txt"
hdl "hello world!"
hPutStrLn hdl hClose hdl
We prefer to write:
withFile
:
"hello.txt" $ \hdl ->
withFile "hello world!" hPutStrLn hdl
as the latter makes the lifetime of the resource apparent from the program’s
syntactic structure alone. In the same way, users have historically been advised
to stay away from using touch
and its variants directly. For instance, while
Foreign.ForeignPtr
provides a touchForeignPtr
operation the documentation
strong recommends that the user rather use withForeignPtr
.
While it is rarely seen in user code, the touch#
primop has served
the Haskell community well for many years: a handful of occurrences in base
,
bytestring
, and other foundational libraries ensured that garbage colection
and foreign calls could peacefully coexist. However, as we will see, the design
of the primop is ripe for unsoundness.
Aside: A bit of simplifier background
Over the years GHC’s simplifier (the GHC subsystem responsible for
optimising programs in its Core intermediate representation)
has consistently grown in its abilities. By GHC 8.2 we started to come to
the realization that all was not well with touch#
. To see why, let us examine
a transform performed by GHC’s simplifier which I will call dead-continuation
elimination (DCE).
Consider that you have a program such as:
=
doThings case error "hello world" :: Int of
0 -> {- some large expression -}
1 -> {- another large expression -}
...
Here we have a program with a divergent expression (namely the error
application) scrutinized by a case
with a number of large alternatives.
Under Haskell’s semantics, it is impossible for program execution to reach
any of these alternatives’ right-hand-sides.
By the dead-continuation elimination transform, GHC’s simplifier will cull this unreachable code, turning the program into:
=
doThings case error "hello world" :: Int of { }
This optimisation can significantly reduce program sizes. Note that exceptions are only one source of divergence: any non-terminating scrutinee can trigger this transformation.
The shortcomings of touch#
With this transformation in mind, consider a trivial modification to the
original printDots
program which loops the puts
call infinitely:
printManyDots :: Int -> IO ()
= do
printManyDots len <- newPinnedByteArray len
arr 0 len (fromIntegral $ ord '.' :: Word8)
setByteArray arr $ puts (byteArrayContents arr)
forever touch arr
At its face, this modification looks benign enough; indeed, many server
applications take a very similar form (replacing puts
with some request
handling logic). However in #14346 we noticed that the simplifier could in
some cases determine that the forever $ ...
expression would diverge and
consequently that the touch arr
continuation was unreachable. By the DCE
transform, this continuation could be dropped. Without the touch
application, the garbage collector may prematurely collect arr
, resulting in
unsoundness in an otherwise reasonable-looking program.
In light of this, it becomes clear that the design of touch#
is not
compatible with DCE (which is otherwise a perfectly sound transform under
Haskell’s semantics): to faithfully serve its purpose, touch#
must be
retained even if it could never be executed.
As there are relatively few uses of touch#
, we tried for a while to paper
over this issue with the judicious use of NOINLINE
to hide the divergence
from GHC’s sights. However, between 8.2 and 8.10 we encountered
at least six other user reports of unsoundness due to various manifestations of
this issue in various base
interfaces
(#15260, #14403, #15544, #13707, #14329, #18061). For this reason, last year we
resolved to address the issue in the 9.0 release, marking the beginning of a
nearly year-long journey towards a possible replacement which lead us through
nearly a dozen design and implementation variants.
Finding an alternative design
At first glance, the design of a safer touch#
seems obvious by analogy to
withFile
:
with :: a -> (a -> m b) -> m b
Under such a design with x f
would evaluate to f x
, ensuring that x
is
kept alive (at least) until evaluation finishes.
While this interface is simple and pleasingly similar to patterns which
Haskellers expect, it does not make for a good primop design. In particular,
touch#
incurs nearly no runtime overhead (it doesn’t even emit code of its own) and
generally occurs in performance-critical, low-level code. Consequently,
it is important to minimize the performance overhead imposed by our new
primitive. However, the type given to with
above is extremely restrictive to
the optimisations available to the simplifier.
For instance, consider a function like,
readWord8_touch :: ForeignPtr Word8 -> IO Word8
= do
readWord8_touch fptr <- peek (unsafeForeignPtrToPtr fptr)
n
touch fptrreturn n
After a bit of simplification this will become:
readWord8_touch :: ForeignPtr Word8 -> IO Word8
= IO (\s0 ->
readWord8_touch fptr case readWord8# (unsafeForeignPtrToPtr fptr) s0 of (# s1, n #) ->
case touch# fptr s1 of s2 ->
# s2, W8# n #) (
This function will benefit from GHC’s worker/wrapper transform, resulting in a pair of Core bindings1:
-- The worker:
worker_readWord8_touch :: ForeignPtr Word8 -> State# RealWorld -> (# State# RealWorld, Word8# #)
=
worker_readWord8_touch fptr s0 case peek (unsafeForeignPtrToPtr fptr) s0 of (# s0, n #) ->
case touch# fptr s0 of s1 ->
# s1, n #)
(
-- The wrapper:
readWord8_touch :: ForeignPtr Word8 -> IO Word8
= IO $ \s ->
readWord8_touch fptr case worker_readWord fptr s of (# s1, n #) ->
# s1, W8# n #)
({-# INLINE readWord8_touch #-}
The readWord8_touch
wrapper can then be inlined into call sites, allowing
elimination of the W8#
allocation (via case-of-known-constructor) at
call-sites which strictly consume the result.
However, if we rewrite readWord8
using with
we will rather get Core of the
form:
readWord8_with :: ForeignPtr Word8 -> IO Word8
= with fptr $ \fptr' -> IO $ \s0 ->
readWord8_with fptr case peek (unsafeForeignPtrToPtr fptr') s0 of (# s1, n #) ->
# s1, W8# n #) (
Unlike readWord8_touch
, this right-hand side does not have the
constructed-product property and therefore can not benefit from unboxing by the
worker/wrapper transform. Consequently, readWord8_with
will allocate
a W8#
result on every call. Empirically we found that these added
allocations significantly regress runtime performance of many, if not most,
Haskell programs. GHC in particular regressed in compile-time by nearly 10% in
some cases.
Recovering the CPR property?
At first appearance, there is no reason why GHC could not learn to transform
programs defined in terms of with
to retain the CPR property. In particular,
this would require three changes:
since worker/wrapper may produce results of arbitrary
RuntimeRep
we must generalize the type ofwith
:# :: forall a (r :: RuntimeRep) (b :: TYPE r). with-> (a -> b) -> b a
use
State#
tokens to enforce execution ordering:# :: forall a (r :: RuntimeRep) (b :: TYPE r). with-> State# s -> (State# s -> b) -> b a
teaching the simplifier to “push” strict contexts (e.g.
case
) intowith#
applications. For instance, transformcase (with# x s (\s' -> k)) of alt -> rhs
into
# x s (\s' -> case k of alt -> rhs) with
This transformation, which we call “strict context pushing” (SCP), is similar to a transform already performed on
runRW#
applications (#15127). In this case ofwith#
, it is sound as it strictly grows the scope of the continuation and therefore lengthens the lifetime of the kept-alive value.
This SCP transformation allows us to eliminate the allocation incurred in the
case of readWord8
by way of inlining and simplification. That is, consider a
strict call-site of readWord8
:
IO $ \s0 ->
case readWord8 fptr s0 of (# s1, n #) ->
case n of W8# n# ->
# doSomething n
First, we inline readWord8
, resulting in:
IO $ \s0 ->
case (
# fptr s0 (\s1 ->
withcase peek (unsafeForeignPtrToPtr fptr') s1 of (# s2, n# #) ->
# s2, W8# n# #)
(
)of (# s3, n #) ->
) case n of W8# n# ->
# doSomething n
Now, by the SCP transform above, the outer case
can be pushed inside of the
with#
:
IO $ \s0 ->
# fptr s0 (\s1 ->
withcase (
case peek (unsafeForeignPtrToPtr fptr') s1 of (# s2, n# #) ->
# s2, W8# n# #)
(of (# s3, n #) ->
) case n of W8# n# ->
#
doSomething n )
Finally, case-of-known constructor can be applied twice, eliminating the W8#
allocation:
IO $ \s0 ->
# fptr s0 (\s1 ->
withcase peek (unsafeForeignPtrToPtr fptr') s1 of (# s2, n# #) ->
#
doSomething n )
Hooray!
The trouble with SCP
At first glance, this SCP transform looks like a promising way to eliminate a
major source of overhead imposed by with#
. Unfortunately, testing quickly
revealed that matters are not so simple. For instance, consider a loop which
uses readWord8
to sum the elements of an array:
sumArray :: ForeignPtr Word8 -- ^ an array of Word8s
-> Int -- ^ array length
-> IO Word8
= IO (go 0# n)
sumArray fptr n where
go :: Word8# -> Int -> State# RealWorld -> (# State# RealWorld, Word8 #)
0 s0 = (# s0, W8# accum #)
go accum =
go accum i s0 case readWord8 (fptr `plusForeignPtr` i) s0 of (# s1, n #) ->
case n of W8# n# ->
+# n#) (i-1) s1 go (accum
As written, this program will run in constant space (modulo short-lived garbage
W8#
constructors). However, consider what happens if we inline readWord8#
,
apply the SCP transform, and simplify in the second equation of go
:
=
go accum i s0 # (fptr `plusForeignPtr` i) s0 (\s1 ->
withcase peek (unsafeForeignPtrToPtr (fptr `plusForeignPtr` i)) s1 of (# s2, n# #) ->
+# n#) (i-1) s1 go (accum
While we have successfully eliminated the garbage W8#
allocations,
the recursive go
call (previously a tail-call) is now beneath a with#
application. Since most implementations of with#
will have the operational
effect of pushing a frame to the evaluation stack, this transforms the
otherwise constant-space sumArray
into an O(n) program. This is clearly not
acceptable.
Unfortunately, this problem is quite difficult to avoid. While one can think of placing various heuristic restrictions on the contexts where SCP is applied, there is no obviously-correct, local criterion which robostly preserves CPR while avoiding stack blow-up.
While we found no way to make SCP work, the modifications we made to with#
’s
type at the beginning of this section are desireable regardless. That is:
with#
must be polymorphic in theRuntimeRep
of its result lest the user is unable to use it to compute unboxed results.- the use of
State#
tokens is a good hint to the user that they must take care to control execution ordering to safely use the operation
Consequently, we will adopt the following type for with#
:
# :: forall a (r :: RuntimeRep) (b :: TYPE r).
with-> State# s -> (State# s -> b) -> b a
Generating code for with#
Above we considered some of the design decisions surrounding the user-facing
interface of with#
. However, the interface is only half of the story: the
compiler must also know how to generate code for with#
.
Thankfully, doing so is (currently) reasonably straightforward as GHC’s DCE
optimisation strictly occurs on Core. For this reason, we can safely desugar
with#
into touch#
by inlining after the Core-to-Core optimisation pipeline
has run (namely in the CorePrep
stage of compilation). Specifically, an
application:
-- Recall that
-- with# :: forall a (r :: RuntimeRep) (b :: TYPE r).
-- a -> State# s -> (State# s -> b) -> b
# x s0 (\s0 -> k) with
can be rewritten to
case k s0 of r ->
case touch# x s0 of s1 ->
r
Admittedly, this rewrite is a bit suspicious as we there is nothing
ensuring the ordering between the case k s
and the touch#
(since they are
both applied to the State#
token s
). However, this is a pragmatic choice:
GHC’s STG pipeline currently performs no optimisation which could reorder
these expressions and forcing with#
to return a (# State# s, b #)
would
preclude its use in some pure contexts.
On the bright side, the fact that we can reuse touch#
means that we can
provide with#
with minimal overhead and no further changes to the backend.
keepAlive#
in GHC 9.0
While we originally set out to supercede (and ultimately remove) touch#
, the
issues described above (and others described in the Wiki) made it clear
that this plan was not feasible: what touch#
lacks in robustness and
aesthetics, it makes up for in efficiency and ease-of-optimisation. Moreover,
there are occassionally cases which do not fit with the lexically-scoped
semantics of with#
.
Consequently, in GHC 9.0 we opted for a pragmatic solution: continue to allow
use of touch#
where necessary but prefer to use the safer with
-style where
possible. GHC 9.0.1 exposes this operation as keepAlive#
(the kinds in the
type below are slightly different than those present in 9.0, referring to
BoxedRep
introduced in 9.2 for added precision):
# :: forall (lev :: Levity) (a :: TYPE (BoxedRep lev))
keepAliverep :: RuntimeRep) (b :: TYPE rep) s.
(
a-> State# s
-> (State# s -> b)
-> b
What implications does this have on user code? For most users, none: nearly all
uses of touch#
are buried in low-level libraries which have already been
adapted by the GHC maintainers.
If you do for some reason use touch#
, we advise that you first consider
whether it is possible to replace the usage with a higher-level, less fragile
abstraction: in our experience, there are few cases where touch#
is necessary
which would not be equally-well served by, e.g., ForeignPtr
or a higher-level
scoped allocation primitive.
If you have determined that the low-level primitive is indeed unavoidable then
the next question is whether touch#
should be replaced with keepAlive#
. In the
general case this unfortunately comes down to a safety/performance tradeoff.
Specifically, let us consider the example of the withForeignPtr
operation
exposed by base
. When written in terms of touch
this operation is defined
as:
withForeignPtr :: ForeignPtr a -> (Ptr a -> IO b) -> IO b
= do
withForeignPtr fptr k
k (unsafeForeignPtrToPtr fptr) touch fptr
While this can produce very efficient code, it is also deeply unsafe as the
user may provide a divergent continuation (e.g. withForeignPtr fptr (forever ...)
). For the reasons described above, this can result in the dropping of the
touch#
, resulting in unsoundness.
To avoid this, we should rather express withForeignPtr
in terms of
keepAlive#
:
withForeignPtr :: ForeignPtr a -> (Ptr a -> IO b) -> IO b
= IO $ \s0 ->
withForeignPtr fptr k # fptr s0 (unIO $ k (unsafeForeignPtrToPtr fptr)) keepAlive
This avoids unsoundness at the expense of needing to allocate the b
result of
the continuation. This is the implementation provided with base-4.15
.
Introducing unsafeWithForeignPtr
While the introduction of keepAlive#
affects relatively little user-code, the
added cost of withForeignPtr
has the potential to be more impactful. In
particular, it is not uncommon to find code such as:
@Word8) withForeignPtr fptr (peek
in tight inner-loops. Adding an allocation in such a context would be
catestrophic for performance and consequently GHC.ForeignPtr
exports the old,
touch#
-based implementation of withForeignPtr
as unsafeWithForeignPtr
.
This allows performance sensitive code to avoid boxing when it is known that the
continuation is not certain to diverge. Since this operation is unsafe in the
presence of known-divergent continuations, users are strong encouraged to
use this operation sparingly and only in cases where the continuation is
obviously not divergent.
Guidance for earlier GHC releases
As noted, use of touch#
in previous GHC versions is a bit perilous unless due
care is taken. Let’s consider an example:
foo :: ByteArray -> IO ()
= do
foo x -- (a)
doSomething x ...
-- (b)
doForeignCall (byteArrayContents x) ...
-- (c) touch x
For this program to be safe, the user must guarantee that no subexpression in
the body of foo
between (a) (the point that x
potentially becomes
unreachable) and (c) (the touch
, which we must ensure DCE does not drop)
is provably divergent. In the case that all of the ...
s are statically known
it is possible to determine this by inspection.
However, things are harder if the function takes a user-provided continuation a
la withForeignPtr
:
withForeignPtr :: ForeignPtr a -> (Ptr a -> IO b) -> IO b
= do
withForeignPtr fptr k
k (unsafeForeignPtrToPtr fptr) touch fptr
In this case the only reliable way to avoid unsoundness is to place a NOINLINE
on withForeignPtr
. This deprives the simplifier of knowledge of k
, ensuring
that it cannot conclude that touch
is unreachable.
Conclusion
Here we have described a bit of the story that lead to keepAlive#
. While the
full story is significantly longer (and ultimately lead to the three-month deferral
of the 9.0.1 release), the result is a primop which avoids a class of
pernicious soundness issues which had lurked in previous releases.
If you have any questions regarding touch#
and keepAlive#
, do be in touch
on the ghc-devs
mailing list.
To avoid cluttering the code too much, we are playing a bit fast-and-loose with the
IO
newtype here. For instance, we treatpeek
as a function of typeState# RealWorld -> (# State# RealWorld, a #)
(i.e. the representation ofIO a)
) instead ofIO a
. We will continue this pragmatic abuse of notation below.↩︎