Custom Traversals

You can get a lot of mileage out of the library without having to resort to writing your own traversals. This section is a bit more advanced for anyone who wants to get their hands dirty.

There are two built-in traversal functions, one which traverses sequentially and one in parallel.

Using one of these two traversal modes is a good idea because it’s easy to write an traversal which is not tail-recursive and will use all your memory.

Sequential Traversal

The interface to the sequential traversal allows you to specify continuations to apply to each of the four different pointer types which are present in the AST.

Closure Pointers
Stack Pointers
PAP Pointers
Constr Description Pointers

Most of the time you just care about closure pointers but the interface is general enough to deal with these other cases as well. The GHC.Debug.Trace module provides the generic traversal function called traceFromM:

traceFromM :: C m => TraceFunctions m -> [ClosurePtr] -> m DebugM ()

data TraceFunctions m =
      TraceFunctions { papTrace :: !(GenPapPayload ClosurePtr -> m DebugM ())
      , stackTrace :: !(GenStackFrames ClosurePtr -> m DebugM ())
      , closTrace :: !(ClosurePtr -> SizedClosure -> m DebugM () -> m DebugM ())
      , visitedVal :: !(ClosurePtr -> m DebugM ())
      , conDescTrace :: !(ConstrDesc -> m DebugM ())
      }

The 4 continuations papTrace, conDescTrace, stackTrace and closTrace are applied the first time the relevant object is visited. The closTrace has an additional continuation which must be called to carry on the traversal. The visitedVal function is applied when we have already visited a closure.

For example, census2LevelClosureType is implemented using traceFromM. Constant functions are supplied for all the functions apart from the closTrace function which instead performs the dereferencing work for the analysis. The whole analysis runs in the StateT monad.

census2LevelClosureType :: [ClosurePtr] -> DebugM CensusByClosureType
census2LevelClosureType cps = snd <$> runStateT (traceFromM funcs cps) Map.empty
  where
    funcs = TraceFunctions {
               papTrace = const (return ())
              , stackTrace = const (return ())
              , closTrace = closAccum
              , visitedVal = const (return ())
              , conDescTrace = const (return ())

            }
    -- Add cos
    closAccum  :: ClosurePtr
               -> SizedClosure
               -> (StateT CensusByClosureType DebugM) ()
               -> (StateT CensusByClosureType DebugM) ()
    closAccum _ s k = do
      s' <- lift $ quadtraverse dereferencePapPayload dereferenceConDesc dereferenceStack pure s
      pts <- lift $ mapM dereferenceClosure (allClosures (noSize s'))
      pts' <- lift $ mapM (quadtraverse pure dereferenceConDesc pure pure) pts


      modify' (go s' pts')
      k

    go d args =
      let k = closureToKey (noSize d)
          kargs = map (closureToKey . noSize) args
          final_k :: Text
          final_k = k <> "[" <> T.intercalate "," kargs <> "]"
      in Map.insertWith (<>) final_k (mkCS (dcSize d))

Parallel Traversal

The parallel traversal has a slightly different interface to the sequential traversal to account for the necessary communication and combination of the results of running actions on different threads. The traceParFromM function is from GHC.Debug.ParTrace.

traceParFromM :: Monoid s => TraceFunctionsIO a s -> [ClosurePtrWithInfo a] -> DebugM s

data TraceFunctionsIO a s =
      TraceFunctionsIO { papTrace :: !(GenPapPayload ClosurePtr -> DebugM ())
      , stackTrace :: !(GenStackFrames ClosurePtr -> DebugM ())
      , closTrace :: !(ClosurePtr -> SizedClosure -> a -> DebugM (a, s, DebugM () -> DebugM ()))
      , visitedVal :: !(ClosurePtr -> a -> DebugM s)
      , conDescTrace :: !(ConstrDesc -> DebugM ())
      }

TraceFunctionsIO a s communicates values of type a between threads and produces a result of type s. Like the sequential traversal there is one continuation for each pointer type and one for already visited values. This time the type of closTrace is quite different. This time in addition to the current pointer and decoded closure, a value of type a is passed to the interpretation function. The function computes a new value of a to communicate to its children, a state value s to update the thread-local result state with and a continuation about whether to carry on the traversal.

The a value communicates information about a closures parents in the findRetainers function.

The performance of the parallel traversal depends on the contention placed on the common socket. For analysis modes which require many expensive DebugM actions, it may be faster to use the sequential traversal. A workload which primarily consults the block cache

As an example, the parCount function is implemented using the parallel traversal:

parCount :: [ClosurePtr] -> DebugM CensusStats
parCount = traceParFromM funcs . map (ClosurePtrWithInfo ())
  where
    nop = const (return ())
    funcs = TraceFunctionsIO nop nop clos (const (const (return mempty))) nop

    clos :: ClosurePtr -> SizedClosure -> ()
              -> DebugM ((), CensusStats, DebugM () -> DebugM ())
    clos _cp sc _ = do
      return ((), mkCS (dcSize sc), id)

I think there is still performance improvements to be made. In future I want to look at the performance using threadscope. It seems that in the case in snapshot mode where all requests can be answered using the cache, for example, parCount, there should be almost full core utilisation.