HIE Files - coming soon to a GHC near you!

wz1000 - 2019-06-26

When GHC compiles your programs, it has to work out a bunch of information - it figures out where stuff is defined, assigns types to expressions, solves constraints and so on. However, all this information is not easily accessible to you after GHC is finished. To get your hands on this, you need to set up a GHC environment, and tediously use the GHC API to extract whatever information you care about.

HIE files are a new type of file that can be emitted by GHC via the -fwrite-ide-info flag. They serialize a bunch of useful information about your source, so that it is easy and fast to access when you need it.

HIE files will be appearing for the first time in GHC 8.8.

How you can make use of .hie files right now

First, you need to generate some .hie files. To do this, you need GHC 8.8, or a recently built version of GHC HEAD.

Once you have these, you need to compile some files with -fwrite-ide-info.

You can do this by invoking GHC explicitly, or by adding this flag to the ghc-options section of a .cabal file. The compiler will output one .hie file for each module it compiles. The file is a binary format which can be read using a simple function from the GHC API.

The output directory for the files can be set using the -hiedir <dir> flag. This way it’s easy to collate all the files for your project into one place.

The rest of this blog post will be about projects which already use HIE files and some future ideas that we have for using them.

Using hiedb to lookup information about your code

hiedb is a command line tool that makes it easy to query .hie files. It stores the information from the files in a SQLite db, to enable very fast indexing and querying.

Once you have installed hiedb, you can index your .hie files. For fun, we’ll index GHC itself. The files were easy to generate just by passing -fwrite-ide-info.

$ hiedb index hiefiles/ghc
Processing file 561/561: /home/matt/ghc/hie-files/Check.hie... done
Completed!

The whole process takes less than a minute to index the 561 files. Reindexing is very fast, less than one second, so you can incrementally update the database as you work on your project and generate updated HIE files.

hiedb can dump the original Haskell source of a module (useful when your build tool doesn’t keep this around):

$ hiedb cat Main
{-# LANGUAGE CPP, NondecreasingIndentation, TupleSections #-}
{-# OPTIONS -fno-warn-incomplete-patterns -optc-DNON_POSIX_SOURCE #-}

-----------------------------------------------------------------------------
--
-- GHC Driver program
--
-- (c) The University of Glasgow 2005
--
-----------------------------------------------------------------------------

module Main (main) where

Now let’s suppose we want to know everywhere in GHC which uses the eqType function. hiedb can be queried to find all references to this name:

$ hiedb name-refs eqType
TcBinds:863:34-863:42
TcBinds:1018:48-1018:56
OptCoercion:121:55-121:63
OptCoercion:121:116-121:124
OptCoercion:266:9-266:17
OptCoercion:799:9-799:17
OptCoercion:1045:8-1045:16
TcHsSyn:1692:37-1692:45
TcMType:354:20-354:28
TcMType:355:20-355:28
...

There are quite a few!

For tooling you might want to know about the references for an identifier at certain source position. hiedb also supports this query, and it works for locally defined and unexported names.

eqType t1 t2 = isEqual $ nonDetCmpType t1 t2

Pointing it to the t1 parameter of eqType, line 2500, column 9:

$ hiedb point-refs Type 2500 9
Name t1 at (2500,9) is used in:
Type:2500:8-2500:10
Type:2500:40-2500:42

The two references to t1, the definition and its sole usage are returned. Similarly, we can query for the type at a point. Asking for the type of the usage of t1 in the body, we find that its type is … Type!.

$ hiedb point-type Type 2500 40
Type

There are a few more commands you can learn about by looking at the help text:

$ hiedb --help
hiedb - a tool to query groups of .hie files

Usage: hiedb [-D|--database DATABASE] [-v|--trace] [-q|--quiet]
             [-f|--virtual-file] COMMAND
  Query .hie files

Available options:
  -D,--database DATABASE   References
                           Database (default: "/home/zubin/.local/share/default.hiedb")
  -h,--help                Show this help text

Available commands:
  init                     Initialize databse
  index                    Index database
  name-refs                Lookup references of value MODULE.NAME
  type-refs                Lookup references of type MODULE.NAME
  name-def                 Lookup definition of value MODULE.NAME
  type-def                 Lookup definition of type MODULE.NAME
  cat                      Dump contents of MODULE as stored in the hiefile
  ls                       List all indexed files/modules
  rm                       Remove targets from index
  module-uids              List all the UnitIds MODULE is indexed under in the
                           db
  lookup-hie               Lookup the location of the .hie file corresponding to
                           MODULE
  point-refs               Find references for symbol at point/span
  point-types              List types of ast at point/span
  point-defs               Find definition for symbol at point/span
  point-info               Print name, module name, unit id for symbol at
                           point/span

Type information in Haddock’s hyperlinked source

hiedb is a great way to consume HIE files but it’s not the only way. Since GHC 8.8 and the Haddock version that ships with it (2.23.0), the hyperlinker has been re-engineered to use .hie files to generate hyperlinked source. This allows the generated hyperlinked source files to report types of expressions on hover.

To use this, you just need to build haddock documentation for your project with --hyperlinked-source enabled.

The Language Server Protocol is a language agnostic protocol for editors and tools to communicate. A LSP client, such as vscode, sends requests to a LSP server, such as haskell-ide-engine for information such as what to display on a hover or what references a certain symbol has.

LSIF files are a new extension to the protocol which provide a static snapshot of how a language server would respond to a request for a fixed piece of code. These allow editors and other clients to provide code intelligence for a project without having to run an LSP server. This can be used to answer queries like “go to definition”, “find references”, “hover”, etc. in editors as well as interfaces like GitHub’s PR code review.

You can use the hie-lsif tool to generate an LSIF file for your project using .hie files.

LSIF files are planned to be integrated into GitHub pull requests soon. With the release of GHC 8.8, Haskell will be in a good position to take advantage of this feature.

Future developments

There’s a couple more applications of .hie files that we’re excited about which are in the works.

Fast, low memory usage and persistent code intelligence for language servers

.hie files are good way to power some of the code intelligence features of language servers like haskell-ide-engine. They are easy to use and require very few system resources to answer queries. They can be used on their own, or as a fallback when typechecked modules aren’t available.

Furthermore, if you load a file in haskell-ide-engine that doesn’t compile, it will not be able to give you any code intelligence for that file. .hie files offer a way to persist some of the data haskell-ide-engine requires to answer queries.

Integrating hiedb with haskell-ide-engine will also enable cross project go-to definition and symbol references, two long requested features.

I plan to work on this during the rest of the summer.

GHCi’s :set +c - now powered by .hie files

GHCi’s :set +c functionality also exposes features like type-at-point, references-at-point etc. This is being rewritten to use .hie files, which should make it faster and more reliable.

You can track progress on this here: https://gitlab.haskell.org/ghc/ghc/issues/16804

Conclusion

.hie files have already been put to good use and they haven’t event been released yet! They provide a convenient middle ground between querying GHCi for information and writing a full blown source plugin. There’s a lot of information, probably all you need, in a simple and easy to consume format.

We’re looking forward to seeing how people use .hie files and if you have any problems with them make sure to open tickets on the issue tracker.