HIE Files - coming soon to a GHC near you!
wz1000 - 2019-06-26
When GHC compiles your programs, it has to work out a bunch of information - it figures out where stuff is defined, assigns types to expressions, solves constraints and so on. However, all this information is not easily accessible to you after GHC is finished. To get your hands on this, you need to set up a GHC environment, and tediously use the GHC API to extract whatever information you care about.
HIE files are a new type of file that can be emitted by GHC via the
-fwrite-ide-info
flag. They serialize a bunch of useful information about
your source, so that it is easy and fast to access when you need it.
HIE files will be appearing for the first time in GHC 8.8.
How you can make use of .hie files right now
First, you need to generate some .hie files. To do this, you need GHC 8.8, or a recently built version of GHC HEAD.
Once you have these, you need to compile some files with -fwrite-ide-info
.
You can do this by invoking GHC explicitly, or by adding this flag to the
ghc-options
section of a .cabal
file. The compiler will output one .hie
file for each module it compiles. The file is a binary format which
can be read using a simple function from the GHC API.
The output directory for the files can be set using the -hiedir <dir>
flag.
This way it’s easy to collate all the files for your project into one place.
The rest of this blog post will be about projects which already use HIE files and some future ideas that we have for using them.
Using hiedb to lookup information about your code
hiedb
is a command line tool that makes
it easy to query .hie
files. It stores the information from the
files in a SQLite db, to enable very fast indexing and querying.
Once you have installed hiedb
, you can index your .hie
files. For fun,
we’ll index GHC itself. The files were easy to generate just by passing
-fwrite-ide-info
.
$ hiedb index hiefiles/ghc
Processing file 561/561: /home/matt/ghc/hie-files/Check.hie... done
Completed!
The whole process takes less than a minute to index the 561 files. Reindexing is very fast, less than one second, so you can incrementally update the database as you work on your project and generate updated HIE files.
hiedb
can dump the original Haskell source of a module (useful when
your build tool doesn’t keep this around):
$ hiedb cat Main
{-# LANGUAGE CPP, NondecreasingIndentation, TupleSections #-}
{-# OPTIONS -fno-warn-incomplete-patterns -optc-DNON_POSIX_SOURCE #-}
-----------------------------------------------------------------------------
--
-- GHC Driver program
--
-- (c) The University of Glasgow 2005
--
-----------------------------------------------------------------------------
module Main (main) where
Now let’s suppose we want to know everywhere in GHC which uses the eqType
function. hiedb
can be queried to find all references to this name:
$ hiedb name-refs eqType
TcBinds:863:34-863:42
TcBinds:1018:48-1018:56
OptCoercion:121:55-121:63
OptCoercion:121:116-121:124
OptCoercion:266:9-266:17
OptCoercion:799:9-799:17
OptCoercion:1045:8-1045:16
TcHsSyn:1692:37-1692:45
TcMType:354:20-354:28
TcMType:355:20-355:28
...
There are quite a few!
For tooling you might want to know about the references for an identifier
at certain source position. hiedb
also supports this query, and it works for
locally defined and unexported names.
= isEqual $ nonDetCmpType t1 t2 eqType t1 t2
Pointing it to the t1
parameter of eqType
, line 2500, column 9:
$ hiedb point-refs Type 2500 9
Name t1 at (2500,9) is used in:
Type:2500:8-2500:10
Type:2500:40-2500:42
The two references to t1
, the definition and its sole usage are returned.
Similarly, we can query for the type at a point. Asking for the type of
the usage of t1
in the body, we find that its type is … Type
!.
$ hiedb point-type Type 2500 40
Type
There are a few more commands you can learn about by looking at the help text:
$ hiedb --help
hiedb - a tool to query groups of .hie files
Usage: hiedb [-D|--database DATABASE] [-v|--trace] [-q|--quiet]
[-f|--virtual-file] COMMAND
Query .hie files
Available options:
-D,--database DATABASE References
Database (default: "/home/zubin/.local/share/default.hiedb")
-h,--help Show this help text
Available commands:
init Initialize databse
index Index database
name-refs Lookup references of value MODULE.NAME
type-refs Lookup references of type MODULE.NAME
name-def Lookup definition of value MODULE.NAME
type-def Lookup definition of type MODULE.NAME
cat Dump contents of MODULE as stored in the hiefile
ls List all indexed files/modules
rm Remove targets from index
module-uids List all the UnitIds MODULE is indexed under in the
db
lookup-hie Lookup the location of the .hie file corresponding to
MODULE
point-refs Find references for symbol at point/span
point-types List types of ast at point/span
point-defs Find definition for symbol at point/span
point-info Print name, module name, unit id for symbol at
point/span
Type information in Haddock’s hyperlinked source
hiedb
is a great way to consume HIE files but it’s not the only way.
Since GHC 8.8 and the Haddock version that ships with it (2.23.0),
the hyperlinker has been re-engineered to use .hie
files to generate
hyperlinked source. This allows the generated hyperlinked source files to
report types of expressions on hover.
To use this, you just need to build haddock documentation for your project
with --hyperlinked-source
enabled.
Code navigation and type information in your Github editor using hie-lsif
The Language Server
Protocol is a
language agnostic protocol for editors and tools to communicate. A LSP client,
such as vscode, sends requests to a LSP server, such as haskell-ide-engine
for information such as what to display on a hover or what references a
certain symbol has.
LSIF files are a new extension to the protocol which provide a static snapshot of how a language server would respond to a request for a fixed piece of code. These allow editors and other clients to provide code intelligence for a project without having to run an LSP server. This can be used to answer queries like “go to definition”, “find references”, “hover”, etc. in editors as well as interfaces like GitHub’s PR code review.
You can use the hie-lsif
tool
to generate an LSIF file for your project using .hie
files.
LSIF files are planned to be integrated into GitHub pull requests soon. With the release of GHC 8.8, Haskell will be in a good position to take advantage of this feature.
Future developments
There’s a couple more applications of .hie
files that we’re excited
about which are in the works.
Fast, low memory usage and persistent code intelligence for language servers
.hie
files are good way to power some of the code intelligence features of
language servers like haskell-ide-engine. They are easy to use and require
very few system resources to answer queries. They can be used on their own,
or as a fallback when typechecked modules aren’t available.
Furthermore, if you load a file in haskell-ide-engine
that doesn’t compile,
it will not be able to give you any code intelligence for that file. .hie
files
offer a way to persist some of the data haskell-ide-engine
requires to
answer queries.
Integrating hiedb
with haskell-ide-engine
will also enable cross project
go-to definition and symbol references, two long requested features.
I plan to work on this during the rest of the summer.
GHCi’s :set +c - now powered by .hie files
GHCi’s :set +c functionality also exposes features like type-at-point,
references-at-point etc. This is being rewritten to use .hie
files,
which should make it faster and more reliable.
You can track progress on this here: https://gitlab.haskell.org/ghc/ghc/issues/16804
Conclusion
.hie
files have already been put to good use and they haven’t event been
released yet! They provide a convenient middle ground between querying GHCi for
information and writing a full blown source plugin. There’s a lot of information,
probably all you need, in a simple and easy to consume format.
We’re looking forward to seeing how people use .hie
files and if you have any
problems with them make sure to open tickets on the issue tracker.