Fast Map Union and Local Instances Through Instance Types26 February 2023
In part 31 of my crusade against GHC’s coherence guarantees, I have actually done it! This time we will end up with a way to generate local type class instances without any asterisks about code breaking with optimizations. On the way, we are going to end up solving the dreaded Fast Map Union Problem, combining two of my favorite Haskell tricks, and discovering a bug in a previous version of GHC.
But before we get to that, let’s start at the beginning
What even is this ‘fast Map union problem’?
Let’s pretend that Haskell did have consistent locally
overridable type class instances. In that case, the interface for
Map would be completely broken.
Look at the type of
insert :: Ord k => k -> v -> Map k v -> Map k v
See the issue?
Map is some kind of ordered tree
internally, so it depends on the
Ord instance being
consistent across different operations, but with local instances, we
don’t have any guarantees like that.
-- This uses the regular instance for Ord Int let map = insert (1 :: Int) 1 mempty -- This uses a different, incompatible instance on the same map! $ insert (2 :: Int) 2 map withLocalOrd reverseOrdInt
If you think about this for a bit, you may come up with a solution: You can store the instance in the map. This is how most map implementations in other languages work after all.
data Map k v = MkMap (Dict (Ord k)) (ActualMapImplementation k v)
And now the type of insert doesn’t mention
Ord at all
anymore so everything is sunshine and roses
insert :: k -> v -> Map k v -> Map k v
…until it isn’t! Consider the type of
union :: Map k v -> Map k v -> Map k v
What happens if both maps use different
Ord k instances?
We have no way to statically ensure that they don’t. The way to solve
this in practice would be to just reinsert every value from one map in
the other, but that means that union is now drastically slower than it
would be if we could assume that both maps use the same instance (It’s
now at least linear in the size of one map vs. logarithmic in that of
the smaller one).
Now, what we really need here is a way to statically ensure that both
maps use the same
Ord implementation. In a dependently
typed language, one could just carry the instance dictionary in the
data Map k v (inst :: Ord k) = ... union :: Map k v inst -> Map k v inst -> Map k v inst
But we cannot do that in Haskell (yet?), so we need a few more tricks. What we absolutely need to do is to somehow move the used instance to the type level, so let’s start there. While we cannot directly depend on the instance value, we can define a dummy type as a stand-in.
With this new type, we can write a version of
carries its concrete instance in the type. Every function with an
Ord constraint should now be polymorphic over this instance
class OrdI inst a | inst -> a where compareI :: a -> a -> Ordering data RegularOrdInt instance OrdI RegularOrdInt Int where = compare compareI data ReverseOrdInt instance OrdI ReverseOrdInt Int where = case compare x y of compareI x y LT -> GT EQ -> EQ GT -> LT
And now we can finally parameterize Map over the used
data IMap inst k v = ... insert :: OrdI inst k => k -> v -> IMap inst k v -> IMap inst k v union :: OrdI inst k => IMap inst k v -> IMap inst k v -> IMap inst k v
union statically ensures that both maps agree on their
The only ‘issue’ here is that we lose some inference. This is more or
less unavoidable if we want multiple instances to coexist (well, more on
that later). We need to tell Haskell what instance to use at some point,
but it’s not that bad, since at least for
carry the instance in their type, we only need to do this once.
let map1 = insert 1 1 $ insert 2 2 $ empty @RegularOrdInt let map2 = insert @ReverseOrdInt 3 3 $ insert 4 4 $ empty -- Type error: Couldn't match type 'RegularOrdInt' with `ReverseOrdInt`union map1 map2
How do local instances fit into this?
Quite well as it turns out! We can combine two of my favorite Haskell tricks to implement them in current Haskell (You can stop pretending that they exist. We are going to implement them for real now)
Local instances might differ between different executions of the same
code, so we need to make sure that the instance types can only be used
locally and cannot escape. If this sounds familiar that is because there
is a decent chance that you have used something similar before:
ST is a monad that is used for local mutability inside
pure code. You can use it to create
STRefs and mutate them,
just like you would in your favorite imperative language2,
except that Haskell, by virtue of being a pure language, needs to make
absolutely sure that you never ever return an
runST. Doing so would allow effects in one usage of the
runST to affect the result of another
ST do this? It uses higher-rank types!
newSTRef :: a -> ST s (STRef s a) runST :: (forall s. ST s a) -> a
If you have never seen this before, you’re probably quite confused
right now. The type
forall s. ST s a specifies that the
runST needs to be an
ST value that
works for any possible instantiation of
s. Crucially for
this case, this means that the
s variable is, unlike
a, not a type parameter of
itself, but actually one of the arguments to runST. This is
much clearer if we write
runST with an explicit outer
runST :: forall a. (forall s. ST s a) -> a
Now, the reason this ensures that no STRefs escape is that the
s parameter of the
STRef is the same as that
of the containing
ST monad. If you were able to
STRef from the argument to
e.g. by passing something of type
forall s. ST s (STRef s Int), what type would the result
have? Blindly substituting would yield
STRef s Int, but
s now? Previously,
s was bound by the
forall in the type of the argument to
forall does not exist anymore! This is why GHC
will not accept this code and complain about an ‘escaping skolem’.
We can use exactly the same trick for local type class instances to invent an instance type that only exists locally! If we can implement a function of the following type, then this will ensure that instance types cannot possibly escape.
withOrdI :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => b) -> b
Now, how do we implement this function? If you read the first post in this series, you will probably know the answer already.
At runtime, GHC represents type classes via a technique called
dictionary passing. This means that a function with an
Ord a => a -> a -> Ordering will
be turned into a function that takes an implementation of that type
class as an argument (
Ord a -> a -> a). This
implementation, called a dictionary, is just a regular record-like data
type that contains an implementation for every method.
GADTs, we can capture this dictionary in a
data OrdIDict a where OrdIDict :: Ord a => OrdIDict a
If we are now able to replicate the structure of this
OrdIDict exactly, but with a custom record of methods, we
unsafeCoerce to convert it to a functional
OrdIDict a. By pattern matching on the result, we can
release the dictionary back into a regular instance, which now contains
our handwritten type class instance!
Instinctively, your definition of
probably look like this
data OrdIDict inst a where OrdIDict :: OrdI inst a => OrdIDict inst a
This is not going to work though. We need to invent a new type for
OrdIDict cannot take it as a
parameter. Thanks to Haskell’s support for existential types, this is
not actually an issue, since we can just leave it off3.
data OrdIDict a where OrdIDict :: OrdI inst a => OrdIDict a
OrdI only has a single method, so it will be represented
by the equivalent of a newtype record at runtime. This is erased
entirely, so we do not need to build a record around our
a -> a -> Ordering function.
OrdIDict wrapper adds some indirection though, so we
need to replicate that.
data FakeDict a = FakeDict a
All preparations are complete. We are ready for the magic4!
withOrdI :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => b) -> b = withOrdI dict body case unsafeCoerce (FakeDict dict) :: OrdIDict a of OrdIDict @inst) -> body @inst (
And that’s… it!
Well, not quite. If you paid very close attention, you may notice
that we have no way to bind the
inst type variable in the
(forall inst. OrdI inst a => b) argument. This means if
we put a lambda there, we cannot actually mention
anywhere. This is easy enough to solve by introducing a
withOrdIProxy :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => Proxy inst -> b) -> b = withOrdIProxy dict body case unsafeCoerce (FakeDict dict) :: OrdIDict a of OrdIDict @inst) -> body @inst (Proxy @inst) (
Let’s try it out!
We can start with some concrete instances
= insert 1 1 (empty @Int) map1 = insert 2 2 (empty @Int) map2 = insert 3 3 (empty @ReverseIntOrd) map3 = insert 4 4 (empty @ReverseIntOrd) map4 = union map1 map2 fine1 = union map3 map4 fine2 = union map1 map3 -- fails!notFine
Looking good. Now let’s try local instances
= withOrdIProxy compare \(Proxy @inst) -> do local let localMap1 :: IMap inst Int Int = insert @inst 1 1 (empty @inst) ()
$ ghc-9.2 insttypes.hs insttypes.hs:80:42: error: • Could not deduce (OrdI (*) Int) arising from a use of ‘insert’ from the context: OrdI inst a0 ...
Huh. That is… strange. It looks like for some strange reason, GHC
inst parameter to…
(*)5? Even if we add as many type
annotations as physically possible?
If this smells like a bug in GHC, that is because it is! Or well, was.
If you run the same example with GHC 9.4, it will compile without complaining.
Now, let’s try that again
= withOrdIProxy compare \(Proxy @inst) -> do local let localMap1 = insert 1 1 (empty @inst) let localMap2 = insert 2 2 (empty @inst) let perfectlyFine = union localMap1 localMap2 let notFine :: IMap RegularIntOrd Int Int = insert @inst 2 2 (empty @inst) -- fails! let notFine2 = union localMap1 map1 -- also fails! -- fails! localMap1
Do we have to annotate the instance every time?
In this specific example with
IMap, the number of type
annotations required was quite manageable, but in code that uses
OrdI like regular
Ord, this is much less
min3 :: IOrd inst a => a -> a -> a -> a = case compare @inst x y of -- needs a type application! min3 x y z GT -> case compare @inst y z of -- this one as well GT -> z -> y _ -> case compare @inst x z of -- same here _ GT -> case compare @inst y z of -- also here GT -> z -> y _ -> x _
Used directly like this, this technique is really just Scrap your type classes with extra steps.
This is frustrating because we are just running up against GHC’s
stubbornness here. There is only a single possible instance for
IOrd inst a in scope, but GHC doesn’t want to choose it.
Usually, this behavior might make sense, but it’s really not helpful for
Fortunately for us though, we are not the first to run into this
issue! Polysemy, the
popular effect system library, had the exact same problem. Polysemy
makes it possible to define effects of the form
Member Effect r over a set of effects
issue here is that, unlike mtl, Polysemy allows
duplicate effects, so GHC again doesn’t trust that you really wanted to
use the constraint from the signature and not another, currently
How did Polysemy solve that? They wrote a type checker plugin that disambiguates Polysemy constraints whenever there is exactly one relevant instance in scope.
This is exactly what we want, so we could probably copy most of it.
But that is a topic for a future blog post.
We covered quite a bit of ground there.
- We parameterized type classes over a type representing the instance
- This made it possible to statically ensure that maps can safely be merged in logarithmic time, even in the presence of local instances
- We managed to implement local type class instances that are actually safe, even in the presence of optimizations
- We discovered a bug in GHC 9.2
- There is probably a way to avoid unnecessary type applications with a GHC plugin
This is the first time in this series that I can write a conclusion
without begging you to ‘please never ever use this anywhere near
production’. This technique still relies on GHC’s internal type class
representation (and you should always be careful around
unsafeCoerce) but so does the popular reflection
package, so you should be fine.
If you want to try this for yourself, you can get the code in a gist. Just make sure to use GHC 9.4 or above.
You don’t need to have read any of the previous posts to understand this, don’t worry. On the other hand, if you like this one, you will probably enjoy the others as well ;)↩︎
If your favorite imperative language is OCaml or Standard ML that is.↩︎
In fact, pattern matching on a value containing an existentially quantified type variable will turn that variable into a ‘skolem’, just like the higher rank type in the
runSTexample, so we are truly conjuring a new type from thin air with this.↩︎
This is an OCaml joke. Don’t worry about it.↩︎
(*)is another way of writing
Type, the type of types. The only reason this is even valid is that modern GHC Haskell has dependent kinds and
Type : Type.↩︎
If you want to tell me your thoughts about this post, you can leave a comment on Reddit or contact me directly.