# Fast Map Union and Local Instances Through Instance Types

26 February 2023In part 3^{1} of my crusade against GHC’s coherence guarantees, I have
actually done it! This time we will end up with a way to generate local
type class instances without any asterisks about code breaking with
optimizations. On the way, we are going to end up solving the dreaded
Fast Map Union Problem, combining two of my favorite Haskell tricks, and
discovering a bug in a previous version of GHC.

But before we get to that, let’s start at the beginning

## What even is this ‘fast Map union problem’?

Let’s pretend that Haskell *did* have consistent locally
overridable type class instances. In that case, the interface for
`Map`

would be completely broken.

Look at the type of `Data.Map.insert`

`insert :: Ord k => k -> v -> Map k v -> Map k v`

See the issue? `Map`

is some kind of ordered tree
internally, so it depends on the `Ord`

instance being
consistent across different operations, but with local instances, we
don’t have any guarantees like that.

```
-- This uses the regular instance for Ord Int
let map = insert (1 :: Int) 1 mempty
-- This uses a different, incompatible instance on the same map!
$ insert (2 :: Int) 2 map withLocalOrd reverseOrdInt
```

If you think about this for a bit, you may come up with a solution: You can store the instance in the map. This is how most map implementations in other languages work after all.

`data Map k v = MkMap (Dict (Ord k)) (ActualMapImplementation k v)`

And now the type of insert doesn’t mention `Ord`

at all
anymore so everything is sunshine and roses

`insert :: k -> v -> Map k v -> Map k v`

…until it isn’t! Consider the type of `union`

now

`union :: Map k v -> Map k v -> Map k v`

What happens if both maps use different `Ord k`

instances?
We have no way to statically ensure that they don’t. The way to solve
this in practice would be to just reinsert every value from one map in
the other, but that means that union is now drastically slower than it
would be if we could assume that both maps use the same instance (It’s
now at least linear in the size of one map vs. logarithmic in that of
the smaller one).

## Instance Types

Now, what we really need here is a way to statically ensure that both
maps use the same `Ord`

implementation. In a dependently
typed language, one could just carry the instance dictionary in the
type

```
data Map k v (inst :: Ord k) = ...
union :: Map k v inst -> Map k v inst -> Map k v inst
```

But we cannot do that in Haskell (yet?), so we need a few more
tricks. What we absolutely *need* to do is to somehow move the
used instance to the type level, so let’s start there. While we cannot
directly depend on the instance value, we can define a dummy type as a
stand-in.

With this new type, we can write a version of `Ord`

that
carries its concrete instance in the type. Every function with an
`Ord`

constraint should now be polymorphic over this instance
parameter.

```
class OrdI inst a | inst -> a where
compareI :: a -> a -> Ordering
data RegularOrdInt
instance OrdI RegularOrdInt Int where
= compare
compareI
data ReverseOrdInt
instance OrdI ReverseOrdInt Int where
= case compare x y of
compareI x y LT -> GT
EQ -> EQ
GT -> LT
```

And now we can finally parameterize Map over the used
`Ord`

instance!

```
data IMap inst k v = ...
insert :: OrdI inst k => k -> v -> IMap inst k v -> IMap inst k v
union :: OrdI inst k => IMap inst k v -> IMap inst k v -> IMap inst k v
```

`union`

statically ensures that both maps agree on their
instances.

The only ‘issue’ here is that we lose some inference. This is more or
less unavoidable if we want multiple instances to coexist (well, more on
that later). We need to tell Haskell what instance to use at some point,
but it’s not that bad, since at least for `IMap`

s, which
carry the instance in their type, we only need to do this once.

```
let map1 = insert 1 1 $ insert 2 2 $ empty @RegularOrdInt
let map2 = insert @ReverseOrdInt 3 3 $ insert 4 4 $ empty
-- Type error: Couldn't match type 'RegularOrdInt' with `ReverseOrdInt` union map1 map2
```

## How do local instances fit into this?

Quite well as it turns out! We can combine two of my favorite Haskell tricks to implement them in current Haskell (You can stop pretending that they exist. We are going to implement them for real now)

Local instances might differ between different executions of the same
code, so we need to make sure that the instance types can only be used
locally and cannot escape. If this sounds familiar that is because there
is a decent chance that you have used something similar before: `runST`

.

`ST`

is a monad that is used for local mutability inside
pure code. You can use it to create `STRef`

s and mutate them,
just like you would in your favorite imperative language^{2},
except that Haskell, by virtue of being a pure language, needs to make
absolutely sure that you never ever return an `STRef`

from
`runST`

. Doing so would allow effects in one usage of the
supposedly pure `runST`

to affect the result of another
one.

How does `ST`

do this? It uses higher-rank types!

```
newSTRef :: a -> ST s (STRef s a)
runST :: (forall s. ST s a) -> a
```

If you have never seen this before, you’re probably quite confused
right now. The type `forall s. ST s a`

specifies that the
argument to `runST`

needs to be an `ST`

value that
works for any possible instantiation of `s`

. Crucially for
this case, this means that the `s`

variable is, unlike
`a`

, *not* a type parameter of `runST`

itself, but actually one of *the arguments to runST*. This is
much clearer if we write `runST`

with an explicit outer
`forall`

.

`runST :: forall a. (forall s. ST s a) -> a`

Now, the reason this ensures that no STRefs escape is that the
`s`

parameter of the `STRef`

is the same as that
of the containing `ST`

monad. If you *were* able to
return an `STRef`

from the argument to `runST`

,
e.g. by passing something of type
`forall s. ST s (STRef s Int)`

, what type would the result
have? Blindly substituting would yield `STRef s Int`

, but
what is `s`

now? Previously, `s`

was bound by the
`forall`

in the type of the argument to `runST`

,
but that `forall`

does not exist anymore! This is why GHC
will not accept this code and complain about an ‘escaping skolem’.

We can use exactly the same trick for local type class instances to invent an instance type that only exists locally! If we can implement a function of the following type, then this will ensure that instance types cannot possibly escape.

`withOrdI :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => b) -> b`

Now, how do we implement this function? If you read the first post in this series, you will probably know the answer already.

At runtime, GHC represents type classes via a technique called
dictionary passing. This means that a function with an `Ord`

constraint like `Ord a => a -> a -> Ordering`

will
be turned into a function that *takes an implementation of that type
class as an argument* (`Ord a -> a -> a`

). This
implementation, called a dictionary, is just a regular record-like data
type that contains an implementation for every method.

Using `GADT`

s, we can capture this dictionary in a
value.

```
data OrdIDict a where
OrdIDict :: Ord a => OrdIDict a
```

If we are now able to replicate the structure of this
`OrdIDict`

exactly, but with a custom record of methods, we
can use `unsafeCoerce`

to convert it to a functional
`OrdIDict a`

. By pattern matching on the result, we can
release the dictionary back into a regular instance, which now contains
our handwritten type class instance!

Instinctively, your definition of `OrdIDict`

would
probably look like this

```
data OrdIDict inst a where
OrdIDict :: OrdI inst a => OrdIDict inst a
```

This is not going to work though. We need to invent a new type for
`inst`

, so `OrdIDict`

cannot take it as a
parameter. Thanks to Haskell’s support for existential types, this is
not actually an issue, since we can just leave it off^{3}.

```
data OrdIDict a where
OrdIDict :: OrdI inst a => OrdIDict a
```

`OrdI`

only has a single method, so it will be represented
by the equivalent of a newtype record at runtime. This is erased
entirely, so we do not need to build a record around our
`a -> a -> Ordering`

function.

The `OrdIDict`

wrapper adds some indirection though, so we
need to replicate that.

`data FakeDict a = FakeDict a`

All preparations are complete. We are ready for the magic^{4}!

```
withOrdI :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => b) -> b
=
withOrdI dict body case unsafeCoerce (FakeDict dict) :: OrdIDict a of
OrdIDict @inst) -> body @inst (
```

And that’s… it!

Well, not quite. If you paid very close attention, you may notice
that we have no way to bind the `inst`

type variable in the
`(forall inst. OrdI inst a => b)`

argument. This means if
we put a lambda there, we cannot actually mention `inst`

anywhere. This is easy enough to solve by introducing a `Proxy`

value.

```
withOrdIProxy :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => Proxy inst -> b) -> b
=
withOrdIProxy dict body case unsafeCoerce (FakeDict dict) :: OrdIDict a of
OrdIDict @inst) -> body @inst (Proxy @inst) (
```

## Let’s try it out!

We can start with some concrete instances

```
= insert 1 1 (empty @Int)
map1 = insert 2 2 (empty @Int)
map2
= insert 3 3 (empty @ReverseIntOrd)
map3 = insert 4 4 (empty @ReverseIntOrd)
map4
= union map1 map2
fine1 = union map3 map4
fine2
= union map1 map3 -- fails! notFine
```

Looking good. Now let’s try local instances

```
= withOrdIProxy compare \(Proxy @inst) -> do
local let localMap1 :: IMap inst Int Int = insert @inst 1 1 (empty @inst)
()
```

```
$ ghc-9.2 insttypes.hs
insttypes.hs:80:42: error:
• Could not deduce (OrdI (*) Int) arising from a use of ‘insert’
from the context: OrdI inst a0
...
```

Huh. That is… strange. It looks like for some strange reason, GHC
defaults the `inst`

parameter to… `(*)`

^{5}? Even if we add as many type
annotations as physically possible?

If this smells like a bug in GHC, that is **because it
is!** Or well, *was*.

If you run the same example with GHC 9.4, it will compile without complaining.

Now, let’s try that again

```
= withOrdIProxy compare \(Proxy @inst) -> do
local let localMap1 = insert 1 1 (empty @inst)
let localMap2 = insert 2 2 (empty @inst)
let perfectlyFine = union localMap1 localMap2
let notFine :: IMap RegularIntOrd Int Int = insert @inst 2 2 (empty @inst) -- fails!
let notFine2 = union localMap1 map1 -- also fails!
-- fails! localMap1
```

Perfect!

### Do we have to annotate the instance every time?

In this specific example with `IMap`

, the number of type
annotations required was quite manageable, but in code that uses
`OrdI`

like regular `Ord`

, this is much less
pleasant

```
min3 :: IOrd inst a => a -> a -> a -> a
= case compare @inst x y of -- needs a type application!
min3 x y z GT -> case compare @inst y z of -- this one as well
GT -> z
-> y
_ -> case compare @inst x z of -- same here
_ GT -> case compare @inst y z of -- also here
GT -> z
-> y
_ -> x _
```

Used directly like this, this technique is really just Scrap your type classes with extra steps.

This is frustrating because we are just running up against GHC’s
stubbornness here. There is only a single possible instance for
`IOrd inst a`

in scope, but GHC doesn’t want to choose it.
Usually, this behavior might make sense, but it’s really not helpful for
us.

Fortunately for us though, we are not the first to run into this
issue! Polysemy, the
popular effect system library, had the exact same problem. Polysemy
makes it possible to define effects of the form
`Member Effect r`

over a set of effects `r`

. The
issue here is that, unlike mtl, Polysemy allows
duplicate effects, so GHC again doesn’t trust that you really wanted to
use the constraint from the signature and not another, currently
unwritten one.

How did Polysemy solve that? They wrote a type checker plugin that disambiguates Polysemy constraints whenever there is exactly one relevant instance in scope.

This is exactly what we want, so we could probably copy most of it.

But that is a topic for a future blog post.

## Conclusion

We covered quite a bit of ground there.

- We parameterized type classes over a type representing the instance
- This made it possible to statically ensure that maps can safely be merged in logarithmic time, even in the presence of local instances
- We managed to implement local type class instances that are actually safe, even in the presence of optimizations
- We discovered a bug in GHC 9.2
- There is probably a way to avoid unnecessary type applications with a GHC plugin

This is the first time in this series that I can write a conclusion
without begging you to ‘please never ever use this anywhere near
production’. This technique still relies on GHC’s internal type class
representation (and you should always be careful around
`unsafeCoerce`

) but so does the popular reflection
package, so you should be fine.

If you want to try this for yourself, you can get the code in a gist. Just make sure to use GHC 9.4 or above.

If you want to tell me your thoughts about this post, you can leave a comment on Reddit or contact me directly.