Phantom type problems
The last few days i’ve been hacking on a data encoding library in Haskell. Haskell has long been lacking in this area and receiving a double-encoded email about a week ago alerted me to this (you might have had to actually be in my head to follow that train of thought
). Yesterday I put up a Wiki page and posted an email to the Haskell library list. I also logged into #Haskell and asked for feedback. This was when Saizan came with an interesting suggestion:
What about using phantom types to annotate the encoding and use a typeclass to unify the interfaces?
That’s an excellent idea. One thought has been bugging me since beginning implementing the library, each module exposes the same methods, with the same type signatures, but there is no type-safe polymorphism in the API. I’ve never really understood phantom types so Saizan’s idea was a bit of an eye-opener for me. I think they can help solve the problem. Unfortunately I ran into another problem when playing.
I picked base16 encoding to play a bit with this because the implementation is fairly short and simple. I began with creating a new typeclass:
class DataEncoding d where
encode :: [Word8] -> d String
decode :: d String -> [Word8]
chop :: Int -> d String -> d [String]
unchop :: d [String] -> d String
liberate :: d [String] -> [String]
incarcerate :: [String] -> d [String]
I added the last two functions since it’s no good to have the encoding locked up and inaccessible all the time, at some point we might actually want to do something with the data.
The implementation in Base16 looks like this:
data B16Enc p = B16EncS String
| B16EncLS [String]
deriving (Eq, Show)
instance DataEncoding B16Enc where
encode os = B16EncS $ b16Encode os
decode (B16EncS s) = b16Decode s
chop n (B16EncS s) = B16EncLS $ b16Chop n s
unchop (B16EncLS ss) = B16EncS $ b16Unchop ss
liberate (B16EncLS ss) = ss
incarcerate ss = B16EncLS ss
Then I realised that it might be useful to be able to liberate not only the result of a chop but also the result of a plain encode. That is I want something like
liberate :: d a -> a
incarcerate :: a -> d a
Where ideally a can be String or [String] and nothing else (I’ll settle for a weaker type if I have to though). What I want to avoid, if possible is the need for two functions for both liberate and incarcerate:
liberateS :: d String -> String
incarcerateS :: String -> d String
liberateSL :: d [String] -> [String]
incarcerateSL :: [String] -> d [String]
I’ve tried just introducing the a like above, but that causes a failure to match types, and declaring the class on DataEncoding d a (and enabling Glasgow extensions) causes arity problems in the types that confuse me to no end.
So, introducing phantom types and a typeclass is still ongoing. Any tip on solving the immidiate issue is welcome. I’ll be back as soon as I’ve come to any conclusion as to whether it’s worth the trouble or not
Antoine:
I’m not really clear on why liberate and incarcerate can’t be polymorphic for all types a, seeing as their operation doesn’t depend on the type it’s operating on.
15 October 2007, 2:57 amMagnus:
Antoine, exactly! That’s what’s puzzling me as well.
15 October 2007, 11:48 amDN:
Try this:
16 October 2007, 9:28 amMagnus:
DN, thanks. I’ll have to take a closer look at it. From my cursory look it seems to be exactly what I was aiming for.
16 October 2007, 10:51 amAdam Turoff:
This is very interesting, but the module(s) seem to presume that a string will be encoded once. How would you represent a string that’s both UUEncoded and XML encoded (& => &, etc.)? RSS/Atom items also periodically deal with content that’s doubly XML encoded as well, which is a nasty and bothersome problem.
It looks like the approach here would be to incarcerate a string into one encoding, decode, and then incarcerate it into a different encoding….
16 October 2007, 3:44 pmMagnus:
Adam, that is a good observation. I’m not sure whether it’s an argument against using phantom types, but it certainly is a limitation worth noting. AFAICS it’s impossible in the general case to determine layered encodings so I’m afraid every API requires several iterations of detecting what encoding was used and then decoding.
17 October 2007, 10:34 amDN:
We’re talking two different kinds of encoding here: something like UTF-8 is typed [Word8] -> String but XML escaping is typed String -> String. So you’ve got to decide what you intend there.
But once past that, there’s no problem with nested encodings. Something UUEncoded then XMLEscaped would be of type Encoded XMLEsc (Encoded UUEnc String). You’ll have to tweak your definition of encode/decode to wrap or unwrap just one layer of encoding. It might be useful to define a NullEncoding that does nothing to get you from raw data to encoded data; then all your functions can work on encoded data.
17 October 2007, 11:09 pmMagnus:
DN, yes, XML (and HTML) escaping are special in so much as I wouldn’t call them encodings. The library I’ve been implementing is for data encodings and for that I think we agree that
[Word8] -> Stringis the type that fits best. Escaping, whereString -> Stringdoes fit, should arguably be in another module. (I’m not sure what URL encoding should be classed as, I suppose it falls somewhere in between encoding and escaping.)Typewise there’s no problem with layered encodings. All I was trying to say was that I’m not sure what advantages it offers for data encodings since it isn’t possible to look at an encoded string and “see past” the first layer. Even for “multiply-escaped” strings I’m not sure if it’s possible to look at the string and see what type to give it in order to do the correct sequence of de-escaping.
I think it might be worth taking your point into consideration when comming up with the API for the string escaping library, but for data encodings I’m not convinced that a “layered type” is worth it in practice. Hell, I’m not even sure using a phantom type to record one level of encoding is worth it in practice.
17 October 2007, 11:58 pm