Adventures in parsing, part 3

I got a great many comments, at least by my standards, on my earlier two posts on parsing in Haskell. Especially on the latest one. Conal posted a comment on the first pointing me towards liftM and its siblings, without telling me that it would only be the first step towards “applicative style”. So, here I go again…

First off, importing Control.Applicative. Apparently <|> is defined in both Applicative and in Parsec. I do use <|> from Parsec so preventing importing it from Applicative seemed like a good idea:

import Control.Applicative hiding ( (<|>) )

Second, Cale pointed out that I need to make an instance for Control.Applicative.Applicative for GenParser. He was nice enough to point out how to do that, leaving syntax the only thing I had to struggle with:

instance Applicative (GenParser c st) where
    pure = return
    (<*>) = ap

I decided to take baby-steps and I started with parseAddress. Here’s what it used to look like:

parseAddress = let
        hexStr2Int = Prelude.read . ("0x" ++)
    in do
        start <- liftM hexStr2Int $ thenChar '-' $ many1 hexDigit
        end <- liftM hexStr2Int $ many1 hexDigit
        return $ Address start end

On Twan’s suggestion I rewrote it using where rather than let ... in and since this was my first function I decided to go via the ap function (at the same time I broke out hexStr2Int since it’s used in so many places):

parseAddress = do
    start <- return hexStr2Int `ap` (thenChar '-' $ many1 hexDigit)
    end <- return hexStr2Int `ap` (many1 hexDigit)
    return $ Address start end

Then on to applying some functions from Applicative:

parseAddress = Address start end
    where
        start = hexStr2Int <$> (thenChar '-' $ many1 hexDigit)
        end = hexStr2Int <$> (many1 hexDigit)

By now the use of thenChar looks a little silly so I changed that part into many1 hexDigit <* char '-' instead. Finally I removed the where part altogether and use <*> to string it all together:

parseAddress = Address <$>
    (hexStr2Int <$> many1 hexDigit <* char '-') <*>
    (hexStr2Int <$> (many1 hexDigit))

From here on I skipped the intermediate steps and went straight for the last form. Here’s what I ended up with:

parsePerms = Perms <$>
    ( (== 'r') <$> anyChar) <*>
    ( (== 'w') <$> anyChar) <*>
    ( (== 'x') <$> anyChar) <*>
    (cA <$> anyChar)

    where
        cA a = case a of
            'p' -> Private
            's' -> Shared

parseDevice = Device <$>
    (hexStr2Int <$> many1 hexDigit <* char ':') <*>
    (hexStr2Int <$> (many1 hexDigit))

parseRegion = MemRegion <$>
    (parseAddress <* char ' ') <*>
    (parsePerms <* char ' ') <*>
    (hexStr2Int <$> (many1 hexDigit <* char ' ')) <*>
    (parseDevice <* char ' ') <*>
    (Prelude.read <$> (many1 digit <* char ' ')) <*>
    (parsePath <|> string "")

    where
        parsePath = (many1 $ char ' ') *> (many1 anyChar)

I have to say I’m fairly pleased with this version of the parser. It reads about as easy as the first version and there’s none of the “reversing” that thenChar introduced.

6 Comments

  1. A thing of beauty! I’m glad you stuck with it, Magnus.

    Some much smaller points:

    • The pattern (== c) <$> anyChar (nicely written, btw) arises three times, so it might merit a name.
    • Similarly for hexStr2Int <$> many1 hexDigit, especially when you rewrite f <$> (a <* b) to (f <$> a) <* b.
    • The pattern (a <* char ' ') <*> b comes up a lot. How about naming it also, with a nice infix op, say a <#> b?
    • The cA definition could use pattern matching instead (e.g., cA 'p' = Private and cA 's' = Shared).
    • Some of your parens are unnecessary (3rd line of parseDevice and last of parseRegion), since application binds more tightly than infix ops.
  2. hm. i wonder why the boxes around list items in my previous reply.

  3. First of all, note that you don’t need parentheses around parseSomething <* char ' '.

    You can also simplify things a bit more by combining hexStr2Int <$> many1 hexDigit into a function, then you could say: parseHex = hexStr2Int many1 hexDigit parseAddress = Address parseHex <* char ‘-’ <> parseHex parseDevice = Device parseHex < char ‘:’ <*> parseHex

    Also, in cA, should there be a case for character other than ‘p’ or ‘s’? Otherwise the program could fail with a pattern match error.

  4. Damn markdown/lack of preview button. The code block in the previous post should be

    parseHex = hexStr2Int <$> many1 hexDigit
    parseAddress = Address <$> parseHex <* char '-' <*> parseHex
    parseDevice  = Device  <$> parseHex <* char ':' <*> parseHex
    
  5. Magnus says:

    Hmm, this increasing traffic and commenting are highlighting some shortcomings of my wordpress setup it seems :-)

    First, Conal, the boxes are due to the theme I’m using, apparently list items are boxed. I don’t like it either and I’ll try to get around modifying the theme.

    Twan, I’ve now added a preview plugin for wordpress. It seems to work quite well and hopefully it’ll make it easier to avoid some of editing problems I’ve seen in comments lately.

  6. Magnus says:

    Conal and Twan, thanks for your suggestions. I’ll put them into practice and post the “final” result as soon as I find some time.

Leave a Reply

Please use markdown to make your comment beautiful.