Adventures in parsing, part 3
I got a great many comments, at least by my standards, on my earlier two posts on parsing in Haskell. Especially on the latest one. Conal posted a comment on the first pointing me towards liftM and its siblings, without telling me that it would only be the first step towards “applicative style”. So, here I go again…
First off, importing Control.Applicative. Apparently <|> is defined in both Applicative and in Parsec. I do use <|> from Parsec so preventing importing it from Applicative seemed like a good idea:
import Control.Applicative hiding ( (<|>) )
Second, Cale pointed out that I need to make an instance for Control.Applicative.Applicative for GenParser. He was nice enough to point out how to do that, leaving syntax the only thing I had to struggle with:
instance Applicative (GenParser c st) where
pure = return
(<*>) = ap
I decided to take baby-steps and I started with parseAddress. Here’s what it used to look like:
parseAddress = let
hexStr2Int = Prelude.read . ("0x" ++)
in do
start <- liftM hexStr2Int $ thenChar '-' $ many1 hexDigit
end <- liftM hexStr2Int $ many1 hexDigit
return $ Address start end
On Twan’s suggestion I rewrote it using where rather than let ... in and since this was my first function I decided to go via the ap function (at the same time I broke out hexStr2Int since it’s used in so many places):
parseAddress = do
start <- return hexStr2Int `ap` (thenChar '-' $ many1 hexDigit)
end <- return hexStr2Int `ap` (many1 hexDigit)
return $ Address start end
Then on to applying some functions from Applicative:
parseAddress = Address start end
where
start = hexStr2Int <$> (thenChar '-' $ many1 hexDigit)
end = hexStr2Int <$> (many1 hexDigit)
By now the use of thenChar looks a little silly so I changed that part into many1 hexDigit <* char '-' instead. Finally I removed the where part altogether and use <*> to string it all together:
parseAddress = Address <$>
(hexStr2Int <$> many1 hexDigit <* char '-') <*>
(hexStr2Int <$> (many1 hexDigit))
From here on I skipped the intermediate steps and went straight for the last form. Here’s what I ended up with:
parsePerms = Perms <$>
( (== 'r') <$> anyChar) <*>
( (== 'w') <$> anyChar) <*>
( (== 'x') <$> anyChar) <*>
(cA <$> anyChar)
where
cA a = case a of
'p' -> Private
's' -> Shared
parseDevice = Device <$>
(hexStr2Int <$> many1 hexDigit <* char ':') <*>
(hexStr2Int <$> (many1 hexDigit))
parseRegion = MemRegion <$>
(parseAddress <* char ' ') <*>
(parsePerms <* char ' ') <*>
(hexStr2Int <$> (many1 hexDigit <* char ' ')) <*>
(parseDevice <* char ' ') <*>
(Prelude.read <$> (many1 digit <* char ' ')) <*>
(parsePath <|> string "")
where
parsePath = (many1 $ char ' ') *> (many1 anyChar)
I have to say I’m fairly pleased with this version of the parser. It reads about as easy as the first version and there’s none of the “reversing” that thenChar introduced.
A thing of beauty! I’m glad you stuck with it, Magnus.
Some much smaller points:
(== c) <$> anyChar(nicely written, btw) arises three times, so it might merit a name.hexStr2Int <$> many1 hexDigit, especially when you rewritef <$> (a <* b)to(f <$> a) <* b.(a <* char ' ') <*> bcomes up a lot. How about naming it also, with a nice infix op, saya <#> b?cA 'p' = PrivateandcA 's' = Shared).hm. i wonder why the boxes around list items in my previous reply.
First of all, note that you don’t need parentheses around
parseSomething <* char ' '.You can also simplify things a bit more by combining
hexStr2Int <$> many1 hexDigitinto a function, then you could say: parseHex = hexStr2Int many1 hexDigit parseAddress = Address parseHex <* char ‘-’ <> parseHex parseDevice = Device parseHex < char ‘:’ <*> parseHexAlso, in
cA, should there be a case for character other than ‘p’ or ‘s’? Otherwise the program could fail with a pattern match error.Damn markdown/lack of preview button. The code block in the previous post should be
Hmm, this increasing traffic and commenting are highlighting some shortcomings of my wordpress setup it seems
First, Conal, the boxes are due to the theme I’m using, apparently list items are boxed. I don’t like it either and I’ll try to get around modifying the theme.
Twan, I’ve now added a preview plugin for wordpress. It seems to work quite well and hopefully it’ll make it easier to avoid some of editing problems I’ve seen in comments lately.
Conal and Twan, thanks for your suggestions. I’ll put them into practice and post the “final” result as soon as I find some time.