More adventures in parsing
I received an interesting comment from Conal Elliott on my previous post on parsing. I have to admit I wasn’t sure I understood him at first, I’m still not sure I do, but I think I have an idea of what he means
Basically my code is very sequential in that I use the do construct everywhere in the parsing code. Personally I thought that makes the parser very easy to read since the code very much mimics the structure of the maps file. I do realise the code isn’t very “functional” though so I thought I’d take Conal’s comments to heart and see what the result would be.
Let’s start with observation that every entity in a line is separated by a space. However some things are separated by other characters. So the first thing I did was write a higher-order function that first reads something, then reads a character and returns the first thing that was read:
thenChar c f = f >>= (\ r -> char c >> return r)
Since space is used as a separator so often I added a short-cut for that:
thenSpace = thenChar ' '
Then I put that to use on parseAddress:
parseAddress = let
hexStr2Int = Prelude.read . ("0x" ++)
in do
start <- thenChar '-' $ many1 hexDigit
end <- many1 hexDigit
return $ Address (hexStr2Int start) (hexStr2Int end)
Modifying the other parsing functions using thenChar and thenSpace is straight forward.
I’m not entirely sure I understand what Conal meant with the part about liftM in his comment. I suspect his referring to the fact that I first read characters and then convert them in the “constructors”. By using liftM I can move the conversion “up in the code”. Here’s parseAddress after I’ve moved the calls to hexStr2Int:
parseAddress = let
hexStr2Int = Prelude.read . ("0x" ++)
in do
start <- liftM hexStr2Int $ thenChar '-' $ many1 hexDigit
end <- liftM hexStr2Int $ many1 hexDigit
return $ Address start end
After modifying the other parsing functions in a similar way I ended up with this:
parsePerms = let
cA a = case a of
'p' -> Private
's' -> Shared
in do
r <- liftM (== 'r') anyChar
w <- liftM (== 'w') anyChar
x <- liftM (== 'x') anyChar
a <- liftM cA anyChar
return $ Perms r w x a
parseDevice = let
hexStr2Int = Prelude.read . ("0x" ++)
in do
maj <- liftM hexStr2Int $ thenChar ':' $ many1 hexDigit
min <- liftM hexStr2Int $ many1 hexDigit
return $ Device maj min
parseRegion = let
hexStr2Int = Prelude.read . ("0x" ++)
parsePath = (many1 $ char ' ') >> (many1 $ anyChar)
in do
addr <- thenSpace parseAddress
perm <- thenSpace parsePerms
offset <- liftM hexStr2Int $ thenSpace $ many1 hexDigit
dev <- thenSpace parseDevice
inode <- liftM Prelude.read $ thenSpace $ many1 digit
path <- parsePath <|> string ""
return $ MemRegion addr perm offset dev inode path
Is this code more “functional”? Is it easier to read? You’ll have to be the judge of that…
Conal, if I got the intention of your comment completely wrong then feel free to tell me I’m an idiot