Posts tagged ‘enumerator’

Trying to work out iteratees

A few days ago I decided to explore the idea of using iteratee to do IO in Haskell. I read most of what Oleg has written on input processing using left-fold enumerators. Only a little wiser I took a look at the Iteratee IO package on Hackage. Unfortunately it still hadn’t quite sunk in. To be honest I couldn’t quite make heads or tails of it. Often just lining up the types properly will just work, even if I don’t understand whyi and soon after I usually gain some sort of understanding. This strategy didn’t seem to work with this particular package though :(

Somewhat based on Don’s answer to my question on Stackoverflow.com I thought I’d try to work through an implementation of my own. I just really hope I’ve got it right :)

I must admit I used Iteratee for inspiration at times. However, I couldn’t copy it straight off since I decided to first implement iteratee without involving monads. Sure, it’s rather silly to do “input processing” in Haskell without involving monads, but I find that including monads often clouds the problem at hand. So, I left them out to begin with, and add them again once I feel I know what I’m doing. So, here goes nothing…

The basic idea is to process a stream of data, presented in chunks. Each chunk is handed to an iteratee by an enumerator. For each chunk the iteratee signals to the enumerator whether it’s done or requires more data. These are the types I cooked up for this.

data Stream c e
    = Eof
    | Chunk (c e)
 
data Stepper c e a
    = Done a (Stream c e)
    | NeedAnotherChunk (Iteratee c e a)
 
data Iteratee c e a = Iteratee
    { runIteratee :: (Stream c e) -> (Stepper c e a) }
 
type Enumerator c e a = Iteratee c e a -> Iteratee c e a

I found it rather useful to implement Show for the two first, but I’ll leave that out of this post since it’s a simple thing to do.

I should probably point out that the container type for the stream (that’s the ‘c’ in Stream c e) is rather pointless in what I’ve done; it’s always going to be [] in this post. Keeping it in does provide some more similarity with the Iteratee on Hackage.

At this point I jumped to turning a list into an enumerator. The way I implemented it the list is split up in chunk of 3 items and present each chunk in order to the passed in iteratee.

enumList l iter = loop grouped iter
    where
        grouped = groupList 3 l
 
        groupList n l = let
                (p, r) = splitAt n l
            in case p of
                [] -> []
                xs -> xs:groupList n r

The loop function is the main part. Ending when the chunks are all used up is the easy part:

        loop [] i = let
                s = runIteratee i Eof
            in case s of
                Done v str -> Iteratee $ \ _ -> s
                NeedAnotherChunk i -> i

It is arguably an error if the iteratee returns NeedAnotherChunk when passed an Eof stream, but for now I’ll leave it the way it is. Doing the recursive step is very similar:

        loop (x:xs) i = let
                s = runIteratee i (Chunk x)
            in case s of
                Done v str -> Iteratee $ \ _ -> s
                NeedAnotherChunk i' -> loop xs i'

Here it is worth noticing that the iteratee is expected to return any part of the chunk that wasn’t processed.

Next I coded up my first iteratee, a map over the stream:

iFMap f = let
        doAcc acc Eof = Done acc Eof
        doAcc acc (Chunk i) = NeedAnotherChunk $ Iteratee $ doAcc (acc `mappend` (fmap f i))
    in Iteratee $ doAcc mempty

Now I can run the iterator over a list enumerator:

> runIteratee (enumList [1..9] (iFMap (*2))) Eof
Stepper Done <<[2,4,6,8,10,12,14,16,18]>> <<Stream: Eof>>

I found that a bit verbose, especially for interactive experimentation so the following simplifies it a bit

run iter = case (runIteratee iter) Eof of
    Done a _ -> a
    NeedAnotherChunk _ -> error "run: Iterator didn't finish on Eof"

As Oleg pointed out in his writings it turns out that Iteratee c e is a monad.

instance Monad (Iteratee c e) where
    return x = Iteratee $ \ s -> Done x s

The implementation of return is obvious, there really isn’t any other option than to encode is a continuation that returns Done irrespective of what is offered, and passes along the rest of the stream no matter what’s passed in. Bind (>>=) is a bit more complicated:

    i >>= f = Iteratee $ \ s ->
        let
            c = runIteratee i s
        in case c of
            Done v str -> runIteratee (f v) str
            NeedAnotherChunk i' -> NeedAnotherChunk $ i' >>= f

My understanding is that the left iteratee should be stepped along until it returns Done, at that point the result is passed to the right-side function, which results in an iteratee. The rest of the stream is then passed on to the new iteratee.

I implemented two other iteratees, iDrop :: Int -> Iteratee [] e () and iTakeWhile :: (e -> Bool) -> Iteratee [] e [e] with the obvious implementations. This then allows me to write a little test like this:

iTest = do
    iDrop 2
    t <- iTakeWhile (< 5)
    a <- return 'c'
    m <- iFMap (+ 3)
    return (t, a, m)

Running it gives the expected result:

> run (enumList [1..9] iTest)
([3,4],'c',[8,9,10,11,12])

That’s pretty much it. Not that much to it. At least not as long as I’ve actually understood iteratees.

  1. This reminds me of a math professor I had at university who said something like: “When it feels like the pen understands more than your head, persevere! It means you are close to getting it.”[back]