Trying to work out iteratees

A few days ago I decided to explore the idea of using iteratee to do IO in Haskell. I read most of what Oleg has written on input processing using left-fold enumerators. Only a little wiser I took a look at the Iteratee IO package on Hackage. Unfortunately it still hadn’t quite sunk in. To be honest I couldn’t quite make heads or tails of it. Often just lining up the types properly will just work, even if I don’t understand whyi and soon after I usually gain some sort of understanding. This strategy didn’t seem to work with this particular package though :(

Somewhat based on Don’s answer to my question on Stackoverflow.com I thought I’d try to work through an implementation of my own. I just really hope I’ve got it right :)

I must admit I used Iteratee for inspiration at times. However, I couldn’t copy it straight off since I decided to first implement iteratee without involving monads. Sure, it’s rather silly to do “input processing” in Haskell without involving monads, but I find that including monads often clouds the problem at hand. So, I left them out to begin with, and add them again once I feel I know what I’m doing. So, here goes nothing…

The basic idea is to process a stream of data, presented in chunks. Each chunk is handed to an iteratee by an enumerator. For each chunk the iteratee signals to the enumerator whether it’s done or requires more data. These are the types I cooked up for this.

data Stream c e
    = Eof
    | Chunk (c e)
 
data Stepper c e a
    = Done a (Stream c e)
    | NeedAnotherChunk (Iteratee c e a)
 
data Iteratee c e a = Iteratee
    { runIteratee :: (Stream c e) -> (Stepper c e a) }
 
type Enumerator c e a = Iteratee c e a -> Iteratee c e a

I found it rather useful to implement Show for the two first, but I’ll leave that out of this post since it’s a simple thing to do.

I should probably point out that the container type for the stream (that’s the ‘c’ in Stream c e) is rather pointless in what I’ve done; it’s always going to be [] in this post. Keeping it in does provide some more similarity with the Iteratee on Hackage.

At this point I jumped to turning a list into an enumerator. The way I implemented it the list is split up in chunk of 3 items and present each chunk in order to the passed in iteratee.

enumList l iter = loop grouped iter
    where
        grouped = groupList 3 l
 
        groupList n l = let
                (p, r) = splitAt n l
            in case p of
                [] -> []
                xs -> xs:groupList n r

The loop function is the main part. Ending when the chunks are all used up is the easy part:

        loop [] i = let
                s = runIteratee i Eof
            in case s of
                Done v str -> Iteratee $ \ _ -> s
                NeedAnotherChunk i -> i

It is arguably an error if the iteratee returns NeedAnotherChunk when passed an Eof stream, but for now I’ll leave it the way it is. Doing the recursive step is very similar:

        loop (x:xs) i = let
                s = runIteratee i (Chunk x)
            in case s of
                Done v str -> Iteratee $ \ _ -> s
                NeedAnotherChunk i' -> loop xs i'

Here it is worth noticing that the iteratee is expected to return any part of the chunk that wasn’t processed.

Next I coded up my first iteratee, a map over the stream:

iFMap f = let
        doAcc acc Eof = Done acc Eof
        doAcc acc (Chunk i) = NeedAnotherChunk $ Iteratee $ doAcc (acc `mappend` (fmap f i))
    in Iteratee $ doAcc mempty

Now I can run the iterator over a list enumerator:

> runIteratee (enumList [1..9] (iFMap (*2))) Eof
Stepper Done <<[2,4,6,8,10,12,14,16,18]>> <<Stream: Eof>>

I found that a bit verbose, especially for interactive experimentation so the following simplifies it a bit

run iter = case (runIteratee iter) Eof of
    Done a _ -> a
    NeedAnotherChunk _ -> error "run: Iterator didn't finish on Eof"

As Oleg pointed out in his writings it turns out that Iteratee c e is a monad.

instance Monad (Iteratee c e) where
    return x = Iteratee $ \ s -> Done x s

The implementation of return is obvious, there really isn’t any other option than to encode is a continuation that returns Done irrespective of what is offered, and passes along the rest of the stream no matter what’s passed in. Bind (>>=) is a bit more complicated:

    i >>= f = Iteratee $ \ s ->
        let
            c = runIteratee i s
        in case c of
            Done v str -> runIteratee (f v) str
            NeedAnotherChunk i' -> NeedAnotherChunk $ i' >>= f

My understanding is that the left iteratee should be stepped along until it returns Done, at that point the result is passed to the right-side function, which results in an iteratee. The rest of the stream is then passed on to the new iteratee.

I implemented two other iteratees, iDrop :: Int -> Iteratee [] e () and iTakeWhile :: (e -> Bool) -> Iteratee [] e [e] with the obvious implementations. This then allows me to write a little test like this:

iTest = do
    iDrop 2
    t <- iTakeWhile (< 5)
    a <- return 'c'
    m <- iFMap (+ 3)
    return (t, a, m)

Running it gives the expected result:

> run (enumList [1..9] iTest)
([3,4],'c',[8,9,10,11,12])

That’s pretty much it. Not that much to it. At least not as long as I’ve actually understood iteratees.

  1. This reminds me of a math professor I had at university who said something like: “When it feels like the pen understands more than your head, persevere! It means you are close to getting it.”[back]
Share

6 Comments

  1. If this is all there is to it, why is there so much hype? It’s just the standard stream fusion hylomorphism stuff with a left fold. Which is nice stuff, don’t get me wrong, but it’s fairly well known already. From all I’ve heard about iteratees (not having read Oleg yet) it seems like there should be more to it, e.g. why all the bugs/issues keeping it from official release?

  2. @wren, Yes, that seems to be it indeed. I don’t quite understand what you mean with “keeping it from official release”. There’s the work to bring stream fusion into GHC ticket #915, but what I wrote is more related to Oleg’s iteratee-based IO. I thought the two things were completely separate, and that the latter wasn’t related to GHC development at all (hence the official release would be whatever can be found on Hackage). Of course I could be wrong, am I?

    I do have a limited implementation of iteratee-based IO (only ‘I’ really), which is derived from what I have in this post. I’m not sure it’s worth posting though.

  3. Hi Magnus, in my opinion it looks like you’ve got it. At least as well as I do anyway.

    Did you look at the “Examples” directory of the iteratee package? You’ll need to download the tarball and unpack it, but those examples are relatively simple. wave_reader.hs is the shortest, but headers.hs is probably more familiar to most programmers. If you did look at these and they weren’t helpful, I would appreciate knowing about it so I can improve them for the next release. Or if you could tell me where you got stuck at the beginning, maybe I can address that.

    As you’ve likely figured out, there’s actually a lot that can be done without including monads in the iteratee type. If you have an enumerator defined as

    type EnumeratorM m c e a = Iteratee c e a -> m (Iteratee c e a)

    you can do input. The IterateeG type in the iteratee package is actually a monad transformer, which allows for some very interesting applications. I’ve considered changing the iteratee package to provide pure iteratees as well as the monadic variety, but I don’t know that they would find much use. You need the monad to stream output, for example.

    wren, I think the biggest thing about iteratees is that not many people understand them. I/O in Haskell isn’t quite right yet, and I think most programmers are aware of this either intuitively or explicitly. So they think enumerator I/O could be the Holy Grail that will solve all their problems. The Oleg label adds to the mystique. There are advantages to enumerator I/O, but there are also drawbacks that anyone who tries to use them will surely discover sooner or later. Anyway, iteratees do offer some good features. They’re composable, have predictable memory usage, and allow short-cut evaluation of streams. They also fully encapsulate the handle so it cannot leak and is GC’d when the iteratee finishes processing, either naturally or on exceptions. Finally they can be very efficient; my iter-audio package (not on hackage yet, but public darcs repo) is the fastest Haskell wave library I know of. It’s faster than the hsndfile binding to libsndfile, which is the only real competitor performance-wise.

    Also, I don’t know what you mean about an “official release” either. Don released “liboleg”, which is basically a cabal-ization of Oleg’s releases, and the “iteratee” package is released too. If you know of any bugs/problems with “iteratee”, please report them to the maintainer, because right now the trac instance is empty :)

  4. @John, I guess I can comfortably say that I’ve got it then :-)

    Yes, I had a look at the Examples directory and found both the files you mention. I’ve so far only taken a look at headers.hs. Now I find it rather a lot easier to see what is going on and I realise that what I had problems with was actually more related to Char and streams. What I tried to do was to call readLines in a fileDriver and expected to get a list of all lines. That failed miserably, and all I got was Left []. After writing this post, and reading headers.hs again I realised I needed to map chr . fromIntegral over the stream first. That got me to at least get some output, though I was still surprised (see below).

    The main issue I’d try to address is simple examples along the lines of reading and printing (to the screen) a file, line by line and char by char.

    I haven’t studied the API extensively (yet) but I’ve already found a few things that surprised me, I’d address those as the second issue. What I’m thinking of is readLines, which for some reason terminates on an empty line and breaks on EOF. The same goes for enumLines. They both seem to be geared to usage for parsing HTTP, but their general-sounding names don’t give that away.

    The third issue as I see it is that there are users of the library mixed in with the library itself. Here I’m thinking of Data.Iteratee.Codecs.*. They are nice to have, but I’d argue they deserve their own little package.

  5. Sorry for the late reply; been busy with family stuff recently.

    I agree that the issues you mention with iteratee do all need to be dealt with. Reading strings (as you tried to do) is probably the most important. I don’t have a great solution yet, but I’m working on it. Thanks for the other issues as well. Solutions are planned for all of them, just a case of limited developer time.

  6. I am following your blog to write an Iteratee. But I change the “c” in Stream c e to be list by default. My implementation of takeWhileI seems not right. Could you give the implementation of it? Thanks. takeWhileI :: Int -> Iteratee e [e] takeWhileI 0 = return [] takeWhileI n = do x <- readOne xs <- takeWhileI (n-1) return (x:xs)

    readOne :: Iteratee e e readOne = Iteratee $ \s -> case s of Eof -> error “need more” Chunk [] -> error “need more1″ Chunk (x:xs) -> Done x (Chunk (xs))

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>