Timing out in conduit

The previous post in my play-with-state-machines-and-conduit series was a bit of a detour into how to combine inputs in conduit. In this post I’ll try to wrap it up by making a conduit pipeline with a timeout.

Prelude

This post doesn’t reflect my first attempt at writing a state machine that times out. My first attempt was to actually write a state machine that timed out. I combined the input from the user with a tick generated once a second into a single event using Either. Each state of the machine then needed to keep track of the number of ticks it had seen in order to time out on seeing the fourth one. The counter was reset on each state transition. Obviously this led to a rather complicated state machine.

Then I decided to attempt to explore having more but simpler parts in the conduit pipeline. I think that strategy resulted in a much cleaner and simpler design.

A state machine for timing out

The state machine for the addition is unchanged from the earlier post and the timing out is added via a second state machine:

data TOState = TOGood Int | TOTimedOut
    deriving (Eq, Show)

data TOSignal a = TONoop | TOVal a | TOTime

toMachine :: Machine TOState (Either t a) (TOSignal a)
toMachine = createMachine (TOGood 0) go
    where
        go (TOGood _) (Right e) = (TOGood 0, TOVal e)

        go (TOGood n) (Left _)
            | n > 3 = (TOTimedOut, TOTime)
            | otherwise = (TOGood (n + 1), TONoop)

        go TOTimedOut _ = (TOTimedOut, TOTime)

It’s a tiny machine of only two states. The ticks come in via Left _ values and cause the counter to be increased each time. The Right e values are simply re-wrapped and passed on together with a resetting of the counter.

The process

The process is built up by combing the users input with a ticker source, the combination is passed through the time-out machine. After that the signal is inspected to see if it’s time to terminate, if not the inner value is extracted so it can be shown to the user.

The ticker source is a source that repeatedly waits followed by yielding a Tick:

sourceTick :: MonadIO m => Int -> Source m Tick
sourceTick w = forever $ do
    liftIO $ threadDelay w
    C.yield (Tick w)

The full process can then be written like this:

process :: ResourceT IO ()
process = ticker source $= timerOuter $= calcRes $$ output
    where
        source = stdinC $= concatMapC unpack $= calculator $= mapC Right

        calculator = wrapMachine calcMachine

        ticker s = whyTMVar s tickSource =$= wrapMachine toMachine
            where
                tickSource = sourceTick 1000000 $= mapC Left

        timerOuter = takeWhileC p1 =$= filterC p2 =$= mapC u
            where
                p1 TOTime = False
                p1 _ = True

                p2 (TOVal _) = True
                p2 _ = False

                u (TOVal a) = a
                u _ = undefined -- partial, but all right

        calcRes = mapC f
            where
                f CalcNothing = ""
                f (CalcResult n) = " =" ++ show n ++ "\n"
                f e@(CalcError {}) = "\n*** " ++ show e ++ "\n"

        output = mapC pack =$ stdoutC

Using the same definition of main as from the earlier post results in the following behaviour:

% runhaskell CalculatorTimeOut.hs
5 6 + =11
5 6 7
*** CalcError (ReadOperator 5 6) "Bad operator"
42 17 + =59

The process can be terminated by either pressing Ctrl-C, just like before, or by waiting for a time-out, like I did above.

Combining inputs in conduit

The code the post on using my simplistic state machine together with conduit will happily wait forever for input, and the only way to terminate it is pressing Ctrl-C. In this series I’m writing a simple adder machine, but my actual use case for these ideas are protocols for communication (see the first post), and waiting forever isn’t such a good thing; I want my machine to time out if input doesn’t arrive in a timely fashion.

I can think of a few ways to achieve this:

  1. Use a watchdog thread that is signalled from the main thread, e.g. by a Conduit that sends a signal as each Char passes through it.
  2. As each Char is read kick off a “timeout thread” which throws an exception back to the main thread, unless the main thread kills it before the timeout expires.
  3. Run a thread that creates ticks that then can be combined with the Chars read and fed into the state machine itself.

Since this is all about state machines I’ve opted for option 3. Furthermore, since I’ve recently finished reading the excellent Parallel and Concurrent Programming in Haskell I decided to attempt writing the conduit code myself instead of using something like stm-conduit.

The idea is to write two functions:

  1. one Source to combine two Sources, and
  2. one Sink that writes its input into a TMVar.

The latter of the two is the easiest one. Given a TMVar it just awaits input, stores it in the TMVar and then calls itself:

sinkTMVar :: MonadIO m => TMVar a -> Sink a m ()
sinkTMVar tmv = forever $ do
    v <- await
    case v of
        Nothing -> return ()
        Just v' -> liftIO (atomically $ putTMVar tmv v')

The other one is only slightly more involved:

whyTMVar :: MonadIO m => Source (ResourceT IO) a -> Source (ResourceT IO) a -> Source m a
whyTMVar src1 src2 = do
    t1 <- liftIO newEmptyTMVarIO 
    t2 <- liftIO newEmptyTMVarIO
    void $ liftIO $ async $ fstProc t1
    void $ liftIO $ async $ sndProc t2
    forever $ liftIO (atomically $ takeTMVar t1 `orElse` takeTMVar t2) >>= C.yield
    
    where
        fstProc t = runResourceT $ src1 $$ sinkTMVar t
        sndProc t = runResourceT $ src2 $$ sinkTMVar t

Rather short and sweet I think. However, there are a few things that I’m not completely sure of yet.

forkIO vs. async vs. resourceForkIO
There is a choice between at least three functions when it comes to creating the threads and I haven’t looked into which one is better to use. AFAIU there may be issues around exception handling and with resources. For now I’ve gone with async for no particular reason at all.
Using TMVar
In this example the input arrives rather slowly, which means having room for a single piece at a time is enough. If the use case changes and input arrives quicker then this decision has to be revisited. I’d probably choose to use stm-conduit in that case since it uses TMChan internally.
Combining only two Sources
Again, this use case doesn’t call for more than two Sources, at least at this point. If the need arises for using more Sources I’ll switch to stm-conduit since it already defines a function to combine a list of Sources.

The next step will be to modify the conduit process and the state machine.

Thought on JavaScript

Over the holidays I’ve been reading Douglas Crockford’s excellent book JavaScript: The Good Parts. About halfway through I came to the conclusion that JavaScript is an “anti-LISP”. There are many reasons to learn LISP, but none of them is “LISP is widely used in industry.” As Eric Raymond is famous words claim, knowing LISP will make you a better programmer. On the other hand there seems to be almost no reasons to learn JavaScript. It sure doesn’t seem to teach anything that’ll make you a better programmer. The only reason I can come up with is “JavaScript is widely used in industry.”

State machine with conduit

Previously I wrote about a simple state machine in Haskell and then offered a simple example, an adder. As I wrote in the initial post I’ve been using this to implement communication protocols. That means IO, and for that I’ve been using conduit. In this post I thought I’d show how easy it is to wrap up the adder in conduit, read input from stdin and produce the result on stdout.

At the top level is a function composing three parts:

  1. input from stdin
  2. processing in the adder
  3. output on stdout

This can be written as this

process :: ResourceT IO ()
process = source $= calc $$ output
    where
        source = stdinC

        output = stdoutC

        calc = concatMapC unpack =$= wrapMachine calcMachine =$= filterC f =$= mapC (pack . show)
            where
                f (CalcResult _) = True
                f _ = False

The input to the adder is ordinary Chars so the ByteStrings generated by stdinC need to be converted before being handed off to the state machine, which is wrapped to make it a Conduit (see below). The state machine is generating more signals that I want printed, so I filter out everything but the CalcResults. Then it’s passed to show and turned into a ByteString before sent to stdoutC.

I think the implementation of wrapMachine shows off the simplicity of conduit rather well:

wrapMachine :: (Monad m) => Machine state event signal -> Conduit event m signal
wrapMachine mach = do
    maybeEvent <- await
    case maybeEvent of
        Nothing -> return ()
        Just event -> let
                (newMach, signals) = stepMachine mach event
            in do
                yield signals
                wrapMachine newMach

The main function is as short as this:

main :: IO ()
main = do
    hSetBuffering stdin NoBuffering
    hSetBuffering stdout NoBuffering
    runResourceT process

The result is about as exciting as expected

% runhaskell Calculator.hs
56 42 +CalcResult 98
17 56 +CalcResult 73

It never exits on its own so one has to use Ctrl-C to terminate. I thought I’d try to address that in the next post by adding a time-out to the adder machine.

Adder as a state machine

In my previous post I wrote about a, probably rather naïve, approach to constructing state machines in Haskell. I ended it with a saying that the pattern matching of Haskell makes it rather simple to manually write the step function required to create a working state machines. Hopefully this post can convince you I’m right.

The vehicle I’ve chosen is a very simple machine capable of reading two integers and adding them. In slightly more detail it’s a machine that:

  1. reads a total of three parts, two integers followed by a plus sign (+)
  2. each part is separated by whitespace
  3. on errors the machine resets and starts over

The signals (a.k.a. the output alphabet) is the following type

data CalcSignal = CalcNothing | CalcResult Int | CalcError CalcStates String
    deriving (Eq, Show)

The events (a.k.a. the input alphabet) is simply Char. The states are

data CalcStates = Start | ReadFirst Int | ReadSecond Int Int | ReadOperator Int Int
    deriving (Eq, Show)

where the names hopefully are self explanatory. The type of the machine is then

type CalcMachine = Machine CalcStates Char CalcSignal

The machine itself can then be written like this:

calcMachine :: CalcMachine
calcMachine = createMachine Start go
    where
        go Start e
            | isNumber e = (ReadFirst (read [e]), CalcNothing)
            | otherwise = (Start, CalcError Start "No number")

        go s@(ReadFirst i) e
            | isNumber e = (ReadFirst (10 * i + read [e]), CalcNothing)
            | isSpace e = (ReadSecond i 0, CalcNothing)
            | otherwise = (Start, CalcError s "Bad format")

        go s@(ReadSecond l i) e
            | isNumber e = (ReadSecond l (10 * i + read [e]), CalcNothing)
            | isSpace e = (ReadOperator l i, CalcNothing)
            | otherwise = (Start, CalcError s "Bad format")

        go s@(ReadOperator i j) e
            | e == '+' = (Start, CalcResult (i + j))
            | isSpace e = (s, CalcNothing)
            | otherwise = (Start, CalcError s "Bad operator")

That’s rather simple and easy to read I find. Though I’m not sure it scales too well to much larger machines. I’ve not really used any DSLs to create large machines either, so I don’t know how well any method scales ;)

To do a bit of exploratory testing it’s handy to create the following function

calculate :: String -> IO ()
calculate = foldM_ go calcMachine
    where 
        go mach c = do
            let (m, s) = stepMachine mach c
            print s 
            return m

Using that function it’s easy to check if the machine works as intended.

> calculate "56 67 +"
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcResult 123

So far so good. What about the behaviour on errors?

> calculate "5a6 67 +"
CalcNothing
CalcError (ReadFirst 5) "Bad format"
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcResult 73

That looks good enough to me. Though there is (at least) one detail of how the machine works that might be surprising and hence should be fixed, but I’ll leave that as an exercise for the reader ;)

As I mentioned in the previous post I’ve been using this method for writing state machines to implement two different protocols. For the IO I used conduit which means I had to turn the state machine into a conduit. I’ll write about that in a later post though.

Simple state machines in Haskell

During this past autumn I’ve had to work with products implementing two small and simple protocols, one was USB based, the other completely proprietary. In both cases I found it very helpful to write clients where the central behaviour was modelled as a state machine. In the case of the proprietary protocol I’ve ended up writing a couple of servers implementing a subset of the behaviour of the real product, also here I found using a state machine as the central piece to be very useful.

I started out having a look at [Hackage][] to see if there was already some tool or DSL that would help me create machines. Surprisingly I found nothing, which probably just means I didn’t look carefully enough. Anyway, after reading the Wikipedia articles on Finite-state machine, finite state transducer, Moore machine and Mealy machine I decided to write my own naïve implementation of something resembling Mealy machines.

A Mealy machine is a 6-tuple comprising

  • a finite set of states
  • a start state
  • a finite set called the input alphabet
  • a finite set called the output alphabet
  • a transition function mapping a state and an input symbol to the next state
  • an output function mapping a state and an input symbol to an output symbol

If the transition and output functions are combined into a single function, and we don’t care about being able to reset a machine, it’s possible to use a representation as simple as this:

data Machine state event signal = Machine
    { mCurState :: state
    , mTransFunction :: state -> event -> (state, signal)
    }

I’ve opted to call the input alphabet event and the output alphabet signal.

Stepping a machine is then performed by passing the current state and an event to the combined function and updating the machine with the result. Of course the generated signal has to be dealt with too. Or in other words:

stepMachine :: Machine state event signal -> event -> (Machine state event signal, signal)
stepMachine machine event = (machine {mCurState = newState}, output)
    where
        curState = mCurState machine
        (newState, output) = mTransFunction machine curState event

I also decided to add a function for creating a machine given a start state and a function. With the definition above it becomes trivial:

createMachine :: state -> (state -> event -> (state, signal)) -> Machine state event signal
createMachine = Machine

That’s it, really. Except of course that the actual transition function still has to be written. However, with the pattern matching of Haskell I’ve found that to be rather simple, but I’ll keep that for the next post.

Adding some strictness

I’ve finally found some time to solve a few more exercises on exercim.io. One exercise, called Robot Simulation, I solved in a straight forward way and promptly got the feedback that my choice of data type could lead to space leaks. Since it’s a dimension of Haskell programming that I rarely (read “never”) have had to think about I thought I’d take the chance to play a bit.

A simplified solution

I don’t want to give away the solution to the exercise so I’m using some code of a similar shape.

import Data.List

newtype P = P (Int, Int)
    deriving (Eq, Show)

step :: P -> Char -> P
step (P c) 'A' = P (c `add` (0, 1))
step _ _ = undefined

add :: (Num a, Num b) => (a, b) -> (a, b) -> (a, b)
add (x0, y0) (x1, y1) = (x0 + x1, y0 + y1)

p :: P
p = foldl' step (P (0, 0)) $ replicate 5000000 'A'

main :: IO ()
main = print p

There’s nothing complicated about the code, but the memory usage is rather ridiculous:

   1,132,746,080 bytes allocated in the heap
   1,149,823,448 bytes copied during GC
     346,747,064 bytes maximum residency (11 sample(s))
       3,817,608 bytes maximum slop
             600 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2148 colls,     0 par    0.58s    0.58s     0.0003s    0.0006s
  Gen  1        11 colls,     0 par    0.56s    0.56s     0.0510s    0.2278s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.25s  (  0.25s elapsed)
  GC      time    1.15s  (  1.14s elapsed)
  EXIT    time    0.02s  (  0.02s elapsed)
  Total   time    1.42s  (  1.41s elapsed)

  %GC     time      80.9%  (81.0% elapsed)

  Alloc rate    4,593,597,274 bytes per MUT second

  Productivity  19.0% of total user, 19.1% of total elapsed

Surely it ought to be possible to add 5 million numbers in less than 600MB, right?

The reason, AFAIU is that tuples are lazy, which means a lot of thunk building when doing the additions.

Using a stricter data type

The maybe easiest way to add some strictness is to add a few exclamations marks in the data type. Modifying the data type brings a few other changes though, so here’s first a version without any strictness annotations so I have something to compare with later on1:

{-# LANGUAGE RecordWildCards #-}
import Data.List

data P = P { pX :: Int, pY :: Int }
    deriving (Eq, Show)

p :: P
p = foldl' step (P 0 0) $ replicate 5000000 'A'

step :: P -> Char -> P
step q@(P {..}) 'A' = q { pX = pX +1 }
step _ _ = undefined

main :: IO ()
main = print p

This solution uses about half as much memory:

     766,417,952 bytes allocated in the heap
     639,064,600 bytes copied during GC
     155,866,064 bytes maximum residency (11 sample(s))
       1,914,016 bytes maximum slop
             345 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1460 colls,     0 par    0.32s    0.32s     0.0002s    0.0006s
  Gen  1        11 colls,     0 par    0.33s    0.33s     0.0303s    0.1071s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.15s  (  0.15s elapsed)
  GC      time    0.65s  (  0.65s elapsed)
  EXIT    time    0.01s  (  0.01s elapsed)
  Total   time    0.81s  (  0.81s elapsed)

  %GC     time      80.2%  (80.2% elapsed)

  Alloc rate    5,212,863,895 bytes per MUT second

  Productivity  19.8% of total user, 19.8% of total elapsed

Halving the memory usage is rather nice, but adding some strictness ought to make it better still. The code with added exclamation marks:

{-# LANGUAGE RecordWildCards #-}
import Data.List

data P = P { pX :: !Int, pY :: !Int }
    deriving (Eq, Show)

p :: P
p = foldl' step (P 0 0) $ replicate 5000000 'A'

step :: P -> Char -> P
step q@(P {..}) 'A' = q { pX = pX +1 }
step _ _ = undefined

main :: IO ()
main = print p

The runtime behaviour of this code is remarkably more frugal with memory:

     480,054,816 bytes allocated in the heap
          57,968 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       918 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.09s  (  0.09s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.09s  (  0.09s elapsed)

  %GC     time       2.4%  (2.4% elapsed)

  Alloc rate    5,630,441,644 bytes per MUT second

  Productivity  97.3% of total user, 98.2% of total elapsed

Indeed, 1MB is a bit more like it!

Strictness with the original type

However impressive the results when using a stricter data type, it might still be preferable to keep the original, somewhat simpler, type. It is possible to do just that by placing some explicit strictness in well chosen places. Since my suspicion is that it’s the addition in step that causes the space leak we’ll rewrite it:

import Data.List

import Control.DeepSeq

newtype P = P (Int, Int)
    deriving (Eq, Show)

add :: (Num a, Num b) => (a, b) -> (a, b) -> (a, b)
add (x0, y0) (x1, y1) = (x0 + x1, y0 + y1)

p :: P
p = foldl' step (P (0, 0)) $ replicate 5000000 'A'

step :: P -> Char -> P
step (P c) 'A' = let np = c `add` (0, 1) in np `deepseq` P np
step _ _ = undefined

main :: IO ()
main = print p

Note that it’s not enough to use seq; a bit more force is needed, which is what deepseq offers. It delivers sure enough:

     920,052,632 bytes allocated in the heap
         198,736 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1774 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.29s  (  0.29s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.30s  (  0.30s elapsed)

  %GC     time       1.4%  (1.4% elapsed)

  Alloc rate    3,124,303,717 bytes per MUT second

  Productivity  98.5% of total user, 98.9% of total elapsed

A further slight change is possible: implement NFData for P itself. That way it’s not necessary to pull out and force evaluation of the innards of P. Using the default implementation of rnf did not cut it though, since it uses seq it just isn’t forceful enough:

import Data.List

import Control.DeepSeq

newtype P = P (Int, Int)
    deriving (Eq, Show)

instance NFData P where
    rnf (P q) = deepseq q ()

add :: (Num a, Num b) => (a, b) -> (a, b) -> (a, b)
add (x0, y0) (x1, y1) = (x0 + x1, y0 + y1)

p :: P
p = foldl' step (P (0, 0)) $ replicate 5000000 'A'

step :: P -> Char -> P
step (P c) 'A' = let np = P (c `add` (0, 1)) in force np
step _ _ = undefined

main :: IO ()
main = print p

The result is close to the previous:

     920,052,632 bytes allocated in the heap
         198,736 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1774 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.33s  (  0.33s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.33s  (  0.33s elapsed)

  %GC     time       1.2%  (1.2% elapsed)

  Alloc rate    2,817,175,186 bytes per MUT second

  Productivity  98.7% of total user, 99.0% of total elapsed

It looks like it’s slightly slower though, which might be caused by the double call to deepseq in this version.

Conclusion

From this little example it’s clear that a little strictness in well-chosen places can go a long way in reducing memory footprint. The various solutions also show that the results can vary quite a bit depending on the strategy for introducing strictness. In this particular case I think the stricter data type (see Using a stricter data type above) is the best choice, it’s both faster and uses less memory than any of the others.


  1. I’m using RecordWildCards to simplify the code a bit, which possibly might influence evaluation as well. I’m simply not sure.

FP101x completed

TL;DR

As of last evening I finished the last exercise/homework of the course FP101x Introduction to Functional Programming. I mostly signed up for fun, but did intend all along to go through with it, watch the lectures, do the homework… in short, take the course ;)

Now that it’s done I’m happy I did. It gave about as much as I expected on the programming side, and it gave me some experience with using the edX MOOC. I will suggest the course to colleagues and I’m already looking around for the next course to take myself.

Thoughts about the course

The course was, on the whole, very good. It was well planned; I felt it started from a level suitable for people unfamiliar with FP and then progressed at a reasonable pace. It should be noted though that the lectures probably isn’t enough for someone who’s new to FP; reading the book or researching the subjects on one’s own is required to follow along.

There were a few things in the course that I don’t really see the point of though. The thing that stands out in particular is the 12th lecture that covers proofs by induction of functions that work on lists. I think it would have been worthwhile spending some time on making more “realistic programs.” Furthermore, picking the “Countdown Problem” as the topic of an entire lecture feels very “academic”. Personally I would have liked an example that comes a bit closer to a problem that better reflects that FP can be used to solve problems that occur in a professional developer’s work.

Thoughts about the edX MOOC

The edX site is fairly nice to work with. There are several things that I liked from the beginning:

  1. Several methods of logging in are offered, I opted to use my Google-plus account since I like to avoid creating yet another account if I can.
  2. Interacting with the site is natural and finding my way around was easy from the start.
  3. The page I land on after logging in shows my current courses and it’s easy to from there.
  4. There’s a page showing my progress in the course; that turned out to be the page I go to all the time since I can find the next lecture from there.

There are also a few things I find really irritating:

  1. In many of the exercises the code isn’t plain text but rather images; this makes it impossible to copy the code and paste it into a REPL.
  2. In some cases the layout of the code in the exercises, especially in the answer options, were very badly laid out. For instance in FP101x there are several question where the answer options are code and rather wordy, due to the layout the code becomes unnecessarily difficult to read. (And thanks to the previous issue with the site, it was not an option to copy the code and reformat it to make it easier to read.)

Regular Haskelling. How?

Ever since ICFP 2014 I’ve had as a goal to get into the habit of coding in Haskell. It’s been the language I enjoy most for a few years now, but being surrounded by and talking to so many brilliant developers as I did during that week really drove home that I will only have more fun the more code I write. My goal was not very ambitious; just write something in Haskell most days every week. So far I’ve managed to keep it up.

These are a few tricks I’ve used and they’ve worked well for me so far.

Just write, no matter what, just write

In ninth grade a rather successful Swedish author visited my school and what I remember most from that is one thing he said:

Just read! It doesn’t matter what. It doesn’t matter if what you read isn’t considered good literature; read Harlequin books if that’s what you like, read magazines, read comics. Just read!

I think the same holds for writing code; it’s only with practice that one gets comfortable expressing oneself in a particular language.

Fix warts

I can’t actually think of any piece of code I’ve written that doesn’t have some warts. It may be in the form of missing features, or quirks (bugs) in the implementation that forces the user to regularly work in a less-than-optimal way. I’ve found fixing warts in tools and libraries I use myself to be one of the most rewarding tasks to take on; the feedback is so immediate that every fix cause a boost in motivation to fix the next one.

Exercise sites

Sometimes it’s simply difficult to find the motivation to tackle working on an existing project, and inspiration for starting something new might be lacking too. This happens to me regularly, and I used to simply close the lid on the computer earlier, but now I try to find some exercises to do instead.

There are several sources of exercises. I know Project Euler is rather popular among new Haskellers, but there are others.

  • CodeEval is a site with problems in three different levels. It may be extra interesting for people in the US since some of the problems are sponsored by companies which seem to use the site as a place for recruiting. So far I’ve only seen American companies do that, but I suppose it might catch on in other parts of the world too. Haskell is one of several languages supported.
  • Exercism is both a site and a tool. The goal is to facilitate learning of languages. On first use the tool will download the first exercise, and after completion one uses it to upload the solution to the site. Once uploaded the solution is visible to other users, and they are allowed to “nitpick” (comment). After uploading a solution to one exercise the next exercise in the series becomes available. It supports a rather long list programming languages.

I like both of these, but I’ve spent more time on the latter one. Personally I find the idea behind Exercism very appealing and I’ve been recommending it to a couple of co-workers already.

Feel free to put links to other sources of exercises in the comments.

Simplify old code

With more practice comes more and more insights into what functions are available and how to string them together. When I don’t even feel like doing a full exercise on Exercism I just dig out something that smells a little and clean it up. Anything is fair game, no matter how tiny. Just take a look at my implementation of reThrowError.

What else?

I’d love to hear tips and tricks from other people who aren’t lucky enough to have a day job where they get to write Haskell. How do you keep up the learning and practice?

Dijkstra quotes from EWD1284

I recently read through this long post entitles Object Oriented Programming is an expensive disaster which must end. I have to agree I largely agree with what he writes, but I don’t think I ever could have written such a well-researched article, and absolutely never one of equal length ;)

It does include some nice quotes and references and so far I’ve only read one of the many that I bookmarked, Computing Science: Achievements and Challenges (EWD1284). It does include a few quips that, based on other articles I’ve read, seem fairly typical to Dijkstra. I simply love the way he expressed his opinions at times.

This one really ought to have been in the lengthy post on the OOP disaster:

After more than 45 years in the field, I am still convinced that in computing, elegance is not a dispensable luxury but a quality that decides between success and failure; in this connection I gratefully quote from The Concise Oxford Dictionary a definition of “elegant”, viz. “ingeniously simple and effective”. Amen. (For those who have wondered: I don’t think object-oriented programming is a structuring paradigm that meets my standards of elegance.)

And his thoughts on formal methods are well-known of course, as are his thoughts on iterative design. However, I rather like how he expresses a certain level of disgust of the Anglo-Saxon world when writing about those topics:

The idea of a formal design discipline is often rejected on account of vague cultural/philosophical condemnations such as “stifling creativity”; this is more pronounced in the Anglo-Saxon world where a romantic vision of “the humanities” in fact idealizes technical incompetence. Another aspect of that same trait is the cult of iterative design.

It amazes me every time I read something by someone like Dijkstra, just how much things stay the same, even in a field like computing science, which is said to be moving so fast.