Timing out in conduit

The previous post in my play-with-state-machines-and-conduit series was a bit of a detour into how to combine inputs in conduit. In this post I’ll try to wrap it up by making a conduit pipeline with a timeout.

Prelude

This post doesn’t reflect my first attempt at writing a state machine that times out. My first attempt was to actually write a state machine that timed out. I combined the input from the user with a tick generated once a second into a single event using Either. Each state of the machine then needed to keep track of the number of ticks it had seen in order to time out on seeing the fourth one. The counter was reset on each state transition. Obviously this led to a rather complicated state machine.

Then I decided to attempt to explore having more but simpler parts in the conduit pipeline. I think that strategy resulted in a much cleaner and simpler design.

A state machine for timing out

The state machine for the addition is unchanged from the earlier post and the timing out is added via a second state machine:

data TOState = TOGood Int | TOTimedOut
    deriving (Eq, Show)

data TOSignal a = TONoop | TOVal a | TOTime

toMachine :: Machine TOState (Either t a) (TOSignal a)
toMachine = createMachine (TOGood 0) go
    where
        go (TOGood _) (Right e) = (TOGood 0, TOVal e)

        go (TOGood n) (Left _)
            | n > 3 = (TOTimedOut, TOTime)
            | otherwise = (TOGood (n + 1), TONoop)

        go TOTimedOut _ = (TOTimedOut, TOTime)

It’s a tiny machine of only two states. The ticks come in via Left _ values and cause the counter to be increased each time. The Right e values are simply re-wrapped and passed on together with a resetting of the counter.

The process

The process is built up by combing the users input with a ticker source, the combination is passed through the time-out machine. After that the signal is inspected to see if it’s time to terminate, if not the inner value is extracted so it can be shown to the user.

The ticker source is a source that repeatedly waits followed by yielding a Tick:

sourceTick :: MonadIO m => Int -> Source m Tick
sourceTick w = forever $ do
    liftIO $ threadDelay w
    C.yield (Tick w)

The full process can then be written like this:

process :: ResourceT IO ()
process = ticker source $= timerOuter $= calcRes $$ output
    where
        source = stdinC $= concatMapC unpack $= calculator $= mapC Right

        calculator = wrapMachine calcMachine

        ticker s = whyTMVar s tickSource =$= wrapMachine toMachine
            where
                tickSource = sourceTick 1000000 $= mapC Left

        timerOuter = takeWhileC p1 =$= filterC p2 =$= mapC u
            where
                p1 TOTime = False
                p1 _ = True

                p2 (TOVal _) = True
                p2 _ = False

                u (TOVal a) = a
                u _ = undefined -- partial, but all right

        calcRes = mapC f
            where
                f CalcNothing = ""
                f (CalcResult n) = " =" ++ show n ++ "\n"
                f e@(CalcError {}) = "\n*** " ++ show e ++ "\n"

        output = mapC pack =$ stdoutC

Using the same definition of main as from the earlier post results in the following behaviour:

% runhaskell CalculatorTimeOut.hs
5 6 + =11
5 6 7
*** CalcError (ReadOperator 5 6) "Bad operator"
42 17 + =59

The process can be terminated by either pressing Ctrl-C, just like before, or by waiting for a time-out, like I did above.

Combining inputs in conduit

The code the post on using my simplistic state machine together with conduit will happily wait forever for input, and the only way to terminate it is pressing Ctrl-C. In this series I’m writing a simple adder machine, but my actual use case for these ideas are protocols for communication (see the first post), and waiting forever isn’t such a good thing; I want my machine to time out if input doesn’t arrive in a timely fashion.

I can think of a few ways to achieve this:

  1. Use a watchdog thread that is signalled from the main thread, e.g. by a Conduit that sends a signal as each Char passes through it.
  2. As each Char is read kick off a “timeout thread” which throws an exception back to the main thread, unless the main thread kills it before the timeout expires.
  3. Run a thread that creates ticks that then can be combined with the Chars read and fed into the state machine itself.

Since this is all about state machines I’ve opted for option 3. Furthermore, since I’ve recently finished reading the excellent Parallel and Concurrent Programming in Haskell I decided to attempt writing the conduit code myself instead of using something like stm-conduit.

The idea is to write two functions:

  1. one Source to combine two Sources, and
  2. one Sink that writes its input into a TMVar.

The latter of the two is the easiest one. Given a TMVar it just awaits input, stores it in the TMVar and then calls itself:

sinkTMVar :: MonadIO m => TMVar a -> Sink a m ()
sinkTMVar tmv = forever $ do
    v <- await
    case v of
        Nothing -> return ()
        Just v' -> liftIO (atomically $ putTMVar tmv v')

The other one is only slightly more involved:

whyTMVar :: MonadIO m => Source (ResourceT IO) a -> Source (ResourceT IO) a -> Source m a
whyTMVar src1 src2 = do
    t1 <- liftIO newEmptyTMVarIO 
    t2 <- liftIO newEmptyTMVarIO
    void $ liftIO $ async $ fstProc t1
    void $ liftIO $ async $ sndProc t2
    forever $ liftIO (atomically $ takeTMVar t1 `orElse` takeTMVar t2) >>= C.yield
    
    where
        fstProc t = runResourceT $ src1 $$ sinkTMVar t
        sndProc t = runResourceT $ src2 $$ sinkTMVar t

Rather short and sweet I think. However, there are a few things that I’m not completely sure of yet.

forkIO vs. async vs. resourceForkIO
There is a choice between at least three functions when it comes to creating the threads and I haven’t looked into which one is better to use. AFAIU there may be issues around exception handling and with resources. For now I’ve gone with async for no particular reason at all.
Using TMVar
In this example the input arrives rather slowly, which means having room for a single piece at a time is enough. If the use case changes and input arrives quicker then this decision has to be revisited. I’d probably choose to use stm-conduit in that case since it uses TMChan internally.
Combining only two Sources
Again, this use case doesn’t call for more than two Sources, at least at this point. If the need arises for using more Sources I’ll switch to stm-conduit since it already defines a function to combine a list of Sources.

The next step will be to modify the conduit process and the state machine.

Thought on JavaScript

Over the holidays I’ve been reading Douglas Crockford’s excellent book JavaScript: The Good Parts. About halfway through I came to the conclusion that JavaScript is an “anti-LISP”. There are many reasons to learn LISP, but none of them is “LISP is widely used in industry.” As Eric Raymond is famous words claim, knowing LISP will make you a better programmer. On the other hand there seems to be almost no reasons to learn JavaScript. It sure doesn’t seem to teach anything that’ll make you a better programmer. The only reason I can come up with is “JavaScript is widely used in industry.”

State machine with conduit

Previously I wrote about a simple state machine in Haskell and then offered a simple example, an adder. As I wrote in the initial post I’ve been using this to implement communication protocols. That means IO, and for that I’ve been using conduit. In this post I thought I’d show how easy it is to wrap up the adder in conduit, read input from stdin and produce the result on stdout.

At the top level is a function composing three parts:

  1. input from stdin
  2. processing in the adder
  3. output on stdout

This can be written as this

process :: ResourceT IO ()
process = source $= calc $$ output
    where
        source = stdinC

        output = stdoutC

        calc = concatMapC unpack =$= wrapMachine calcMachine =$= filterC f =$= mapC (pack . show)
            where
                f (CalcResult _) = True
                f _ = False

The input to the adder is ordinary Chars so the ByteStrings generated by stdinC need to be converted before being handed off to the state machine, which is wrapped to make it a Conduit (see below). The state machine is generating more signals that I want printed, so I filter out everything but the CalcResults. Then it’s passed to show and turned into a ByteString before sent to stdoutC.

I think the implementation of wrapMachine shows off the simplicity of conduit rather well:

wrapMachine :: (Monad m) => Machine state event signal -> Conduit event m signal
wrapMachine mach = do
    maybeEvent <- await
    case maybeEvent of
        Nothing -> return ()
        Just event -> let
                (newMach, signals) = stepMachine mach event
            in do
                yield signals
                wrapMachine newMach

The main function is as short as this:

main :: IO ()
main = do
    hSetBuffering stdin NoBuffering
    hSetBuffering stdout NoBuffering
    runResourceT process

The result is about as exciting as expected

% runhaskell Calculator.hs
56 42 +CalcResult 98
17 56 +CalcResult 73

It never exits on its own so one has to use Ctrl-C to terminate. I thought I’d try to address that in the next post by adding a time-out to the adder machine.

Adder as a state machine

In my previous post I wrote about a, probably rather naïve, approach to constructing state machines in Haskell. I ended it with a saying that the pattern matching of Haskell makes it rather simple to manually write the step function required to create a working state machines. Hopefully this post can convince you I’m right.

The vehicle I’ve chosen is a very simple machine capable of reading two integers and adding them. In slightly more detail it’s a machine that:

  1. reads a total of three parts, two integers followed by a plus sign (+)
  2. each part is separated by whitespace
  3. on errors the machine resets and starts over

The signals (a.k.a. the output alphabet) is the following type

data CalcSignal = CalcNothing | CalcResult Int | CalcError CalcStates String
    deriving (Eq, Show)

The events (a.k.a. the input alphabet) is simply Char. The states are

data CalcStates = Start | ReadFirst Int | ReadSecond Int Int | ReadOperator Int Int
    deriving (Eq, Show)

where the names hopefully are self explanatory. The type of the machine is then

type CalcMachine = Machine CalcStates Char CalcSignal

The machine itself can then be written like this:

calcMachine :: CalcMachine
calcMachine = createMachine Start go
    where
        go Start e
            | isNumber e = (ReadFirst (read [e]), CalcNothing)
            | otherwise = (Start, CalcError Start "No number")

        go s@(ReadFirst i) e
            | isNumber e = (ReadFirst (10 * i + read [e]), CalcNothing)
            | isSpace e = (ReadSecond i 0, CalcNothing)
            | otherwise = (Start, CalcError s "Bad format")

        go s@(ReadSecond l i) e
            | isNumber e = (ReadSecond l (10 * i + read [e]), CalcNothing)
            | isSpace e = (ReadOperator l i, CalcNothing)
            | otherwise = (Start, CalcError s "Bad format")

        go s@(ReadOperator i j) e
            | e == '+' = (Start, CalcResult (i + j))
            | isSpace e = (s, CalcNothing)
            | otherwise = (Start, CalcError s "Bad operator")

That’s rather simple and easy to read I find. Though I’m not sure it scales too well to much larger machines. I’ve not really used any DSLs to create large machines either, so I don’t know how well any method scales ;)

To do a bit of exploratory testing it’s handy to create the following function

calculate :: String -> IO ()
calculate = foldM_ go calcMachine
    where 
        go mach c = do
            let (m, s) = stepMachine mach c
            print s 
            return m

Using that function it’s easy to check if the machine works as intended.

> calculate "56 67 +"
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcResult 123

So far so good. What about the behaviour on errors?

> calculate "5a6 67 +"
CalcNothing
CalcError (ReadFirst 5) "Bad format"
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcResult 73

That looks good enough to me. Though there is (at least) one detail of how the machine works that might be surprising and hence should be fixed, but I’ll leave that as an exercise for the reader ;)

As I mentioned in the previous post I’ve been using this method for writing state machines to implement two different protocols. For the IO I used conduit which means I had to turn the state machine into a conduit. I’ll write about that in a later post though.

Simple state machines in Haskell

During this past autumn I’ve had to work with products implementing two small and simple protocols, one was USB based, the other completely proprietary. In both cases I found it very helpful to write clients where the central behaviour was modelled as a state machine. In the case of the proprietary protocol I’ve ended up writing a couple of servers implementing a subset of the behaviour of the real product, also here I found using a state machine as the central piece to be very useful.

I started out having a look at [Hackage][] to see if there was already some tool or DSL that would help me create machines. Surprisingly I found nothing, which probably just means I didn’t look carefully enough. Anyway, after reading the Wikipedia articles on Finite-state machine, finite state transducer, Moore machine and Mealy machine I decided to write my own naïve implementation of something resembling Mealy machines.

A Mealy machine is a 6-tuple comprising

  • a finite set of states
  • a start state
  • a finite set called the input alphabet
  • a finite set called the output alphabet
  • a transition function mapping a state and an input symbol to the next state
  • an output function mapping a state and an input symbol to an output symbol

If the transition and output functions are combined into a single function, and we don’t care about being able to reset a machine, it’s possible to use a representation as simple as this:

data Machine state event signal = Machine
    { mCurState :: state
    , mTransFunction :: state -> event -> (state, signal)
    }

I’ve opted to call the input alphabet event and the output alphabet signal.

Stepping a machine is then performed by passing the current state and an event to the combined function and updating the machine with the result. Of course the generated signal has to be dealt with too. Or in other words:

stepMachine :: Machine state event signal -> event -> (Machine state event signal, signal)
stepMachine machine event = (machine {mCurState = newState}, output)
    where
        curState = mCurState machine
        (newState, output) = mTransFunction machine curState event

I also decided to add a function for creating a machine given a start state and a function. With the definition above it becomes trivial:

createMachine :: state -> (state -> event -> (state, signal)) -> Machine state event signal
createMachine = Machine

That’s it, really. Except of course that the actual transition function still has to be written. However, with the pattern matching of Haskell I’ve found that to be rather simple, but I’ll keep that for the next post.

Adding some strictness

I’ve finally found some time to solve a few more exercises on exercim.io. One exercise, called Robot Simulation, I solved in a straight forward way and promptly got the feedback that my choice of data type could lead to space leaks. Since it’s a dimension of Haskell programming that I rarely (read “never”) have had to think about I thought I’d take the chance to play a bit.

A simplified solution

I don’t want to give away the solution to the exercise so I’m using some code of a similar shape.

import Data.List

newtype P = P (Int, Int)
    deriving (Eq, Show)

step :: P -> Char -> P
step (P c) 'A' = P (c `add` (0, 1))
step _ _ = undefined

add :: (Num a, Num b) => (a, b) -> (a, b) -> (a, b)
add (x0, y0) (x1, y1) = (x0 + x1, y0 + y1)

p :: P
p = foldl' step (P (0, 0)) $ replicate 5000000 'A'

main :: IO ()
main = print p

There’s nothing complicated about the code, but the memory usage is rather ridiculous:

   1,132,746,080 bytes allocated in the heap
   1,149,823,448 bytes copied during GC
     346,747,064 bytes maximum residency (11 sample(s))
       3,817,608 bytes maximum slop
             600 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2148 colls,     0 par    0.58s    0.58s     0.0003s    0.0006s
  Gen  1        11 colls,     0 par    0.56s    0.56s     0.0510s    0.2278s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.25s  (  0.25s elapsed)
  GC      time    1.15s  (  1.14s elapsed)
  EXIT    time    0.02s  (  0.02s elapsed)
  Total   time    1.42s  (  1.41s elapsed)

  %GC     time      80.9%  (81.0% elapsed)

  Alloc rate    4,593,597,274 bytes per MUT second

  Productivity  19.0% of total user, 19.1% of total elapsed

Surely it ought to be possible to add 5 million numbers in less than 600MB, right?

The reason, AFAIU is that tuples are lazy, which means a lot of thunk building when doing the additions.

Using a stricter data type

The maybe easiest way to add some strictness is to add a few exclamations marks in the data type. Modifying the data type brings a few other changes though, so here’s first a version without any strictness annotations so I have something to compare with later on1:

{-# LANGUAGE RecordWildCards #-}
import Data.List

data P = P { pX :: Int, pY :: Int }
    deriving (Eq, Show)

p :: P
p = foldl' step (P 0 0) $ replicate 5000000 'A'

step :: P -> Char -> P
step q@(P {..}) 'A' = q { pX = pX +1 }
step _ _ = undefined

main :: IO ()
main = print p

This solution uses about half as much memory:

     766,417,952 bytes allocated in the heap
     639,064,600 bytes copied during GC
     155,866,064 bytes maximum residency (11 sample(s))
       1,914,016 bytes maximum slop
             345 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1460 colls,     0 par    0.32s    0.32s     0.0002s    0.0006s
  Gen  1        11 colls,     0 par    0.33s    0.33s     0.0303s    0.1071s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.15s  (  0.15s elapsed)
  GC      time    0.65s  (  0.65s elapsed)
  EXIT    time    0.01s  (  0.01s elapsed)
  Total   time    0.81s  (  0.81s elapsed)

  %GC     time      80.2%  (80.2% elapsed)

  Alloc rate    5,212,863,895 bytes per MUT second

  Productivity  19.8% of total user, 19.8% of total elapsed

Halving the memory usage is rather nice, but adding some strictness ought to make it better still. The code with added exclamation marks:

{-# LANGUAGE RecordWildCards #-}
import Data.List

data P = P { pX :: !Int, pY :: !Int }
    deriving (Eq, Show)

p :: P
p = foldl' step (P 0 0) $ replicate 5000000 'A'

step :: P -> Char -> P
step q@(P {..}) 'A' = q { pX = pX +1 }
step _ _ = undefined

main :: IO ()
main = print p

The runtime behaviour of this code is remarkably more frugal with memory:

     480,054,816 bytes allocated in the heap
          57,968 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       918 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.09s  (  0.09s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.09s  (  0.09s elapsed)

  %GC     time       2.4%  (2.4% elapsed)

  Alloc rate    5,630,441,644 bytes per MUT second

  Productivity  97.3% of total user, 98.2% of total elapsed

Indeed, 1MB is a bit more like it!

Strictness with the original type

However impressive the results when using a stricter data type, it might still be preferable to keep the original, somewhat simpler, type. It is possible to do just that by placing some explicit strictness in well chosen places. Since my suspicion is that it’s the addition in step that causes the space leak we’ll rewrite it:

import Data.List

import Control.DeepSeq

newtype P = P (Int, Int)
    deriving (Eq, Show)

add :: (Num a, Num b) => (a, b) -> (a, b) -> (a, b)
add (x0, y0) (x1, y1) = (x0 + x1, y0 + y1)

p :: P
p = foldl' step (P (0, 0)) $ replicate 5000000 'A'

step :: P -> Char -> P
step (P c) 'A' = let np = c `add` (0, 1) in np `deepseq` P np
step _ _ = undefined

main :: IO ()
main = print p

Note that it’s not enough to use seq; a bit more force is needed, which is what deepseq offers. It delivers sure enough:

     920,052,632 bytes allocated in the heap
         198,736 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1774 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.29s  (  0.29s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.30s  (  0.30s elapsed)

  %GC     time       1.4%  (1.4% elapsed)

  Alloc rate    3,124,303,717 bytes per MUT second

  Productivity  98.5% of total user, 98.9% of total elapsed

A further slight change is possible: implement NFData for P itself. That way it’s not necessary to pull out and force evaluation of the innards of P. Using the default implementation of rnf did not cut it though, since it uses seq it just isn’t forceful enough:

import Data.List

import Control.DeepSeq

newtype P = P (Int, Int)
    deriving (Eq, Show)

instance NFData P where
    rnf (P q) = deepseq q ()

add :: (Num a, Num b) => (a, b) -> (a, b) -> (a, b)
add (x0, y0) (x1, y1) = (x0 + x1, y0 + y1)

p :: P
p = foldl' step (P (0, 0)) $ replicate 5000000 'A'

step :: P -> Char -> P
step (P c) 'A' = let np = P (c `add` (0, 1)) in force np
step _ _ = undefined

main :: IO ()
main = print p

The result is close to the previous:

     920,052,632 bytes allocated in the heap
         198,736 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1774 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.33s  (  0.33s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.33s  (  0.33s elapsed)

  %GC     time       1.2%  (1.2% elapsed)

  Alloc rate    2,817,175,186 bytes per MUT second

  Productivity  98.7% of total user, 99.0% of total elapsed

It looks like it’s slightly slower though, which might be caused by the double call to deepseq in this version.

Conclusion

From this little example it’s clear that a little strictness in well-chosen places can go a long way in reducing memory footprint. The various solutions also show that the results can vary quite a bit depending on the strategy for introducing strictness. In this particular case I think the stricter data type (see Using a stricter data type above) is the best choice, it’s both faster and uses less memory than any of the others.


  1. I’m using RecordWildCards to simplify the code a bit, which possibly might influence evaluation as well. I’m simply not sure.

FP101x completed

TL;DR

As of last evening I finished the last exercise/homework of the course FP101x Introduction to Functional Programming. I mostly signed up for fun, but did intend all along to go through with it, watch the lectures, do the homework… in short, take the course ;)

Now that it’s done I’m happy I did. It gave about as much as I expected on the programming side, and it gave me some experience with using the edX MOOC. I will suggest the course to colleagues and I’m already looking around for the next course to take myself.

Thoughts about the course

The course was, on the whole, very good. It was well planned; I felt it started from a level suitable for people unfamiliar with FP and then progressed at a reasonable pace. It should be noted though that the lectures probably isn’t enough for someone who’s new to FP; reading the book or researching the subjects on one’s own is required to follow along.

There were a few things in the course that I don’t really see the point of though. The thing that stands out in particular is the 12th lecture that covers proofs by induction of functions that work on lists. I think it would have been worthwhile spending some time on making more “realistic programs.” Furthermore, picking the “Countdown Problem” as the topic of an entire lecture feels very “academic”. Personally I would have liked an example that comes a bit closer to a problem that better reflects that FP can be used to solve problems that occur in a professional developer’s work.

Thoughts about the edX MOOC

The edX site is fairly nice to work with. There are several things that I liked from the beginning:

  1. Several methods of logging in are offered, I opted to use my Google-plus account since I like to avoid creating yet another account if I can.
  2. Interacting with the site is natural and finding my way around was easy from the start.
  3. The page I land on after logging in shows my current courses and it’s easy to from there.
  4. There’s a page showing my progress in the course; that turned out to be the page I go to all the time since I can find the next lecture from there.

There are also a few things I find really irritating:

  1. In many of the exercises the code isn’t plain text but rather images; this makes it impossible to copy the code and paste it into a REPL.
  2. In some cases the layout of the code in the exercises, especially in the answer options, were very badly laid out. For instance in FP101x there are several question where the answer options are code and rather wordy, due to the layout the code becomes unnecessarily difficult to read. (And thanks to the previous issue with the site, it was not an option to copy the code and reformat it to make it easier to read.)

Regular Haskelling. How?

Ever since ICFP 2014 I’ve had as a goal to get into the habit of coding in Haskell. It’s been the language I enjoy most for a few years now, but being surrounded by and talking to so many brilliant developers as I did during that week really drove home that I will only have more fun the more code I write. My goal was not very ambitious; just write something in Haskell most days every week. So far I’ve managed to keep it up.

These are a few tricks I’ve used and they’ve worked well for me so far.

Just write, no matter what, just write

In ninth grade a rather successful Swedish author visited my school and what I remember most from that is one thing he said:

Just read! It doesn’t matter what. It doesn’t matter if what you read isn’t considered good literature; read Harlequin books if that’s what you like, read magazines, read comics. Just read!

I think the same holds for writing code; it’s only with practice that one gets comfortable expressing oneself in a particular language.

Fix warts

I can’t actually think of any piece of code I’ve written that doesn’t have some warts. It may be in the form of missing features, or quirks (bugs) in the implementation that forces the user to regularly work in a less-than-optimal way. I’ve found fixing warts in tools and libraries I use myself to be one of the most rewarding tasks to take on; the feedback is so immediate that every fix cause a boost in motivation to fix the next one.

Exercise sites

Sometimes it’s simply difficult to find the motivation to tackle working on an existing project, and inspiration for starting something new might be lacking too. This happens to me regularly, and I used to simply close the lid on the computer earlier, but now I try to find some exercises to do instead.

There are several sources of exercises. I know Project Euler is rather popular among new Haskellers, but there are others.

  • CodeEval is a site with problems in three different levels. It may be extra interesting for people in the US since some of the problems are sponsored by companies which seem to use the site as a place for recruiting. So far I’ve only seen American companies do that, but I suppose it might catch on in other parts of the world too. Haskell is one of several languages supported.
  • Exercism is both a site and a tool. The goal is to facilitate learning of languages. On first use the tool will download the first exercise, and after completion one uses it to upload the solution to the site. Once uploaded the solution is visible to other users, and they are allowed to “nitpick” (comment). After uploading a solution to one exercise the next exercise in the series becomes available. It supports a rather long list programming languages.

I like both of these, but I’ve spent more time on the latter one. Personally I find the idea behind Exercism very appealing and I’ve been recommending it to a couple of co-workers already.

Feel free to put links to other sources of exercises in the comments.

Simplify old code

With more practice comes more and more insights into what functions are available and how to string them together. When I don’t even feel like doing a full exercise on Exercism I just dig out something that smells a little and clean it up. Anything is fair game, no matter how tiny. Just take a look at my implementation of reThrowError.

What else?

I’d love to hear tips and tricks from other people who aren’t lucky enough to have a day job where they get to write Haskell. How do you keep up the learning and practice?

Dijkstra quotes from EWD1284

I recently read through this long post entitles Object Oriented Programming is an expensive disaster which must end. I have to agree I largely agree with what he writes, but I don’t think I ever could have written such a well-researched article, and absolutely never one of equal length ;)

It does include some nice quotes and references and so far I’ve only read one of the many that I bookmarked, Computing Science: Achievements and Challenges (EWD1284). It does include a few quips that, based on other articles I’ve read, seem fairly typical to Dijkstra. I simply love the way he expressed his opinions at times.

This one really ought to have been in the lengthy post on the OOP disaster:

After more than 45 years in the field, I am still convinced that in computing, elegance is not a dispensable luxury but a quality that decides between success and failure; in this connection I gratefully quote from The Concise Oxford Dictionary a definition of “elegant”, viz. “ingeniously simple and effective”. Amen. (For those who have wondered: I don’t think object-oriented programming is a structuring paradigm that meets my standards of elegance.)

And his thoughts on formal methods are well-known of course, as are his thoughts on iterative design. However, I rather like how he expresses a certain level of disgust of the Anglo-Saxon world when writing about those topics:

The idea of a formal design discipline is often rejected on account of vague cultural/philosophical condemnations such as “stifling creativity”; this is more pronounced in the Anglo-Saxon world where a romantic vision of “the humanities” in fact idealizes technical incompetence. Another aspect of that same trait is the cult of iterative design.

It amazes me every time I read something by someone like Dijkstra, just how much things stay the same, even in a field like computing science, which is said to be moving so fast.

Optparse-applicative and custom argument parsers

The latest update of optparse-applicative triggered me to look over the functions in cblrepo for parsing a few custom command line options. I used to do the parsing in a rather ad-hoc way with lots of use of list functions to split on specific characters. For instance, some option values are pairs of package name and version separated by a comma: PKG-NAME,VERSION. This worked fine and was easy to plug into version 0.10 of optparse-applicative. It was also easily extended to triples, PKG-NAME,VERSION,RELEASE, but it started feeling a bit brittle when some tuples got extended with an optional list of flag assignments, PKG-NAME,VERSION[:FLAG,FLAG,FLAG,...]. The recent release of version 0.11 of optparse-applicative changed the API for custom option value parsers radically; instead of passing a string to the parser, the parser has to use readerAsk to get the string. In short, ReaderM turned into a state monad.

In adjusting to the new API I noticed that the code was organised in such a way that some low-level parsing functions were used directly from command line option definitions, while also being used as building blocks for the more complex parsers. This of course meant that the structuring of the functions needed to be changed completely to deal with the API change.

It turns out there already was a parser that was written in a different style (here already adjusted to the 0.11 API):

readerGhcVersion :: ReadM Version
readerGhcVersion =
    arg <- readerAsk
    case lastMay $ readP_to_S parseVersion arg of
        Just (v, "") -> return v
        _ -> fail $ "cannot parse value `" ++ arg ++ "`"

So I rewrote the rest of the parsers in a similar style. The arguably most complicated is this one:

readPkgNVersion :: ReadP (String, Version)
readPkgNVersion = do
    n <- many (satisfy (/= ','))
    char ','
    v <- parseVersion
    return (n, v)

readFlag :: ReadP (FlagName, Bool)
readFlag = readNegFlag <++ readPosFlag
    where
        readNegFlag = do
            char '-'
            n <- many (satisfy (/= ','))
            return (FlagName n, False)

        readPosFlag = do
            n0 <- get
            n <- many (satisfy (/= ','))
            return (FlagName (n0 : n), True)

strCblPkgArgReader :: ReadM (String, Version, FlagAssignment)
strCblPkgArgReader = let
        readWithFlags = do
            (n, v) <- readPkgNVersion
            char ':'
            fas <- sepBy readFlag (char ',')
            return (n, v, fas)

        readWithoutFlags = do
            (n, v) <- readPkgNVersion
            return (n, v, [])

    in do
        s <- readerAsk
        case lastMay (readP_to_S (readWithFlags <++ readWithoutFlags) s) of
            Just (r, "") -> return r
            _ -> fail $ "Cannot parse: " ++ s

It is slightly longer, but it’s rather a lot easier to read what’s happening after this rewrite. ReadP feels like a lighter option than pulling in parsec as a dependency, but I’d love to hear any comments or suggestions, as well as pointers to how other people deal with parsing of non-trivial types of arguments in combination with optparse-applicative.

Script for migrating from WordPress to Hakyll

As I wrote about in a previous post on converting post from WordPress to Hakyll I couldn’t quite find a tool that met my needs out of the box. I had a look at the source of hakyll-convert but it uses a library for RSS parsing, which sounds good. However, the export of posts from WordPress is in an extended RSS format, among other things it contains the comments of posts. Unfortunately it doesn’t look like the RSS library supports the WordPress extensions, so modifying hakyll-convert to also extract comments seems like a bit more work than I’d like to put into it. Especially since I had a feeling that hacking up something using tagsoup would be quite a bit faster.

I put the resulting script in gist on github. I call it bbwp, which is short for Bye, bye, WordPress.

Adding tags

Adding tags to a Hakyll site brought some surprises. In retrospect it all makes sense, but it took some thinking on my part to work out the why of it all. The resulting code was heavily influenced by Erik Kronberg’s site.

First I thought I’d just add tags to each rendered post, by building the tags

tags <- buildTags "posts/*" (fromCapture "tags/*.html")

then adding them to the post context

let postCtx =
        field "previousPostUrl" (previousPostUrl "posts/*") <>
        field "previousPostTitle" (previousPostTitle "posts/*") <>
        field "nextPostUrl" (nextPostUrl "posts/*") <>
        field "nextPostTitle" (nextPostTitle "posts/*") <>
        field "postId" getPostId <>
        tagsField "tags" tags <>
        listFieldFunc "comments" defaultContext (getComments "comments/*") <>
        baseCtx

and last modify the template

<p>Tags: $tags$</p>

Easy! Except it doesn’t work that way. The $tags$ is always empty. To actually get the tagsField to work as intended it’s necessary to build the tag pages, which can be accomplished using tagsRules

tagsRules tags $ \ tagStr pattern -> do
    route idRoute
    compile $ do
        posts <- loadAll pattern >>= recentFirst
        let tagsCtx =
                constField "thetag" tagStr <>
                listField "posts" baseCtx (return posts) <>
                baseCtx
        makeItem ""
            >>= loadAndApplyTemplate "templates/tag-post-list.html" tagsCtx
            >>= loadAndApplyTemplate "templates/default.html" tagsCtx
            >>= relativizeUrls

The template for the tags pages is very simple at the moment

<h1>Posts tagged $thetag$</h1>
<ul>
  $for(posts)$
  <li>
    <a href="$url$">$title$</a> - $date$
  </li>
  $endfor$
</ul>

That’s it. With that in place the $tags$ field renders properly in the post pages as well.

Adding support for comments

It seems most people using Hakyll or other static site generators rely on services like Disqus, but I really don’t like the idea of putting a bunch of JavaScript on each page and dynamically loading all comments off some cloud storage. It sorts of fly in the face of the idea of having a static site to begin with. Searching online resulted in a few posts related to a plugin for static comments in Jekyll.

This post only covers dealing with the comments, and not how the reader actually submits a comment. I’ll save that for the next post.

Code changes

I settled on the following naming scheme for comments. The comments for a post P, which is found at posts/<P>.mkd will be put into files named comments/<P>-c000.mkd, comments/<P>-c001.mkd, and so on. The crucial bits are that, first the post’s name is a prefix of all its comments’ names, and two the identifiers (basically the filenames) of the comments are, just like identifiers for posts, easy to sort in date order.

Adding a rule for the comments is easy:

match "comments/*" $ compile pandocCompiler

Then it got a little more tricky. The comments for each post needs to be put into the context used to build the posts. Previously I’ve used field, which takes a function turning an Item String into String. I’ve also used listField which is used to tie a key to a list of Item a. What I needed here though doesn’t seem to exist, i.e. a context function that takes an Item a and returns a list of Item s. So after a bit of studying the source of field and listField I came up with listFieldFunc:

listFieldFunc :: Show a => String -> Context a -> (Item a -> Compiler [Item a]) -> Context a
listFieldFunc key ctx func = Context $ \ k i -> if k == key then value i else empty
    where
        value i = do
            is <- func i
            return $ ListField ctx is

The function for extracting a post’s comments can then be written as

getComments :: (Binary a, Typeable a) => Pattern -> Item a -> Compiler [Item a]
getComments pattern item = do
    idents <- getMatches pattern >>= sortChronological
    let iId = itemIdentifier item
        comments = filter (isCommentForPost iId) idents
    mapM load comments

isCommentForPost :: Identifier -> Identifier -> Bool
isCommentForPost post comment = let
        postBase = takeBaseName $ toFilePath post
        cmtBase = takeBaseName $ toFilePath comment
    in isPrefixOf postBase cmtBase

Adding the key to the context used for the posts results in

let postCtx =
        field "previousPostUrl" (previousPostUrl "posts/*") <>
        field "previousPostTitle" (previousPostTitle "posts/*") <>
        field "nextPostUrl" (nextPostUrl "posts/*") <>
        field "nextPostTitle" (nextPostTitle "posts/*") <>
        field "postId" getPostId <>
        listFieldFunc "comments" defaultContext (getComments "comments/*") <>
        baseCtx

Template changes

The template changes are trivial of course

$for(comments)$
<div>
<p>$author$</p>
$body$
</div>
$endfor$

Links to previous and next post

Currently the landing page contains all the posts I’ve written so far. That works for now, but it won’t for very long if I keep on writing posts. It certainly won’t work when I get around to importing all the posts from my old blog. Clearly I need to limit the posts that appear in index.html at some point, but before that two things need to be in place:

  1. The post titles on the landing page should be links to the single-post page.
  2. The single-post pages should be linked together with links to the previous and next posts.

I found another hakyll-based site that already implemented such previous-next links, Richard Goulter’s blog. He even has a post about its implemantation. It was a rather straigth-forward implementation and it served well as inspiration.

Code changes

I want the links to have the post titles, not just general text like “previous post” and “next post”. That meant putting a total of four values into the context used to render the individual post pages, two URLs and two titles. That means postCtx will look like this:

let postCtx =
        field "previousPostUrl" (previousPostUrl "posts/*") <>
        field "previousPostTitle" (previousPostTitle "posts/*") <>
        field "nextPostUrl" (nextPostUrl "posts/*") <>
        field "nextPostTitle" (nextPostTitle "posts/*") <>
        baseCtx

In implementing the functions to extract the URLs and titles I first tried working on a snapshot but that only resulted in circular dependencies, so I had to follow Richard’s lead and used Identifier to drive the whole thing. When done with functions for extracting both URL and title there was an obvious pattern so I generalised a little and refactored into a single function to do the grunt work:

withRelatedPost:: (MonadMetadata m, Alternative m) =>
    (Identifier -> [Identifier] -> Maybe t) -> (t -> m b) -> Pattern -> Item a -> m b
withRelatedPost r f pattern item = do
    idents <- getMatches pattern >>= sortRecentFirst
    let id = itemIdentifier item
        prevId = r id idents
    case prevId of
        Just i -> f i
        Nothing -> empty

The r argument is a function that given the ID of the current item, and a list of the IDs of all items (sorted with the most recent first) returns the ID of a single item. With that it’s possible to create two more functions, one that operates on the previous post and one on the next post:

withPreviousPost :: (MonadMetadata m, Alternative m) => (Identifier -> m b) -> Pattern -> Item a -> m b
withPreviousPost = withRelatedPost itemAfter
    where
        itemAfter x xs = lookup x $ zip xs (tail xs)

withNextPost :: (MonadMetadata m, Alternative m) => (Identifier -> m b) -> Pattern -> Item a -> m b
withNextPost = withRelatedPost itemBefore
    where
        itemBefore x xs = lookup x $ zip (tail xs) xs

Now getting the URLs and titles becomes a few single-line functions:

previousPostUrl :: Pattern -> Item String -> Compiler String
previousPostUrl = withPreviousPost (fmap (maybe empty toUrl) . getRoute)

previousPostTitle :: Pattern -> Item String -> Compiler String
previousPostTitle = withPreviousPost (\ i -> getMetadataField' i "title")

nextPostUrl :: Pattern -> Item String -> Compiler String
nextPostUrl = withNextPost (fmap (maybe empty toUrl) . getRoute)

nextPostTitle :: Pattern -> Item String -> Compiler String
nextPostTitle = withNextPost (flip getMetadataField' "title")

What’s left is putting the fields to use in a template. The template itself is in the following section. The compilation step is modified to use the new context, and the new template is put in after the snapshot so the added links don’t appear in the index.html.

pandocCompiler
    >>= loadAndApplyTemplate "templates/single-post.html" postCtx
    >>= saveSnapshot "posts-content"
    >>= loadAndApplyTemplate "templates/single-post-prev-next.html" postCtx
    >>= loadAndApplyTemplate "templates/default.html" postCtx
    >>= relativizeUrls

Template changes

The added template is short and rather short and obvious:

$body$

<div>
$if(previousPostUrl)$
<a href="$previousPostUrl$">&#10232; $previousPostTitle$</a>
$endif$

$if(nextPostUrl)$
<a href="$nextPostUrl$">$nextPostTitle$ &#10233;</a>
$endif$
</div>

Converting posts from WordPress

I’ve found two tools for converting posts from a WordPress site to Hakyll[ (or to Jekyll too I suppose):

I ran both tools on an export of my posts (a tool included in WordPress), and both tools spat out one file per post. So far it looked good. Then I put all the posts into my Hakyll blog project and tried to build the site.

hakyll-convert

  • The conversion finished without any reported errors.
  • The individual files were rather badly named, the name of each was based on the post ID rather than on the post date (but there’s a bug report for that).
  • That posts were consumed by the build script without problems.
  • The resulting HTML was not satisfactory, but that’s not due to the tool, instead it’s my choice of using GeSHi (via WP-Syntax).

exitwp

  • The conversion finished with a few reported errors, which I didn’t investigate further.
  • The posts were not consumed by the build script due to categories and tags not being comma separated, but rather a Yaml list.

The conclusion is that hakyll-convert will be what I use, but it’ll take a little while before I get around to importing my old posts since it’ll require manual edits to ensure they look all right.

Dipping toes into CSS, normalizing

The site doesn’t look that good. Actually it is pathetically simple. I’m however not that good at CSS, I also don’t really have a good sense of design, so making it look good is going to be an up-hill battle. The first step is easy though, add a standard CSS file to normalize the look.

The CSS

I grabbed normalize.css from the web site.

Template changes

Well, obviously the CSS has to be pulled into the web pages. The full webpages are provided by templates/default.html and the line that needs adding is

<link rel="stylesheet" type="text/css" href="/css/normalize.css" />

Of course it goes into the <head> section of the file.

Code changes

The changes to the build script are equally straight forward.

    match "css/*" $ do
        route idRoute
        compile copyFileCompiler

I opted to just copy the file rather than compress it usign compressCssCompiler. I don’t think the speedup is really worth it, and I personally find it very handy to be able to read the CSS files on sites that I think look nice. Of course I need to enable others to do the same.

Adding a feed

A blog isn’t complete without a feed, so that was the first thing to add. Luckily there’s a good tutorial on adding a feed in Hakyll. The following code snippets are basically just copied from that tutorial.

I’ve also decided to start publishing the result; it’s available on my github pages.

RSS or atom?

I decided to use atom since it’s a standard. Yeah, not more to write about that I suppose.

Code changes

The first thing to do was to add a feed configuration. It’s a simple adaption on what’s found in the tutorial.

postFeedConfig :: FeedConfiguration
postFeedConfig = FeedConfiguration
    { feedTitle = "Magnus web site"
    , feedDescription = "Random stuff"
    , feedAuthorName = "Magnus Therning"
    , feedAuthorEmail = "magnus@therning.org"
    , feedRoot = "http://magthe.github.io"
    }

Then the build logic has to be extended with a rule for making atom.xml. This is also a straight forward adaptation of the information found in the tutorial.

    create ["atom.xml"] $ do
        route idRoute
        compile $ do
            posts <- fmap (take 50) . recentFirst =<< loadAllSnapshots "posts/*" "posts-content"
            let feedCtx = baseCtx <> bodyField "description"
            renderAtom postFeedConfig feedCtx posts

Once again the snapshot of the posts comes in handy. Since the old WordPress site limited the feed to the 50 latest posts I decided to do that in the new one too. Maybe it’s a bit excessive, 10-20 ought to be enough, but I’ll leave it for now. The feed-specific context is a little nice details, also from the tutorial. The feed builder requires the presence of a description for each post, but to avoid having to remember to add one to all posts I just add a description field containing the body of the post.

The generated feed is available at ./atom.xml

The current Hakyll build script

I’m fairly sure I understand the current build script, and it seems to be rather minimal. Hopefully it can serve as an introductory for someone who, just like me, is new to Hakyll.

The site layout

Before getting into the build script it’s worth describing the folder layout of the project, it looks like this:

.
├── build_site.hs
├── index.html
├── posts
│   ├── 2014-09-23-000-moving-to-hakyll.mkd
│   └── 2014-09-23-001-hakyll-build-script.mkd
└── templates
    ├── default.html
    └── single-post.html

The templates

single-post.html
A template for, as the name suggests, a single post. It renders the post into an HTML snippet.
default.html
The main template, i.e. the one that provides the entire structure of a complete HTML page.

It should also be noted that index.html basically is a template itself. These three files fit together such that each post is turned into an HTML snippet (single-post.html), all the snippets are then pulled into index.html, which finally is wrapped into a proper page by default.html.

The script

The full script looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#! /usr/bin/runhaskell

{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid
import Hakyll

main :: IO ()
main = hakyll $ do
    match "posts/*" $ do
        route $ setExtension "html"
        compile $ pandocCompiler
            >>= loadAndApplyTemplate "templates/single-post.html" baseCtx
            >>= saveSnapshot "posts-content"
            >>= loadAndApplyTemplate "templates/default.html" baseCtx
            >>= relativizeUrls

    match "index.html" $ do
        route idRoute
        compile $ do
            posts <- recentFirst =<< loadAllSnapshots "posts/*" "posts-content"
            let indexCtx =
                    listField "posts" baseCtx (return posts) <>
                    baseCtx
            getResourceBody
                >>= applyAsTemplate indexCtx
                >>= loadAndApplyTemplate "templates/default.html" indexCtx
                >>= relativizeUrls

    match "templates/*" $ compile templateCompiler

baseCtx :: Context String
baseCtx =
    dateField "date" "%Y-%m-%d" <>
    constField "site-title" "Magnus web site" <>
    constField "site-subtitle" "Random stuff" <>
    constField "author" "Magnus Therning" <>
    defaultContext

The only slightly unnecessary thing is that all posts are turned into complete web pages (line 14), but none of those files is actually reachable from the generated landing page. My plan is to limit the number of posts on the landing page so they will be used later on.

For the moment some site constants are put into the base context. I’m not completely convinced that’s the wise thing to do, but I have a vague feeling that it’s better putting stuff like that into the context than hard code them into the template. I guess the better solution would be to have a configuration file for them though. That’s for the future though, for now I’m keeping it simple.

Moving to Hakyll

I’ve decided to ditch WordPress and switch to Hakyll. I’ll keep the old site intact until I’ve gotten around to import all the posts. That’ll probably take quite a while.

As you can see the site is terribly bare bones at the moment. It’ll stay like this until I work out enough of Hakyll to get most of the items into the site. There’s simply no point in getting distracted with the craziness that’s CSS and making it pretty until I’m happy with the content handling.

More Visitor (in Python)

Right after writing the previous post on the Visitor pattern in Python I picked up another paper on the same topic, Visitor Combination and Traversal Control. Of course this one also used Java for its examples, so once again I decided to use Python to explore the ideas presented.

The first part of the paper is all about writing small visitors that then are combined into more complicated ones. This part is nice but not that exciting. The interesting bit is when it gets to controlling traversal, which means it’s possible to remove the traversal code that usually appear in the accept method each visited type has to implement. Let’s see how that can look in Python.

The full code in this post is available at https://gist.github.com/magthe/beddad5c627946f28748.

First we need a structure to traverse, a simple tree will do.

class Tree(Visitable):
    def __init__(self, left, right):
        self.left = left
        self.right = right

class Leaf(Visitable):
    def __init__(self, value):
        self.value = value

def build_tree():
    l0 = Leaf(0)
    l1 = Leaf(1)
    t0 = Tree(l0, l1)
    l2 = Leaf(2)
    t1 = Tree(t0, l2)
    l3 = Leaf(3)
    l4 = Leaf(4)
    t2 = Tree(l3, l4)
    return Tree(t1, t2)

But before this we really should define Visitor, the base class for visitors, and Visitable, the base class of everything that can be visited.

class Visitor:
    def visit(self, obj):
        getattr(self, 'visit_' + obj.__class__.__name__)(obj)

    def visit_Tree(self, t):
        pass

    def visit_Leaf(self, l):
        pass

class Visitable:
    def accept(self, visitor):
        visitor.visit(self)

We’ll throw in a visitor for printing the whole tree too:

class Print(Visitor):
    def visit_Tree(self, t):
        print('Tree (%s)' % hex(id(t)))

    def visit_Leaf(self, l):
        print('Leaf (%s): %i' % (hex(id(l)), l.value))

Due to the lack of traversal in the accept methods it’s easy to be underwhelmed by the Print visitor:

In [32]: build_tree().accept(Print())
Tree (0x7f1680681a90)

To address this we first need a visitor combinator that runs two visitors in sequence. Unsurprisingly we’ll call it Sequence. Its constructor takes two visitors, and for each node in the tree it passed each one to the node.accept method.

class Sequence(Visitor):
    def __init__(self, first, then):
        self.first = first
        self.then = then

    def visit_Tree(self, t):
        t.accept(self.first)
        t.accept(self.then)

    def visit_Leaf(self, l):
        l.accept(self.first)
        l.accept(self.then)

The next building block is a visitor that descends one level down from a Tree node.

class All(Visitor):
    def __init__(self, v):
        self.v = v

    def visit_Tree(self, t):
        t.left.accept(self.v)
        t.right.accept(self.v)

At this point it’s worth noting that the name All probably isn’t very well chosen, since we don’t really get all nodes:

In [33]: build_tree().accept(All(Print()))
Tree (0x7f1680681278)
Tree (0x7f1680681be0)

We only descend one level, but we still keep the name since that’s the name they use in the paper.

With this in place it does become possible to create combinations that do traverse the full tree though. It even becomes rather simple. Traversing top-down is a simple matter of using a sequence that ends with All, and bottom-up is a matter of using a sequence starting with All.

class TopDown(Sequence):
    def __init__(self, v):
        Sequence.__init__(self, v, All(self))


class BottomUp(Sequence):
    def __init__(self, v):
        Sequence.__init__(self, All(self), v)

First top-down:

In [34]: build_tree().accept(TopDown(Print()))
Tree (0x7f1680681ef0)
Tree (0x7f16806814a8)
Tree (0x7f16806813c8)
Leaf (0x7f1680681278): 0
Leaf (0x7f1680681550): 1
Leaf (0x7f1680681a90): 2
Tree (0x7f1680681f28)
Leaf (0x7f1680681ba8): 3
Leaf (0x7f1680681a20): 4

Then bottom-up:

In [35]: build_tree().accept(BottomUp(Print()))
Leaf (0x7f1680681ba8): 0
Leaf (0x7f16806814a8): 1
Tree (0x7f1680681a90)
Leaf (0x7f16806813c8): 2
Tree (0x7f1680681550)
Leaf (0x7f1680681278): 3
Leaf (0x7f16806a1048): 4
Tree (0x7f16806a1198)
Tree (0x7f16806a1390)

That’s all rather cute I think.

Dealing with Microsoft Products, or Battling Loss of Gumption

I was slightly frustrated and irritated with a situation at work today, which caused me to think about the word “gumption” as it’s used in Pirsig’s Zen and the Art of Motorcycle Maintenance. That led me to Wikipedia’s article on gumption trap which in turn led me to learn about the concept of learned helplessness.

So, what was the situation and how is it connected to learned helplessness?

The rest is just slightly tongue-in-cheek ;)

What to standardise on

I’m in situation where the powers-that-be have standardised on applications. Not on open formats or open protocols, but on specific applications that use proprietary formats and proprietary protocols. Of course these applications suck. That’s what a lack of competition does, it removes any will for a company to actually make improvements to their applications! Some of these applications have captured such a large market share that reverse engineering of the formats was inevitable. Yay! That means I can use a sane OS and vastly better applications. However, one protocol is not reverse engineered yet and I’m forced to use the standard application. This application is painful to use and only runs on a crap OS.

How bad can it be? you ask. The application is Outlook, the OS is Windows! Yes! It’s that bad. Hence the thoughts of gumption, or rather the loss of it. Which is exactly what starting Outlook causes. Every time!

Connection to learned helplessness

It continues to amaze me that companies standardise on Windows and applications that only run on Windows. There are better alternatives, especially in this day and age with fast networks and powerful and fast execution environments that completely sidestep the whole question of which OS to run. Still there seems to be very little will to upgrade to Linux, or to standardise on web-based applications. Why is that? In the past I’ve thought it might be the network effect. Most often I’ve come to the conclusion that it most likely is simple inertia. What’s the explanation for the inertia though?

This is where learned helplessness can offer an explanation. People have been conditioned and have grown so used to Windows and other Microsoft products that they simply don’t recognise that there now is a way out. No matter how many escape routes that become avilable people simply won’t see them.

What to do about it

As the experiments on dogs showed there is hope (from the wikipedia page):

To change their expectation and to recover the dogs from helplessness, experimenters had to physically pick up the dogs and move the legs in a close replication of the physical actions the dogs needed to take to remove themselves from the electrified grid. This had to be replicated at least 2 times before the dogs would exhibit the functional response of jumping over the barrier to get away from the electrified grid. Threats, rewards, and observed demonstrations had no observed effect in helping the dogs to independently move away from the shocks.

Oh how I whish I could pull off the direct translation to my work place: re-install my co-workers computers and replace servers and services. Too bad that’s not a realistic plan. What I can do though is civil disobedience (or maybe it should be called something like civil disobedience in the workplace instead). By simply not conforming and at the same time showing that there are better ways of getting the job done others will hopefully notice and either adopt my way, or come up with something that suits them better (which I then can learn from). Even if that doesn’t happen at least I’ll keep my gumption at healthy levels :)

What I’m doing at the moment

This is what I’m doing at work right now to avoid loss of gumption:

  • Use Linux as my main OS.
  • Run Windows in a VM.
  • Use pandoc to generate MSWord docs.
  • Use LibreOffice.

Finally, for Outlook. The decision of the powers-that-be to disable IMAP forces me to:

  • Limit my mail reading to twice per day.
  • Be logged into Skype to make up for not reading mail more often.

Visitor and Walkabout (in Python)

A couple of weeks ago I found a link to a stackoverflow question I’d saved away a long time ago. I’d saved it due to having asked myself the exact same question and being curious about the papers when I found that answer. Over the last few weeks I’ve made my way through those papers and as so often I found a couple of references to other papers that sound interesting. One such paper was The Essence of the Visitor Pattern by Jens Palsberg and C. Barry Jay. There was however one little thing that bugged me with the Walkabout pattern and I thought I’d try to work that out for myself. Of course I’m using Python rather than Java ;)

The full code can be found at https://gist.github.com/magthe/ad6e23fb560a8a494fd2

Visitor

The Visitor pattern separates the traversal and the operation. It does this by using an accept method in the classes making up the data structure, this method takes a Visitor instance which implements the operation to be carried out. This enables adding new operations without modifying the data structure.

First we need a simple structure to play with: a tree where each node can have zero or more sub-trees.

class Tree:
    def __init__(self, value, children=[]):
        self.value = value
        self.children = children

The implementation of accept for this type is rather obvious:

    def accept(self, visitor):
        visitor.visit(self)
        for c in self.children:
            c.accept(visitor)

Next we need an implemention of a Visitor that all visitors can derive from. Since Python’s dispatch doesn’t depend at all on the type of the argument we’ll have to implement the necessary behaviour ourselves, i.e. inspect the type and then pick the correct method to call.

class Visitor:
    def visit(self, obj):
        func_name = 'visit_' + obj.__class__.__name__
        visit_func = getattr(self, func_name)
        visit_func(obj)

In order to visit a Tree type it also needs the appropriately named method:

    def visit_Tree(self, obj):
        pass

Now it’s easy to create a visitor. Here’s a very simple one:

class TreePrintVisitor(Visitor):
    def visit_Tree(self, obj):
        print('Tree (%s): %i' % (hex(id(obj)), obj.value))

Finally here’s a function that exercises what we’ve just put together:

def test_visitor():
    leaves = [Tree(42), Tree(17)]
    tree = Tree(1, leaves)
    printer = TreePrintVisitor()
    tree.accept(printer)

Looking at this it’s easy to see the objections Palsberg and Jay present in their paper:

  1. A data type is only ‘visitable’ if it has an accept method, and
  2. we must know the types of all objects we want to visit, so changes to the class structure requires changes to the visitors.

The authors then introduce Walkabout in order to remove these limitations.

Walkabout

To remove these limitations the authors use reflection to find if the visitor has a method to carry out the operation on the type of the current object. If such a method doesn’t exist they use reflection to find all members of the object and then visits them. The Walkbout class and its visit method then looks something like this:

class Walkabout:
    def visit(self, obj):
        func_name = 'visit_%s' % obj.__class__.__name__
        if hasattr(self, func_name):
            visit_func = getattr(self, func_name)
            visit_func(obj)
        elif hasattr(obj, '__dict__'):
            for m in obj.__dict__.keys():
                self.visit(getattr(obj, m))

The accept method can be removed from Tree and the visitor is changed to include code to continue the traversal:

class TreePrintWalkabout(Walkabout):
    def visit_Tree(self, tree):
        print('Tree (%s): %i' % (hex(id(tree)), tree.value))
        for c in tree.children:
            self.visit(c)

The function for exercising this only changes slightly:

def test_walkabout():
    leaves = [Tree(42), Tree(17)]
    tree = Tree(1, leaves)
    printer = TreePrintWalkabout()
    printer.visit(tree)

This is where Palsberg and Jay stop, but I think there is one little detail in Walkabout that’s worth looking a little closer at.

Walkabout Two

I personally find it a little strange that the authors first note that the visitor pattern suffers from the limitation that one has to know up front the full set of types to visit, then they don’t recognise that their own solution to this instead require one to know the shape of (parts of) the structure to operate on. In other words, the classes deriving from Walkabout are in charge of carrying on the traversal (the last two lines of visit_Tree above).

This little detail is of course easy to work around, we just modify visit to always visit all members irrespective of whether there is a method to handle the current object. There may be cases where we want to stop the walking about for efficiency reasons, we can address that at the same time and WalkaboutTwo will then look like this:

class WalkaboutTwo:
    def visit(self, obj):
        func_name = 'visit_%s' % obj.__class__.__name__
        descend = True
        if hasattr(self, func_name):
            visit_func = getattr(self, func_name)
            descend = visit_func(obj)
        if descend and hasattr(obj, '__dict__'):
            for m in obj.__dict__.keys():
                self.visit(getattr(obj, m))
        if descend and hasattr(obj, '__iter__'):
            for o in obj:
                self.visit(o)

One little detail above is that if we use only reflection to traverse the Tree we won’t actually find the sub-trees as immediate members since they are contained in a list. We address this by checking whether the object has an iterator as well, and if it does we visit all items.

2014-06-12 Fixed minor typo, thanks to Joe for pointing it out.

Localised configuration in Vim: localcfg

For quite a while I’ve been using a small Vim plugin that lets me write configuration that is specific to a system, it loaded a config file based on the system’s host name. Unfortunately I can’t seem to find that plugin anywhere now, so I’ve put it in a snippet. This allowed me to easily create a single clean Vim configuration and check it into version control, while still allowing for settings that are unique to a system.

Lately I’ve found it slightly limited though, I really wanted to use other things to trigger the loading of some piece of configuration. So I wrote my first ever Vim plugin: localcfg

Hopefully someone will find it useful.

Phabricator on ArchLinux

At work we’ve been using Trac for quite a while now, but it’s always interesting to look at other options. When listening to a recent episode of git Minutes on Blender’s move to using git I heard of Phabricator for the first time. There are good instructions for how to install it on Ubuntu, but I couldn’t find any for ArchLinux.

These are my notes for installing Phabricator on ArchLinux. Hopefully someone will find them useful.

Basic setup

I performed this install in a virtual machine (using VirtualBox). The virtual machine is configured with a bridged network and I gave it the FQDN of phabarch.vbox.net.

Packages

Beyond the packages installed as part of the basic install I needed the following packages:

  • lighttpd
  • git
  • mariadb
  • php
  • php-cgi
  • php-apcu
  • php-gd

Setup of mariadb

Following the instructions on the ArchLinux wiki page on mariadb is first started the service and then finished the installation:

# systemctl start mysqld.service
# mysql_secure_installation

After this I restarted the service, and made sure it’ll restart on system start:

# systemctl restart mysqld.service
# systemctl enable mysqld.service

I picked the root password mariaroot.

Setup of lighttpd

Modify /etc/lighttpd/lighttp.conf to look something like this:

server.modules = (
  "mod_rewrite",
  "mod_fastcgi",
)

server.port = 80
server.username  = "http"
server.groupname = "http"
server.document-root = "/srv/http"
server.errorlog  = "/var/log/lighttpd/error.log"
dir-listing.activate = "enable"
index-file.names = ( "index.php", "index.html" )
static-file.exclude-extensions = ( ".php" )
mimetype.assign  = (
  ".html" => "text/html",
  ".txt" => "text/plain",
  ".css" => "text/css",
  ".js" => "application/x-javascript",
  ".jpg" => "image/jpeg",
  ".jpeg" => "image/jpeg",
  ".gif" => "image/gif",
  ".png" => "image/png",
  "" => "application/octet-stream"
  )

fastcgi.server += ( ".php" =>
  ((
    "bin-path" => "/usr/bin/php-cgi",
    "socket" => "/var/run/lighttpd/php.socket",
    "max-procs" => 1,
    "bin-environment" => (
      "PHP_FCGI_CHILDREN" => "4",
      "PHP_FCGI_MAX_REQUESTS" => "10000",
    ),
    "bin-copy-environment" => (
      "PATH", "SHELL", "USER",
    ),
    "broken-scriptfilename" => "enable",
  ))
)

$HTTP["host"] =~ "phabarch(\.vbox\.net)?" {
  server.document-root = "/srv/http/phabricator/webroot"
  url.rewrite-once = (
    "^(/rsrc/.*)$" => "$1",
    "^(/favicon.ico)$" => "$1",
    # This simulates QSA (query string append) mode in Apache
    "^(/[^?]*)\?(.*)" => "/index.php?__path__=$1&$2",
    "^(/.*)$" => "/index.php?__path__=$1",
  )
}

Setup of php

Modify /etc/php/php.ini and enable the following extensions:

  • mysqli.so
  • openssl.so
  • iconv.so
  • apcu.so
  • gd.so
  • posix.so

Also disable the open_basedir setting.

Getting and setting up Phabricator

Checking it out

I placed it in /srv/http:

# cd /srv/http
# git clone git://github.com/facebook/libphutil.git
# git clone git://github.com/facebook/arcanist.git
# git clone git://github.com/facebook/phabricator.git

Database configuration

# cd /srv/http/phabricator
# ./bin/storage upgrade --user root --password mariaroot
# ./bin/config set mysql.user root
# ./bin/config set mysql.pass mariaroot

Set the base URI

# cd /srv/http/phabricator
# ./bin/config set phabricator.base-uri 'http://phabarch.vbox.net/'

Diffusion configuration

# mkdir /var/repo
# ./bin/config set diffusion.allow-http-auth true

At this point I started lighttpd and used the web interface to configure environment.append-paths to include the path of git-core, /usr/lib/git-core.

phd configuration

First create the daemon user

# useradd -r -M -d /tmp phabd

Then create the phd log dir and set its owner and group:

# mkdir /var/tmp/phd/log
# chown -R phabd:phabd /var/tmp/phd/

Also make the daemon user the owner of the repo folder used by Diffusion:

# chown phabd:phabd /var/repo

Configure sudo

Create the file /etc/sudoers.d/phabricator with this content:

http ALL=(phabd) SETENV: NOPASSWD: /usr/lib/git-core/git-http-backend

User configuration

In the web interface set a VCS password for the user.

Play

Now the system should be ready to be played with :)

How do professional Windows programmers stand Visual Studio?

I have a new assignment at work and now find myself at yet another Windows shop. They are making embedded systems, but have for some strange reason decided that Windows is the only development platform to use. After only a few weeks here I’m noting a growing irritation with the tools offered for my use. The amount of forced mouse usage is astounding and the main tool, Visual Studio, is turning out to be the main culprit. After a week or so of exposure to VS I’ve found what I consider to be a serious flaw with a tool for developers: it doesn’t scale.

  1. No hierarchical structure It doesn’t scale very well with the size of a project. The concept of having a solution with a number of projects is not bad. But the implementation doesn’t scale. A project can’t have sub-projects, which means I can’t really layer the code in the IDE in the same way I do on disk. The only thing I can do is organise viewing of files through the concept of filters.
  2. All configuration is manual, part 1 MS seems to have optimised for small projects. All configuration is kept in XML files, and the only interface to them is a set of property dialogues (some which can be resized, others not) requiring an amazing amount of pointing and clicking to get anything done.
  3. All configuration is manual, part 2 MS have optimised for beginning new projects. Getting started is amazingly quick, but once you reach a size of about 10 projects it becomes daunting to fix anything that requires configuration in all projects. Making sure that all configurations are correct is a major undertaking, and requires an insane amount using the mouse. Some earlier versions of VS seem to even have made it impossible to edit the common settings of configurations properly; a simple mistake and the value of one configuration is lost.
  4. Experience There are no shortcuts to discover. The configuration is all in XML, which means it’s not really possible to jump out of VS and use a text editor for fixing up semi-broken configurations (like dealing with someone’s decision to place all intermediate files and final results in the top-level solution directory).

So, how do Windows developers cope with this? Don’t they use VS? Do they police the configurations diligently to ensure no silliness creeps in? Or are they all using tools to address these points (like CMake)?

TCLAP for command line argument parsing in C++

A long while ago I was looking for a way to handle command line arguments in a C++ program I was writing for Windows. At the time I only found Boost.Program_options. After a bit of experimenting I found that the pre-built Boost libs I found back then had some issues and after a bit of thinking I decided to write the program in Python instead :) Now I once again find that I need to handle command line arguments in C++ on Windows, but this time I found quite a few options, gflags, ezOptionParser, TCLAP. They are all stand-alone, meaning that I don’t need to pull in a huge dependency like Boost or Qt, and liberally licensed (BSD3 and MIT) so usage in at work is no problem. After a bit of playing with all three I found that TCLAP is most to my liking, but gflags does have one nice little feature–it allows putting command line option definitions pretty much anywhere in the code. This would solve one of the problems I’m facing; the tool shall consist of a main program and pluggable modules, where each module must be able to add command line arguments of its own. However, it turns out that the TCLAP API easily allows implementing such a scheme. Here’s how I implemented it in the experiment code I’ve been playing with this weekend.

How it works

The basic idea is to create a central instance of TCLAP::CmdLine that can be used to register options with in the program and the pluggable modules. By declaring the option instances as top-level variables in compilation units it will be enough to just load a pluggable to the constructors run, and by passing the central TCLAP::CmdLine to the constructors they will register themselves properly. This seems to be same idea used in gflags.

Registering a command line option

The following code is the code my pluggable module used to register an option:

1
2
3
4
5
6
7
TCLAP::ValueArg<int> value("v",
                           "value",
                           "ONE: the value to return",
                           false,
                           42,
                           "integer",
                           argparse::argparser);

The parser

I put the parser in its own namespace, and due to the API of TCLAP::CmdLine I found that I needed to subclass it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <tclap/CmdLine.h>

namespace argparse {

class ArgParser : public TCLAP::CmdLine {
public:
    ArgParser() : TCLAP::CmdLine("No message set.", ' ', "No version set.") {}

    void setMessage(std::string msg) { _message = msg; }
    void setVersion(std::string version) { _version = version; }
};

extern ArgParser argparser;

}

The main function

After this I just need to instantiate the central parser instance and set it up.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
argparse::ArgParser argparse::argparser;

int main(int ac, char **av)
{
    argparse::argparser.setMessage("The message.");
    argparse::argparser.setVersion("0.0.0");

    // load the the plugin modules

    argparse::argparser.parse(ac, av);

    // do the work

    return 0;
}

Limitations

The main limitation I can see with this approach, and AFAICS that would be true with gflags as well, is that it’s not very easy to find the plugin modules to load by passing them on the command line. If the TCLAP API allowed parsing without acting on --help and --version, as well as be forgiving with arguments it didn’t know how to handle, it would be possible to use an option in the main application to find the modules, load them, and then re-parse the command line. In my particular case it doesn’t matter much, the plugin modules will all be found in a well-known place and all available modules should be loaded every time. It does make testing a bit more cumbersome though.

Eclipse and greyed out #ifdef sections

A note to the future me:

Are your #ifdef sections greyed out despite switching to a profile where the macro is set?

Read this bug comment!

Strachey, referential transparency, Haskell

This is my input into the recent discussion on referential transparency (RT). I’m nowhere near as well versed in the subject as others, but how am I ever to learn anything unless I put my thoughts out there for them to be laughed at and ridiculed? ;)

It all started with a [post on stackoverflow.com](http://stackoverflow.com/questions/210835/what-is-referential-transparency, which received several very long and detailed responses, in particular from Uday Reddy (here and here. His answers were also linked to from Reddit. His second response contains a link to an excellent paper by Strachey, Fundamental concepts in programming languages. I’d go as far as saying that, despite it being lecture notes rather than a fully worked paper, it ought to be required reading for all software developers.

The rest of what I write here hinges on me actually understanding what Strachey writes in his paper. Of course I’m looking forward to comments/corrections/etc that help me correct my understanding.

What Strachey says about RT

In section 3.2.1 he introduces RT like this:

One of the most useful properties of expressions is that called by Quine referential transparency. In essence this means that if we wish to find the value of an expression which contains a sub-expression, the only thing we need to know about the sub-expression is its value. Any other features of the sub-expression, such as its internal structure, the number and nature of its components, the order in which they are evaluated or the colour of the ink in which they are written, are irrelevant to the value of the main expression.

There is however a crucial bit just before that:

Like the rest of mathematics, we shall be concerned only with R-values.

That is, he starts out with a very limited subset of what most people would consider a usable imperative programming language.

He then dives into some more details in section 3.2.2 by adding the concept of environment, which is handled through the use of a where-clause, or alternatively using let-statements (this ought to be making any Haskell developer feel right at home). After a few interesting sections on stuff like applicative structure, evaluation, and conditional expressions he finally tackles the issue of variables in section 3.3.1. There are two pieces to the trick, the first is to take advantage of his earlier insight that lead to a split of values into L-values and R-values:

If we consider L-values as well as R-values, however, we can preserve referential transparency as far as L-values are concerned. This is because L-values, being generalised addresses, are not altered by assignment commands. Thus the command x := x+1 leaves the address of the cell representing x (L-value of x) unchanged although it does alter the contents of this cell (R-value of x). So if we agree that the values concerned are all L-values, we can continue to use where-clauses and lambda-expressions for describing parts of a program which include assignments.

The cost of this is that the entire theory constructed earlier for operations taking R-values now has to be revised to incorporate L-values. The outline for this is in the rest of section 3.3 and it basically comes down to include an abstract store in the environment. However, before doing that he mentions that:

I think these problems are inevitable and although much of the work remains to be done, I feel hopeful that when completed it will not seem so formidable as it does at present, and that it will bring clarification to many areas of programming language study which are very obscure today. In particular the problems of side effects will, I hope, become more amenable.

He does reach his goal, but it’s a bit unfortunate that he stops short of considering the wider of problem of side effects. My assumption is that this would have to be dealt with in a similar way to assignment, but that would mean that rather than just adding an store to the environment the world, or a subset of it, would need to be added.

An open question (to me) is if anyone has built on Strachey’s work in this area and thought of the details of RT and general side effects?

RT in Haskell

The original question described RT as

it means you can replace equals with equals

which I actually think is a rather good, and very short, description of it. It’s not the full story, there are further important details, but it’s a good first intuition. Also, it’s a description usable in Haskell. Well, to be slightly more nuanced, it good for Haskell without IO (Haskell-IO). However, this is where the strict type system of Haskell really starts to shine because (here I’m probably a bit imprecise) we only have R-values in Haskell-IO. If we want to use assignment we add the use of a state monad, and we do that explicitly.

A former colleague of mine said that in Haskell we need to build up our own computational models ourselves. For instance, if we need assigment we use State, if we need to record progress we use Writer, etc. In other languages the language designer has already made all those choices for us, we don’t get to make them ourselves. For RT it means that Haskell is more explicit in what the environment of a function is.

Moving on to general side effects those are also more explicit in Haskell since they have to happen inside the IO monad. That alone is a great boon for RT in Haskell since it becomes explicit where RT as worked out by Strachey applies directly, and where there are (hopefully amenable) problems of side effects left. Even further, in Haskell it’s possible to make subsets of IO (by wrapping IO, see e.g. my own posts on wrapping IO, part 1 and wrappint IO, part 2. I’m sure that if including the world in the environment is the way to achieve RT with general side effects, then it’s highly beneficial to be able to create subsets of the world.

RT in Haskell vs. RT in (common) imperative languages

Uday writes in his first answer that:

But, today, functional programmers claim that imperative programming languages are not referentially transparent. Strachey would be turning in his grave.

This may well be true, but I think that when a Haskell programmer says it, he’s only twitching slightly. The reason? Strachey writes:

Any departure of R-value referential transparency in a R-value context should either be eliminated by decomposing the expression into several commands and simpler expressions, or, if this turns out to be difficult, the subject of a comment.

Which is something that Haskell programmers do naturally by use of IO. That is, in Haskell you either have an R-value, and you clearly see that you do, or you put in a comment, which is encoded in the type of the function.

This rather lengthy post basically arrives at the following, which is what I suspect the user [pacala is saying about RT on Reddit][reddit-pacala]:

Imperative languages my well be RT, but when trying to understand a code base the environment of each function is so large that understanding is an intractable problem. I don’t have this problem in Haskell.

Compiling boost for Windows, with MinGW on Linux

Just in case you see the utter logic in developing for Windows on Linux :)

In the root of the unpacked Boost:

  • Run ./bootstrap.sh --with-python=$(which python2) --prefix=${HOME}/opt/boost-win --without-icu
  • Modify project-config.jam like this:

     # file.
     if ! gcc in [ feature.values <toolset> ]
     {
    -    using gcc ;
    +    using gcc : : i486-mingw32-g++ ;
     }
    
     project : default-build <toolset>gcc ;
  • Compile and install by running ./bjam --layout=system variant=release threading=multi link=shared runtime-link=shared toolset=gcc target-os=windows threadapi=win32 install

Limit what is built by adding e.g. --with-program_options to that last command.

Qt state machines and automatic timed transitions

In the interest of full disclosure: this post is related to what I do for a living, development of and for embedded systems. I work for Semcon, but they don’t make me wear a suit and tie so these are my words, and mine alone.

A bit of background info

In a recent project we had a system where the turning of the wheels were controlled by a simple dial. It emitted pulses as it was turned and the pulse train was shifted slightly depending on the direction of the turn. In software this was mapped onto two signals, one for each direction, with one signal emitted for each pulse in the train. All very straight forward so far.

To avoid accidental change of direction we decided that

  1. only start turning the wheels after having received four initial signals, and
  2. if a full second without receiving any signal meant that the turning had stopped.

The solution

The application was to be implemented using Qt, so using the Qt state machine framework was an obvious choice. The full state machine wouldn’t have to be large, only 8 states. The initial state (sResting) would indicate that the system was in a steady state (no turning), from there any received signal would advance into a successive state (sOne, sTwo, sThree, sFour) to indicate the number of received signals. From the fourth state the machine would advance directly to a state (sTurning) where a received signal would initiate an actual turn of the wheels. The turning would happen upon the entry into two separate states (sTurnRight and sTurnLeft), each of these states would instantly return to sTurning. All of this is simple and straight forward, what wasn’t so clear was to implement the automatic return to the initial state after 1s of inactivity.

The implementation

As I like to do, I first experimented a little to find a suitable solution to the problem. What follows is the resulting code of that experiment. The final code used in the project ended up being very similar. It’s all based around the method postDelayedEvent() found in QStateMachine.

First off a new type of event is nedded, a ReturnEvent:

class ReturnEvent : public QEvent
{
public:
    ReturnEvent() : QEvent(QEvent::Type(QEvent::User + 1)) {}
};

There is also a need for a new type of transition, ReturnTransition:

class ReturnTransition : public QAbstractTransition
{
public:
    ReturnTransition(QState *target=0) { if(target) setTargetState(target); }

protected:
    virtual bool eventTest(QEvent *e) {
        return(e->type() == QEvent::Type(QEvent::User + 1));
    }

    virtual void onTransition(QEvent *) {}
};

For the experiment I decided to use a simple widget containing two buttons, it would also hold the state machine:

class MButtons : public QWidget
{
    Q_OBJECT;

public:
    MButtons(QStateMachine &m)
        : _right("Right"), _left("Left"),
        _m(m), _delayed(0) {
        QBoxLayout *lo = new QBoxLayout(QBoxLayout::TopToBottom);
        lo->addWidget(&_right);
        lo->addWidget(&_left);

        setLayout(lo);
    }
    virtual ~MButtons() {}

    QPushButton _right,
                _left;
    QStateMachine &_m;

The widget also holds the slots for all the state entry functions:

public slots:
    void sRestingEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
    }

    void sOneEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
        _delayed = _m.postDelayedEvent(new ReturnEvent, 1000);
    }

    void sTwoEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
        _delayed = _m.postDelayedEvent(new ReturnEvent, 1000);
    }
    void sThreeEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
        _delayed = _m.postDelayedEvent(new ReturnEvent, 1000);
    }
    void sFourEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
        _delayed = _m.postDelayedEvent(new ReturnEvent, 1000);
    }
    void sTurningEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
        _delayed = _m.postDelayedEvent(new ReturnEvent, 1000);
    }
    void sTurnRightEntered() {
        qDebug() << __PRETTY_FUNCTION__;
    }
    void sTurnLeftEntered() {
        qDebug() << __PRETTY_FUNCTION__;
    }

Sure, several of the entry functions could be folded into one, but in order to validate the idea it’s easier to make separate ones for each state. The pattern is easy to spot, on entry a delayed return event is registered (if there’s a previous one its replaced with a new), except for the steady state (sResting) where any delayed event is removed, and the turning states (sTurnRight and sTurnLeft) since those states immediately return to sTurning anyway.

Finally it also holds the handle for the delayed event:

private:
    int _delayed;
};

Now the main function for setting it all up is simple:

int main(int argc, char **argv)
{
    QApplication app(argc, argv);
    QStateMachine m;
    MButtons b(m);
    b.show();

    QState *sResting = new QState(),
           *sOne = new QState(),
           *sTwo = new QState(),
           *sThree = new QState(),
           *sFour = new QState(),
           *sTurning = new QState(),
           *sTurnRight = new QState(),
           *sTurnLeft = new QState();

    m.addState(sResting);
    m.addState(sOne);
    m.addState(sTwo);
    m.addState(sThree);
    m.addState(sFour);
    m.addState(sTurning);
    m.addState(sTurnRight);
    m.addState(sTurnLeft);
    m.setInitialState(sResting);

    sResting->addTransition(&b._right, SIGNAL(clicked()), sOne);
    sResting->addTransition(&b._left, SIGNAL(clicked()), sOne);
    sOne->addTransition(&b._right, SIGNAL(clicked()), sTwo);
    sOne->addTransition(&b._left, SIGNAL(clicked()), sTwo);
    sOne->addTransition(new ReturnTransition(sResting));
    sTwo->addTransition(&b._right, SIGNAL(clicked()), sThree);
    sTwo->addTransition(&b._left, SIGNAL(clicked()), sThree);
    sTwo->addTransition(new ReturnTransition(sResting));
    sThree->addTransition(&b._right, SIGNAL(clicked()), sFour);
    sThree->addTransition(&b._left, SIGNAL(clicked()), sFour);
    sThree->addTransition(new ReturnTransition(sResting));
    sFour->addTransition(sTurning);
    sTurning->addTransition(&b._right, SIGNAL(clicked()), sTurnRight);
    sTurning->addTransition(&b._left, SIGNAL(clicked()), sTurnLeft);
    sTurning->addTransition(new ReturnTransition(sResting));
    sTurnRight->addTransition(sTurning);
    sTurnLeft->addTransition(sTurning);

    QObject::connect(sResting, SIGNAL(entered()), &b, SLOT(sRestingEntered()));
    QObject::connect(sOne, SIGNAL(entered()), &b, SLOT(sOneEntered()));
    QObject::connect(sTwo, SIGNAL(entered()), &b, SLOT(sTwoEntered()));
    QObject::connect(sThree, SIGNAL(entered()), &b, SLOT(sThreeEntered()));
    QObject::connect(sFour, SIGNAL(entered()), &b, SLOT(sFourEntered()));
    QObject::connect(sTurning, SIGNAL(entered()), &b, SLOT(sTurningEntered()));
    QObject::connect(sTurnRight, SIGNAL(entered()), &b, SLOT(sTurnRightEntered()));
    QObject::connect(sTurnLeft, SIGNAL(entered()), &b, SLOT(sTurnLeftEntered()));

    m.start();

    return(app.exec());
}

Conclusion and open questions

I’m fairly happy with the solution, but I’d be curious how other people, people more skilled in using Qt, would have solved the problem.

For a while I considered solving the skipping of four initial signals using a single state and counter, but I saw no obvious easy way to implement that, so I instead opted to use separate states. Slightly wasteful of resources, but not too bad, and simplicity is important. I’m very curious to find out if there’s a simply way to implement it using a single state.

Manual setup of Qt+Eclipse on Windows

Before the weekend I started looking at using Qt on Windows. More specifically I wanted to know whether this combination could be an option for a sub-project at work. We need to develop a program for the Windows desktop, and due to the overall context it would make sense to write it in C++ (that’s what we use for another part of the project). We already use both Eclipse and Visual Studio in the project, but I strongly prefer Eclipse, so I was hoping to be able to use it. However, it seems that the Qt developers strongly favour their own tool Qt Creator, though there are (outdated?) integrators for both Eclipse and Visual Studio. I’d rather avoid introducing a third IDE into a project—two is already one too many in my opinion. Anyway, I think I managed to find an acceptable configuration of Eclipse without using that old Qt integration plugin together with the MSVC (I was using the gratis version of MSVC for this).

Qt setup

I decided to install Qt into C:\QtSDK, and then I made the following permanent changes to the environment:

> set QTDIR=C:\QtSDK\Desktop\Qt\4.8.0\msvc2010
> set QMAKESPEC=%QTDIR%\mkspecs\win32-msvc2010
> set PATH=%PATH%;%QTDIR%\bin;C:\QtSDK\QtCreator\bin

Starting Eclipse so that it finds the compiler

It’s slightly disappointing that Eclipse happily lets one create MSVC project that isn’t buildable because it doesn’t know where the compiler is located. One easy way to remedy that seems to create a BAT file to create the proper environment for Eclipse:

@echo off
setlocal
call "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\vcvarsall.bat"
start C:\Eclipse\Indigo\eclipse.exe
endlocal

Creating the project

Creating a “makefile” project in Eclipse is fairly straight forward; one needs a C/C++ project, of the makefile type, and make it empty too so that there isn’t any cruft in the way. Then add a single source file, e.g. main.cxx:

#include <iostream>
#include <Qt/QtGui>

int main(int argc, char **argv)
{
    std::cout << __FUNCTION__ << std::endl;
    QApplication app(argc, argv);
    return(app.exec());
}

And then a project file, e.g. Test.pro:

TEMPLATE = app
TARGET = 
DEPENDPATH += .
INCLUDEPATH += .

CONFIG += qt

HEADERS +=
SOURCES += main.cxx

After this use qmake to create the required makefile. I decided to use a subdirectory (_build) in the project, which qmake seems to have full support for:

> qmake ..\Test.pro

Setting up building from Eclipse

In the project properties modify the C/C++ Build settings for the Debug target. Instead of the default build command (which is make) one can use nmake, or even better jom:

  • Build command: C:/QtSDK/QTCreator/bin/jom -f Makefile.Debug
  • Build directory: ${workspace_loc:/Test}/_build

Then one can create a Release target, which differs only in that it builds using Makefile.Release.

Running qmake from inside Eclipse

It’s very convenient to be able to run qmake and re-generate the makefiles from inside Eclipse. One can set that up by adding an external tool:

  • Location: C:\QtSDK\Desktop\Qt\4.8.0\msvc2010\bin\qmake.exe
  • Working directory: ${workspace_loc:/Test}/_build
  • Arguments: ../Test.pro

In closing

I plan to also have a look at the Qt Visual Studio Add-in, though I suspect we might be using the latest version of VS, which might cause trouble.

Suggestions for further integration with Eclipse would be most welcome, e.g. for forms and translations.

LXDE and multiple screens: replacing lxrandr with a script

When using Gnome3 I was really impressed with the support for multiple screens. Then I switched to LXDE and was very disappointed in that desktop’s support for multiple screens. In fact so disappointed that I sat down and read the man-page for ‘randr’ and hacked up the following script:

#! /bin/bash

cmd=$1; shift

case $cmd in
    on)
        # turn on VGA1, auto-select size, right of laptop screen
        xrandr --output VGA1 --auto --right-of LVDS1
        ;;
    off)
        xrandr --output VGA1 --off
        ;;
    list)
        xrandr
        ;;
    *)
        echo "Commands: on, off, list"
esac

In my mind it’s vastly more usable than ‘lxrandr’ :)

0MQ and Haskell

Ever since I heard the FLOSS weekly episode on 0MQ I’ve been looking for a reason to take a look at it. Well, to hell with reason, I’ll have a first look without any specific goal in mind.

I found a simple introduction to it in Nicholas Piël’s post ZeroMQ an introduction. The only issue was that it was based on Python, and Python2 at that. So here are my attempts at translating two of the clients to Haskell (using zeromq-haskell).

req-rep

Here’s the client in Python3 first:

import zmq

ctx = zmq.Context()
socket = ctx.socket(zmq.REQ)
socket.connect('tcp://127.0.0.1:5000')

for i in range(10):
    msg = "msg %s" % i
    socket.send_unicode(msg)
    print('Sending', msg)
    msg_in = socket.recv()

And here in Haskell:

import System.ZMQ
import Data.ByteString.Char8 as CBS

main = withContext 1 $ \ ctx -> withSocket ctx Req $ \ soc -> do
    connect soc "tcp://127.0.0.1:5000"
    let msgs = [pack ("msg " ++ show i) | i <- [0..9]]
    flip mapM_ msgs $ \ msg -> do
        send soc msg []
        CBS.putStrLn msg
        receive soc []

pub-sub

In Python3:

import zmq

ctx = zmq.Context()
socket = ctx.socket(zmq.SUB)
socket.connect('tcp://127.0.0.1:5000')
socket.setsockopt(zmq.SUBSCRIBE, b'sweden')
socket.setsockopt(zmq.SUBSCRIBE, b'denmark')

while True:
    print(socket.recv())

Haskell:

import System.ZMQ
import Control.Monad
import Data.ByteString.Char8 as CBS

main = withContext 1 $ \ ctx -> withSocket ctx Sub $ \ soc -> do
    connect soc "tcp://127.0.0.1:5000"
    subscribe soc "sweden"
    subscribe soc "denmark"
    forever $ receive soc [] >>= CBS.putStrLn

Two comments on the Haskell code here:

  • I’m not sure why, but the Haskell client dies after receiving just a few messages (they are properly filtered though).
  • The API for subscribe is a bit strange, it would make more sense if it took a ByteString rather than a String.

Shelltestrunner to the rescue

A little while ago shelltestrunner was announced on haskell-cafe. At the time I was slowly losing hope on ever getting decent test coverage in cblrepo using HUnit. Using something like shelltestrunner could be an easier and more workable solution, especially since what cblrepo needed most in the short term is a bit of integration testing.

shelltestrunner is basically just a tool that runs shell commands and compares output (both stdout and stderr) and the exit code. It’s also possible to provide data to be passed to the command on stdin. The documentation on the shelltestrunner home page is very good and accessible. There are only a few things that I’d like to add to it:

  • Use the --with (`-w´) flag, it’s very handy to avoid littering the tests with long paths to the output of your build environment.
  • There is no support for set-up and tear-down steps in the tests (in my opinion this would be a very nice addition to the tool) so anything needed to be set up for the actual tests, will itself have to be tests.
  • There is no way to name tests (would be another good addition) so I found it crucial to organise tests into several files.

Compiling U-Boot for use in QEMU (VersatilePB)

Since I’m now working a bit with embedded systems I thought I’d take a look at compiling for one of the ARM-based machines that QEMU supports. I settled for VersatilePB after finding this old-ish article. Rather optimistically I thought that maybe, just maybe things had change in a year and that the limitation of flash was removed. How wrong I was.

I did find an easier way to get it working, though with the limitation that Linux has to be started via tftpboot or some other network-based fashion. The patch looks like this:

--- u-boot.orig/src/u-boot-2011.12/include/configs/versatile.h
+++ u-boot/src/u-boot-2011.12/include/configs/versatile.h
@@ -31,6 +31,8 @@
 #ifndef __CONFIG_H
 #define __CONFIG_H
 
+#define CONFIG_ARCH_VERSATILE_QEMU
+
 /*
  * High Level Configuration Options
  * (easy to change)

Then just go ahead and modify the default boot argument (CONFIG_BOOTARGS in the same file) to your hearts content to minimise the amount of manual work for booting.

Adjusting to Sweden with XKB

Having lived outside of Sweden for about a decade I’ve grown accustomed to non-Swedish keyboard layouts, first the US (as it’s widely used in The Netherlands) and later on the UK layout. Moving back to Sweden had me swearing over the layout used here within only a few days. The placement of “{[]}” is especially painful. Clearly the Swedish layout wasn’t designed for developers! Rather than go on muscle memory I decided to first attempt a small change to the X key mappings.

I found a good description of per-user XKB configuration after a bit of searching. Then I modified it slightly to fit better in my Arch-based LXDE system.

The XKB config

I started with removing all the configuration I’d previously put into /etc/X11/xorg.conf.d – if I’m to use per-user configuration then there should be no system-wide settings at all. Then I put the output of setxkbmap -print into ~/.xkb/maps/$(hostname) as a starting point. The main goal is to move the characters that requires awkward single-hand combinations with AltGr to slightly more comfortable mappings. After a bit of experimentation I settled on the following (which I put in ~/.xkb/symbols/sedev)

partial alphanumeric_keys
xkb_symbols "devkeys" {
    key  { [ q, Q, backslash ] };
    key  { [ w, W, asciitilde ] };

    key  { [ a, A, braceleft ] };
    key  { [ s, S, bracketleft ] };
    key  { [ d, D, bracketright ] };
    key  { [ f, F, braceright ] };
};

After setting it manually and verifying that the new mappings work I added it to my keymap, which ended up looking like this

xkb_keymap {
    xkb_keycodes  { include "evdev+aliases(qwerty)" };
    xkb_types     { include "complete" };
    xkb_compat    { include "complete" };
    xkb_symbols   { include "pc+se(nodeadkeys)+inet(evdev)+capslock(swapescape)+compose(paus)+sedev(devkeys)" };
    xkb_geometry  { include "pc(pc104)" };
};

Tying it together

Now all the remains is to load the new configuration on login. Based on madduck’s example I put the following into ~/.xprofile

# load XKB, if there is one
XKBDIR=${HOME}/.xkb
XKBMAPFILE=${XKBDIR}/keymap/$(hostname)
if [[ -f ${XKBMAPFILE} ]]; then
    xkbcomp -I${XKBDIR} ${XKBMAPFILE} ${DISPLAY}
fi

Now I just have to get used to using the new mappings.

LXDE and xmonad

A few days ago I create the page on LXDE and Xmonad on the Xmonad area of the Haskell Wiki. It’s very short, mainly due to it being very simple to set it up. My config is a bit bare-bones at the moment though and I’m sure others have more to contribute.

And yes! This means I’ve left the Gnome camp. Quite possibly for good.

Xmonad and Gnome 3

The upgrade to Gnome3 in ArchLinux a few days ago broke my previous setup that combined xmonad with Gnome. Gnome 3 has a fallback mode, but I found that the instructions for replacing metacity under Gnome 2 no longer worked. With some help from the xmonad mailing list (in particular Jens Petersen and his efforts of providing a working setup on Fedora) I now finally have a working setup again. Here’s how I did it.

Add a session file for use by Gnome Session (/usr/share/gnome-session/sessions/xmonad.session):

[GNOME Session]
Name=Xmonad session
RequiredComponents=gnome-panel;gnome-settings-daemon;
RequiredProviders=windowmanager;notifications;
DefaultProvider-windowmanager=xmonad
DefaultProvider-notifications=notification-daemon

And a desktop file for GDM (/usr/share/xsessions/xmonad-gnome-session.desktop):

[Desktop Entry]
Name=Xmonad GNOME
Comment=Tiling window manager
TryExec=/usr/bin/gnome-session
Exec=gnome-session --session=xmonad
Type=XSession

That’s all it takes. Of course I’ve raised a ticket against the Arch package.

Per-user Gnome 3 configuration

Gnome 3 just hit the official ArchLinux repos a few days ago. It’s new, it’s slick, it’s shiny… but I don’t think it’s ready for general use just yet. It seems stable enough, but there’s just a few too many things missing to make it feel like it’s complete. Anyway, running Arch means that at times one has to live with not-quite-release-ready software anyway :-)

The biggest issue I’ve come across with Gnome 3, and especially Gnome Shell and the window manager, is configuring the themes. I was pointed to a fairly good article on customising the Gnome Shell, but it suggests modifying system files which is a bad thing to do even on single-user systems. So this post should be read as an addendum to that one.

First of all install the User Theme Gnome Shell Extension. The AUR packages available pull the source from its git repo because there doesn’t seem to be any releases of the extensions just yet. When using the bleeding edge source I ran into problems with Gnome Shell crashing so I advise against using it. I’ve had success with the source tagged at 3.0.1, you can find an Arch source package for Gnome Shell User Theme that I put together based on one of the AUR packages. Build and install that, then restart Gnome Shell (Alt-F2, r, return). Then verify that the extension has been loaded by using Looking Glass.

Then create copies of the default themes using rsync:

% rsync -a /usr/share/themes/Adwaita ~/.themes
% mv ~/.themes/Adwaita ~/.themes/Adwaita2
% mkdir -p ~/.themes/Default/gnome-shell
% rsync -a /usr/share/gnome-shell/theme ~/.themes/Default/gnome-shell

Then modify the file ~/.themes/Adwaita2/index.theme so that each mention of Adwaita says Adwaita2 instead, except for the cursor theme.

Make sure gnome-tweak-tool is installed (it’s in a package with the same name). Run it and change the shell theme to Default,the windows theme to Adwaita2, and the interface gtk+ theme to Adwaita2 as well.

Now you return to the article on configuring Gnome Shell, but instead of modifying the system files modify the ones in your ~/.themes.

ArchHaskell HABS with cblrepo

As a follow-up to my earlier post on cblrepo I thought I’d convert a snapshot of ArchHaskell HABS to cblrepo. It’s mostly done as an exercise and to serve as an example. You can find it at http://www.kiwilight.com/~magnus/habs/. Of course I have used it to build all the packages, and I still have the result of that around, so if anyone asks I just might upload that as well.

Revisiting JSON in Haskell

I just received an email with some praise for my earlier post on JSON in Haskell–it’s always nice to receive some praise ;-) However, the sender also mentioned that the mLookup function as coded there would blow up on incomplete JSON objects. That was by design, as a simplification, but the sender needed to deal with just that and asked if I had some more elegant solution than making every field in the data type a Maybe.

As I said, it’s always nice to receive praise, so here’s one solution that came to mind as I was reading the email.

I should mention that it relies on there being a reasonable default value for each type of the fields, and that the default is the same for all fields sharing a type.

First off, define a type class for types with default values:

class Defaultable d where
    def :: d

Then modify mLookup so that it uses Defaultable. I renamed it to mLookupAndReadJSON:

mLookupAndReadJSON a as = maybe def readJSON (lookup a as)

Now we need to provide some instances of Defaultable too. I limit this example to cover only GlossDef, so only the following instances are required:

instance Defaultable [a] where
    def = []

instance Defaultable a => Defaultable (Result a) where
    def = Ok def

Now it’s possible to decode incomplete JSON objects:

ghci> decode "{ \"GlossSeeAlso\": [\"GML\", \"XML\"] }" :: Result GlossDef
Ok (GlossDef {glossDefPara = "", glossDefSeeAlso = ["GML","XML"]})

I’m sure there are other ways of achieving what the author of the email asked for. Please let me know of them in comments.

Maintaining Haskell packages for a Linux distribution---cblrepo

Maintaining a large set of Haskell packages for a Linux distribution is quite a chore. Especially if one wants to track Hackage as far as possible. Several distributions have tools to automatically convert Cabal-based packages into distribution packages, e.g. cabal2arch for ArchLinux and cabal-rpm. They are just conversion tools though, and the most time-consuming activity in maintaining Haskell packages is resolving and verifying dependencies.

At least that was my experience when I was actively involved in ArchHaskell. I only saw two options when adding or upgrading a package, either I worked out dependencies manually, or I simply tried it out. Neither of them was very appealing, and both were very time-consuming. It seemed obvious that I needed some tool to help out.

Enter cblrepo!

It allows me to maintain a database of specific versions of packages, and when I want to upgrade a package, or add a new one, it’ll verify that all dependencies can be satisfied. In other words, it helps me maintain a buildable set of packages at all times.

The tool also has some functionality that helps in tracking Hackage as packages are updated there.

Something about how it works

At the moment I maintain a small repository of Arch packages, mostly just to try out cblrepo and convince myself that it works. The work environment contains a database and a directory of patches:

% ls
cblrepo.db  patches/
%

The database is a cleartext file containing the information on the packages. It’s basically just a dump of the related Haskell datatype, encoded in JSON. The patches directory holds patches for Cabal files and PKGBUILD files. They must be named patch.cabal.<hackage name> or patch.pkgbuild.<hackage name> in order to be picked up by cblrepo.

There’s also an application directory (~/.cblrepo) for caching info about the packages available on Hackage:

% ls ~/.cblrepo
00-index.tar.gz
%

How to use it

A session with cblrepo looks something like this. First we update the information about what packages are available on Hackage:

% cblrepo idxsync
%

After that it’s possible to see what packages are out-of-date:

% cblrepo updates
cmdargs: 0.6.8 (0.6.9)
test-framework-th: 0.1.3 (0.2.0)
xml: 1.3.7 (1.3.8)
language-haskell-extract: 0.1.2 (0.2.0)
blaze-builder: 0.2.1.4 (0.3.0.0)
%

Let’s check whether cmdargs can be updated:

% cblrepo add -n cmdargs,0.6.9 %

It generates no output, so that means it’s possible to update. When attempting to add all the packages we run into a problem:

% cblrepo add -n cmdargs,0.6.9 \
> test-framework-th,0.2.0 \
> xml,1.3.7 \
> language-haskell-extract,0.2.0 \
> blaze-builder,0.3.0.0
Adding blaze-builder 0.3.0.0 would break:
  haxr : blaze-builder ==0.2.*

We’ll leave blaze-builder at the current version for now:

% cblrepo add cmdargs,0.6.9 \
> test-framework-th,0.2.0 \
> xml,1.3.7 \
> language-haskell-extract,0.2.0
%

After these updates we also need to make sure that all packages that depend on these ones are re-built, that is we need to bump their release version:

% cblrepo bump -n cmdargs \
> test-framework-th \
> xml \
> language-haskell-extract
Would bump:
test-framework
test-framework-hunit
test-framework-quickcheck2
%

Just re-run that without the -n to actually perform the bump. Now that all this is done we need to generate the files necessary to build the Arch packages. We can easily check what packages need re-building, and get a good order for building them:

% cblrepo build cmdargs \
> test-framework-th \
> xml \
> language-haskell-extract
cmdargs
xml
test-framework
test-framework-quickcheck2
test-framework-hunit
language-haskell-extract
test-framework-th
%

And generating the required files is also easy:

% cblrepo pkgbuild $(!!)
% tree
.
|-- cblrepo.db
|-- haskell-cmdargs
|   |-- haskell-cmdargs.install
|   `-- PKGBUILD
|-- haskell-language-haskell-extract
|   |-- haskell-language-haskell-extract.install
|   `-- PKGBUILD
|-- haskell-test-framework
|   |-- haskell-test-framework.install
|   `-- PKGBUILD
|-- haskell-test-framework-hunit
|   |-- haskell-test-framework-hunit.install
|   `-- PKGBUILD
|-- haskell-test-framework-quickcheck2
|   |-- haskell-test-framework-quickcheck2.install
|   `-- PKGBUILD
|-- haskell-test-framework-th
|   |-- haskell-test-framework-th.install
|   `-- PKGBUILD
|-- haskell-xml
|   |-- haskell-xml.install
|   `-- PKGBUILD
`-- patches

8 directories, 15 files
%

Now all that’s left is running makepkg in each of the directories, in the order indicated by cblrepo build above.

Unfortunately they won’t all build—generating the Haddock docs for test-framework-th fails. That’s however fairly easy to remedy by patching the PKGBUILD to disable the generation of docs.

I’ll get back to that in a later post though.

Your comments, please

Please leave comments and suggestions. I’m planning on uploading the source to github shortly.

On maintaining Haskell packages for a Linux distro

When trying to maintain set of binary packages of Haskell libraries for a Linux distribution there are a few issues that come up:

  1. The set of packages must be compilable at all times, and
  2. Updating one package requires all packages that depend on it, in one or more steps, to be re-compiled.

The first requires keeping track of all dependencies of the packages in the set and making sure that they are satisfiable at all times. For a while I was doing this by simple attempting to compile all updated packages and check for breakages. Which was both time-consuming and a painful when build-failures had to be resolved.

The second requires bumping the package release number for all packages that are reachable when following the dependencies in the reverse direction. Doing this manually is tedious and very error prone in my experience.

Of course it ought to be possible to make this a lot easier with the help of a tool. The last few days I’ve been writing such a tool. This is how I’ve been using it so far.

Building the initial database

GHC in ArchLinux ships with a few Haskell libraries and ArchLinux also has a few Haskell packages in its base repositories. Since I don’t need to maintain any of those packages I decided to treat these as a sort of base. Adding those is as simple as this:

% head base-pkgs
base,4.2.0.2
array,0.3.0.1
bytestring,0.9.1.7
Cabal,1.8.0.6
containers,0.3.0.0
directory,1.0.1.1
extensible-exceptions,0.1.1.1
filepath,1.1.0.4
haskell98,1.0.1.1
hpc,0.5.0.5
% cblrepo addbasepkg $(cat base-pkgs)
Success

Then I need to add the packages of the binary repo provided by ArchHaskell. I wrote a little script that extracts the package name and version from the ArchHaskell HABS tree (get-ah-cabals):

#! /bin/bash

habsdir=$1

for d in ${habsdir}/habs/*; do
    . ${d}/PKGBUILD
    case $_hkgname in
        (datetime|haskell-platform)
            ;;
        (*)
            echo ${_hkgname},${pkgver}
            ;;
    esac
done

echo http://hackage.haskell.org/platform/2010.2.0.0/haskell-platform.cabal

Since haskell-platform isn’t on Hackage it requires special handling. The reason why datetime is excluded is slightly different. It’s the only package that requires old base (version <4). GHC in Arch does whip with both old and new base so datetime can be built, but cblrepo can’t deal with two versions of the same package. This is a limitation, but I’m not sure it’s worth fixing it since base is the only library that comes in two versions, and datetime is the only package that hasn’t been updated to use new base.

Knowing this it’s easy to add all the ArchHaskell packages to the database:

% cblrepo idxupdate
% cblrepo add $(get-ah-cabals path/to/habs)
Success

Attempting an update

Now it’s possible to attempt to attempt an update:

% cblrepo add neither,0.2.0
Failed to satisfy the following dependencies for neither:
  monad-peel >=0.1 && <0.2
Adding neither 0.2.0 would break:
  yesod : neither >=0.1.0 && <0.2
  persistent : neither >=0.1 && <0.2

The way to read this is that there first of all is a missing dependency to satisfy for neither itself, and second there are two packages, yesod and persistent, that wouldn’t be buildable if neither were updated.

Now if it were possible to update neither, what packages would require a bump?

% cblrepo bump neither     
persistent
yesod

Dear Google Code,

I’m impressed that you sent me an email asking for my input when someone wanted to use the project name bunny just because I happen to have a project called bunny on SourceForge1. However, neither your email, nor the page where I can record my opinion gives me enough information to decide. Please consider at least including in future emails the description of the proposed project.

Thank you!

XML character dereferencer

Just in case you ever need one:

xmlCharDeref :: String -> String
xmlCharDeref [] = []
xmlCharDeref ('&':'#':'x':r) = let
        (digits, remainder) = span (/= ';') r
        c = chr (read ("0x" ++ digits))
    in
        c : xmlCharDeref (tail remainder)
xmlCharDeref ('&':'#':r) = let
        (digits, remainder) = span (/= ';') r
        c = chr (read digits)
    in
        c : xmlCharDeref (tail remainder)
xmlCharDeref (c:r) = c : xmlCharDeref r

In ghci:

*Foo> xmlCharDeref "hello there"
"hello there"
*Foo> xmlCharDeref "hello&#32;there"
"hello there"
*Foo> xmlCharDeref "hello&#x32;there"
"hello2there"

Any Haskell puzzlers?

I just watched Joshua Block’s talk Effective Java - Still Effective After All These Years. I’m not a Java developer1 but I still found the talk very interesting. Mr Block offers tips and tricks to deal effectively with a few aspects of Java, and I’m sure many a Java developer out there would find that part very interesting. For me however, the most interesting part was the appetizers and the dessert :-)

The appetizer and dessert consisted of three puzzlers. A puzzler is a piece of code that high-lights some peculiarity of the language or standard libraries. The puzzlers in this talk were are follows:

Simple question

What is printed by the following code, and why?

public class SimpleQuestion {
  static boolean yesOrNo(String s) {
    s = s.toLowerCase();
    if(s.equals("yes") || s.equals("t") || s.equals("y"))
      s = "true";
    return Boolean.getBoolean(s);
  }

  public static void main(String[] args) {
    System.out.println(yesOrNo("true") + " " + yesOrNo("YeS"));
  }
}

Searching

What is the result of the following code, and why?

import java.util.*;

public class Searching {
  public static void main(String[] args) {
    String[] strs = { "0", "1", "2", "3", "4", "5" };

    // Translate string array into list of integer
    List<Integer> ints = new ArrayList<Integer>();
    for(String s : strs)
      ints.add(Integer.valueOf(s));

    System.out.println(Collections.binarySearch(ints, 1, cmp));
  }

  static Comparator<Integer> cmp = new Comparator<Integer>() {
    public int compare(Integer i, Integer j) {
      return i < j ? -1 : (i == j ? 0 : 1);
    }
  };
}

PrintWords

This one consists of two classes, which are compiled together:

public class PrintWords {
  public static void main(String[] args) {
    System.out.println(Words.FIRST + " " + Words.SECOND + " " + Words.THIRD);
  }
}
public class Words {
  public static final String FIRST = "the";
  public static final String SECOND = null;
  public static final String THIRD = "set";
}

Now modify the latter like this:

public class Words {
  public static final String FIRST = "physics";
  public static final String SECOND = "chemistry";
  public static final String THIRD = "biology";
}

Compile the second version of Words.java alone and then run PrintWords, what is the result and why?

Any puzzlers for Haskell?

Of course I couldn’t help but wonder what puzzlers there are for Haskell. Do note though that puzzlers aren’t just obfuscated code; they are readable code that you think does one thing but in reality it does something else. I’d really like to read any Haskell puzzlers you can come up with. Post them on your own blogs, or as comments to this post.

NB I should probably mention that I really don’t want answers to the puzzlers. I’ve watched Josh Bloch’s presentation, and I think anyone interested in finding out should watch it for themselves.


  1. If I ever find myself in a situation that calls for Java I’m very likely to spend some time looking at Scala :-)

Modifying Twitter Tools (WordPress plugin) for use with identi.ca

This is almost silly, so I’ll be surprised if it works flawlessly, so far though it seems to work well enough. :-)

I got Twitter Tools to work with identi.ca by modifying the file twitter-tools.php so that the URIs used are:

define('AKTT_API_POST_STATUS', 'http://identi.ca/api/statuses/update.json');
define('AKTT_API_USER_TIMELINE', 'http://identi.ca/api/statuses/user_timeline.json');
define('AKTT_API_STATUS_SHOW', 'http://identi.ca/api/statuses/show/###ID###.json');
define('AKTT_PROFILE_URL', 'http://identi.ca/api/###USERNAME###');
define('AKTT_STATUS_URL', 'http://identi.ca/api/###USERNAME###/statuses/###STATUS###');
define('AKTT_HASHTAG_URL', 'http://search.identi.ca/api/search.json?q=###HASHTAG###');

Hopefully this is all that’s needed :-)

Validating names in SSL certificates using OpenSSL (0.9.8)

Recently I’ve battled with OpenSSL at work. One thing I needed to do was add name validation to a program that previously hasn’t had it. In an attempt to avoid obvious mistakes I went looking for existing examples for how to do it. I came across some code from Secure Programming.com, it can be found in the code from the book in “/spc-1.1/chapter10/8-unix.c”. Just too bad only a part of the code actually works as advertised. On top of that the working part uses old functions which remain in the API only for backwards compatibility.

In trying to fix up that code I wrote the following little example code for extracting CN and subjectAltName:

#include <stdlib.h>
#include <stdio.h>

#include <openssl/pem.h>
#include <openssl/x509v3.h>

void getCN( X509 * );
void getSubjectAltName( X509 * );

int
main( int argc, char **argv )
{
    FILE *fpem;
    X509 *cert;

    if( !( fpem = fopen( argv[1], "r" ))) {
        fprintf( stderr, "Couldn't open the PEM file: %s\n", argv[1] );
        return( EXIT_FAILURE );
    }

    if( !( cert = PEM_read_X509( fpem, NULL, NULL, NULL ))) {
        fclose( fpem );
        fprintf( stderr, "Failed to read the PEM file: %s\n", argv[1] );
        return( EXIT_FAILURE );
    }

    getCN( cert );
    getSubjectAltName( cert );

    fclose( fpem );
    return( EXIT_SUCCESS );
}

void
getCN( X509 *cert )
{
    printf( "## %s\n", __PRETTY_FUNCTION__ );

    X509_NAME *subjName;
    int idx;

    if( !( subjName = X509_get_subject_name( cert )))
        fprintf( stderr, "X509_get_subject_name failed" );

    idx = X509_NAME_get_index_by_NID( subjName, NID_commonName, -1 );
    X509_NAME_ENTRY *entry = X509_NAME_get_entry( subjName, idx );
    ASN1_STRING *entryData = X509_NAME_ENTRY_get_data( entry );
    unsigned char *utf8;
    int length = ASN1_STRING_to_UTF8( &utf8, entryData );
    printf( "CN value: %s\n", utf8 );
    printf( "CN length: %d\n", length );
    OPENSSL_free( utf8 );

    return;
}

void getSubjectAltName( X509 *cert )
{
    printf( "## %s\n", __PRETTY_FUNCTION__ );

    GENERAL_NAMES *sANs;

    if( !( sANs = X509_get_ext_d2i( cert, NID_subject_alt_name, 0, 0 ))) {
        printf( "No subjectAltName extension\n" );
        return;
    }

    int i, numAN = sk_GENERAL_NAME_num( sANs );
    printf( "subjectAltName entries: %d\n", numAN );
    for( i = 0; i < numAN; ++i ) {
        GENERAL_NAME *sAN = sk_GENERAL_NAME_value( sANs, i );
        // we only care about DNS entries
        if( GEN_DNS == sAN->type ) {
            unsigned char *dns;
            ASN1_STRING_to_UTF8( &dns, sAN->d.dNSName );
            printf( "subjectAltName DNS: %s\n", dns );
            OPENSSL_free( dns );
        }
    }

    return;
}

Based on this I should be able to finish the patch I’ve been working on.

My new browser setup at work

I am very keen on keeping my private and work information separate. E.g. I would never read my personal and work email in the same MUA, instead I read work email in Thunderbird and the few times I read private email during working hours I do that using the web interface to GMail. At home it’s the other way around, Thunderbird for personal email, and a web interface to read work email. I used to have a similar setup for my browsing to keep bookmarks and saved passwords for the different areas of my life separate. Firefox was my work browser and Epiphany was my personal browser.

With the recent move to use webkit I noticed that there are a few bits with Epiphany that really bugs me though. Especially its inability to remember passwords; on my Eee it’s just a killer to not be able to do that. So, I decided to take a look at Firefox again, especially to see whether there are any add-ons that would help. And there are. These are the add-ons I found useful for this:

Profile Manager and Synchronizer

The most important piece of the setup is the addon Profile Manager and Synchronizer. It make sit easy to have more than one instance of Firefox running at the same time, with different profiles active in each one.

At first I tried synchronising profiles via dropbox, but that resulted in a lot of updates each time so I quickly stopped. I can recommend using it once though, to get the profiles to all the computers in the first place.

The plugin author says there will be a version that works with 3.6 soon. In the meantime I can report that I’ve had no issues with manually modifying the version range just to get it to install.

Xmarks

Since I don’t synchronise my profiles I do need to synchronise my bookmarks, and for that I use Xmarks.

Diigo

Diigo is a social bookmarking site. There seems to be about 13 to a dozen of those, but there are a couple of things that make Diigo different.

With the plugin I can easily store away pages for reading at some later date. In the past I’ve had a bookmark folder, or slightly more recently a tag, that I used to mark up pages that I’d like to take a closer look at. I’ve stopped that completely, and now I just mark pages as unread in Diigo. Just another way of reducing the clutter among my bookmarks.

The probably coolest feature is commenting on webpages. I mostly use that to add private comments to web pages, e.g. when I do some research into some topic (so far it’s mostly been for items I’m considering buying), but it’s also possible to make public comments. I’ve found it useful on more than one occasion to have a quick look through the public comments other people have put on pages.

Good online stores for yoga equipment

The condensed version: I’ve found both YogaStudio and Yoga United to be excellent online stores for yoga equipment.

Longer version

About a month ago I decided to finally invest in a yoga mat. After a bit of research I found the prAna Revolution, it’s an extra wide, extra long mat made of natural rubber. I decided to order it from YogaStudio. It was completely eventless and slightly quicker than I expected. So two big thumbs up for YogaStudio.

After receiving the mat I realised I really would need a bag for it. I found it very difficult to find a bag that would fit my slightly over-sized mat. Finally I stumbled on Yoga United, who had a good price on an extra long bag made out of cotton. They also delivered quicker than expected, unfortunately they didn’t ship the one I actually ordered. What I got was the smaller size bag, too small for my mat, but it fit my wife’s mat perfectly and she wanted to keep it. Instead of the hazzle of sending it back I agreed with the lady at Yoga United that it would be simpler for them to just let me order another extra long bag and let me keep the one I had. The second bag arrived the next day. Again brilliant service.

Finally, I can really recommend the mat I bought. Yes, it’s pricy, but so far I’ve found it to be brilliant. The bag, you ask. Well, the mat doesn’t really want to stay rolled up, if I put it that way. Also, cotton isn’t a material that natural rubber slides easily on. It isn’t that hard to get the mat into the bag, but it helps to be patient :-)

Gnome: 2 questions that go unanswered

Since no one on the Gnome mailing list seems to be able to answer these questions I thought I’d try some other venues for getting them answered. The audience for my blog isn’t that big, but just maybe there’s someone out there who knows the answers to these questions related to Gnome configuration. Mail one and mail two.

1: Running GUI tool after NM has brought up network

I run dropbox on my laptop, but their software is crap at handling that the network comes up only after the dropbox service has started ((I’ve noticed no problem when network goes away and then comes back, dropbox picks that up just fine. But if the network isn’t there to start with, that it can’t handle. I’m somehow at a loss how to write a program that handles the former but not the latter. :-) ))

I know of the possibility of dropping a file in /etc/NetworkManager/dispatcher.d/, but that doesn’t work in this case since I need the program to run “in my desktop”. Well, at least I haven’t managed to get the dropbox server to throw up an icon in my Gnome tool bar like it should, unless I run it from inside the desktop environment.

Any suggestions on how to solve this problem?

2: Changing background of the Gnome screensaver

I think Gnome comes with quite possibly the ugliest background for a screensaver I’ve ever seen. It’s a close-up on a green leaf or something. Absolutely hideous. I want to change it. To something nice, like a solid black. Actually just about anything else would do. But how?

GDM came with the same ugly background. Luckily I managed to find instructions on the Arch wiki to change GDM background. I’ve tried and failed to use a similar trick on the screensaver.

Please, help me escape the ugly background of the Gnome screensaver!

Bash is simply insane

What do you think the following script ((The script is somewhat artificial, who would ever use the construct true | while ...? I’ve used this just to show the point while keeping the examples short. Feel free to replace that part with something more useful, like cat myfile | while read ....)) will print?

foo() {
    true | while true; do
        false
        rc=$?
        if [ $rc -eq 1 ]; then
            return 1
        fi
    done
    echo $?
    return 0
}

foo || echo "Failed foo"

Run it and see. I suspect everyone but the script gurus out there will be surprised.

What about this script then?

bar () {
    local rc=0
    true | while true; do
        false
        rc=$?
        if [ $rc -eq 1 ]; then
            return 1
        fi
    done
    echo $?
    echo $rc
    return 0
}

bar

Surprised again?

I guess this means that scoping in bash is somewhat more complicated then I would have ever guessed.

Playing with sockets in Haskell

This is another one of those posts that I make mostly for myself, you know for organising and help my memory :-)

There are as far as I can see three ways to deal with sockets in Haskell. There’s the type Socket which is used throughout Network.Socket. From that it’s possible to get to the underlying filedescriptor, and it in turn can be converted to a Handle.

When coupled with fork+exec it’s crucial to make sure the child process can find the socket Leaving it in a predictable place seems to be the easiest way to do that, and as far as I can see that requires using dupTo from System.Posix.IO. So, on the child-side it’s necessary to find a way to turn an integer (CInt) into something that can be treated as a socket (i.e. a Socket, a Handle, or a filedescriptor).

A basic parent-child which obviously won’t work since the child’s socket is represented as a Socket:

import Control.Concurrent
import System.Posix.Process
import Network.Socket

childFunc s = send s "Ping from child" >> return ()

main = do
    (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol
    print (childSock, parentSock)
    child <- forkProcess $ childFunc childSock
    recv parentSock 10 >>= print

Let the child take a CInt and turn it into a filedescriptor:

import Control.Concurrent
import Control.Concurrent.MVar
import System.Posix.Process
import System.Posix.IO
import System.Posix.Types
import Network.Socket

childFunc sInt = do
    let fd = Fd sInt
    fdWrite fd "Ping from child" >> return ()

main = do
    (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol
    let childInt = fdSocket childSock
    print (childInt, parentSock)
    child <- forkProcess $ childFunc childInt
    recv parentSock 10 >>= print

Let the child take a CInt and turn it into a Handle:

import Control.Concurrent
import System.Posix.Process
import System.Posix.IO
import System.Posix.Types
import Network.Socket
import System.IO

childFunc sInt = do
    h <- fdToHandle $ Fd sInt
    hPutStr h "Ping from child"
    hFlush h

main = do
    (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol
    let childInt = fdSocket childSock
    print (childSock, parentSock)
    child <- forkProcess $ childFunc childInt
    recv parentSock 10 >>= print

Let the child take a CInt and turn it into a Socket:1

import Control.Concurrent
import Control.Concurrent.MVar
import System.Posix.Process
import System.Posix.IO
import System.Posix.Types
import Network.Socket

childFunc sInt = do
    s <- mkSocket sInt AF_UNIX Stream defaultProtocol Connected
    send s "Ping from child" >> return ()

main = do
    (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol
    let childInt = fdSocket childSock
    print (childInt, parentSock)
    child <- forkProcess $ childFunc childInt
    recv parentSock 10 >>= print

  1. It seems the socket is in the Connected state after socketPair succeeds.

Epilicious is dead, say hello to BMS

With Python being dropped as a language for extensions in epiphany 2.28 I needed to replace epilicious. I tried writing it in JavaScript (seed was integrated in 2.28), but I gave up due to hitting too many hurdles on the way. Instead I decided to rewrite epilicious using Vala and a minimal layer of C.

It turned out to be very doable, despite epiphany’s extension API lacking Vala bindings (Cosimo Cecchi, I’m looking at you :-) ). Basically I did the following setup:

  1. Add the extension following the instructions in Writing Epiphany Extensions by Adam Hooper.
  2. Add a check for valac in configure.ac.
  3. Add a rule in the extension’s Makefile.am to generate a C header file for all the Vala code, for use from C.

Then I started writing the actual extension. I did the minimal amount of work in C, trying to escape as soon as possible into Vala. In the few places I needed to call from Vala into C I would declare a delegate in Vala, and pass a function from C.1

I call this new version BMS, for bookmark synchronisation. I have a patch for BMS that applies to epiphany 2.28.1. (The file also contain a PKGBUILD in order to delight Arch users :-) )

I should probably point out that while epilicious never could be called polished, BMS is even less so. I might find the time to make it multi-threaded, so that it’s possible to do some sort of progress dialogue, but don’t hold your breath. In the back of my mind is also the thought of rewriting it yet again, in JavaScript/seed.


  1. The format of .vapi files are unknown to me, and I haven’t managed to find much documentation. Using function pointers seemed like an easier way, especially given that I needed this in exactly 3 places.

Updating Haskell packages on Arch

A few days ago I noticed that there were a few Haskell packages on AUR that had received updates. This was the excuse I had been waiting for to address the second part of keeping my Haskell packages up-to-date (I’ve written about the first part before, dealing with an update to GHC).

It’s easy to find the packages with available updates:

% yaourt -Su --aur

Unfortunately it isn’t as easy as just updating the listed packages. Any package that depends on an updated package really should be re-compiled and re-installed to guarantee that the entire system behaves as expected after the upgrade. Of course pacman can handle it:

% pacman -Rcn <all pkgs with updates>

This will list all packages that will be removed. After removing them all, they can all be re-installed. That is of course not quite as nice as it could be, since they all then will be explicitly installed, it would be nicer to just re-install the “top-level packages”. This is one way to achieve this.

I did a bit of refactoring and put the Arch-related functions from the previous post in their own module, Arch. Then I added a function that takes a package and recursively inspects Required By until a “top-level package” (i.e. a package that doesn’t require any other package) is reached:

getTopRequiredBy pkg = let
        tops = do
            first <- getRequiredBy pkg
            if null first
                then return [pkg]
                else liftM concat $ mapM getTopRequiredBy first
    in liftM nub tops

After that it’s straight forward to write up a little tool which offers some advice on what to do:

main = do
    pkgsToUpgrade <- getArgs
    pkgsToReinstall <- liftM (nub . concat) $ mapM Arch.getTopRequiredBy pkgsToUpgrade
    putStrLn $ "To remove     : pacman -Rnc " ++ unwords pkgsToUpgrade
    putStrLn $ "To re-install : yaourt -S " ++ unwords pkgsToReinstall

Using it on the packages I wanted to upgrade gave the following output:

% runhaskell PkgUpgrade.hs haskell-{configfile,haxml,json,missingh,safe,testpack,time}
To remove     : pacman -Rnc haskell-configfile haskell-haxml haskell-json haskell-missingh haskell-safe haskell-testpack haskell-time
To re-install : yaourt -S haskell-configfile haskell-haxml haskell-json haskell-hsh haskell-safe

Following that advice seemed to work just like I intended.

Wrapping IO, part 2

The previous post was a fairly silly example, unless of course it’s more useful than I realise :) However, here’s something that I can see a bit more use of, a monad that restricts reading and writing of files to two files, one to read from and one to write to.

Again, the first step is to create a data type:

newtype TwoFileIO a = TwoFileIO { execTwoFileIO :: (Handle, Handle) -> IO a }

This defines a type wrapping a function that takes a pair of handles (one for input and one for output) and returns an “IO action”. Turning this into a monad is straight forward (actually it’s similar to the Reader monad):

instance Monad TwoFileIO where
    return v = TwoFileIO $ \ _ -> return v
    (>>=) m f = let
            fInIO = execTwoFileIO . f
        in TwoFileIO $ \ hs ->
            execTwoFileIO m hs >>= \v -> fInIO v hs

To return a value we can simply drop the pair of handles and return the value in IO. Bind (>>=) only looks complicated, what happens is that the first argument is “executed” with the provided handles, then the second argument is passed the result and executed with the same pair of handles. Of course the handles aren’t actually known yet, so an anynmous function is created, and wrapped in an instance of TwoFileIO. That’s it for the most complicated part.

In order to avoid having to manually open files and wire everything up I wrote the following convenience function:

runFileIO m iFn oFn = do
    iH <- openFile iFn ReadMode
    oH <- openFile oFn WriteMode
    res <- execTwoFileIO m (iH, oH)
    mapM_ hClose [iH, oH]
    return res

Yes, it does lack a bit in exception handling, but it’s good enough for now.

Then I can define the actions/functions that are available inside TwoFileIO. Reading and writing lines:

fioPutStrLn s = TwoFileIO $ \ (iH, oH) ->
    hPutStrLn oH s

fioGetLine = TwoFileIO $ \ (iH, oH) ->
    hGetLine iH

Note how it now becomes very hard to mix up the files and accidentally read from the output file or write to the input file.

As a little test function I used this one, which reads two lines and then writes them in the reversed order:

get1stN2ndPutLast = do
    first <- fioGetLine
    second <- fioGetLine
    fioPutStrLn second
    fioPutStrLn first

I can now test this using ghci:

> h <- openFile "testIn.txt" ReadMode
> hGetContents h
"line 0\nline 1\nline 2\n"
> runFileIO get1stN2ndPutLast "testIn.txt" "testOut.txt"
> h <- openFile "testOut.txt" ReadMode
> hGetContents h
"line 1\nline 0\n"

Wrapping IO, part 1

I’ve many times heard that Haskell can be used to prevent certain kind of programmer mistakes. In a presentation on Darcs it was explained how GADTs (especially phantom types) are used in Darcs to make sure that operations on patches follow certain rules. Another way, and at least it sounds easier, is to limit the available functions by running code in some sort of container. This being Haskell, that container is often a monad. I’ve really never seen this presented1 , so I thought I’d try to do it, and indeed it turns out to be very simple.

I started with a data type:

newtype HideIO a = HideIO { runHideIO :: IO a }

which I then made into a Monad in order to make it easy to work with:

instance Monad HideIO where
    return = HideIO . return

    (>>=) m f = HideIO $ runHideIO m >>= runHideIO . f

Then I can create an IO function that are allowed in the HideIO monad:

hioPutStrLn = HideIO . putStrLn

In ghci I can then do the following:

> runHideIO $ hioPutStrLn "Hello, World!"
Hello, World!

But I can’t do much else.


  1. Most probably due do my weak searching-fu than anything else.

Trying to work out iteratees

A few days ago I decided to explore the idea of using iteratee to do IO in Haskell. I read most of what Oleg has written on input processing using left-fold enumerators. Only a little wiser I took a look at the Iteratee IO package on Hackage. Unfortunately it still hadn’t quite sunk in. To be honest I couldn’t quite make heads or tails of it. Often just lining up the types properly will just work, even if I don’t understand why ((This reminds me of a math professor I had at university who said something like: “When it feels like the pen understands more than your head, persevere! It means you are close to getting it.”)) and soon after I usually gain some sort of understanding. This strategy didn’t seem to work with this particular package though :(

Somewhat based on Don’s answer to my question on Stackoverflow.com I thought I’d try to work through an implementation of my own. I just really hope I’ve got it right :)

I must admit I used Iteratee for inspiration at times. However, I couldn’t copy it straight off since I decided to first implement iteratee without involving monads. Sure, it’s rather silly to do “input processing” in Haskell without involving monads, but I find that including monads often clouds the problem at hand. So, I left them out to begin with, and add them again once I feel I know what I’m doing. So, here goes nothing…

The basic idea is to process a stream of data, presented in chunks. Each chunk is handed to an iteratee by an enumerator. For each chunk the iteratee signals to the enumerator whether it’s done or requires more data. These are the types I cooked up for this.

data Stream c e
    = Eof
    | Chunk (c e)

data Stepper c e a
    = Done a (Stream c e)
    | NeedAnotherChunk (Iteratee c e a)

data Iteratee c e a = Iteratee
    { runIteratee :: (Stream c e) -> (Stepper c e a) }

type Enumerator c e a = Iteratee c e a -> Iteratee c e a

I found it rather useful to implement Show for the two first, but I’ll leave that out of this post since it’s a simple thing to do.

I should probably point out that the container type for the stream (that’s the ‘c’ in Stream c e) is rather pointless in what I’ve done; it’s always going to be [] in this post. Keeping it in does provide some more similarity with the Iteratee on Hackage.

At this point I jumped to turning a list into an enumerator. The way I implemented it the list is split up in chunk of 3 items and present each chunk in order to the passed in iteratee.

enumList l iter = loop grouped iter
    where
        grouped = groupList 3 l

        groupList n l = let
                (p, r) = splitAt n l
            in case p of
                [] -> []
                xs -> xs:groupList n r

The loop function is the main part. Ending when the chunks are all used up is the easy part:

        loop [] i = let
                s = runIteratee i Eof
            in case s of
                Done v str -> Iteratee $ \ _ -> s
                NeedAnotherChunk i -> i

It is arguably an error if the iteratee returns NeedAnotherChunk when passed an Eof stream, but for now I’ll leave it the way it is. Doing the recursive step is very similar:

        loop (x:xs) i = let
                s = runIteratee i (Chunk x)
            in case s of
                Done v str -> Iteratee $ \ _ -> s
                NeedAnotherChunk i' -> loop xs i'

Here it is worth noticing that the iteratee is expected to return any part of the chunk that wasn’t processed.

Next I coded up my first iteratee, a map over the stream:

iFMap f = let
        doAcc acc Eof = Done acc Eof
        doAcc acc (Chunk i) = NeedAnotherChunk $ Iteratee $ doAcc (acc `mappend` (fmap f i))
    in Iteratee $ doAcc mempty

Now I can run the iterator over a list enumerator:

> runIteratee (enumList [1..9] (iFMap (*2))) Eof
Stepper Done <<[2,4,6,8,10,12,14,16,18]>> <<Stream: Eof>>

I found that a bit verbose, especially for interactive experimentation so the following simplifies it a bit

run iter = case (runIteratee iter) Eof of
    Done a _ -> a
    NeedAnotherChunk _ -> error "run: Iterator didn't finish on Eof"

As Oleg pointed out in his writings it turns out that Iteratee c e is a monad.

instance Monad (Iteratee c e) where
    return x = Iteratee $ \ s -> Done x s

The implementation of return is obvious, there really isn’t any other option than to encode is a continuation that returns Done irrespective of what is offered, and passes along the rest of the stream no matter what’s passed in. Bind (>>=) is a bit more complicated:

    i >>= f = Iteratee $ \ s ->
        let
            c = runIteratee i s
        in case c of
            Done v str -> runIteratee (f v) str
            NeedAnotherChunk i' -> NeedAnotherChunk $ i' >>= f

My understanding is that the left iteratee should be stepped along until it returns Done, at that point the result is passed to the right-side function, which results in an iteratee. The rest of the stream is then passed on to the new iteratee.

I implemented two other iteratees, iDrop :: Int -> Iteratee [] e () and iTakeWhile :: (e -> Bool) -> Iteratee [] e [e] with the obvious implementations. This then allows me to write a little test like this:

iTest = do
    iDrop 2
    t <- iTakeWhile (< 5)
    a <- return 'c'
    m <- iFMap (+ 3)
    return (t, a, m)

Running it gives the expected result:

> run (enumList [1..9] iTest)
([3,4],'c',[8,9,10,11,12])

That’s pretty much it. Not that much to it. At least not as long as I’ve actually understood iteratees.

Making a choice from a list in Haskell, Vty (part 5, the last one)

The time has come for the final installment of this series of “discussions of a refactoring”. These are the earlier installments. This is where I finally add the ability to collapse a list item. This is a rather terse description of the changes, since I feel all of them are fairly obvious, and hence require no lengthy explanation.

First the Option type has to be changed to keep track of whether an item is collapsed or not:

data Option = Option
    { optionRange::(Int, Int)
    , optionCollapsed::Bool
    , optionS1::String
    , optionS2::String
    } deriving (Show)

Next the rendering of an item has to be changed, so that collapsed items really appear collapsed. I thought displaying a collapsed item as its first line, with “…” added to the end would be acceptable for a first version:

instance Pretty Option where
    pretty (Option _ False s1 s2) = string s1 <> line <> indent 2 (string s2)
    pretty (Option _ True s1 _) = string s1 <> string "..."

Later on I’ll need to update the range of an item. For the forgetful, the range of an item is the starting and ending line. Obviously the range changes when an item is collapsed:

optionUpdateRange o = let
        (b, _) = optionRange o
        l = length $ lines $ show $ pretty o
    in o { optionRange = (b, b + l - 1) }

The implementation of optionsIsInRange has to change due to adding the optionCollapsed field. It’ll also be useful to have a few functions for manipulating the collapsed state of an item:

optionIsInRange (Option (b, e) _ _ _) i = b <= i && i <= e

optionIsCollapsed (Option _ c _ _) = c
optionToggleCollapse o = o { optionCollapsed = not (optionCollapsed o) }
optionCollapse o = o { optionCollapsed = True }
optionExpand o = o { optionCollapsed = False }

One thing that I didn’t think about until after doing some manual testing was that moving the cursor up in the list should always put the cursor on the line above, even when moving from one item to the previous. This was a bug in the previous version :-)

ozPreviousLine o@(OptionZipper 0 _ _) = o
ozPreviousLine o = let
        c = fromJust $ ozCursor o
        i = ozIdx o
    in if optionIsInRange c (i - 1)
        then o { ozIdx = i - 1 }
        else ozJumpToCursorBottom $ ozLeft o

I also have to change ozCursorMod due to adding the new field:

ozCursorMod f o@(OptionZipper _ _ (r:rs)) = let
        _r = f r
    in o { ozRS = (_r:rs) }
ozCursorMod _ o = o

It turns out the be useful to be able to jump to the top and bottom of an item (there’s already an example of the latter above):

ozJumpToCursorTop o@(OptionZipper _ _ (r:rs)) = let
        (newIdx, _) = optionRange r
    in o { ozIdx = newIdx }

ozJumpToCursorBottom o@(OptionZipper _ _ (r:rs)) = let
        (_, newIdx) = optionRange r
    in o { ozIdx = newIdx }

Creating the list of items need a slight modification as well:

options = ozFromListWithMod (optionSetRange 0) [Option (0, 0) False ((show i) ++ " Foo") "Bar" | i <- [0..2]]

The last change is adding actually collapsing of an item in the UI controller code:

_getChoice vt opts sx sy =
    let
        _converted_opts = lines $ show $ pretty opts
        _idx = ozIdx opts
        _calcTop winHeight listLength idx = max 0 ((min listLength ((max 0 (idx - winHeight `div` 2)) + winHeight)) - winHeight)
        _top = _calcTop sy (length _converted_opts) _idx
        _visible_opts = take sy (drop _top _converted_opts)
    in do
        update vt (render _visible_opts (_idx - _top) sx)
        k <- getEvent vt
        case k of
            EvKey (KASCII ' ') [] -> let
                    newOpts = ozJumpToCursorTop $ ozCursorMod (optionUpdateRange . optionToggleCollapse) opts
                in _getChoice vt newOpts sx sy
            EvKey KDown [] -> _getChoice vt (ozNextLine opts) sx sy
            EvKey KUp [] -> _getChoice vt (ozPreviousLine opts) sx sy
            EvKey KEsc [] -> shutdown vt >> return Nothing
            EvKey KEnter [] -> shutdown vt >> return (Just $ (_idx, ozCursor opts))
            EvResize nx ny -> _getChoice vt opts nx ny
            _ -> _getChoice vt opts sx sy

That’s it.

Fork/exec in Haskell

Here’s some simple code I put together. I’m mostly posting it so I won’t have any problems finding it in the future.

module Main where

import Control.Monad
import System.Exit
import System.Posix.IO
import System.Posix.Process

executeChild = do
    mapM_ closeFd [stdInput, stdOutput, stdError]
    devnull <- openFd "/dev/null" ReadWrite Nothing defaultFileFlags
    dup devnull; dup devnull
    executeFile "./Child" False [] Nothing

main = do
    child <- forkProcess executeChild
    putStrLn "ForkExec: main - forked, going to wait"
    s <- getProcessStatus True True child
    case s of
        Nothing -> -- this shouldn't happen, ever
            print s >>  exitFailure
        Just s -> do
            print s
            case s of
                Exited _ -> putStrLn "Child exited properly, though possibly unsuccessfully"
                Terminated _ -> putStrLn "Terminated!"
                Stopped _ -> putStrLn "Stopped (only SIGSTOP?)"
            exitSuccess
    exitFailure

It’d be really nice to be able to, after the fork, close all open file descriptors in the child. But how can I find all the open file descriptors in a process? Ideally it should be fairly portable, though portability to major Unix/Linux systems is enough for me.

JSON in Haskell

The other day I wanted to experiment a bit with the JSON interface to AUR. Of course my first stop was at HackageDB to look for a Haskell package for parsing JSON. There are several of them, but only one that seemed suitable for some quick experimentation, especially I wanted to avoid pre-defining data types for the objects in the JSON interface. That failed however and I ended up switching to Python. It did bother me though, and later on, when I had some more time I decided to have another look at json. I was also helped by Don’s recent work on wrapping up the AUR JSON interface in Haskell.

After some searching online I found a reasonably good example1:

{ "ID": "SGML"
, "SortAs": "SGML"
, "GlossDef":
    { "para": "A meta-markup language, used to create markup languages such as DocBook."
    , "GlossSeeAlso": ["GML", "XML"]
    }
}

As a slight aside, the absolutely easiest way to add JSON to your program is to derive Data (and by implication Typeable too). This is the way I might have represented the data above in Haskell2:

data GlossDef = GlossDef
    { glossDefPara :: String
    , glossDefSeeAlso :: [String]
    } deriving (Eq, Show, Typeable, Data) 

data GlossEntry = GlossEntry
    { glossEntryId :: String
    , glossEntrySortAs :: String
    , glossEntryGlossDef :: GlossDef
    } deriving (Eq, Show, Typeable, Data)

After that it’s as easy as using Text.JSON.Generic.toJSON followed by Text.JSON.encode:

> let gd = GlossDef "foo" ["bar", "baz"]
> let ge = GlossEntry "aa" "bb" gd
> putStrLn $ encode $ toJSON ge
{"glossEntryId":"aa","glossEntrySortAs":"bb","glossEntryGlossDef":{"glossDefPara":"foo","glossDefSeeAlso":["bar","baz"]}}

As can be seen the “names” of the members are derived from the field names in the datatypes. Great for when you are designing new JSON objects, not when you are writing code to parse an already existing object. For that there is another, more verbose way to do it.

Start with the same data types, but without deriving Typeable and Data:

data GlossDef = GlossDef
    { glossDefPara :: String
    , glossDefSeeAlso :: [String]
    } deriving (Eq, Show)

data GlossEntry = GlossEntry
    { glossEntryId :: String
    , glossEntrySortAs :: String
    , glossEntryGlossDef :: GlossDef
    } deriving (Eq, Show)

Then you have to implement Text.JSON.JSON. Only two of the four functions must be implemented, showJSON and readJSON. Starting with GlossDef:

instance JSON GlossDef where
    showJSON gd = makeObj
        [ ("para", showJSON $ glossDefPara gd)
        , ("GlossSeeAlso", showJSON $ glossDefSeeAlso gd)
        ]

Basically this part defers to the already supplied implementations for the fields’ types. The same approach works for readJSON too:

    readJSON (JSObject obj) = let
            jsonObjAssoc = fromJSObject obj
        in do
            para <- mLookup "para" jsonObjAssoc >>= readJSON
            seeAlso <- mLookup "GlossSeeAlso" jsonObjAssoc >>= readJSON
            return $ GlossDef
                { glossDefPara = para
                , glossDefSeeAlso = seeAlso
                }

    readJSON _ = fail ""

The function mLookup is a wrapper around lookup that makes it a bit nicer to work with in monads other than Maybe:

mLookup a as = maybe (fail $ "No such element: " ++ a) return (lookup a as)

(The choice to include the key in the string passed to fail limits the usefulness somewhat in the general case, but for this example it doesn’t make any difference.)

Implementing the interface for GlossEntry is analogous:

instance JSON GlossEntry where
    showJSON ge = makeObj
        [ ("ID", showJSON $ glossEntryId ge)
        , ("SortAs", showJSON $ glossEntrySortAs ge)
        , ("GlossDef", showJSON $ glossEntryGlossDef ge)
        ]

    readJSON (JSObject obj) = let
            jsonObjAssoc = fromJSObject obj
        in do
            id <- mLookup "ID" jsonObjAssoc >>= readJSON
            sortAs <- mLookup "SortAs" jsonObjAssoc >>= readJSON
            gd <- mLookup "GlossDef" jsonObjAssoc >>= readJSON
            return $ GlossEntry
                { glossEntryId = id
                , glossEntrySortAs = sortAs
                , glossEntryGlossDef = gd
                }

With the JSON object mentioned at the top in the file test.json the following is possible:

> f <- readFile "test.json"
> let (Ok j) = decode f :: Result GlossEntry
> putStrLn $ encode j
{"ID":"SGML","SortAs":"SGML","GlossDef":{"para":"A meta-markup language, used to create markup languages such as DocBook.","GlossSeeAlso":["GML","XML"]}}

I have a feeling the implemention of readJSON could be simplified by using an applicative style, but I leave that as an excercise for the reader :-)


  1. It’s a modified version of what I found here.

  2. The file should include {-# LANGUAGE DeriveDataTypeable #-} and both Data.Typeable and Data.Data must be imported.

Updating GHC on Arch

Arch is somewhat of a hybrid distribution in the sense that if you have any sort of ‘peculiar’ needs then you are likely to have to build packages from source. As expected developing in Haskell is a “peculiar need” :-)

After every upgrade of GHC I find myself in the situation where the system (pacman) and GHC have different views of what packages are available. What is needed then is somehow finding out the difference, and this is how I found that difference after the recent 6.10.3 -> 6.10.4 upgrade of GHC. Once I know what packages are missing from GHC’s view of the world I can use pacman to first remove and then yaourt to rebuild the packages.

First I noted that the old package.conf file wasn’t deleted during the upgrade, apparently pacman noted the changes that installing packages resulted in and saved the file as package.conf.pacsave. Finding the name of all the ‘missing’ packages was then as simple as loading both /usr/lib/ghc-6.10.3/package.conf.pacsave and /usr/lib/ghc-6.10.4/package.conf, filter out the package names and take the difference:

printMissingPackages = let
        pkgNameStr = PackageName . display . packageName
    in do
        oldPackConf <- readFile "/usr/lib/ghc-6.10.3/package.conf.pacsave"
        curPackConf <- readFile "/usr/lib/ghc-6.10.4/package.conf"
        let oldPacks = (read oldPackConf) :: [InstalledPackageInfo_ String]
        let curPacks = (read curPackConf) :: [InstalledPackageInfo_ String]
        let gonePacks = (map pkgNameStr oldPacks) \\ (map pkgNameStr curPacks)
        putStrLn "Missing packages:"
        mapM_ (putStrLn . display) gonePacks

That isn’t the most useful output however, so I decided to modify it to print out the name of the Arch package that needed re-compilation. The following functions generates the name of the .hi of the first module in the Haskell package, it then uses pacman to look up the owner of the file:

ghcPkg2ArchPkg pkg = let
        hsFileLoc = head $ libraryDirs pkg
        hsFile = map (\ c -> if c == '.' then '/' else c) $ head $ exposedModules pkg
        hsFullFile = hsFileLoc </> hsFile <.> "hi"
    in do
        exists <- doesDirectoryExist hsFullFile
        if exists
            then liftM Just $ archOwnerOfFile hsFullFile
            else return Nothing

archOwnerOfFile fn = let
        pkgFromPacmanOutput = head . tail . reverse . words
    in do
        res <- rawSystemStdout silent "/usr/bin/pacman" ["-Qo", fn]
        return $ pkgFromPacmanOutput res

Now I can find the list of Arch packages that aren’t known to the new version of GHC by mapping ghcPkg2ArchPkg over gonePkgs. In other words that is the list of packages that need to be removed, but that can be different from the list of packages that needs to be rebuilt with yaourt (basically I only want to tell it to build and install the ‘top-level’ packages, i.e. packages that aren’t dependencies of any other packages. It’s of course possible to build that second list from the first one, with the help of pacman.

archGetRequiredBy pkg = let
        extractPkgs pkgDesc = let
                deps = (drop 3 . words . head . filter (isPrefixOf "Required By") . lines) pkgDesc
            in
                if deps == ["None"]
                    then []
                    else deps
    in do
        res <- rawSystemStdout silent "/usr/bin/pacman" ["-Qi", pkg]
        return $ extractPkgs res

Now I can modify printMissingPackages to print some more useful information. This is the full function:

printMissingPackages = let
        pkgNameStr = PackageName . display . packageName
    in do
        oldPackConf <- readFile "/usr/lib/ghc-6.10.3/package.conf.pacsave"
        curPackConf <- readFile "/usr/lib/ghc-6.10.4/package.conf"
        let oldPacks = (read oldPackConf) :: [InstalledPackageInfo_ String]
        let curPacks = (read curPackConf) :: [InstalledPackageInfo_ String]
        let gonePacks = (map pkgNameStr oldPacks) \\ (map pkgNameStr curPacks)
        putStrLn "Missing packages:"
        mapM_ (putStrLn . display) gonePackprints
        let gonePkgs = filter (\ p -> pkgNameStr p `elem` gonePacks) oldPacks
        archPkgs <- liftM catMaybes $ mapM ghcPkg2ArchPkg gonePkgs
        putStrLn "Packages to remove:"
        mapM_ putStrLn archPkgs
        archTopPkgs <- filterM (liftM ([] ==) . archGetRequiredBy) archPkgs
        putStrLn "\nPackages to install:"
        mapM_ putStrLn archTopPkgs

On my system it produced the following output:

Missing packages:
terminfo
vty
wl-pprint

Packages to remove:
haskell-terminfo
haskell-vty
haskell-wl-pprint

Packages to install:
haskell-vty
haskell-wl-pprint

And after a quick pacman -Rn ... and a not so quick yaourt -S ... I reran it and the output was

Packages to remove:

Packages to install:

Exactly as expected.

Making a choice from a list in Haskell, Vty (part 4)

After part 3 in this series, which might have been the longest post I’ve ever put on this blog, follows a much short post. In fact it’s so short it’s rather silly.

In this post I’ll modify the Option type to render into multiple lines; two in fact (it’s easy to see that it would work with more lines).

So, to start off, I add a second string to Option:

data Option = Option { optionRange::(Int, Int), optionS1::String, optionS2::String }
    deriving (Show)

Next the definition for Pretty is changed to render an Option on two lines:

instance Pretty Option where
    pretty (Option _ s1 s2) = string s1 <> line <> indent 2 (string s2)

Due to the change to Option I also need to modify optionIsInRange:

optionIsInRange (Option (b, e) _ _) i = b <= i && i <= e

Finally the options need to be modified as well:

options = ozFromListWithMod (optionSetRange 0) [Option (0, 0) ((show i) ++ " Foo") "Bar" | i <- [0..2]]

That’s all there’s to it. Short and sweet.

Making a choice from a list in Haskell, Vty (part 3)

This is the third part, and it’s likely to be the longest one in the series. The three previous parts have been rather short, but now it’s time for a longer post because in this one I completely change the representation of the options that are rendered.

Instead of using a list and an integer I’ll use what is basically a zipper (with some extra fields for book keeping). At the same time I also add a new field to the Option type to keep track of how many lines the option renders to. At the moment it will always be one line, but the next part will actually make use of it. (Yes, that part probably should have been kept in a separate part, but this happens to be how I wrote the code.)

First some changes to the Option type and its implementation of Pretty:

data Option = Option { optionRange::(Int, Int), optionS1::String }
    deriving (Show)

instance Pretty Option where
    pretty (Option _ s) = string s

Then two functions related to the “range” of an Option. The first to update based on a new start line (new beginning), the second checks whether a line falls within the range of an Option:

optionSetRange nb o = let
        l = length $ lines $ show $ pretty o
    in o { optionRange = (nb, nb + l - 1) }

optionIsInRange (Option (b, e) _) i = b <= i && i <= e

Now it’s time to introduce the zipper that replaces the list of options. The basic idea is that there are two parts to a list, the left side (ozLS) and the right side (ozRS), and a current item. In this list zipper the current item is the first item on the right side:

data OptionZipper = OptionZipper { ozIdx::Int, ozLS::[Option], ozRS::[Option]
}
    deriving (Show)

Making the zipper an instance of Pretty is as simple as this:

instance Pretty OptionZipper where
    pretty = vcat . map pretty . ozToList

It’s useful to be able to both convert to and from lists (as seen just above in the Pretty instance):

ozFromList l = OptionZipper 0 [] l
ozFromListWithMod f = ozCursorMod f . ozFromList

ozToList (OptionZipper _ l r) = reverse l ++ r

The function for getting the current item is obvious. At the same time I’ll define a function that applies a function to the item at the cursor.

ozCursor (OptionZipper _ _ (r:_)) = Just r
ozCursor _ = Nothing

ozCursorMod f o@(OptionZipper _ _ (r:rs)) = o { ozRS = (f r:rs) }
ozCursorMod _ o = o

Usually a list zipper has functions to move the cursor, i.e. move items between the left and right sides. In this zipper there is some extra bookkeeping that has to be done to make sure that the index is correct and that the current item has a correct range:

ozLeft (OptionZipper _ (l:ls) rs) = let
        (newIdx, _) = optionRange l
    in OptionZipper newIdx ls (l:rs)
ozLeft o = o

ozRight (OptionZipper _ ls (r:rs)) = let
        (_, pe) = optionRange r
    in ozCursorMod (optionSetRange $ pe + 1) $ OptionZipper (pe + 1) (r:ls) rs
ozRight o = o

That’s all good and well, but what I really need is to be able to navigate based on lines. Expressing that using ozLeft and ozRight is fairly straight forward. Let’s start with shifting to the next line, ozNextLine, it has two cases, one general case and one when the cursor points to the last item:

ozNextLine o@(OptionZipper i _ [c]) =
    if optionIsInRange c (i + 1)
        then o { ozIdx = i + 1 }
        else o
ozNextLine o = let
        c = fromJust $ ozCursor o
        i = ozIdx o
    in if optionIsInRange c (i + 1)
        then o { ozIdx = i + 1 }
        else ozRight o

Anyone who pays attention will realise that this definition of ozNextLine isn’t complete. The zipper is capable of pointing to the empty spot after the last item (when ozRS is [], as would be the case for an empty list turned into a zipper). For this occasion that is all right, but this would need some attention when using this in a proper program.

The definition of ozPreviousLine also has two cases:

ozPreviousLine o@(OptionZipper 0 _ _) = o
ozPreviousLine o = let
        c = fromJust $ ozCursor o
        i = ozIdx o
    in if optionIsInRange c (i - 1)
        then o { ozIdx = i - 1 }
        else ozLeft o

Yes, this function also has some assumptions built into it, just like for ozNextLine it’s enough to just realise that for this exercise.

That’s it for the zipper, now it’s possible to create the options:

options = ozFromListWithMod (optionSetRange 0) [Option (0, 0) ((show i) ++ " Foo") | i <- [0..99]]

The introduction of the zipper requires large changes to both getChoice and _getChoice. The changes are however very straight forward and in my opinion they make both functions easier to read and understand. I’ll simply copy in the definitions without any comments in the hope that thanks to using a zipper the code is self-explanatory :-) It might be worth pointing out though that render is still passed a list of strings to render, so it requires no changes at this point.

getChoice vt opts = do
    (sx, sy) <- getSize vt
    _getChoice vt opts sx sy


_getChoice vt opts sx sy =
    let
        _converted_opts = lines $ show $ pretty opts
        _idx = ozIdx opts
        _calcTop winHeight listLength idx = max 0 ((min listLength ((max 0 (idx - winHeight `div` 2)) + winHeight)) - winHeight)
        _top = _calcTop sy (length _converted_opts) _idx
        _visible_opts = take sy (drop _top _converted_opts)
    in do
        update vt (render _visible_opts (_idx - _top) sx)
        k <- getEvent vt
        case k of
            EvKey KDown [] -> _getChoice vt (ozNextLine opts) sx sy
            EvKey KUp [] -> _getChoice vt (ozPreviousLine opts) sx sy
            EvKey KEsc [] -> shutdown vt >> return Nothing
            EvKey KEnter [] -> shutdown vt >> return (Just $ (_idx, ozCursor opts))
            EvResize nx ny -> _getChoice vt opts nx ny
            _ -> _getChoice vt opts sx sy

Ping server in Haskell (not that kind of ping, and rather silly)

Yesterday I needed to do some tests involving tunneling of network connections. Rather than firing up the full client-server setup that I want to tunnel I thought I’d use someting simple to test with first. Instead of looking online for a simple server to use, or hack one up using netcat, or even hack one in Python I decided to hack one in Haskell:

module Main where

import Control.Monad
import System.Environment(getArgs)
import Network
import System.IO

main = withSocketsDo $ do
    [port_str] <- getArgs
    let port = fromIntegral (read port_str :: Int)
    serv_sock <- listenOn (PortNumber port)
    forever $ do
        (handle, host, port) <- accept serv_sock
        cmd <- hGetLine handle
        when (cmd == "Ping") $ hPutStr handle "Pong"
        hClose handle

XML prettifier in Haskell

I don’t know how many times I’ve gone looking for one of these but my search-fu is weak and I always give up, instead resorting to manual editing in Vim (no I hardly ever need the entire file to be pretty, only one or two tags that I’m interested in). Anyway, here’s a quick hack in Haskell, relying on xml for the heavy lifting:

#! /usr/bin/env runhaskell

module Main where

import Control.Monad
import System.Environment
import Text.XML.Light.Input
import Text.XML.Light.Output

main = do
    fn <- liftM (!! 0) $ getArgs
    xml_contents <- readFile fn
    let (Just doc) = parseXMLDoc xml_contents
    writeFile ("pretty-" ++ fn) (ppTopElement doc)

Making a choice from a list in Haskell, Vty (part 2)

Following on the previous part is another baby step. This just changes the options from a list of strings to a list of objects (the only requirement that they implement Pretty):

data Option = Option { optionValue::String }
    deriving (Show)

instance Pretty Option where
    pretty (Option s) = string s

After this it’s an easy step to replace the list of strings with a list of Option:

options = [Option ((show i) ++ " Foo") | i <- [0..99]]

That’s it! Yes, yet another ridiculously short post, but I promise the next one will be considerably longer.

Making a choice from a list in Haskell, Vty (part 1)

After posting the zeroth part of this series I realised I hadn’t said anything about the final goal of this exercise. The first post contained code for choosing one one-line string (String) out of a list of one-line strings ([String]). What I really want is the ability to choose one item out of a list of items, where each may render to be multiple lines. It would also be really cool if an item could be collapsed and expanded in the rendering. This is the first step in my journey towards these loosely specified requriements.

Rendering items into strings sounds like pretty-printing to me, so I played around a little with a few pretty-printing libraries. Finally I settled on the Wadler/Leijen Pretty Printer (Text.PrettyPrint.Leijen). I didn’t really have any strong reason for choosing it, beyond that it comes with its own type class whereas the pretty-printing library that ships with GHC (Text.PrettyPrint.HughesPJ) doesn’t (though there is a package on HackageDB with a class for it).

I did the smallest change I could think of to add pretty-printing. First the module needs to be imported of course:

import Text.PrettyPrint.Leijen

Then I added a function to turn a list of items into a document (Doc) where each item is pretty-printed on its own line:

myPrettyList :: Pretty a => [a] -> Doc
myPrettyList = vcat . map pretty

I then decided that _getChoice should be left unchanged and instead modified getChoice to turn the list of items into a list of strings:

getChoice vt opts = let
        _converted_opts = myPrettyList opts
    in do
        (sx, sy) <- getSize vt
        _getChoice vt (lines $ show _converted_opts) 0 sx sy

That’s it. The first step, albeit a small one.

Setting up Epiphany to play with Seed extensions

Since the Python extensions to Epiphany have been removed from the repository I thought it was high time to start playing with what seems to be the replacement to Python extensions: Seed extensions. The first step is of course to get a version of Epiphany that supports Seed extensions. After a few emails on the mailing list I’ve come to a recipe (I’ve done this twice now on different machines to I’m fairly confident it works). I should probably preface this by saying that I run an up-to-date Arch system, if you run something else you might need to do a bit more, or less if you’re lucky :-)

  1. Make sure the following packages are installed ((You might need a few more packages depending on what desktop environment you use. Those were the packages I needed to add to my machine where I run Gnome and do regular non-Gnome development)): libsoup, libwebkit, gnome-common, intltool, libffi
  2. Clone the following Git repositories from git.gnome.org: epiphany-extensions, epiphany, seed, gobject-inspection, gnome-js-common, gir-repository
  3. Decide on a $prefix, i.e. where you want it all installed (I use ~/opt/gnome-trunk). Then export the following environment variables:

    export PATH=$prefix/bin:$PATH
    export PKG_CONFIG_PATH=$prefix/lib/pkgconfig
  4. Then configure, build and install everything. Use the autogen.sh script to create the configuration, and make sure to pass it prefix=$prefix each time. Some of the modules need extra arguments as well. This is the order and extra arguments I used:

    1. gnome-js-common (--disable-seed --disable-gjs)
    2. seed
    3. gnome-js-common (--disable-gjs)
    4. gobject-introspection
    5. gir-repository
    6. epiphany (--enable-introspection --enable-seed)
    7. epiphany-extensions

After that you can put your extensions in ~/.gnome2/epiphany/extensions/. I have two instances of Epiphany installed, a stable from my distro, and the dev version I built myself. I haven’t managed to run them both side by side, but beyond that there seems to be no bad interaction between them.

Dataenc finally making it into Debian

Erik de Castro Lopo is having a lot more success with getting dataenc into Debian than I ever had.

Making a choice from a list in Haskell, Vty (part 0)

I haven’t had much free time lately, which means I haven’t written much non-work code. The only exception is some experiments with a small piece of Haskell code using the Vty module. Many moons ago I wrote a small piece of code that let’s the user choose options from a list in a terminal. Somewhat similar to what you get using dialog --menu ..., but of course a lot more limited and less good looking.

Anyway, over the last few weeks I’ve slowly expanded it in a direction that would be useful if I ever get around to work on yet another of those projects that so far only exist in my head :-)

I’ve kept the transformations in a stack of patches using quilt, and I thought I’d write a little about them. Not because I think they are extremely useful or even good in any way, but more because I really need to get back to writing some blog posts ;-)

This is the zeroth post containing the version I put together when I first came across Vty. It is an executable program so it starts with the familiar

module Main where

Next comes a few modules that have to be imported:

import Data.Maybe
import Graphics.Vty
import qualified Data.ByteString.Char8 as B

The options are, in this version, represented as a list of strings. For now it’s enough to have a nonsensical list of unique strings.

options = [ (show i) ++ " Foo" | i <- [0..59]]

The main function is as small as possible, two rows, the first creating an instance of Vty and the second getting the choice and feeding it into print.

main = do
    vt <- mkVty
    getChoice vt options >>= print

Of course one would think that geChoice would be the meat of the code, but it is also short. After getting the size of the terminal it calls _getChoice, which is the meat of the code. The reason for this split is the handling of resize events.

getChoice vt opts = do
    (sx, sy) <- getSize vt
    _getChoice vt opts 0 sx sy

The main part of _getChoice is straight forward, first update the terminal, then wait for an event, and finally handle the event. Unless the user wants to exit (pressing enter choses an item, pressing escape exits without a choice) a recursive call is made to _getChoice with slightly modified arguments.

Probably the most complicated part is the calculation of the top of the list of visible items. The idea is that if the list has more items than there are lines in the terminal then the cursor moves down until the middle line, once there any down movement will result in the list scrolling up. This continues until the end of the list is visible, at that point the cursor moves down towards the last line in the terminal. I doubt that explanation makes sense, hopefully it’ll be clear to anyone who bothers running the code.

_getChoice vt opts idx sx sy =
    let
        _calcTop winHeight listLength idx = max 0 ((min listLength ((max 0 (idx - winHeight `div` 2)) + winHeight)) - winHeight)
        _top = _calcTop sy (length opts) idx
        _visible_opts = take sy (drop _top opts)
    in do
        update vt (render _visible_opts (idx - _top) sx)
        k <- getEvent vt
        case k of
            EvKey KDown [] -> _getChoice vt opts (min (length opts - 1) (idx + 1)) sx sy
            EvKey KUp [] -> _getChoice vt opts (max 0 (idx - 1)) sx sy
            EvKey KEsc [] -> shutdown vt >> return Nothing
            EvKey KEnter [] -> shutdown vt >> return (Just $ (idx, opts !! idx))
            EvResize nx ny -> _getChoice vt opts idx nx ny
            _ -> _getChoice vt opts idx sx sy

The final piece is the code that renders the list. The items of the list are zipped together with a list of integers. Each such tuple is then rendered into a line,1 where the line of the cursor is highlighted. The resulting list of rendered lines is then folded into a full image.

render opts idx sx = pic {
    pImage = foldr1 (<->) $ map _render1 $ zip [0..] opts
    }
    where
        _render1 (i, o) = renderHFill attr ' ' 5 <|> renderBS (_attr i) (B.pack o) <|> renderHFill attr ' ' (sx - 5 - length o)
        _attr i = if i /= idx
            then attr
            else setRV attr

That’s it, that’s the starting point. It’s also likely to be the longest post in this planned series. :-)


  1. The reason for the line rendering looking so complicated is that Vty requires each line to be of equal length.

Using msmtp with darcs

To get darcs to use msmtp I put the following in my ~/.darcs/defaults:

send sendmail-command /usr/bin/msmtp %t %<

Using msmtp with Mercurial

After configuring msmtp I put the following in my ~/.hgrc:

[extensions]
hgext.patchbomb =

[email]
from = my.email@my.isp.com
method = /usr/bin/msmtp

Using msmtp for a desktop system

I read all my email via IMAP, and send all email via external SMTP servers, but at times it’s really useful to be able to run something like /usr/sbin/sendmail to send an email. Many tools depend on it, for me I want reports on cron jobs and I like using darcs send straight to an email address. For this a full-fledged SMTP server like exim/postfix/sendmail clearly is overkill. First I had a quick look at ssmtp but it only allows for a system-wide configuration file. Ideally I’d like “system-mails” (like cron) to be sent via my ISP while more personal emails should be sent via my Google account. After a suggestion on the Arch mailing list I checked out msmtp and it was exactly what I was looking for.

I created /etc/msmtprc with the following contents:

defaults
logfile ~/.msmtp.log
    
# MyISP
account myisp
host smtp.myisp.com
from my.email@myisp.com

account default : myisp

I then found out that [dcron][3] is severely limited; it doesn’t honour MAILTO in crontabs and it’s not possibly to use anything but /usr/sbin/sendmail to send emails. The lack of support for MAILTO is the deal breaker in this case.

Rather than create a symbolic link I modified fcron’s configuration file, /etc/fcron/fcron.conf, to use /usr/bin/msmtp:

sendmail = /usr/bin/msmtp

I also made sure to stick MAILTO=my.email@myisp.com in the system crontab, using fcrontab -u systab -e.

Then I created the file ~/.msmtprc:

defaults
logfile ~/.msmtp.log

# MyISP
account mysisp
host smtp.myisp.com
from my.email@myisp.com

# google
account google
host smtp.gmail.com
from user@googlemail.com
tls on
tls_certcheck off
auth on
user user@googlemail.com
password MySecretPassword

account default : google

Finally I put MAILTO=user@googlemail.com in my user’s crontab.

Reply to Randal on dynamic type systems

At first this I wrote this as a comment on my original post, but it grew a bit too long so I decided to post it like this instead.

@Randal, “Static typing is still effectively a syntax check, not a semantic check though.” Now, I think we might have a different understanding of the word “syntax”. I can guess at yours, but my understanding mirrors what’s on Foldoc, that is syntax determines what is a valid string in a program, in this case it determines where a type declaration/definition/statement goes, but not what type declaration/definition/statement is semantically valid in a particular position in a string. That “what” is very much part of the semantics in my mind, it tells the world (me, other users of my library, the compiler) a part of the intention of a function, it tells how something may be used. I gather you are mostly familiar with type systems like Java so what is clearer to you, this declaration:

int add(int i, int j);

or this declaration:

int add(i, j);

In the former the intent of the function is obvious, it adds integers. In the latter it isn’t, does it handle complex numbers? Does it handle matrices? In dynamic languages you’d have to document it somewhere, but there is no consistency check between code and documentation (maybe there are external tools for that though, but why not use the compiler to check?). You would also have to test that no caller of this function can be tricked into calling add with a non-integer.

Also, though your example of “semantic type checking” in the talk is interesting (I simply don’t know if there are any type systems that could deal with it) you completely skip all the cases where the type system can do the job and where it does save on testing and typing. In these cases you would have a proof relating to your use of types in the program, unit testing can never give that proof it can only show you haven’t disproved it yet :-)

If you remain unconvinced then I strongly urge you to read Tom Moertel’s post on providing safe strings, and the assurance you can achieve by using a strong type system in relation to information flow through your program. The same technique has been used in Darcs (watch the video, the bit relevant for this dicussion starts about 43 minutes in).

I urge you to read Kristian’s blog post (linked to in his comment).

If you want a podcast to listen to there’s episode 62 of Software Engineering Radio where Martin Odersky talks about Scala (a statically typed language built on the JVM).

I’d also like to clear one thing up, I don’t dislike dynamic languages and I don’t think that static languages are inherently better. What I do dislike about your talk is that it’s uninformed, presents very narrow arguments and then draws conclusions that are very general and simply don’t follow from the given arguments.

Finally, I really enjoy FLOSS Weekly. You and Leo are doing a fantastic job on it, but since it is where I first heard of your talk (I suspect Industry Misinterpretations might not get a lot of attention outside the Smalltalk community) I really think you should talk to someone from the FLOSS FP community. Get someone who can explain, much better than me, what a modern, advanced, strong, statically typed language will get you. I only have experience with OCaml and Haskell, and there are others, like Scala, all are FLOSS and hopefully it wouldn’t be too much work to find someone knowledgable who’d be willing to set you straight on the dynamic vs static typing issue. I’d be happy to do what I can to help you find someone, just let me know if you are insterested :-)

Struggling with "the Arch way"

So, I’m struggling somewhat with the move to Arch. Not that Arch is problematic in any way, it seems to be me and my “debianised thinking”.

I noticed that the Vim plugins that were packaged for Arch all install in system directories and hence are imposed on all users. Not a good thing. So, inspired by Debian’s vim-addons-manager, I hacked up a solution relying on pacman. Then I packaged two Vim plugins for Arch. The idea was to do rather than talk, so it wasn’t until after this that I asked for feedback on aur-general. Just to have the idea shot down :-)

Anyway, after being presented with an existing solution more inline with “the Arch way” I decided to try it out. I’ve now subscribed to using GetLatestVimScripts. Brilliant.

Now I need to convince the author of haskellmode to put it on vim.org and I really ought to get my packages off AUR.

Randal Schwartz on why dynamic beat static---a rather rubbish talk

Well, not the most diplomatic title maybe, but listening to Randal Schwartz’ talk “Dynamic Returns” first made me laugh and then made my fingers itch to write him an email addressing the fallacies he is peddling. Anyway, I wrote this large, rather ranting, blog post but then decided to not publish it.1

Randal, my suggestion to you is that you get someone like Don Stewart on FLOSS Weekly (I believe he is based in Portland as well) to discuss how Haskell’s advanced type system can be harnessed to completely remove whole classes of bugs.


  1. Randal, if you read this I’ll be happy to send the draft version of the post to you, just drop me an email :-)

Vim haskellmode packaged for Arch

A few days ago I packaged the vim haskellmode for Arch Linux and uploaded it to AUR.

Configuring audio for my Arch desktop

I probably started in the wrong end, by installing a few GStreamer packages:

  1. gstreamer0.10-bad-plugins
  2. gstreamer0.10-ugly-plugins
  3. gstreamer0.10-ffmpeg

It was first after that that I actually configured sound :-)

As always the Arch Wiki contains all the information needed, and more. There’s a page for configuring Alsa. All I did was install alsa-utils, make sure my user was a member of audio and then configure using alsamixer until aplay /usr/share/sounds/alsa/Front_Center.wav played properly. Then I put alsa in the DEAMON list in /etc/rc.conf.

Next I installed and configured PulseAudio. Again there’s an excellent page with instructions for configuring PulseAudio on the wiki. For me that meant first installing pulseaudio and alsa-plugins. Then I used vigr to add my user to pulse-access and pulse-rt. I also created /etc/asound.conf, following the instruction on the wiki it contains the following:

pcm.pulse {
    type pulse
}
ctl.pulse {
    type pulse
}

pcm.!default {
    type pulse
}
ctl.!default {
    type pulse
}

I then double checked that PulseAudio uses HAL to detect the needed modules. Lastly I made it possible for gstreamer to use PulseAudio by installing gstreamer0.10-pulse.

[edited 22/5/2009] Added info on group membership.

Odds and ends for my Arch desktop

After the previous setup steps I found the next task was to add a few odds and ends really were needed before proceeding to other major things, like audio, email, and such. Several of the things I needed are available from AUR so to make things easier I started by adding archlinuxfr to my /etc/pacman.conf:

--- pacman.conf_orig    2009-05-09 18:47:19.825013872 +0100
+++ pacman.conf 2009-04-26 08:22:05.471685249 +0100
@@ -70,6 +70,9 @@
 # Add your preferred servers here, they will be used first
 Include = /etc/pacman.d/mirrorlist
 
+[archlinuxfr]
+Server = http://repo.archlinux.fr/x86_64
+
 # An example of a custom package repository.  See the pacman manpage for
 # tips on creating your own repositories.
 #[custom]

After that I updated the package listings (pacman -Sy) and installed yaourt (pacman -S yaourt).

After this I could easily install the other packages I need:

  1. nautilus-dropbox
  2. encfs
  3. pam_mount
  4. keysafe
  5. hpodder
  6. twitux

Only pam_mount needed some extra configuration. First I added two lines each to /etc/pam.d/gdm and /etc/pam.d/login:

auth		optional	pam_mount.so
session		optional	pam_mount.so

Then I modified /etc/security/pam_mount.conf.xml to allow users to have their own configs, that’s done by uncommenting the line

<luserconf name=".pam_mount.conf.xml" />

Arch and Haskell, on little snag

Dear lazyweb (Arch users and especially DonS :-) ),

I just used yaourt to update my system and it noticed that haskell-time was available in a new version. After answering a few questions I was greeted with this message:

ghc-pkg: unregistering time-1.1.2.3 would break the following packages: hslogger-1.0.7 MissingH-1.1.0
    ConfigFile-1.0.4 convertible-1.0.1 HDBC-2.1.0 HDBC-sqlite3-2.1.0.0 HSH-1.2.6 (use --force to override)
error: scriptlet failed to execute correctly

Is there already some automated way to deal with this? (How to deal with it manually is fairly obvious to me…)

Setting up my network services on Arch

After having set up X (not sure I really can call just installing it “setting up” though :-) ) and Gnome, which of course included installing and configuring Xmonad, I went on to configure some network-related stuff.

At home I use mDNS to make it easy to connect to the computers we have at home. I added avahi to the list of services to start in /etc/rc.conf and installed nss-mdns. Then I following the instructions on how to configure mDNS on Arch, which resulted in the following changes:

--- /etc/nsswitch.conf_old	2009-05-02 21:17:06.460591935 +0100
+++ /etc/nsswitch.conf	2009-04-25 18:08:39.719853968 +0100
@@ -6,7 +6,7 @@
 
 publickey: files
 
-hosts: files dns
+hosts: files mdns_minimal [NOTFOUND=return] dns mdns
 networks: files
 
 protocols: db files

I also need SSH access to my desktop machine so I installed OpenSSH (openssh). No changes were needed to the configuration but following the instructions I made the following change to /etc/hosts.allow to allow connections at all:

--- /etc/hosts.allow_orig	2009-05-02 21:31:16.243401337 +0100
+++ /etc/hosts.allow	2009-04-28 17:55:57.553603986 +0100
@@ -2,5 +2,6 @@
 # /etc/hosts.allow
 #
 
+sshd: ALL
 
 # End of file

Then of course I added sshd to the list of services in /etc/rc.conf.

The last step was setting up NTP, based on suggestions on the Arch Wiki I installed OpenNTP (openntp) rather than the standard NTP package. No changes to the configuration were needed, just adding it to DAEMONS in /etc/rc.conf.

That was it, all the crucial network-related services set up and running at boot.

Arch, Gnome, and shutdown/reboot

I’ll soon continue documenting my experience with installing and configuring my desktop, but first a note on fixing an irritating issue in Gnome 2.26. Soon after installing Gnome I noticed that I couldn’t reboot or shutdown from inside my Gnome session, I had to log out and use GDM’s action menu instead. Rather irritating, but something I could live with until the weekend. It took a while to find the solution so to safe myself time in the future I’ll document it here. It might even help someone else, who knows?

Whenever I tried to reboot or shutdown I saw the following message appear in ~/.xsession-errors:

gnome-session[4134]: WARNING: Unable to list sessions: Rejected send message, 2 matched rules;
type="method_call", sender=":1.50" (uid=1000 pid=4134 comm="gnome-session ")
interface="org.freedesktop.ConsoleKit.Manager" member="GetSessions" error name="(unset)"
requested_reply=0 destination="org.freedesktop.ConsoleKit" (uid=0 pid=2838
comm="/usr/sbin/console-kit-daemon "))

After giving up on finding a solution with Google I turned to #archlinux and was pointed to a forum post. That discussion doesn’t mention the warning message at all, so it wasn’t obvious to me at first that it was the same issue. However, about half-way down the page I found a post by justusjonas that looked relevant. I tried that change, or rather something very similar, and a manual reboot later I could confirm that the change indeed fixed the issue. This is the change I applied:

--- /etc/dbus-1/system.d/ConsoleKit.conf_org	2009-05-02 15:59:40.235077390 +0100
+++ /etc/dbus-1/system.d/ConsoleKit.conf	2009-05-02 16:00:10.108231150 +0100
@@ -32,6 +32,8 @@
     <allow send_interface="org.freedesktop.ConsoleKit.Manager"
            send_member="GetSeats"/>
     <allow send_interface="org.freedesktop.ConsoleKit.Manager"
+           send_member="GetSessions"/>
+    <allow send_interface="org.freedesktop.ConsoleKit.Manager"
            send_member="GetSessionForCookie"/>
     <allow send_interface="org.freedesktop.ConsoleKit.Manager"
            send_member="GetSessionForUnixProcess"/>

More on the move to Arch Linux

After installing a basic Arch system and adding X.org it still isn’t very usable. My desktop of choice is Gnome, so that’s what I installed next, with a few extra packages to make it prettier and more useful.

I found out the hard way a few weeks ago that if you install gnome without gnome-extras then it’s a bad idea to install gdm and configure your system to boot to X. That way you’ll end up in a situation where your desktop doesn’t have an X terminal. I made sure to avoid that situation this time around. After installing gnome,gnome-extras and gdm I noticed that both gdm and policykit created users with uid 120 and 102, respectively. Just like exptected from services they have low uids. What surprised me was that they ended up having gid 1001 and 1002. That looks like bugs to me :-) I decided to fix that up manually myself by editing the group file (using vigr of course) and then searching for all file with the offending gids (using find / -gid 1001 for the first, with obvious changes to find the second).

After configuring gdm I realised I also needed gdm-themes :-) And I also added a few pretty fonts, and removed a few of the unpretty ones:

# pacman -S ttf-ms-fonts ttf-cheapskate artwiz-fonts ttf-bitstream-vera
# pacman -Rns xorg-fonts-75dpi xorg-fonts-100dpi

Then of course I needed to install xmonad:

# pacman -S xmonad-contrib

Co-ercing Gnome to actually use it was interesting though. I followed the instructions for running Xmonad in Jaunty, but that wasn’t enough. I also needed to place a file named xmonad.desktop in /usr/share/applications/ with the following contents (greatly inspired by the metacity.desktop found in the same location):

[Desktop Entry]
Type=Application
Encoding=UTF-8
Name=Xmonad
Exec=xmonad
NoDisplay=true
X-GNOME-WMName=Xmonad
X-GNOME-Autostart-Phase=WindowManager
X-GNOME-Provides=windowmanager

After that it was time to install stuff that is useful :-) First off was Thunderbird, which I immidiately had to equip with a few plug-ins of course. When setting up my mail account again I noticed that my Gnome session wasn’t quite the way I wanted—there was no gpg-agent running. After a quick check with the people on arch-general, without receiving any definitive resolution, I wrote my own little hack to address it. Of course I posted it in the thread I had started, but I might as well include it here too:

--- /etc/gdm/Xsession_orig	2009-04-27 17:13:50.346834448 +0100
+++ /etc/gdm/Xsession	2009-04-27 17:16:25.310151728 +0100
@@ -213,6 +213,14 @@
   fi
 fi
 
+# add seahorse if found
+seahorse="`gdmwhich seahorse-agent`"
+if [ -n "${seahorse}" ] && [ -x "${seahorse}" ]; then
+    command="seahorse-agent --execute $command"
+elif [ -z "${seahorse}" ]; then
+    echo "$0: seahorse not found!"
+fi
+
 # add ssh-agent if found
 sshagent="`gdmwhich ssh-agent`"
 if [ -n "$sshagent" ] && [ -x "$sshagent" ] && [ -z "$SSH_AUTH_SOCK" ]; then

There’s one more tool I can’t live without, parcellite, which is also available pre-built on Arch.

Moving to Arch Linux

<rant>So my irritation with Debian on the desktop reached a critical moment over the last few days. Sid is a good desktop system for most things, but if you are interested in Haskell then you really should prepare yourself for being a little behind on things. If it isn’t GHC that is out of date then it’s some library that hasn’t been rebuilt for the latest version of GHC. If you’re lucky then new packages are uploaded to the NEW queue and it’s only a matter of waiting for the autobuilders to get to them. Of course that doesn’t mean you can get those packages, so not that much luck in the end anyway.</rant>

I decided to move to the only other distro I’ve used seriously since Linux’ move to 2.x kernels, Arch. I remember it as a slightly less polished than Debian, but more up-to-date than even Sid. Also, there is a vibrant group of Haskell hackers working on providing almost all of Hackage in Arch’s native packaging system.

Installing Arch turned out to be a bit of a blast from the past, even when compared to Debian. Luckily the Arch wiki is there to help with the things that are different compared to Debian. I would suggest having a second computer with a browser pointed to Arch’s home pages nearby during the entire install.

For my installation of Debian I had opted to use LVM and I wanted to keep my /home around so I had to follow the steps required to use LVM in Arch. Installation went smooth and the reboot was successful. Now starts the hard work—configuring the system and updating my home folder.

The very first thing I noticed was that vim wasn’t installed by default, only vi was available. Not a big problem, except that my muscle memory resulted in a lot of vim: command not found messages. Clearly I had to install vim just to keep my sanity.

Next thing to do was to install X.org and the video driver I need:

# pacman -S xorg xf86-video-intel

After adding hal to /etc/rc.conf I started it manually and I ran startx as non-root. Wow, it all worked perfect, without creating any configuration file for X first. I double checked and it used the driver I wanted, i.e. it didn’t pick a safe default, like vesa, as its first option. Brilliant.

After following the instructions for changing the keyboard layout in X.org using hal I even had my ‘@’ on the right key. (The most confusing thing was that in X.org the UK keyboard layout is called gb, while in the console setup it’s called uk.)

I’ll post a bit more about the setup and configuration of my system as I progress towards my ideal desktop.

dataenc 0.12.1.0 released

Yesterday I released a new version of dataenc. It’s available on Hackage of course. Summary of (visible) changes:

  • implementation of a bunch of new encodings:
  • xxencode
  • quoted-printable
  • python escaping
  • url encoding
  • squashing of a bug in the yEncoding implementation that only manifested on 32-bit systems
  • an attempt to conform to the guidelines for the Haskell Platform regarding versioning and dependencies

Missing data encodings?

Dear lazyweb, are there any useful data encodings missing from the following list?

  • Uuencode
  • Base64
  • Base64Url
  • Base32
  • Base32Hex
  • Base16
  • Base85
  • yEncoding

My thoughts on debianisation of haskell packages

I have written a few posts on debianising haskell packages in the past. Back then I stopped looking at it because of the age of the ghc-related packages in Debian Sid. Things have since improved and I’ve finally found some time to revisit the problem. This isn’t however a technical post, but rather a post on the strategy I think would be best for making Debian the best Haskell development environment in the universe.

In my opinion there is no point in trying to get all packages on Hackage into Debian proper. The few Debian developers (DD) who are interested in Haskell and maintain the existing packages are already terribly short on time and adding further burden would improve the chance of timely availability of GHC in the future. Sure, it would be possible to recruit more DDs with an interest in Haskell (I’ve considered in the past to try to become a DD, but the timing was off on both sides and I decided to postpone it) but it would take time. Maintaining 1000 packages is a huge task, and would require quite a large team I think. I suspect the recruiting would be such that if one were to plot the number of packages the team could maintain as it grew over time, compared to the number of packages on Hackage, then the lines would diverge rather than converge ;-)

Instead I think all packages on Hackage should first be debianised and put in a source-only repository. As some packages grow in importance and get added into Debian proper (e.g. a killer-app written in Haskell) then it and its dependencies are removed.

I believe that with this solution it’d be possible to quickly make a large part of Hackage available on Debian while still giving the DDs time to grow the number of “Haskell-friendlies” within Debian.

This of course rests on a few assumptions:

  1. It would require a smaller team to maintain source packages than binary packages. At least it would require less time from DDs.
  2. There is some space to store the source packages, ideally non-Debian space since that would mean even less involvement required by DDs.
  3. There’s a tool available in Debian that makes it easy for end-users to download, compile, and install source packages.
  4. There’s a tool that automates the creation of a Debian source package from a Cabal package.

In my optimism I’ve decided to not worry about the two first points and instead focus on the last two. Besides the optimism they also feel like the points I can actually address on my own, I’ll need help with the first two.

The tool in Debian for handling source packages is apt-src. In my opinion it will suffice, but if the developer would address one issue then it would be perfect.

The tool for automating the creation of Debian source packages could be cabal-debian. It is a very good tool, and it seems to be suited for maintaining Debian packages. I only have two objections to it. The first objection I have is that it is almost too well engineered; it does more than is necessary, e.g. there is no need to maintain proper changelogs in the source repo. The second is that it requires all dependencies of a package to be installed when it’s debianised. It is almost certainly the tool for the team charged with maintaining a large set of Cabal-based packages in Debian proper, but for the source-repo it might be too much. Because of that I’ve started hacking on a simplistic tool that I feel would be better suited for the source-repo.

VirtualBox with Linux 2.6.29

After a recent upgrade which brought in kernel 2.6.29 VirtualBox stopped working for me; the kernel module failed to find the symbol g_SUPGlobalInfoPage. How irritating!

Here’s a solution I found:

  1. Modify /usr/share/virtualbox/src/vboxdrv/Makefile, uncomment the line

    # VBOX_USE_INSERT_PAGE = 1
  2. Recompile the kernel modules:

    # /etc/init.d/vboxdrv setup

Now I only have to find a solution for the second problem that upgrade brought in, deskbar-applet failing to load in the Gnome panel.

Even more Cabal fun, visualising in 3d

avsm pointed me to ubigraph. So after a bit of XML-RPC hacking (haxr really rocks, it is just so amazingly easy to use) I now have code that sends the graph data over to a ubigraph server.

Of course I had to create a video of it. The video shows the dependencies of just two packages (dataenc and datetime). I tried creating a second video of the dependencies of more packages, but xvidcap isn’t very reliable it seems. One irritating thing is that I run out of filehandles fairly soon, so I couldn’t create a 3d graph of more than about 75 packages (all packages starting with ‘a’ and ‘b’ on Hackage).

More fun with Cabal, visualising dependencies

It wasn’t why I started playing with Cabal, but after extracting dependencies from a single package I thought struck me that I could extract dependencies from many packages, e.g. hackage, and draw a dependency graph of the result.

The basic idea is to use the code from my earlier post, accumulate dependency information by mapping it over several cabal files. Then convert that information into nodes and edges suitable for building a graph (Data.Graph). That graph is then “graphviz’ed” using Data.Graph.Inductive.Graphviz. Not that since this is performed on Debian Sid I’m using some rather old versions of packages.1

First off a shortcut for reading cabal files:

readCabalFile = readPackageDescription silent

Once I have a GenericPackageDescription I want to collapse it into a regular PackageDescription (see the comments in my previous post for some details regarding this). Then I extract the package name and its dependencies and package them into a tuple. by mapping this function over a list of GenericPackageDescription I end up with an association list where the key is the package and the value is a list of all its dependencies.

processFile gpd = let
        finPkgDesc = finalizePackageDescription [] Nothing
            "Linux" "X86_64" ("GHC", Version [6, 8, 2] [])
        (Right (pd, _)) = finPkgDesc gpd
        getPackageName (Dependency name _) = name
        nameNDeps = (pkgName . package) &&& (nub . map getPackageName . buildDepends)
    in
        nameNDeps pd

In order to create the graph later on I need a complete list of all nodes. To do this I take all the keys and all the values in the association list, collapse them into a single list, and remove duplicates. To turn this resulting list of packages into a list of LNode I then zip it with a list of integers.

getNodes = let
        assocKeys = map fst
        assocVals = concat . map snd
    in zip [1..] . nub . uncurry (++) . (assocKeys &&& assocVals)

Building the edges is straight forward, but a bit more involved. An edge is a tuple of two integers and something else, I don’t need to label the edges so in my case it is (Int, Int, ()). The list of nodes is basically an association list (an integer for key and a string for value), but I need to flip keys and values since I know the package name and need its node number.

getEdges deps = let
        nodes = getNodes deps
        nodesAssoc = map (\ (a, b) -> (b, a)) nodes
        buildEdges (name, dep) = let
                getNode n = fromJust $ lookup n nodesAssoc
                fromNode = getNode name
            in map (\ t -> (fromNode, getNode t, ())) dep
    in concat $ map buildEdges deps

Now that that’s done I can put it all together in a main function. The trickiest bit of that was to find the size of A4 in incehs :-)

main = do
    files <- getArgs
    gpds <- mapM readCabalFile files
    let deps = map processFile gpds
    let (nodes, edges) = (getNodes &&& getEdges) deps
    let depGraph = mkGraph nodes edges :: Gr String ()
    putStrLn $ graphviz depGraph "Dependency_Graph" (11.7, 16.5) (1,1) Landscape

I downloaded all cabal files from Hackage and ran this code over them. I bumped into a few that use Cabal features not supported in the ancient version I’m using. I was a bit disappointed that Cabal wouldn’t let me handle that kind of errors myself (as I already expressed in an earlier post) so I was forced to delete them manually.

Here’s the output, completely unreadable, I know, but still sort of cool.


  1. Yes, I’d be really happy if the transition to GHC 6.10 would finish soon.

A no-no in my book (found in Cabal)

A recent “find” in Cabal made me think of this:

In my opinion the final decision of what to do in case of an error belongs in the application, not in libraries.

I’m sure there are exceptions to this, but I believe it’s good as a guiding principle when defining APIs.

Here’s the code in Cabal that breaks this rule. Basically, if you write an application and call readPackageDescription on a cabal file with errors then your application is terminated (the call to dieWithLocation ends up calling exitWith). I would much rather it returned an error (e.g. based on Either) so I can decide what to do with bad cabal files myself.

I’ve raised a ticket for it of course ;-)

Another reason to create distro-specific packages for Haskell modules

Since the question comes up every now and then on haskell-cafe; why bother with distro-specific packages when we have Cabal? This problem is trivially solved with packaging like Debian’s.

Simple Cabal parsing

This is going to be a silly post, simply because it is so amazingly simple to do some basic parsing of Cabal files. It turns out to be true even for old versions of Cabal, the code below is for 1.2.3.0 since that’s the version still found in Debian Unstable.1

First import the required modules:

import Distribution.PackageDescription
import Data.Version
import Distribution.Verbosity
import Distribution.Version

The a simplistic shortcut, I don’t care about setting flags or pretty much anything else

finPkgDesc = finalizePackageDescription [] Nothing "Linux" "X86_64" ("GHC", Version [6, 8, 2] [])

and I’m only interested in listing the package names of dependencies

getPackageName (Dependency name _) = name

After this it’s possible to do the following in ghci (ex01.cabal is a copy of the Cabal file for dataenc):

> gpd <- readPackageDescription silent "ex01.cabal"
...
> let pd = finPkgDesc gpd
> let deps = either (\_ -> []) (\ (p, _) -> buildDepends p) pd
> map getPackageName deps
["array","base","containers"]

  1. That’ll change when GHC 6.10 makes it onto my system. It’s already in Debian, just not on my system since some Xmonad-related packages haven’t been rebuilt yet.

Another way to start Xmonad in Gnome

I stumbled on this email thread on starting Xmonad in Fedora 10. In anticipation of Gnome 2.24 being installable from Debian Experimental I dropped the following into /usr/share/applications/xmonad.desktop:

[Desktop Entry]
Type=Application
Encoding=UTF-8
Name=Xmonad
Exec=xmonad
NoDisplay=true
X-GNOME-WMName=Xmonad
X-GNOME-Autostart-Phase=WindowManager
X-GNOME-Provides=windowmanager
X-GNOME-Autostart-Notify=true

and modified the GConf key /desktop/gnome/applications/window_manager/current to hold the value /usr/bin/xmonad. Last but not least I removed my ~/.gnomerc. Log out, log in. Enjoy!

Yes, this is mentioned on the Xmonad wiki, though with slightly less details. I’ll hold off updating the page until I have Gnome 2.24 on my desktop.

Python extensions: Please tell me there's an easier way!

Dear Lazyweb,

Please tell me there’s an easier way of building Python C/C++ extensions (read shared objects) using automake than by using libtool. Please! Please!

I just want a frickin’ .so file, nothing complicated. Is the only way, besides using libtool, to write custom targets?

Python coding dojo in Cambridge

I just came across this post about a coding dojo in Cambridge. Sounds interesting.

New backend for epilicious pushed.

I’ve just finished pushing support for diigo as a backed for epilicious. For the impatient there is the epilicious darcs repo. The patient will wait for the next version of Gnome to be released.

A really good version of Star Wars

If George ever decides to make even more money off Star Wars then he should release a remix of the existing films based on this story.

More sensible comments on cabal-debian

First of all I decided to make this a new post rather then answer in the comment section in my previous post. I think that this way the chance is greater that both David and Jeremy will read this :-)

So, first let’s set the stage… It’s almost midnight and I’ve just spent somewhere between 30 and 45 minutes hunting down packages on Haskell and navigating the directory listings on src.seereason.com. I’ve been sent in the wrong direction once (downloading a Debian package that seems to be from Linspire instead of grabbing the identically named package from the seereason site. I’m tired and slightly frustrated. I run cabal-debian, it works, then I run debuild just to be told that I really should have build cabal-debian itself into a Debian package and installed it that way and I also need another package that isn’t in Debian at all.

Anyway, having slept and let almost 24 pass by I’ve come to my senses somewhat ;-)

Here’s what think is a slightly more tempered whishlist:

  1. Remove haskell-devscripts-cdbs and include hlibrary.mk in haskell-devscripts.
  2. Modify dh_haskell* so that the two snippets for -doc are automatically generated instead (just like the script snippets for -dev already are).

I think it would be easy to stick that in seereason’s repos for public consumption. Then interested people, like me, can start thinking about how to use that as a base to make Debian the best Haskell platform in the world, or at least making it better than Arch ;-)

Completely unrelated question for seereason people

I have the following line in my source.list:

deb     http://deb.seereason.com/debian sid-seereason main

Why don’t I see the seereason-keyring package?

Experience with cabal-debian

So, after receiving several pointers to seereason’s cabal-debian tool I thought I’d take it for a spin. ((I did pull down autobuilder as well, but didn’t feel I had enough time to explore it tonight.))

After about 30 minutes of browsing through HackageDB and seereason’s source repos, building and installing, I had finally satisfied all dependencies and the build of cabal-debian succeeded. (Oh, BTW, seereason people, it’s really confusing that you have a package called debian among your source repos, when there already is a different debian package on HackageDB. Please consider renaming it!) I decided to take the tool for a spin on my own package, dataenc.

The result was fairly good. It seems the generated files depend on some packages that either aren’t in Debian or that the people at seereason have modified. With the following changes to the generated files I was happy with the contents of ./debian and I successfully built an (almost) autogenerated Debian package:

  1. Download hlibrary.mk into ./debian.
  2. Modify ./debian/rules so that hlibrary.mk is loaded from $(CURDIR)/debian/hlibrary.mk.
  3. Delete ./debian/libghc6-dataenc-doc.post*.
  4. Remove cabal-debian and haskell-devscripts-cdbs from the build dependencies in ./debian/control.

I’ll try to find the time to put those changes into cabal-debian itself. Then I’d have a rather nice tool for building Debian packages automatically.

Building Debian packages of (cabalised) Haskell packages

I’ve just spent the last hour or so ironing out the details required to automate the building of cabalised Haskell packages for Debian. At the same time I also built Debian packages for 5 Haskell packages (test-framework and its dependencies).

These are the basic steps I’ve followed:

  1. Extract the tar-ball in the working directory.
  2. Rename the directory to match the form haskell-X-0.1.
  3. Re-tar the directory into an archive named haskell-X_0.1.orig.tar.gz.
  4. Change into the package directory.
  5. Run Cabal’s configure command followed by register --gen-pkg-conf=pkg.conf.
  6. Modify the generated file and turn it into a .ini-style file:

    echo '[conf]' > pkg.conf.tmp; cat pkg.conf >> pkg.conf.tmp; mv pkg.conf.tmp pkg.conf
  7. Use Python to extract the dependencies:

    python << EOF
    from ConfigParser import ConfigParser
    cfg = ConfigParser()
    cfg.read('pkg.conf')
    deps = cfg.get('conf', 'depends').split()
    for d in map(lambda s: s.rsplit('-', 1)[0].lower(), deps):
        print d
    EOF
  8. Clean up by rm pkg.conf and running Cabal’s clean command.
  9. Create the debian directory.
  10. Create debian/compat: echo 5 > debian/compat
  11. Create debian/changelog, it should be something like:

    haskell-X (0.1-1) unstable; urgency=low
    
      * Automatic build.
    
     -- Magnus Therning <magnus@therning.org>  Mon, 19 Jan 2009 22:35:00 +0000
  12. Create debian/control. This is by far the most time consuming part to do manually. It’s also a bit long so I’ll leave it out of this post. Take a look in my APT repo if you are interested in the details.
  13. Copy hlibrary.mk into debian/. At some point it will be included in some Debian package (bug #462482) but for now it can be found here.
  14. Create debian/rules with

    #!/usr/bin/make -f
    include $(CURDIR)/debian/hlibrary.mk
    and make it executable, chmod +x debian/rules.
  15. Done. Run debuild.

My intention is to at some point encode these steps in a tool so that I can spew out Debian packages for all interesting packages on HackageDB at the press of a button. Of course that tool will be written in Haskell, and it sure won’t rely on Python for any part of the process ;-)

Readers who are familiar with the rules for Debian packages will notice that this won’t create well-formed packages. I’m not particularly interested in getting these packages into Debian proper, I see the package generated this way living in an unofficial repo. So there is no need to make sure they are acceptable for inclusion in the distribution itself. The resulting packages might offer a good starting point for someone who feels that a package deserves to become official. I also suspect that a tool that generates fully compliant packages, and is able to deal with keeping them up-to-date, is too big a task for me. I’m also not sure it really is worth it, better to keep things as simple as possible in order to deal with the amazing rate of new releases on Hackage.

Ubuntu - now found in the UK too

It’s strange but Ubuntu cola seems to be more mainstream in Sweden than in the UK. My brother found it in downtown Gothenburg, in a branch of a major chain of grocery stores. On the other hand, I haven’t seen it anywhere in the UK until yesterday. I found it in a fairtrade store in a tiny village outside Cambridge. A little strange since Ubuntu Trading is a UK company.

Ubuntu cola

Ubuntu cola

Useful thing when adopting test-framework after already using HUnit

I found test-framework today. In adopting it with my already existing unit tests I found this function to be rather useful:

import Test.HUnit
import qualified Test.Framework.Providers.API as TFAPI

unitTest2TFTest :: Test -> [TFAPI.Test]
unitTest2TFTest t = let
        unitTest2TFTest' desc (TestCase a) = [testCase desc a]
        unitTest2TFTest' desc (TestLabel s t) = unitTest2TFTest' (desc ++ ":" ++ s) t
        unitTest2TFTest' desc (TestList ts) = concat $ map (unitTest2TFTest' desc) ts
    in unitTest2TFTest' "" t

Series of posts on testing Haskell code

I’ve been using both QuickCheck and HUnit to make sure that the functions in dataenc both have the expected properties and follow the specifications.

This morning I stumbled on a series of posts by Matthew Podwysocki on Functional Programming Unit Testing (part 1, part 2, part 3, part 4, part). I think I really could have used these posts about two years ago when I first ventured into the world of testing Haskell. The only new thing I learnt was the existence of the package test-framwork in part three. Anyway, they are a good introduction to testing Haskell code (and F#).

dataenc 0.12 posted to HackageDB

Yesterday I implemented yEncoding in dataenc and uploaded version 0.12 to HackageDB.

This morning on Swedish telly...

Today I started the day by watching some Swedish telly, Gomorron Sverige I think the show was called. Three guests got to comment on the news year that just passed. They were asked for their nominations in categories such as Most Over-Reported News Story, Most Important Public Figure and the like. One of them nominated climate change for the category Most Under-Reported News Story. Immediately one of the others blurted out that he thinks that climate change is exaggerated and has received too much attention in media. He was told that there now is proof that large portions of the polar ice has melted (25% was mentioned, though I didn’t catch whether it was an increase of 25% or if 25% was gone, anyway, it doesn’t really matter here) and that therefore this is a very real problem. In answer to this the climate-change doubter mentioned that he has been keeping an eye on the water level at the pier near his summer place and there has been no change over the years. So, if climate change is real and the polar ice is melting, where does the water go? My initial reaction against this line of reasoning seems to have been shared by the person who initially brought up the subject, “How could he argue against climate change? And with such a simplistic argument as well?” Then I was disappointed to hear that the only counter argument basically boiled down to “But, it doesn’t work that way” and “How can you argue against all the scientists?” First I was disappointed in her for not providing any answers, then I grew more disappointed in myself as I realised that had I been sitting next to this climate-change doubter myself at a dinner I wouldn’t have been able to provide a stronger argument myself.

How should a counter-argument be constructed? What scientific ideas and proofs should be used to point out the fallacy of the “pier argument”?

I can think of these:

  • There is evidence of climate change on a macro level, his argument doesn’t explain those away. All he’s really saying is that on a micro level, and in a very specific geographical location, measuring only the water level, he can’t find any evidence for climate change.
  • It’s probably also simple to point out the lack of scientific rigor in his methods. Somehow though, I feel this is a bad way to try to convince anyone.

Book: Lighttpd

As I mentioned in the previous post I received an email pointing me towards a newly released book on lighttpd. Only one chapter is available free of charge so that and the table of contents is all I can comment on at this point.

Let’s start with the table of contents. To be honest I found it very disappointing, there were so many other chapters that I would have found much more interesting than chapter 10. Given that I’ve never really gotten into Apache to begin with I would much rather have read about advanced configuration (chapters 2 and 3), or CGI (chapters 3 and 11), or maybe most about securing lighttpd, both securing what it serves (through SSL, chapter 6) and by reducing privileges (chapter 8). Yes, for me personally they’ve chosen the least interesting chapter to make available free of charge ;-)

So, what do I think about chapter 10 then? Overall I’m happy with it. In a book like this, one that covers usage of a specific piece of software, I’m happy to find text that is clear and simple to understand. It isn’t very deep in concepts and details, but there are links to places where I can find out more if I need it. I’m not familiar enough with lighttpd to judge whether the details are correct, but I assume they are, and as I already mentioned it isn’t a complete reference, but offers links to where the reader can find more details. One thing that did upset me a little is the presence of rather lengthy pieces of Python scripts to aid in the converting some of the Apache configuration to lighttpd. Though I suppose they fit in I’d rather have seen them made available only online, with just the descriptions and usage in the book. I couldn’t help but feel they were there only to increase the page count.

Anyway, what I’ve seen so far makes me want to read the rest of the book. As I’ve made clear I’m not a serious user of lighttpd, but it is the web server I reach for first when I find myself needing one. I think this book will make me a more efficient user of lighttpd.

A first for me...

I’ve read quite a few articles about the changing landscape of marketing, how regular ads and old-style marketing is no longer trusted. I suppose companies have over-promised and under-delivered for too long. Instead PR firms are adding blogging into the mix, which probably means blogs will be ruined in the long run ;-) For now though blogs are still trusted, especially when it’s regular people who write about products.

So, why this post? Well, it seems the few posts I’ve made here on lighttpd seems to have attracted some minimal attention and I received an email regarding a new book on lighttpd, published by Packt publishing. This despite the fact that I am what can only be called a cursory user of lighttpd. It contained a link to a gratuitous chapter so I decided to play along and publicly comment on the chapter. If you don’t like this kind of thing then you should skip the next post.

Re: Redesigning GHC’s build system

The blog GHC Mutterings doesn’t allow posting of comments without logging in. Since I have absolutely no interest at all in creating yet another online account I hope they allow pingbacks :-)

In this post they discuss the planned redesign of GHC’s build system, and here’s my comment:

The goal of moving away from make is probably a good one, but when failing to get Cabal to fit your needs you revert back to redesigning it using make again? There are numerous other build tools out there, many of them remarkably better than make. What other tools were considered and why were they discarded? I simply refuse to believe that the only build tool to make the shortlist was make. :-)

Keysafe and screenshots.debian.net

After reading about screenshots.debian.net I’ve now uploaded a screenshot for the only package I have in Debian, keysafe. It will be visible to the public as soon at it has been approved.

Another item for the GMail whishlist

Why can’t I change the ordering of mail in GMail?

I’ve never quite understood why anyone would want to have their most recently received mail at the top. Personally I always want to read mail in the order they arrive.

Adventures with a certain Xen vulnerability (in the PVFB backend)

Here’s another post about a paper I’ve read recently. This time it’s not entirely for fun, but I still thought I’d write about this one, Adventures with a certain Xen vulnerability (in the PVFB backend). I’ve read a few security-related papers and articles. In general I’ve found that there’s a huge gap in quality (and sometimes rigor) between the practitioners and academia. This however is a paper that I found to be of good quality, while still being produced by a member of the former camp. Hopefully it will start a trend :-)

Evaluating High-Level Distributed Language Constructs

I thought I’d start posting little notes about papers I read, especially if I find them interesting and worth reading. So here we go!

I read this paper on the bus this morning. I suspect I got it off Lambda the Ultimate a while back, printed it and then kept it in my bag for several months.

Evaluating High-Level Distributed Language Constructs

What money is

A friend pointed me to this film about what money is. I’ve known since high school how banks create money, but I’ve never thought it through to its logical, and somewhat worrying, conclusion. Well worth watching I thought.

Haskell and Time

I’ve found myself getting confused over the Data.Time modules a few times now. Today I wanted to do something that I ought to be simple both to do (it was) and find out how to do (it wasn’t, at least not for me). I thought I’d write it down, because I’m bound to forget about it otherwise, and it might be helpful for some other poor soul out there.

I wanted to store a date in a database (SQLite) using Haskell (HDBC). Since I’m only interested in the date and not the time I chose to represent it in Haskell as a Date.Time.Calendar.Day. For Day 0 means 1858-11-17. HDBC on the other hand needs the date in seconds since epoch (1970-01-01 00:00 UTC), which conveniently can be represented with a single Integer. Time.Data.Clock.POSIX contains a few handy functions to convert between POSIXTime (NominalDiffTime) and UTCTime. However, there are no special functions for converting between an Integer and NominalDiffTime. The way I found was to do it via the class types that NominalDiffTime implements.

First I thought that Enum might offer the solution, but my experiments with that was a bit confusing:

> pt <- getPOSIXTime
1221603766.526661s
> fromEnum pt
4118657661830593344
> toEnum 4118657661830593344 :: POSIXTime 
4118657.661830593344s

Not really the result I had hoped for. No, instead the function to use to get the Integer is floor, and the only way I found to convert a NominalDiffTime from an Integer is to go via a Ratio:

> fromRational (1221603766 % 1) :: POSIXTime
1221603766s

Have I missed something or is this the best way to achieve this?

GPG and Gmail?

I’d love to see the Gmail S/MIME plugin for Firefox gain support for PGP/MIME. I can’t imagine it’d be too hard to add for someone who’s familiar with JavaScript and Firefox plugins. Or maybe there’s some other plugin out there that offers PGP/MIME support in Gmail?

Oh, and no, FireGPG isn’t good enough, it doesn’t do PGP/MIME, it doesn’t automatically verify/decrypt received emails, and it requires too much mouse interaction!

On twitter...

Username: magthe

Confusion with HaskellDB

Dear LazyWeb,

I’ve recently made some comparisons between HaskellDB and HDBC. My gut feeling is that I’d like to use HaskellDB due to its type safety and that it allows me to abstract away from SQL—something in me twitches every time I see strings containing SQL mixed in with other code, no matter what language it is. However there are some things that confuse me with HaskellDB. After overcoming the initial puzzlement that I seem to hit whenever I look at a Haskell API I stumbled on the apparent lack of support for primary keys. This surprised me and to check if I overlooked something I wrote the following code:

module Main where

import System.Environment
import System.IO
import Database.HaskellDB.HSQL.SQLite3
import Database.HaskellDB.DBSpec

testDBOpts = DBOptions { useBString=False }

sqliteOptions= SQLiteOptions { filepath="replace me", mode=ReadMode }

main :: IO ()
main = do
    dbfn <- fmap (!! 0) getArgs
    let options = sqliteOptions { filepath=dbfn }
    sqliteConnect options (dbToDBSpec False "extracted.db") >>= print

This is the surprising output when I point this to two different databases. The first database has no primary key:

% sqlite3 test1.db .dump
BEGIN TRANSACTION;
CREATE TABLE person (name text  not null,
                     age int);
COMMIT;

and the DBInfo looks the way I expected:

% ./extract test1.db 
DBInfo {dbname = "extracted.db", opts = DBOptions {useBString = False},
tbls = [TInfo {tname = "person", cols = [CInfo {cname = "name",
descr = (StringT,False)},CInfo {cname = "age", descr = (StringT,True)}]}]}

The second database does have a primary key:

% sqlite3 test2.db .dump
BEGIN TRANSACTION;
CREATE TABLE person (name text primary key not null, age int);
COMMIT;

but not only does the DBInfo lack any mention of it, it also contains the table info twice:

% ./extract test2.db 
DBInfo {dbname = "extracted.db", opts = DBOptions {useBString = False},
tbls = [TInfo {tname = "person", cols = [CInfo {cname = "name",
descr = (StringT,False)},CInfo {cname = "age", descr = (StringT,True)}]},
TInfo {tname = "person", cols = [CInfo {cname = "name",
descr = (StringT,False)},CInfo {cname = "age", descr = (StringT,True)}]}]}

Am I doing something fundamentally wrong or does HaskellDB not support the notion of a primary key?

Is it a bug in HaskellDB that the second extracted DBInfo contains two instances of TInfo when the database clearly only has one table?

Fixing fonts in Debian

This is how it’s done.

TagSoup, meet Parsec!

Recently I began writing a tool to scrape some information off a web site for some off-line processing. After writing up the basics using TagSoup I showed what I had to a colleague. His first comment was “Can’t you use Parsec for that?” It took me a second to realise that he didn’t mean that I should write my own XML parser but rather that Parsec allows writing parsers of a list of anything. So I thought I’d see just what it’d take to create a parser for [Tag].

A look at the string parser shipped with Parsec offered a lot of inspiration.

First the basic type, TagParser:

type TagParser = GenParser Tag

The basic function of Parsec is tokenPrim, basically that’s what other basic parsers use. Taking a cue from the string parser implementation I defined a function called satisfy:

satisfy f = tokenPrim
        show
        (\ pos t _ -> updatePosTag pos t)
        (\ t -> if (f t) then Just t else Nothing)

The positioning in a list of tags simply an increase of column, irrespective of what tag is processed:

updatePosTag s _ = incSourceColumn s 1

Now I have enough to create the first Tag parser—one that accepts a single instance of the specified kind:

tag t = satisfy (~== t) <?> show t

It’s important to stick the supplied tag on the right of (~==). See its documentation for why that is. The second parser is one that accepts any kind of tag:

anyTag = satisfy (const True)

So far so good. The next parser to implement is one that accepts any kind of tag out of a list of tags. Here I want to make use of the convenient behaviour of (~==) so I’ll need to implement a custom version of elem:

l `elemTag` r = or $ l `elemT` r
    where
        l `elemT` [] = [False]
        l `elemT` (r:rs) = (l ~== r) : l `elemT` rs

With that in place it’s easy to implement oneOf and noneOf:

oneOf ts = satisfy (`elemTag` ts)
noneOf ts = satisfy (\ t -> not (t `elemTag` ts))

So, as an example of what this can be used for here is a re-implementation of TagSoup’s partitions:

partitions t = liftM2 (:)
        (many $ noneOf [t])
        (many $ liftM2 (:) (tag t) (many $ noneOf [t]))

Of course the big question is whether I’ll rewrite my original code using Parsec. Hmm, probably not in this case, but the next time I need to do some web page scraping it offers yet another option for doing it.

Burning audio CDs on Linux

I just had a blast from the past!

Being somewhat spoiled by the current state of Linux-on-the-desktop (Gnome in my case) I have become used to inserting a blank CD, point Nautilus to burn:///, dragging and dropping a few files into it, and finally clicking the “Burn to CD” button. Works beautifully. Of course I thought that Nautilus’ CD-burning would know about audio CDs. So I dragged a few WAV files into Nautilus and clicked the button. Voilà, a data CD containing a few Wav files! Not really what I wanted.

So, to avoid this in the future here is the command line to use:

wodim -v dev=0,0,0 speed=40 -audio -pad *.wav

More prefixes

As Edward pointed out the code in the previous post expressions such as a .+. b .+. c are problematic. By expanding the expression and introducing a few parentheses it becomes apparent what the problem is:

fromIB $ (toIB (fromIB $ (toIB a) `addIB` (toIB b))) `addIB` (toIB c)

The problem is that inner fromIB. Unfortunately GHC doesn’t realise that the left-most sum could take on any type that is a CIB, the exact type doesn’t really matter since the result is passed to toIB anyway. It would be sort of cool if I could tell the compiler to prefer a specific CIB, basically a directive along the lines of “if in doubt, use Bytes”. I don’t think that’s possible in GHC. As it stands one would have to specify the type of the inner sum:

(a .+. b :: Bytes) .+. c :: KBytes

One possible solution would be to remove the call to fromIB from the definition of .+. and instead require the user to call it explicitly:

fromIB $ a .+. b .+. c :: KBytes

I suppose it’s all right, but not quite as elegant as I had hoped.

Now on to the more interesting part of Edward’s comment. I needed quite a bit of clarification, but I now know what he is proposing, I even think I understand it ;-)

The general idea is to have the compiler choose the largest prefix that can represent the sum of two values without losing precision. The represention I used didn’t have the problem of ever losing precision, so I’ll change the representation to better show what Edward meant.

First off I need some extensions to GHC:

{-# OPTIONS_GHC -XGeneralizedNewtypeDeriving
        -XMultiParamTypeClasses
        -XFunctionalDependencies #-}

Now the prefixes are represented with a single integer (or rather a single instance of Num). This is easily done thanks to GeneralizedNewtypeDeriving.

newtype Bytes a = Bytes a
    deriving (Eq, Show, Num)

newtype KBytes a = KBytes a
    deriving (Eq, Show, Num)

newtype KiBytes a = KiBytes a
    deriving (Eq, Show, Num)

Now I need a typeclass to bind together two types in a ‘lesser-than-or-equal’ relationship and provide a conversion function:

class LEq s u where
    lower :: Num a => s a -> u a

Now the relation has to be implemented for the prefixes. In short the following says that Bytes is the less than both KBytes and KiBytes and that each prefix is less than or equal to itself:

instance LEq Bytes Bytes where
    lower = id

instance LEq KBytes KBytes where
    lower = id

instance LEq KiBytes KiBytes where
    lower = id

instance LEq KBytes Bytes where
    lower (KBytes k) = Bytes $ k * 10^3

instance LEq KiBytes Bytes where
    lower (KiBytes ki) = Bytes $ ki * 2 ^ 10

Now there’s a second relationship that is designed to relate two prefixes to a third one. Basically the third is the largest prefix that can be used to represent the sum of the other two without a loss of precision. One could say that the third is where the other two meet.

class (LEq s u, LEq t u) => Meet s t u | s t -> u

I have to manually define where the different prefixes meet:

instance Meet Bytes Bytes Bytes
instance Meet KBytes Bytes Bytes
instance Meet Bytes KBytes Bytes
instance Meet KiBytes Bytes Bytes
instance Meet Bytes KiBytes Bytes
instance Meet KBytes KBytes KBytes
instance Meet KBytes KiBytes Bytes
instance Meet KiBytes KBytes Bytes
instance Meet KiBytes KiBytes KiBytes

Now I can define addition and subtraction in terms of Meet instances:

(.+.) :: (Meet s t u, Num a, Num (u a)) => s a -> t a -> u a
(.+.) a b = (lower a) + (lower b)

(.-.) :: (Meet s t u, Num a, Num (u a)) => s a -> t a -> u a
(.-.) a b = (lower a) - (lower b)

Finally I’ve arrived at the destination, but as so often I have to admit that the journey was at least half the fun.

*Prefixes> let i = KiBytes 4
*Prefixes> let j = KBytes 3
*Prefixes> let k = Bytes 900
*Prefixes> i .+. j
Bytes 7096
*Prefixes> i .+. i
KiBytes 8
*Prefixes> i .+. j .+. k
Bytes 7996

Playing with prefixes

I just realised that it’s been over a month since I posted something. All I can do is blame life ;-) With a daughter of 8 months and a recent switch of positions I just haven’t found time to play around really. Hopefully that’ll change now that the situation at work is “calming down” a bit.

Anyway, here’s a bit of code I played with today. A friend described a problem inside a tool related to reporting of free and used memory. Apparently, depending on the location in the code, memory could be stored in integers in bytes or in kilobytes (kB) or in kibibytes (Ki) or in mebibytes (Mi) etc. In every place the prefix would be ‘recorded’ in the variable name, resulting in a situation where a change of prefix used in one place would force a change of variable names in callers/callees. A bit of a mess. I thought I’d try to explore what Haskell’s type system could produce ;-)

First approach

My first thought was to create a data type for the prefixes:

data Size = Bytes Int
    | KB Int Int
    | KiBi Int Int
    deriving (Show, Eq)

Both for KB and KiBi I record the number of full KBs (Kis) and the remaining bytes. Then I thought that it’d be easiest to do all the calculations in bytes, which required a conversion function:

toBytes (KB i r) = Bytes (10^3 * i + r)
toBytes (KiBi i r) = Bytes $ 2^10 * i + r
toBytes b = b

After that it’s possible to make Size an instance of Num (incomplete, but the rest of the functions in Num are straight forward):

instance Num Size where
    (Bytes b1) + (Bytes b2) = Bytes $ b1 + b2
    s1  + s2 = (toBytes s1) + (toBytes s2)
    (Bytes b1) - (Bytes b2) = Bytes $ b1 - b2
    s1  - s2 = (toBytes s1) - (toBytes s2)

Unfortunately the result will always be in Bytes so a few other conversion functions are necessary:

toKB (Bytes i) = uncurry KB $ divMod i (10^3)
toKB s = toKB $ toBytes s

toKiBi (Bytes i) = uncurry KiBi $ divMod i (2^10)
toKiBi s = toKiBi $ toBytes s

Now it’s easy to create a type for bookeeping of memory and making it an instance of Num as well:

data MemUsage = Free Size | Used Size
    deriving (Eq, Show)

instance Num MemUsage where
    (Free i) + (Free j) = Free $ i + j
    (Used i) + (Used j) = Used $ i + j
    (Free i) + (Used j) = Free $ i - j
    (Used i) + (Free j) = Used $ i - j

    (Free i) - (Free j) = Free $ i - j
    (Used i) - (Used j) = Used $ i - j
    (Free i) - (Used j) = Free $ i + j
    (Used i) - (Free j) = Used $ i + j

On some level this ‘algebra’ for memory usage makes sense, though refinements are certainly possible e.g. . I suspect there are some interesting choices to be made regarding the remaining functions in Num, e.g. should negate turn free memory into used memory?

I find this solution somewhat lacking in the extensibility department—as the commonly available amount of memory grows (the code above is stuck in the 80’s it seems) changes have to be made to Size. The same goes for changes to support non-standard prefixes, probably not a common problem but It would be nice to have a solution that is “independently extensible”.

Second approach

Here my idea was to create one type per prefix, use a separate type for the calculations and use a type class for the conversions. Quite straightforward when put like that, though it took me a while to come up with it ;-)

Here is the internal representation used during calculations:

data IB = IB Int
    deriving (Eq, Show)

Simple enough. Then the class for conversions to and from IB:

class CIB a where
    toIB :: a -> IB
    fromIB :: IB -> a

No surprises there either. Now I want to add and subtract, but due to the definition of Num (it requires all arguments to be of the same type, e.g. the type of addition is (+) :: a -> a -> a) I’ll need two new functions for that (at least I don’t know a way around this):

(.+.) :: (CIB a, CIB b, CIB c) => a -> b -> c
i .+. j = fromIB $ (toIB i) `addIB` (toIB j)
    where
        addIB (IB i1) (IB i2) = IB (i1 + i2)

(.-.) :: (CIB a, CIB b, CIB c) => a -> b -> c
i .-. j = fromIB $ (toIB i) `subIB` (toIB j)
    where
        subIB (IB i1) (IB i2) = IB (i1 - i2)

One interesting observation is that due to the type of .+. and .-. it is possible, and often required, to specify the type of the result, somewhat similar to HSH. Now the stage is set for the prefix types to be defined, starting with plain bytes:

data Bytes = Bytes Int
    deriving (Eq, Show)

instance CIB Bytes where
    toIB (Bytes i) = IB i
    fromIB (IB i) = Bytes i

Similarly for kilobytes and kibibytes:

data KBytes = KBytes Int Int
    deriving (Eq, Show)

instance CIB KBytes where
    toIB (KBytes i j) = IB $ i * 10^3 + j
    fromIB (IB i) = uncurry KBytes $ divMod i (10^3)

data KiBytes = KiBytes Int Int
    deriving (Eq, Show)

instance CIB KiBytes where
    toIB (KiBytes i j) = IB $ i * 2^10 + j
    fromIB (IB i) = uncurry KiBytes $ divMod i (2^10)

The data type for memory usage wrap the memory usage:

data MemUsage a = Free a | Used a
    deriving (Eq, Show)

Again I can’t make MemUsage a an instance of Num, instead there are two new operators:

(~+~) :: (CIB a, CIB b, CIB c) => MemUsage a -> MemUsage b -> MemUsage c
(Free i) ~+~ (Free j) = Free $ i .+. j
(Used i) ~+~ (Used j) = Used $ i .+. j
(Free i) ~+~ (Used j) = Free $ i .-. j
(Used i) ~+~ (Free j) = Used $ i .-. j

(~-~) :: (CIB a, CIB b, CIB c) => MemUsage a -> MemUsage b -> MemUsage c
(Free i) ~-~ (Free j) = Free $ i .-. j
(Used i) ~-~ (Used j) = Used $ i .-. j
(Free i) ~-~ (Used j) = Free $ i .+. j
(Used i) ~-~ (Free j) = Used $ i .+. j

End thoughts

There is one thing I like about the first approach and one thing I don’t like. That both the types are instances of Num is something I like since I am interested in adding and subtracting memory, though arguably there are functions in Num that don’t make much sense for memory (e.g. multiplication and negation). The thing I don’t like is the need for explicit conversion functions (toKB and friends).

There are also one thing that I like about the second approach and one I don’t like. I would have liked to use the standard functions in Num, introducing new (type-specific) operators is something that I’m never particularly keen on. Though I suppose that implementing separate operators does emphasize that memory isn’t quite like regular integers. I really do like using Haskell’s own syntax to do the conversion, it’s more readable than having to call conversion functions.

That’s it! Comments and suggestions for improvements are always welcome of course.

Google Treasure Hunt primes question

In the comments to my post on the Google treasure hunt Andre asked me to put up my solution for the fourth problem. The problems were personalised and here is mine:

Find the smallest number that can be expressed as
the sum of 3 consecutive prime numbers,
the sum of 29 consecutive prime numbers,
the sum of 373 consecutive prime numbers,
the sum of 665 consecutive prime numbers,
and is itself a prime number.

The idea behind my solution is that if I have four lists of the sums, and a list of primes, then the solution to the problem is the smallest member of the intersection of all five lists.

I saved the problem of finding primes to the last and instead concentrated on summing up lists of integers in a convenient way. I ended up with the following solution to that:

sumN :: Int -> [Integer]
sumN n = map fst sumi
    where
        sumi = tail $ iterate (sumn) (0, primes)
        sumn (r, xs) = (sum $ take n xs, tail xs)

The real work happens in the inner functions. sumn picks out as many numbers as I want and sums them up, the reason for taking and returning a tuple is that sumi can use iterate to calculate the sequence of sums. The first item in the result of iterate a is a, so sumi drops the first one. This lets me create the needed lists of sums:

sum3 = sumN 3
sum29 = sumN 29
sum373 = sumN 373
sum665 = sumN 665

After having a quick look at the implementation of Data.List.intersect I realised it probably would be a bit too slow. Instead I wrote one that exploits that all the involved lists are ordered:

intersection a@(x:xs) b@(y:ys)
    | x == y = x : intersection xs ys
    | x < y = intersection (dropWhile (< y) xs) b
    | x > y = intersection a (dropWhile (x >) ys)

After convincing myself that this code did what I wanted (basically through playing in GHCi using an incorrect definition of primes, primes = [1..]) I turned to the problem of finding the primes. Of course someone else had already written the Haskell code to find primes. That code fitted perfectly with mine since it produces an infinate list of Integers. Brilliant! :-)

Finally here’s the main that allows me to compile the code and get the answer quickly (it’s slightly modified for the medium, basically I wrote it all on one, long line):

main :: IO ()
main = let
        i1 = (primes `intersection` sum665)
        i2 = (sum3 `intersection` sum373)
        s =  i1 `intersection` i2 `intersection` sum29
    in
        print $ head s

Compiled, it runs in about 5.5 seconds on my desktop machine.

Google Treasure Hunt, check!

I personally found the first one to be the trickiest, quite a bit of thinking with a colleague before we saw Pascal’s triangle in the problem. I was a bit worried that things would get trickier from there on. It surely didn’t.

The second, summing certain lines taken from files matching certain glob patterns, was easily solved with a short Haskell program.

The third, a routing problem was probably the easiest and I think most easily solved on paper. I did make a mistake though, which I caught before typing my answer into the webform.

The fourth one, adding sequences of primes, was fairly straight forward. I think it would have presented a bit more trouble if I didn’t make use of Haskell’s laziness and support for infinite lists though.

Silly realisation

I bet a lot of people would consider this to be fairly obvious, but today, while solving the fourth of Google’s Treasure Hunt problems I realised just how similar in essence laziness+infinate lists in Haskell is to generators in Python. It’s just much easier to work with the former ;-)

Linux 2.6.25 + nvidia module

I hope this will be out of date really soon, but here’s what I did to solve the problems I had with the nVidia driver after upgrading to the latest kernel in Sid (2.6.25). First download the file NVIDIA_kernel-169.12-2286310.diff.txt from this forum article. Then, as root do:

# cd /usr/src
# m-a clean nvidia-kernel
# tar -x -j -f nvidia-kernel.tar.bz2
# pushd modules/nvidia-kernel/nv
# patch -p3 < NVIDIA_kernel-169.12-2286310.diff.txt
# popd
# m-a -O build nvidia-kernel

Now there’s a package ready for installation :-)

Dealing with life in Haskell

I bet you have at least one silly little thing at work that, whenever it happens, you let out a sigh, maybe roll your eyes and whish that everyone would use a proper operating system. A few days I finally decided to do something about one of my things like that. At work, Windows users will at times for some strange reason, manually create directories inside their work area, even though the directories actually are under version control. Invariably they get the case wrong and due to an onfortunate combination of case insensitive filesystem on the client (Windows) and a version control system the cares about case (Perforce). This results in files ending up all over the place even though they belong in the same directory. The Windows users are none the wiser, they simply don’t see the problem. Since I use a sane system (Linux) I do notice, and when I see it I sigh and roll my eyes.

Here’s my take on solving the problem:

module Main where

import Control.Monad
import Data.Char
import System.Directory
import System.Environment
import System.FilePath
import System.IO.HVFS.Utils
import System.Posix.Files

data IFile = IFile
    { iFileIPath :: FilePath
    , iFileIName :: FilePath
    , iFileFull :: FilePath
    } deriving (Show)

toIFile :: FilePath -> IFile
toIFile fp =  IFile path file fp
    where
        path = map toLower $ takeDirectory fp
        file = map toLower $ takeFileName fp

listFilesR :: FilePath -> IO [IFile]
listFilesR path =
    recurseDir SystemFS path >>=
    filterM doesFileExist >>=
    mapM (return . toIFile)

linkFile :: FilePath -> IFile -> IO ()
linkFile dest ifile = do
        createDirectoryIfMissing True newDir
        createLink (iFileFull ifile) newFile
    where
        newDir = normalise $ dest </> (iFileIPath ifile)
        newFile = newDir </> (iFileIName ifile)

main :: IO ()
main = do
    args <- getArgs
    listFilesR (args !! 0) >>= mapM_ (linkFile $ args !! 1)

Yes, this is the code I wrote in Literate Haskell, but I think I’d better not disclose my rant against clueless Windows users publically ;-)

Rubber and lhs?

Another dear lazyweb post.

Yesterday night I hacked up a silly little tool for use at work. Nothing spectacular ((I’ll post the source here shortly.)) but I thought I’d try this Literate Haskell thing. I decided to go the LaTeX route and created a .lhs file. To my amazement rubber has a module for handling Literate Haskell, unfortunately it passes everything ending in .lhs through lhs2Tex. I didn’t really want to use lhs2Tex ((I’ll probably have a closer look at it later but for now I wanted to keep things as easy as possible.)) but I haven’t been able to find any way of changing rubber’s behaviour through its command line arguments. What I’d like is a way to disable a module for a single invocation. So, dear lazyweb, is there a way to do this from the command line?

So far I’ve worked around this by creating a symbolic link named MyTool.tex that points to the Haskell source, then I run rubber on the link. Not to much of a hazzle, but it’d be nicer to pass an argument to rubber.

Debian, lighttpd and MediaWiki

I just went through the exercise of setting up MediaWiki on a Debian Etch box, but instead of serving it using the common choice of Apache I wanted to use lighttpd. It seems the combination MediaWiki and Apache is well supported on Debian. There was even an install-time question whether I wanted the configuration files set up for me. Well, it’s boring to follow the common path, right?

It seems the combination MediaWiki and Apache is well supported on Debian. Luckily that doesn’t mean it’s difficult to serve it with lighttpd. Here’s how I configured things, and luckily it all seems to work just fine ;-)

Installation

First off, I installed mediawiki1.7, lighttpd, php5-cgi, and mysql-server. I made sure that aptitude only pulled in required packages and then I started un-choosing all Apache-related packages. By the time you get to installing MySQL you’ll be asked for a root password, that is if your debconf is set to ask medium-level questions.

Setting up MediaWiki

The Debian package for MediaWiki creates a site in /var/lib/mediawiki1.7 consisting mostly of symbolic links to the actual location of its files. I kept lighttpd’s default document root of /var/www and linked the two by creating a symbolic link /var/www/mw pointing to the MediaWiki site (/var/lib/mediawiki1.7).

Configuration of lighttpd

First enable FastCGI by linking /etc/lighttpd/conf-available/10-fastgi.conf into /etc/lighttpd/conf-enabled/. Then go in and change the bin-path to point to /usr/bin/php5-cgi. To prepare for the MediaWiki configuration, enable mod_rewrite in /etc/lighttpd/lighttpd.conf and finally create the file /etc/lighttpd/conf-enabled/20-mediawiki.conf with the following contents:

url.rewrite-once = (
    "^/$" => "/mw/index.php",
    "^/wiki/([^?]*)(?:\?(.*))?" => "/mw/index.php?title=$1&$2"
  )

Now it’s time to restart lighttpd.

Configuration of MediaWiki

Point a browser to http://your.server/mw/config/index.php to configure the site settings. When that’s done copy the file /var/lib/mediawiki1.7/config/LocalSettings.php one layer up.

That should be it! Enjoy!

Debian: pbuilder tip

I’m writing this mostly so that I’ll remember myself :-)

If pbuilder --create fails when building a base image for Sid then it’s worth trying to build a base image for the stable release and then upgrade to Sid afterwards. The very manual way is pbuild --login --save-after-login.

Packages for omnicodec and dataenc

As promised I’ve now created Debian packages for omnicodec. Since dataenc is a pre-requisite for building from source there are packages for it as well.

I’ve uploaded the source to Debian Mentors here and here. As usual pre-built binaries are available from my own APT repo.

ctypes and flexible (zero-length) arrays

Dear lazyweb, I’ve been racking my brain trying to get information out of the following C struct in a nice way when using ctypes:

struct Foo
{
    unsigned int length;
    char name[];
};

I wrote a little C function to get instance of the struct easily into python:

struct Foo *
put_name(char *name)
{
    struct Foo *foo = malloc(sizeof(struct Foo) + strlen(name));
    foo->length = strlen(name);
    memcpy(foo->name, name, foo->length);
    return(foo);
}

Now, how do I actually get the full contents of Foo.name in python? The only way I could think of that actually works is to create a dummy-struct type in order to get the length out and then use it to dynamically create a new sub-class of ctypes.Structure, then create an instance based on the address of what was returned. I think the following shows it pretty clearly:

class FOO(Structure):
    _fields_ = [('length', c_uint), ('name', c_char * 0)]

liblib = cdll.LoadLibrary('./_build/liblib.so')
c_put_name = liblib.put_name
c_put_name.argtypes = [c_char_p]
c_put_name.restype = POINTER(FOO)

def put_name(str):
    f = c_put_name(str)
    K = type('FOO%s' % f.contents.length, (Structure,),
                        {'_fields_' : [('length', c_uint), ('name', c_char * f.contents.length)]})
    return K.from_address(addressof(f.contents))

I still think there ought to be some other way that ‘feels’ nicer. I mean the use of “short arrays” (declared as here with [], or zero size as supported by GCC, or the more portable array of size one) seems to be common enough to warrant some support from ctypes, right?

Any suggestions for improvements?

Kid's play with HTML in Haskell

In my ever-continuing attempts to replace Python by Haskell as my language of first choice I’ve finally managed to dip a toe in the XML/HTML sea. I decided to use the Haskell XML Toolkit (HXT) even though it’s not packaged for Debian (something I might look into doing one day). HXT depends on tagsoup which also isn’t packaged for Debian. Both packages install painlessly thanks to Cabal.

As the title suggests my itch wouldn’t require anything complicated, but when I’ve previously have looked at any Haskell XML library I’ve always shied away. It all just looks so complicated. It turns out it looks worse than it is, and of course the documentation is poor when it comes to simple, concrete examples with adequate explanations. HXT would surely benefit from documentation at a level similar to what’s available for Parsec. I whish I were equipped to write it.

Anyway, this was my problem. I’ve found an interesting audio webcast. The team behind it has published around 90 episodes already and I’d like to listen to all of them. Unfortunately their RSS feed doesn’t include all episodes so I can’t simply use the trusted hpodder to get all episodes. After manually downloading about 20 of them I thought I’d better write some code to make it less labour-intensive. Here’s the complete script:

module Main where

import System.Environment
import Text.XML.HXT.Arrow

isMp3Link = (==) "3pm." . take 4 . reverse

myReadDoc = readDocument [(a_parse_html, "1"), (a_encoding, isoLatin1),
    (a_issue_warnings, "0"), (a_issue_errors, "0")]

myProc src = (runX $ myReadDoc src >>> deep selectMp3Links >>> getAttrValue "href")
    >>= mapM_ putStrLn

selectMp3Links = hasName "a" >>> hasAttrValue "href" isMp3Link

main = do
    [src] <- getArgs
    myProc src

The thing that took by far the most time was finding out that hasAttrValue exists. I’m currently downloading episodes using the following command line:

curl -L $(for h in $(runhaskell get_mp3links.hs 'http://url.of.page/'); do \
    echo '-O' $h; done)

Yet another set of itches where Haskell has displaced Python as the utensil used for scratching. :-)

Saddle, two SDDL related tools

I’ve just uploaded two small tools that makes it a little easier to deal with SDDL (Security Descriptor Description Language, this is a good resource for SDDL):

  1. saddle-ex - “extract” the security descriptor for a number of different kinds of objects
  2. saddle-pp - a pretty printer for an SDDL string

Full build instructions are included. Download the archive here.

The tools are written (mostly) in Haskell and I’ll probably upload it all to hackage at some point in the future. I’m holding out for a fix to ticket 2097 since a fix for it will allow me to add a third tool which I feel belong in the package.

Suggestions for improvements are always welcome, patches even more so :-)

Omnicodec released

Four days ago I uploaded omnicodec to Hackage. It’s a collection of two simple command-line utilities for encoding and decoding data built on the dataenc package. I’m planning on building a Debian package for it as soon as I find the time.

Haskell on Windows

Recently I’ve had reason to write a few tools for Windows. Nothing complicated, and since there was nothing “mission critical” (there hardly ever is for me :-) ) I decided to try to write them in Haskell rather than Python or C/C++. This post contains some sort of recollection of my experience so far.

First off I should probably disclose that I don’t particularly like Windows. Besides feeling more at home on a Unix (Linux) system I find that Windows also upsets me on a very basic level. If I spend a whole day at work in Windows I tend to go home more stressed and irritated than after a day in Linux. Part of this is probably due to my familiarity with Linux. My hope was that the combination of the immense joy of programming Haskell and my loath of the Windows platform I would end up with something that at least was bearable. It turns out I did.

My setup

Besides installing the pre-built GHC I also installed Visual Studio Express more later to explain the need for it. I don’t like the Visual Studio environment so I installed CMake and jEdit ((ordinarily I would install Vim but I thought I’d try out another editor. Vim on Windows is somewhat “leaky”, i.e. it sometimes shows behaviour that betrays its Unix roots.)). Last but not least I installed Console, it addresses the fact that Windows’ “DOS-box” is an amazing piece of crap.

FFI on Windows

First, GHC comes with a number of Windows-specific modules and they cover an impressive number of the Win32 API functions. The Win32 API is simply huge and one will sooner or later come find the need to add to what those modules offer. Second, GHC is able to compile C code by delegation to MinGW, however MinGW doesn’t completely cover the Win32 API either and if one is unlucky MinGW isn’t enough for the task at hand. Of course that’s exactly what happened to me. So, I installed Visual Studio Express, but since I don’t like VS that much and I didn’t want VS solutions in my darcs repository I decided to use CMake to build a library that I then instruct GHC, through a Cabal file, to link with my Haskell code. The end result is that building requires a few extra steps before the familiar configure/build sequence. It works remarkably well really, and I’ll definately use that strategy again if I need to.

There is a benign warning thrown when linking the final executable though:

Warning: .drectve `/manifestdependency:"type='win32' name='Microsoft.VC90.CRT'
version='9.0.21022.8' processorArchitecture='x86' publicKeyToken='1fc8b3b9a1e18e3b'"
/DEFAULTLIB:"uuid.lib" /DEFAULTLIB:"uuid.lib" /DEFAULTLIB:"MSVCRT" /DEFAULTLIB:"OLDNAMES" '
unrecognized

It seems that it might be possible to skip the extra steps in the future if Cabal ticket #229 is addressed.

Windows System modules

Admittedly I haven’t been using much of the functionality in these modules, but it happens that the only function I needed had a bug that results in a segmentation fault, sometimes. See GHC ticket #2097 for some details. I guess this confirms my impression of most Haskell hackers being Linux/BSD/Unix users.

Conclusions

I won’t drop Haskell on Linux anytime soon, but I’ve found Haskell on Windows to be possible. As I suspected when I started out, Haskell does make Windows somewhat easier to bear.

Rotonyms in Haskell

I have to say rotonyms seem kind of pointless, but since there is a post with a Python solution I thought I’d make an attempt at a solution in Haskell. Of course I didn’t manage on my own, I received some much needed help from Cale and dmwit on #haskell.

First some rotation functions, as far as I know there’s no built-in rot13 encoding in Haskell:

rot13Char c = chr ( (ord c - ord 'a' + 13) `mod` (ord 'z' - ord 'a' + 1) + ord 'a')
rot13 = map rot13Char

In order to ease working with rotonyms I introduced a type to represent one. This turned out to pay off later on.

data Rotonym = R String

It would be easy to let the compiler derive Show, but I wanted to mimick the output produced by Will’s solution in Python:

instance Show Rotonym where
    show (R w) =
        let r = rot13 w
        in (min w r) ++ "\t" ++ (max w r)

Now I’m ready for the meat of the matter, reading the word list, pick out the rotonyms and finally print them. Following Will’s cue I represent the word list as a set of strings (importing Data.Set as S), this also eliminates duplicates.

main = do
    words <- liftM lines $ readFile "WORD.LST"
    let wordset = S.fromList words
        isRotonym w = S.member (rot13 w) wordset
        rots = S.fromList [R w | w <- words, isRotonym w]
    mapM_ (putStrLn . show) (S.elems rots)

Sticking Rotonym instances in a set requires them to be instances of Ord and Eq:

instance Eq Rotonym where
    (R w1) == (R w2) =
        let r1 = rot13 w1
            r2 = rot13 w2
        in (w1 == w2 && r1 == r2) || (w1 == r2 && w2 == r1)

instance Ord Rotonym where
    compare (R w1) (R w2) =
        let r1 = rot13 w1
            r2 = rot13 w2
        in compare (length w1, (min w1 r1)) (length w2, (min w2 r2))

dmwit pointed out that what is actually going on here is taking an intersection of the the words and their rotated counterparts. This means the main could be written

main = readFile "WORD.LST" >>=
        mapM_ (putStrLn . show) . S.toList . S.map R . findRotonyms . S.fromList . lines
    where findRotonyms ws = ws `S.intersection` (S.map rot13 ws)

Another idea that dmwit came up with was to introduce a class for rot13:

class Rot13 a where
    rot13 :: a -> a

Char is an instance of that class with the expected implementation of rot13:

instance Rot13 Char where
    rot13 c = chr ((ord c - ord 'a' + 13) `mod` (ord 'z' - ord 'a' + 1) + ord 'a')

If a is an instance of Rot13 then list of a is as well, again with the expected implementation:

instance (Rot13 a) => Rot13 [a] where
    rot13 = map rot13

It is possible to intersect lists (Data.List.intersect) but it turned out to be painfully slow, so back to sets again. A set of a is an instance of Rot13:

instance (Ord a, Rot13 a) => Rot13 (S.Set a) where
    rot13 = S.map rot13

Finding all rotonyms is then straight forward:

main = readFile "WORD.LST" >>=
        mapM_ (putStrLn . show) . S.toList . S.map R . findRotonyms . S.fromList . lines
    where findRotonyms ws = ws `S.intersection` (rot13 ws)

Back to listening to pauldotcom

I just began listening to episode 95 of pauldotcom and was glad to hear that they enjoyed my email. Here’s the complete email I sent them:

Well, something must have changed since I last communicated with you (see http://therning.org/magnus/archives/257). I’m not sure what though. I heard you when you were on the Linux Reality audio cast and thought I’d check you out again, just to see what you were up to. Well, episode 92 (both parts) was bloody brilliant, episode 93 was good too, and now I’m halfway into episode 94. I have no recollection of the earlier episodes being this organised and good. At some point when I wasn’t listening you must have learnt to rock!

I enjoy the tech segment. The amount of banter is down and the episodes move along a lot more than I remember. No offence to Twitchy, but I’m not sad he isn’t as involved any more, you know, Kramer is brilliant but Seinfeld just wouldn’t be a good show if he were in each and every scene. Twitchy has more of a “celebrity guest” personality… The only criticism I have, and this is pushing it I know; given my walk to work I’d prefer each episode to be around 60 minutes, rather than 80-90 ;-)

Keep it up!

/M

PS I’m planning on posting this email on my blog. I’ll put any reply from you on there as well.

Reading my email on the show sure beats any reply they could have sent by email :-) At some point I have to go back and check out the other podcast I stopped listening to

BSD or GPL?

Today I had a long and interesting discussion about what is “more free”, GPL or the BSD license. I did a little searching on the internet afterwards and found the following posts quite illuminating:

I’ll refrain from saying where I come down on this issue :-)

The KLM way of respecting privacy...

I just received an email from KLM UK. I have probably given them my email address and thereby my consent to them sending me “informative emails”. Still, I think their words sound a bit hollow when the link to unsubscribe takes me to doubleclick:

If you do not wish to receive future communication from us click here to unsubscribe. KLM is firmly committed to respecting your privacy. We don’t share your information with any third party without your consent.

Shame on you!

Podcasts I no longer listen to

  • The security catalyst—in the end it was simply too vague and a bit too “managerial” to suit me.
  • PaulDotCom Security Weekly—spending 90 minutes a week listening to 3-4 friends (not mine) drinking beers, talking about security-related news snippets you’ve already read and sometimes making wildly inaccurate claims regarding the security of MacOSX/Windows/Linux. Thanks but no thanks! I have better things to do with my time.

C and Haskell sitting in a tree...

A few days ago I thougth I’d take a look at calling C functions from haskell. I wrote up the following set of files:

foo.h:

    int foo(int i);

foo.c:

    int
    foo(int i)
    {
        return i * i;
    }

Foo.hs:

    module Main where
    
    import Foreign.C.Types
    
    main = do
        r <- foo 2
        putStrLn $ show r
    
    foreign import ccall safe "foo.h foo" foo :: CInt -> IO CInt

Compiling the C file was of course no problem:

% gcc -c foo.c

The haskell file offered some resistance:

% ghc -c Foo.hs
Foo.hs:9:8: parse error on input `import'

It took me a round on haskell-cafe before I found out that ghc needs to be told to use the foreign function interface, -ffi or -fffi:

% ghc -c -fffi Foo.hs

Linking is a snap after that:

% ghc -o foo foo.o Foo.o
% ./foo
4

It’s also possible to build and link it all in one go:

% ghc --make -fffi -o foo foo.c Foo.hs

Now, that’s pretty nice, however it’d be even nicer to use cabal to do the building. At the same time I decided to put c2hs to use. It seemed to be a lot easier than having to create the import statements manually. I ended up with the following:

_csrc/foo.h_:

    #ifndef _FOO_H_
    
    int foo(int);
    
    #endif

csrc/foo.c:

    #include "foo.h"
    
    int
    foo(int i)
    {
        return i * i;
    }

I couldn’t get cabal to accept Foo.chs as the file containing the Main module in my project. So I ended up putting all the relevant code in Foo and then have a dummy Main.

src/Foo.chs:

    module Foo where
    
    #include "foo.h"
    
    import Foreign.C.Types
    
    main = do
        r <- {# call foo #} 2
        putStrLn $ show r

Here’s the dummy Main.

src/Main.hs:

    module Main where
    
    import qualified Foo
    
    main = Foo.main

The cabal file is rather straight forward. It took me a round on haskell-cafe to find out how to let the compiler know that I need the foreign function interface without putting compiler directives in the source file.

cnh.cabal:

name: cnh
version: 0.1
build-depends: base

executable: cnh
main-is: Main.hs
hs-source-dirs: src
include-dirs: csrc
c-sources: csrc/foo.c
extensions: ForeignFunctionInterface
other-modules: Foo

Nothing special is needed in the Setup.hs:

    #! /usr/bin/env runhaskell

    import Distribution.Simple
    main = defaultMain

Make it executable and you can build in two easy steps:

% ./Setup.hs configure && ./Setup.hs build