Regular Haskelling. How?

- Magnus Therning

Ever since ICFP 2014 I’ve had as a goal to get into the habit of coding in Haskell. It’s been the language I enjoy most for a few years now, but being surrounded by and talking to so many brilliant developers as I did during that week really drove home that I will only have more fun the more code I write. My goal was not very ambitious; just write something in Haskell most days every week. So far I’ve managed to keep it up.

These are a few tricks I’ve used and they’ve worked well for me so far.

Just write, no matter what, just write

In ninth grade a rather successful Swedish author visited my school and what I remember most from that is one thing he said:

Just read! It doesn’t matter what. It doesn’t matter if what you read isn’t considered good literature; read Harlequin books if that’s what you like, read magazines, read comics. Just read!

I think the same holds for writing code; it’s only with practice that one gets comfortable expressing oneself in a particular language.

Fix warts

I can’t actually think of any piece of code I’ve written that doesn’t have some warts. It may be in the form of missing features, or quirks (bugs) in the implementation that forces the user to regularly work in a less-than-optimal way. I’ve found fixing warts in tools and libraries I use myself to be one of the most rewarding tasks to take on; the feedback is so immediate that every fix cause a boost in motivation to fix the next one.

Exercise sites

Sometimes it’s simply difficult to find the motivation to tackle working on an existing project, and inspiration for starting something new might be lacking too. This happens to me regularly, and I used to simply close the lid on the computer earlier, but now I try to find some exercises to do instead.

There are several sources of exercises. I know Project Euler is rather popular among new Haskellers, but there are others.

  • CodeEval is a site with problems in three different levels. It may be extra interesting for people in the US since some of the problems are sponsored by companies which seem to use the site as a place for recruiting. So far I’ve only seen American companies do that, but I suppose it might catch on in other parts of the world too. Haskell is one of several languages supported.
  • Exercism is both a site and a tool. The goal is to facilitate learning of languages. On first use the tool will download the first exercise, and after completion one uses it to upload the solution to the site. Once uploaded the solution is visible to other users, and they are allowed to “nitpick” (comment). After uploading a solution to one exercise the next exercise in the series becomes available. It supports a rather long list programming languages.

I like both of these, but I’ve spent more time on the latter one. Personally I find the idea behind Exercism very appealing and I’ve been recommending it to a couple of co-workers already.

Feel free to put links to other sources of exercises in the comments.

Simplify old code

With more practice comes more and more insights into what functions are available and how to string them together. When I don’t even feel like doing a full exercise on Exercism I just dig out something that smells a little and clean it up. Anything is fair game, no matter how tiny. Just take a look at my implementation of reThrowError.

What else?

I’d love to hear tips and tricks from other people who aren’t lucky enough to have a day job where they get to write Haskell. How do you keep up the learning and practice?

Dijkstra quotes from EWD1284

- Magnus Therning

I recently read through this long post entitles Object Oriented Programming is an expensive disaster which must end. I have to agree I largely agree with what he writes, but I don’t think I ever could have written such a well-researched article, and absolutely never one of equal length ;)

It does include some nice quotes and references and so far I’ve only read one of the many that I bookmarked, Computing Science: Achievements and Challenges (EWD1284). It does include a few quips that, based on other articles I’ve read, seem fairly typical to Dijkstra. I simply love the way he expressed his opinions at times.

This one really ought to have been in the lengthy post on the OOP disaster:

After more than 45 years in the field, I am still convinced that in computing, elegance is not a dispensable luxury but a quality that decides between success and failure; in this connection I gratefully quote from The Concise Oxford Dictionary a definition of “elegant”, viz. “ingeniously simple and effective”. Amen. (For those who have wondered: I don’t think object-oriented programming is a structuring paradigm that meets my standards of elegance.)

And his thoughts on formal methods are well-known of course, as are his thoughts on iterative design. However, I rather like how he expresses a certain level of disgust of the Anglo-Saxon world when writing about those topics:

The idea of a formal design discipline is often rejected on account of vague cultural/philosophical condemnations such as “stifling creativity”; this is more pronounced in the Anglo-Saxon world where a romantic vision of “the humanities” in fact idealizes technical incompetence. Another aspect of that same trait is the cult of iterative design.

It amazes me every time I read something by someone like Dijkstra, just how much things stay the same, even in a field like computing science, which is said to be moving so fast.

Optparse-applicative and custom argument parsers

- Magnus Therning

The latest update of optparse-applicative triggered me to look over the functions in cblrepo for parsing a few custom command line options. I used to do the parsing in a rather ad-hoc way with lots of use of list functions to split on specific characters. For instance, some option values are pairs of package name and version separated by a comma: PKG-NAME,VERSION. This worked fine and was easy to plug into version 0.10 of optparse-applicative. It was also easily extended to triples, PKG-NAME,VERSION,RELEASE, but it started feeling a bit brittle when some tuples got extended with an optional list of flag assignments, PKG-NAME,VERSION[:FLAG,FLAG,FLAG,...]. The recent release of version 0.11 of optparse-applicative changed the API for custom option value parsers radically; instead of passing a string to the parser, the parser has to use readerAsk to get the string. In short, ReaderM turned into a state monad.

In adjusting to the new API I noticed that the code was organised in such a way that some low-level parsing functions were used directly from command line option definitions, while also being used as building blocks for the more complex parsers. This of course meant that the structuring of the functions needed to be changed completely to deal with the API change.

It turns out there already was a parser that was written in a different style (here already adjusted to the 0.11 API):

readerGhcVersion :: ReadM Version
readerGhcVersion =
    arg <- readerAsk
    case lastMay $ readP_to_S parseVersion arg of
        Just (v, "") -> return v
        _ -> fail $ "cannot parse value `" ++ arg ++ "`"

So I rewrote the rest of the parsers in a similar style. The arguably most complicated is this one:

readPkgNVersion :: ReadP (String, Version)
readPkgNVersion = do
    n <- many (satisfy (/= ','))
    char ','
    v <- parseVersion
    return (n, v)

readFlag :: ReadP (FlagName, Bool)
readFlag = readNegFlag <++ readPosFlag
    where
        readNegFlag = do
            char '-'
            n <- many (satisfy (/= ','))
            return (FlagName n, False)

        readPosFlag = do
            n0 <- get
            n <- many (satisfy (/= ','))
            return (FlagName (n0 : n), True)

strCblPkgArgReader :: ReadM (String, Version, FlagAssignment)
strCblPkgArgReader = let
        readWithFlags = do
            (n, v) <- readPkgNVersion
            char ':'
            fas <- sepBy readFlag (char ',')
            return (n, v, fas)

        readWithoutFlags = do
            (n, v) <- readPkgNVersion
            return (n, v, [])

    in do
        s <- readerAsk
        case lastMay (readP_to_S (readWithFlags <++ readWithoutFlags) s) of
            Just (r, "") -> return r
            _ -> fail $ "Cannot parse: " ++ s

It is slightly longer, but it’s rather a lot easier to read what’s happening after this rewrite. ReadP feels like a lighter option than pulling in parsec as a dependency, but I’d love to hear any comments or suggestions, as well as pointers to how other people deal with parsing of non-trivial types of arguments in combination with optparse-applicative.

Script for migrating from WordPress to Hakyll

- Magnus Therning

As I wrote about in a previous post on converting post from WordPress to Hakyll I couldn’t quite find a tool that met my needs out of the box. I had a look at the source of hakyll-convert but it uses a library for RSS parsing, which sounds good. However, the export of posts from WordPress is in an extended RSS format, among other things it contains the comments of posts. Unfortunately it doesn’t look like the RSS library supports the WordPress extensions, so modifying hakyll-convert to also extract comments seems like a bit more work than I’d like to put into it. Especially since I had a feeling that hacking up something using tagsoup would be quite a bit faster.

I put the resulting script in gist on github. I call it bbwp, which is short for Bye, bye, WordPress.

Adding tags

- Magnus Therning

Adding tags to a Hakyll site brought some surprises. In retrospect it all makes sense, but it took some thinking on my part to work out the why of it all. The resulting code was heavily influenced by Erik Kronberg’s site.

First I thought I’d just add tags to each rendered post, by building the tags

tags <- buildTags "posts/*" (fromCapture "tags/*.html")

then adding them to the post context

let postCtx =
        field "previousPostUrl" (previousPostUrl "posts/*") <>
        field "previousPostTitle" (previousPostTitle "posts/*") <>
        field "nextPostUrl" (nextPostUrl "posts/*") <>
        field "nextPostTitle" (nextPostTitle "posts/*") <>
        field "postId" getPostId <>
        tagsField "tags" tags <>
        listFieldFunc "comments" defaultContext (getComments "comments/*") <>
        baseCtx

and last modify the template

<p>Tags: $tags$</p>

Easy! Except it doesn’t work that way. The $tags$ is always empty. To actually get the tagsField to work as intended it’s necessary to build the tag pages, which can be accomplished using tagsRules

tagsRules tags $ \ tagStr pattern -> do
    route idRoute
    compile $ do
        posts <- loadAll pattern >>= recentFirst
        let tagsCtx =
                constField "thetag" tagStr <>
                listField "posts" baseCtx (return posts) <>
                baseCtx
        makeItem ""
            >>= loadAndApplyTemplate "templates/tag-post-list.html" tagsCtx
            >>= loadAndApplyTemplate "templates/default.html" tagsCtx
            >>= relativizeUrls

The template for the tags pages is very simple at the moment

<h1>Posts tagged $thetag$</h1>
<ul>
  $for(posts)$
  <li>
    <a href="$url$">$title$</a> - $date$
  </li>
  $endfor$
</ul>

That’s it. With that in place the $tags$ field renders properly in the post pages as well.

Adding support for comments

- Magnus Therning

It seems most people using Hakyll or other static site generators rely on services like Disqus, but I really don’t like the idea of putting a bunch of JavaScript on each page and dynamically loading all comments off some cloud storage. It sorts of fly in the face of the idea of having a static site to begin with. Searching online resulted in a few posts related to a plugin for static comments in Jekyll.

This post only covers dealing with the comments, and not how the reader actually submits a comment. I’ll save that for the next post.

Code changes

I settled on the following naming scheme for comments. The comments for a post P, which is found at posts/<P>.mkd will be put into files named comments/<P>-c000.mkd, comments/<P>-c001.mkd, and so on. The crucial bits are that, first the post’s name is a prefix of all its comments’ names, and two the identifiers (basically the filenames) of the comments are, just like identifiers for posts, easy to sort in date order.

Adding a rule for the comments is easy:

match "comments/*" $ compile pandocCompiler

Then it got a little more tricky. The comments for each post needs to be put into the context used to build the posts. Previously I’ve used field, which takes a function turning an Item String into String. I’ve also used listField which is used to tie a key to a list of Item a. What I needed here though doesn’t seem to exist, i.e. a context function that takes an Item a and returns a list of Item s. So after a bit of studying the source of field and listField I came up with listFieldFunc:

listFieldFunc :: Show a => String -> Context a -> (Item a -> Compiler [Item a]) -> Context a
listFieldFunc key ctx func = Context $ \ k i -> if k == key then value i else empty
    where
        value i = do
            is <- func i
            return $ ListField ctx is

The function for extracting a post’s comments can then be written as

getComments :: (Binary a, Typeable a) => Pattern -> Item a -> Compiler [Item a]
getComments pattern item = do
    idents <- getMatches pattern >>= sortChronological
    let iId = itemIdentifier item
        comments = filter (isCommentForPost iId) idents
    mapM load comments

isCommentForPost :: Identifier -> Identifier -> Bool
isCommentForPost post comment = let
        postBase = takeBaseName $ toFilePath post
        cmtBase = takeBaseName $ toFilePath comment
    in isPrefixOf postBase cmtBase

Adding the key to the context used for the posts results in

let postCtx =
        field "previousPostUrl" (previousPostUrl "posts/*") <>
        field "previousPostTitle" (previousPostTitle "posts/*") <>
        field "nextPostUrl" (nextPostUrl "posts/*") <>
        field "nextPostTitle" (nextPostTitle "posts/*") <>
        field "postId" getPostId <>
        listFieldFunc "comments" defaultContext (getComments "comments/*") <>
        baseCtx

Template changes

The template changes are trivial of course

$for(comments)$
<div>
<p>$author$</p>
$body$
</div>
$endfor$

Links to previous and next post

- Magnus Therning

Currently the landing page contains all the posts I’ve written so far. That works for now, but it won’t for very long if I keep on writing posts. It certainly won’t work when I get around to importing all the posts from my old blog. Clearly I need to limit the posts that appear in index.html at some point, but before that two things need to be in place:

  1. The post titles on the landing page should be links to the single-post page.
  2. The single-post pages should be linked together with links to the previous and next posts.

I found another hakyll-based site that already implemented such previous-next links, Richard Goulter’s blog. He even has a post about its implemantation. It was a rather straigth-forward implementation and it served well as inspiration.

Code changes

I want the links to have the post titles, not just general text like “previous post” and “next post”. That meant putting a total of four values into the context used to render the individual post pages, two URLs and two titles. That means postCtx will look like this:

let postCtx =
        field "previousPostUrl" (previousPostUrl "posts/*") <>
        field "previousPostTitle" (previousPostTitle "posts/*") <>
        field "nextPostUrl" (nextPostUrl "posts/*") <>
        field "nextPostTitle" (nextPostTitle "posts/*") <>
        baseCtx

In implementing the functions to extract the URLs and titles I first tried working on a snapshot but that only resulted in circular dependencies, so I had to follow Richard’s lead and used Identifier to drive the whole thing. When done with functions for extracting both URL and title there was an obvious pattern so I generalised a little and refactored into a single function to do the grunt work:

withRelatedPost:: (MonadMetadata m, Alternative m) =>
    (Identifier -> [Identifier] -> Maybe t) -> (t -> m b) -> Pattern -> Item a -> m b
withRelatedPost r f pattern item = do
    idents <- getMatches pattern >>= sortRecentFirst
    let id = itemIdentifier item
        prevId = r id idents
    case prevId of
        Just i -> f i
        Nothing -> empty

The r argument is a function that given the ID of the current item, and a list of the IDs of all items (sorted with the most recent first) returns the ID of a single item. With that it’s possible to create two more functions, one that operates on the previous post and one on the next post:

withPreviousPost :: (MonadMetadata m, Alternative m) => (Identifier -> m b) -> Pattern -> Item a -> m b
withPreviousPost = withRelatedPost itemAfter
    where
        itemAfter x xs = lookup x $ zip xs (tail xs)

withNextPost :: (MonadMetadata m, Alternative m) => (Identifier -> m b) -> Pattern -> Item a -> m b
withNextPost = withRelatedPost itemBefore
    where
        itemBefore x xs = lookup x $ zip (tail xs) xs

Now getting the URLs and titles becomes a few single-line functions:

previousPostUrl :: Pattern -> Item String -> Compiler String
previousPostUrl = withPreviousPost (fmap (maybe empty toUrl) . getRoute)

previousPostTitle :: Pattern -> Item String -> Compiler String
previousPostTitle = withPreviousPost (\ i -> getMetadataField' i "title")

nextPostUrl :: Pattern -> Item String -> Compiler String
nextPostUrl = withNextPost (fmap (maybe empty toUrl) . getRoute)

nextPostTitle :: Pattern -> Item String -> Compiler String
nextPostTitle = withNextPost (flip getMetadataField' "title")

What’s left is putting the fields to use in a template. The template itself is in the following section. The compilation step is modified to use the new context, and the new template is put in after the snapshot so the added links don’t appear in the index.html.

pandocCompiler
    >>= loadAndApplyTemplate "templates/single-post.html" postCtx
    >>= saveSnapshot "posts-content"
    >>= loadAndApplyTemplate "templates/single-post-prev-next.html" postCtx
    >>= loadAndApplyTemplate "templates/default.html" postCtx
    >>= relativizeUrls

Template changes

The added template is short and rather short and obvious:

$body$

<div>
$if(previousPostUrl)$
<a href="$previousPostUrl$">&#10232; $previousPostTitle$</a>
$endif$

$if(nextPostUrl)$
<a href="$nextPostUrl$">$nextPostTitle$ &#10233;</a>
$endif$
</div>

Converting posts from WordPress

- Magnus Therning

I’ve found two tools for converting posts from a WordPress site to Hakyll[ (or to Jekyll too I suppose):

I ran both tools on an export of my posts (a tool included in WordPress), and both tools spat out one file per post. So far it looked good. Then I put all the posts into my Hakyll blog project and tried to build the site.

hakyll-convert

  • The conversion finished without any reported errors.
  • The individual files were rather badly named, the name of each was based on the post ID rather than on the post date (but there’s a bug report for that).
  • That posts were consumed by the build script without problems.
  • The resulting HTML was not satisfactory, but that’s not due to the tool, instead it’s my choice of using GeSHi (via WP-Syntax).

exitwp

  • The conversion finished with a few reported errors, which I didn’t investigate further.
  • The posts were not consumed by the build script due to categories and tags not being comma separated, but rather a Yaml list.

The conclusion is that hakyll-convert will be what I use, but it’ll take a little while before I get around to importing my old posts since it’ll require manual edits to ensure they look all right.

Dipping toes into CSS, normalizing

- Magnus Therning

The site doesn’t look that good. Actually it is pathetically simple. I’m however not that good at CSS, I also don’t really have a good sense of design, so making it look good is going to be an up-hill battle. The first step is easy though, add a standard CSS file to normalize the look.

The CSS

I grabbed normalize.css from the web site.

Template changes

Well, obviously the CSS has to be pulled into the web pages. The full webpages are provided by templates/default.html and the line that needs adding is

<link rel="stylesheet" type="text/css" href="/css/normalize.css" />

Of course it goes into the <head> section of the file.

Code changes

The changes to the build script are equally straight forward.

    match "css/*" $ do
        route idRoute
        compile copyFileCompiler

I opted to just copy the file rather than compress it usign compressCssCompiler. I don’t think the speedup is really worth it, and I personally find it very handy to be able to read the CSS files on sites that I think look nice. Of course I need to enable others to do the same.

Adding a feed

- Magnus Therning

A blog isn’t complete without a feed, so that was the first thing to add. Luckily there’s a good tutorial on adding a feed in Hakyll. The following code snippets are basically just copied from that tutorial.

I’ve also decided to start publishing the result; it’s available on my github pages.

RSS or atom?

I decided to use atom since it’s a standard. Yeah, not more to write about that I suppose.

Code changes

The first thing to do was to add a feed configuration. It’s a simple adaption on what’s found in the tutorial.

postFeedConfig :: FeedConfiguration
postFeedConfig = FeedConfiguration
    { feedTitle = "Magnus web site"
    , feedDescription = "Random stuff"
    , feedAuthorName = "Magnus Therning"
    , feedAuthorEmail = "magnus@therning.org"
    , feedRoot = "http://magthe.github.io"
    }

Then the build logic has to be extended with a rule for making atom.xml. This is also a straight forward adaptation of the information found in the tutorial.

    create ["atom.xml"] $ do
        route idRoute
        compile $ do
            posts <- fmap (take 50) . recentFirst =<< loadAllSnapshots "posts/*" "posts-content"
            let feedCtx = baseCtx <> bodyField "description"
            renderAtom postFeedConfig feedCtx posts

Once again the snapshot of the posts comes in handy. Since the old WordPress site limited the feed to the 50 latest posts I decided to do that in the new one too. Maybe it’s a bit excessive, 10-20 ought to be enough, but I’ll leave it for now. The feed-specific context is a little nice details, also from the tutorial. The feed builder requires the presence of a description for each post, but to avoid having to remember to add one to all posts I just add a description field containing the body of the post.

The generated feed is available at ./atom.xml

The current Hakyll build script

- Magnus Therning

I’m fairly sure I understand the current build script, and it seems to be rather minimal. Hopefully it can serve as an introductory for someone who, just like me, is new to Hakyll.

The site layout

Before getting into the build script it’s worth describing the folder layout of the project, it looks like this:

.
├── build_site.hs
├── index.html
├── posts
│   ├── 2014-09-23-000-moving-to-hakyll.mkd
│   └── 2014-09-23-001-hakyll-build-script.mkd
└── templates
    ├── default.html
    └── single-post.html

The templates

single-post.html
A template for, as the name suggests, a single post. It renders the post into an HTML snippet.
default.html
The main template, i.e. the one that provides the entire structure of a complete HTML page.

It should also be noted that index.html basically is a template itself. These three files fit together such that each post is turned into an HTML snippet (single-post.html), all the snippets are then pulled into index.html, which finally is wrapped into a proper page by default.html.

The script

The full script looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#! /usr/bin/runhaskell

{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid
import Hakyll

main :: IO ()
main = hakyll $ do
    match "posts/*" $ do
        route $ setExtension "html"
        compile $ pandocCompiler
            >>= loadAndApplyTemplate "templates/single-post.html" baseCtx
            >>= saveSnapshot "posts-content"
            >>= loadAndApplyTemplate "templates/default.html" baseCtx
            >>= relativizeUrls

    match "index.html" $ do
        route idRoute
        compile $ do
            posts <- recentFirst =<< loadAllSnapshots "posts/*" "posts-content"
            let indexCtx =
                    listField "posts" baseCtx (return posts) <>
                    baseCtx
            getResourceBody
                >>= applyAsTemplate indexCtx
                >>= loadAndApplyTemplate "templates/default.html" indexCtx
                >>= relativizeUrls

    match "templates/*" $ compile templateCompiler

baseCtx :: Context String
baseCtx =
    dateField "date" "%Y-%m-%d" <>
    constField "site-title" "Magnus web site" <>
    constField "site-subtitle" "Random stuff" <>
    constField "author" "Magnus Therning" <>
    defaultContext

The only slightly unnecessary thing is that all posts are turned into complete web pages (line 14), but none of those files is actually reachable from the generated landing page. My plan is to limit the number of posts on the landing page so they will be used later on.

For the moment some site constants are put into the base context. I’m not completely convinced that’s the wise thing to do, but I have a vague feeling that it’s better putting stuff like that into the context than hard code them into the template. I guess the better solution would be to have a configuration file for them though. That’s for the future though, for now I’m keeping it simple.

Moving to Hakyll

- Magnus Therning

I’ve decided to ditch WordPress and switch to Hakyll. I’ll keep the old site intact until I’ve gotten around to import all the posts. That’ll probably take quite a while.

As you can see the site is terribly bare bones at the moment. It’ll stay like this until I work out enough of Hakyll to get most of the items into the site. There’s simply no point in getting distracted with the craziness that’s CSS and making it pretty until I’m happy with the content handling.

More Visitor (in Python)

- Magnus Therning

Right after writing the previous post on the Visitor pattern in Python I picked up another paper on the same topic, Visitor Combination and Traversal Control. Of course this one also used Java for its examples, so once again I decided to use Python to explore the ideas presented.

The first part of the paper is all about writing small visitors that then are combined into more complicated ones. This part is nice but not that exciting. The interesting bit is when it gets to controlling traversal, which means it’s possible to remove the traversal code that usually appear in the accept method each visited type has to implement. Let’s see how that can look in Python.

The full code in this post is available at https://gist.github.com/magthe/beddad5c627946f28748.

First we need a structure to traverse, a simple tree will do.

class Tree(Visitable):
    def __init__(self, left, right):
        self.left = left
        self.right = right

class Leaf(Visitable):
    def __init__(self, value):
        self.value = value

def build_tree():
    l0 = Leaf(0)
    l1 = Leaf(1)
    t0 = Tree(l0, l1)
    l2 = Leaf(2)
    t1 = Tree(t0, l2)
    l3 = Leaf(3)
    l4 = Leaf(4)
    t2 = Tree(l3, l4)
    return Tree(t1, t2)

But before this we really should define Visitor, the base class for visitors, and Visitable, the base class of everything that can be visited.

class Visitor:
    def visit(self, obj):
        getattr(self, 'visit_' + obj.__class__.__name__)(obj)

    def visit_Tree(self, t):
        pass

    def visit_Leaf(self, l):
        pass

class Visitable:
    def accept(self, visitor):
        visitor.visit(self)

We’ll throw in a visitor for printing the whole tree too:

class Print(Visitor):
    def visit_Tree(self, t):
        print('Tree (%s)' % hex(id(t)))

    def visit_Leaf(self, l):
        print('Leaf (%s): %i' % (hex(id(l)), l.value))

Due to the lack of traversal in the accept methods it’s easy to be underwhelmed by the Print visitor:

In [32]: build_tree().accept(Print())
Tree (0x7f1680681a90)

To address this we first need a visitor combinator that runs two visitors in sequence. Unsurprisingly we’ll call it Sequence. Its constructor takes two visitors, and for each node in the tree it passed each one to the node.accept method.

class Sequence(Visitor):
    def __init__(self, first, then):
        self.first = first
        self.then = then

    def visit_Tree(self, t):
        t.accept(self.first)
        t.accept(self.then)

    def visit_Leaf(self, l):
        l.accept(self.first)
        l.accept(self.then)

The next building block is a visitor that descends one level down from a Tree node.

class All(Visitor):
    def __init__(self, v):
        self.v = v

    def visit_Tree(self, t):
        t.left.accept(self.v)
        t.right.accept(self.v)

At this point it’s worth noting that the name All probably isn’t very well chosen, since we don’t really get all nodes:

In [33]: build_tree().accept(All(Print()))
Tree (0x7f1680681278)
Tree (0x7f1680681be0)

We only descend one level, but we still keep the name since that’s the name they use in the paper.

With this in place it does become possible to create combinations that do traverse the full tree though. It even becomes rather simple. Traversing top-down is a simple matter of using a sequence that ends with All, and bottom-up is a matter of using a sequence starting with All.

class TopDown(Sequence):
    def __init__(self, v):
        Sequence.__init__(self, v, All(self))


class BottomUp(Sequence):
    def __init__(self, v):
        Sequence.__init__(self, All(self), v)

First top-down:

In [34]: build_tree().accept(TopDown(Print()))
Tree (0x7f1680681ef0)
Tree (0x7f16806814a8)
Tree (0x7f16806813c8)
Leaf (0x7f1680681278): 0
Leaf (0x7f1680681550): 1
Leaf (0x7f1680681a90): 2
Tree (0x7f1680681f28)
Leaf (0x7f1680681ba8): 3
Leaf (0x7f1680681a20): 4

Then bottom-up:

In [35]: build_tree().accept(BottomUp(Print()))
Leaf (0x7f1680681ba8): 0
Leaf (0x7f16806814a8): 1
Tree (0x7f1680681a90)
Leaf (0x7f16806813c8): 2
Tree (0x7f1680681550)
Leaf (0x7f1680681278): 3
Leaf (0x7f16806a1048): 4
Tree (0x7f16806a1198)
Tree (0x7f16806a1390)

That’s all rather cute I think.

Dealing with Microsoft Products, or Battling Loss of Gumption

- Magnus Therning

I was slightly frustrated and irritated with a situation at work today, which caused me to think about the word “gumption” as it’s used in Pirsig’s Zen and the Art of Motorcycle Maintenance. That led me to Wikipedia’s article on gumption trap which in turn led me to learn about the concept of learned helplessness.

So, what was the situation and how is it connected to learned helplessness?

The rest is just slightly tongue-in-cheek ;)

What to standardise on

I’m in situation where the powers-that-be have standardised on applications. Not on open formats or open protocols, but on specific applications that use proprietary formats and proprietary protocols. Of course these applications suck. That’s what a lack of competition does, it removes any will for a company to actually make improvements to their applications! Some of these applications have captured such a large market share that reverse engineering of the formats was inevitable. Yay! That means I can use a sane OS and vastly better applications. However, one protocol is not reverse engineered yet and I’m forced to use the standard application. This application is painful to use and only runs on a crap OS.

How bad can it be? you ask. The application is Outlook, the OS is Windows! Yes! It’s that bad. Hence the thoughts of gumption, or rather the loss of it. Which is exactly what starting Outlook causes. Every time!

Connection to learned helplessness

It continues to amaze me that companies standardise on Windows and applications that only run on Windows. There are better alternatives, especially in this day and age with fast networks and powerful and fast execution environments that completely sidestep the whole question of which OS to run. Still there seems to be very little will to upgrade to Linux, or to standardise on web-based applications. Why is that? In the past I’ve thought it might be the network effect. Most often I’ve come to the conclusion that it most likely is simple inertia. What’s the explanation for the inertia though?

This is where learned helplessness can offer an explanation. People have been conditioned and have grown so used to Windows and other Microsoft products that they simply don’t recognise that there now is a way out. No matter how many escape routes that become avilable people simply won’t see them.

What to do about it

As the experiments on dogs showed there is hope (from the wikipedia page):

To change their expectation and to recover the dogs from helplessness, experimenters had to physically pick up the dogs and move the legs in a close replication of the physical actions the dogs needed to take to remove themselves from the electrified grid. This had to be replicated at least 2 times before the dogs would exhibit the functional response of jumping over the barrier to get away from the electrified grid. Threats, rewards, and observed demonstrations had no observed effect in helping the dogs to independently move away from the shocks.

Oh how I whish I could pull off the direct translation to my work place: re-install my co-workers computers and replace servers and services. Too bad that’s not a realistic plan. What I can do though is civil disobedience (or maybe it should be called something like civil disobedience in the workplace instead). By simply not conforming and at the same time showing that there are better ways of getting the job done others will hopefully notice and either adopt my way, or come up with something that suits them better (which I then can learn from). Even if that doesn’t happen at least I’ll keep my gumption at healthy levels :)

What I’m doing at the moment

This is what I’m doing at work right now to avoid loss of gumption:

  • Use Linux as my main OS.
  • Run Windows in a VM.
  • Use pandoc to generate MSWord docs.
  • Use LibreOffice.

Finally, for Outlook. The decision of the powers-that-be to disable IMAP forces me to:

  • Limit my mail reading to twice per day.
  • Be logged into Skype to make up for not reading mail more often.

Visitor and Walkabout (in Python)

- Magnus Therning

A couple of weeks ago I found a link to a stackoverflow question I’d saved away a long time ago. I’d saved it due to having asked myself the exact same question and being curious about the papers when I found that answer. Over the last few weeks I’ve made my way through those papers and as so often I found a couple of references to other papers that sound interesting. One such paper was The Essence of the Visitor Pattern by Jens Palsberg and C. Barry Jay. There was however one little thing that bugged me with the Walkabout pattern and I thought I’d try to work that out for myself. Of course I’m using Python rather than Java ;)

The full code can be found at https://gist.github.com/magthe/ad6e23fb560a8a494fd2

Visitor

The Visitor pattern separates the traversal and the operation. It does this by using an accept method in the classes making up the data structure, this method takes a Visitor instance which implements the operation to be carried out. This enables adding new operations without modifying the data structure.

First we need a simple structure to play with: a tree where each node can have zero or more sub-trees.

class Tree:
    def __init__(self, value, children=[]):
        self.value = value
        self.children = children

The implementation of accept for this type is rather obvious:

    def accept(self, visitor):
        visitor.visit(self)
        for c in self.children:
            c.accept(visitor)

Next we need an implemention of a Visitor that all visitors can derive from. Since Python’s dispatch doesn’t depend at all on the type of the argument we’ll have to implement the necessary behaviour ourselves, i.e. inspect the type and then pick the correct method to call.

class Visitor:
    def visit(self, obj):
        func_name = 'visit_' + obj.__class__.__name__
        visit_func = getattr(self, func_name)
        visit_func(obj)

In order to visit a Tree type it also needs the appropriately named method:

    def visit_Tree(self, obj):
        pass

Now it’s easy to create a visitor. Here’s a very simple one:

class TreePrintVisitor(Visitor):
    def visit_Tree(self, obj):
        print('Tree (%s): %i' % (hex(id(obj)), obj.value))

Finally here’s a function that exercises what we’ve just put together:

def test_visitor():
    leaves = [Tree(42), Tree(17)]
    tree = Tree(1, leaves)
    printer = TreePrintVisitor()
    tree.accept(printer)

Looking at this it’s easy to see the objections Palsberg and Jay present in their paper:

  1. A data type is only ‘visitable’ if it has an accept method, and
  2. we must know the types of all objects we want to visit, so changes to the class structure requires changes to the visitors.

The authors then introduce Walkabout in order to remove these limitations.

Walkabout

To remove these limitations the authors use reflection to find if the visitor has a method to carry out the operation on the type of the current object. If such a method doesn’t exist they use reflection to find all members of the object and then visits them. The Walkbout class and its visit method then looks something like this:

class Walkabout:
    def visit(self, obj):
        func_name = 'visit_%s' % obj.__class__.__name__
        if hasattr(self, func_name):
            visit_func = getattr(self, func_name)
            visit_func(obj)
        elif hasattr(obj, '__dict__'):
            for m in obj.__dict__.keys():
                self.visit(getattr(obj, m))

The accept method can be removed from Tree and the visitor is changed to include code to continue the traversal:

class TreePrintWalkabout(Walkabout):
    def visit_Tree(self, tree):
        print('Tree (%s): %i' % (hex(id(tree)), tree.value))
        for c in tree.children:
            self.visit(c)

The function for exercising this only changes slightly:

def test_walkabout():
    leaves = [Tree(42), Tree(17)]
    tree = Tree(1, leaves)
    printer = TreePrintWalkabout()
    printer.visit(tree)

This is where Palsberg and Jay stop, but I think there is one little detail in Walkabout that’s worth looking a little closer at.

Walkabout Two

I personally find it a little strange that the authors first note that the visitor pattern suffers from the limitation that one has to know up front the full set of types to visit, then they don’t recognise that their own solution to this instead require one to know the shape of (parts of) the structure to operate on. In other words, the classes deriving from Walkabout are in charge of carrying on the traversal (the last two lines of visit_Tree above).

This little detail is of course easy to work around, we just modify visit to always visit all members irrespective of whether there is a method to handle the current object. There may be cases where we want to stop the walking about for efficiency reasons, we can address that at the same time and WalkaboutTwo will then look like this:

class WalkaboutTwo:
    def visit(self, obj):
        func_name = 'visit_%s' % obj.__class__.__name__
        descend = True
        if hasattr(self, func_name):
            visit_func = getattr(self, func_name)
            descend = visit_func(obj)
        if descend and hasattr(obj, '__dict__'):
            for m in obj.__dict__.keys():
                self.visit(getattr(obj, m))
        if descend and hasattr(obj, '__iter__'):
            for o in obj:
                self.visit(o)

One little detail above is that if we use only reflection to traverse the Tree we won’t actually find the sub-trees as immediate members since they are contained in a list. We address this by checking whether the object has an iterator as well, and if it does we visit all items.

2014-06-12 Fixed minor typo, thanks to Joe for pointing it out.

Localised configuration in Vim: localcfg

- Magnus Therning

For quite a while I’ve been using a small Vim plugin that lets me write configuration that is specific to a system, it loaded a config file based on the system’s host name. Unfortunately I can’t seem to find that plugin anywhere now, so I’ve put it in a snippet. This allowed me to easily create a single clean Vim configuration and check it into version control, while still allowing for settings that are unique to a system.

Lately I’ve found it slightly limited though, I really wanted to use other things to trigger the loading of some piece of configuration. So I wrote my first ever Vim plugin: localcfg

Hopefully someone will find it useful.

Phabricator on ArchLinux

- Magnus Therning

At work we’ve been using Trac for quite a while now, but it’s always interesting to look at other options. When listening to a recent episode of git Minutes on Blender’s move to using git I heard of Phabricator for the first time. There are good instructions for how to install it on Ubuntu, but I couldn’t find any for ArchLinux.

These are my notes for installing Phabricator on ArchLinux. Hopefully someone will find them useful.

Basic setup

I performed this install in a virtual machine (using VirtualBox). The virtual machine is configured with a bridged network and I gave it the FQDN of phabarch.vbox.net.

Packages

Beyond the packages installed as part of the basic install I needed the following packages:

  • lighttpd
  • git
  • mariadb
  • php
  • php-cgi
  • php-apcu
  • php-gd

Setup of mariadb

Following the instructions on the ArchLinux wiki page on mariadb is first started the service and then finished the installation:

# systemctl start mysqld.service
# mysql_secure_installation

After this I restarted the service, and made sure it’ll restart on system start:

# systemctl restart mysqld.service
# systemctl enable mysqld.service

I picked the root password mariaroot.

Setup of lighttpd

Modify /etc/lighttpd/lighttp.conf to look something like this:

server.modules = (
  "mod_rewrite",
  "mod_fastcgi",
)

server.port = 80
server.username  = "http"
server.groupname = "http"
server.document-root = "/srv/http"
server.errorlog  = "/var/log/lighttpd/error.log"
dir-listing.activate = "enable"
index-file.names = ( "index.php", "index.html" )
static-file.exclude-extensions = ( ".php" )
mimetype.assign  = (
  ".html" => "text/html",
  ".txt" => "text/plain",
  ".css" => "text/css",
  ".js" => "application/x-javascript",
  ".jpg" => "image/jpeg",
  ".jpeg" => "image/jpeg",
  ".gif" => "image/gif",
  ".png" => "image/png",
  "" => "application/octet-stream"
  )

fastcgi.server += ( ".php" =>
  ((
    "bin-path" => "/usr/bin/php-cgi",
    "socket" => "/var/run/lighttpd/php.socket",
    "max-procs" => 1,
    "bin-environment" => (
      "PHP_FCGI_CHILDREN" => "4",
      "PHP_FCGI_MAX_REQUESTS" => "10000",
    ),
    "bin-copy-environment" => (
      "PATH", "SHELL", "USER",
    ),
    "broken-scriptfilename" => "enable",
  ))
)

$HTTP["host"] =~ "phabarch(\.vbox\.net)?" {
  server.document-root = "/srv/http/phabricator/webroot"
  url.rewrite-once = (
    "^(/rsrc/.*)$" => "$1",
    "^(/favicon.ico)$" => "$1",
    # This simulates QSA (query string append) mode in Apache
    "^(/[^?]*)\?(.*)" => "/index.php?__path__=$1&$2",
    "^(/.*)$" => "/index.php?__path__=$1",
  )
}

Setup of php

Modify /etc/php/php.ini and enable the following extensions:

  • mysqli.so
  • openssl.so
  • iconv.so
  • apcu.so
  • gd.so
  • posix.so

Also disable the open_basedir setting.

Getting and setting up Phabricator

Checking it out

I placed it in /srv/http:

# cd /srv/http
# git clone git://github.com/facebook/libphutil.git
# git clone git://github.com/facebook/arcanist.git
# git clone git://github.com/facebook/phabricator.git

Database configuration

# cd /srv/http/phabricator
# ./bin/storage upgrade --user root --password mariaroot
# ./bin/config set mysql.user root
# ./bin/config set mysql.pass mariaroot

Set the base URI

# cd /srv/http/phabricator
# ./bin/config set phabricator.base-uri 'http://phabarch.vbox.net/'

Diffusion configuration

# mkdir /var/repo
# ./bin/config set diffusion.allow-http-auth true

At this point I started lighttpd and used the web interface to configure environment.append-paths to include the path of git-core, /usr/lib/git-core.

phd configuration

First create the daemon user

# useradd -r -M -d /tmp phabd

Then create the phd log dir and set its owner and group:

# mkdir /var/tmp/phd/log
# chown -R phabd:phabd /var/tmp/phd/

Also make the daemon user the owner of the repo folder used by Diffusion:

# chown phabd:phabd /var/repo

Configure sudo

Create the file /etc/sudoers.d/phabricator with this content:

http ALL=(phabd) SETENV: NOPASSWD: /usr/lib/git-core/git-http-backend

User configuration

In the web interface set a VCS password for the user.

Play

Now the system should be ready to be played with :)

How do professional Windows programmers stand Visual Studio?

- Magnus Therning

I have a new assignment at work and now find myself at yet another Windows shop. They are making embedded systems, but have for some strange reason decided that Windows is the only development platform to use. After only a few weeks here I’m noting a growing irritation with the tools offered for my use. The amount of forced mouse usage is astounding and the main tool, Visual Studio, is turning out to be the main culprit. After a week or so of exposure to VS I’ve found what I consider to be a serious flaw with a tool for developers: it doesn’t scale.

  1. No hierarchical structure It doesn’t scale very well with the size of a project. The concept of having a solution with a number of projects is not bad. But the implementation doesn’t scale. A project can’t have sub-projects, which means I can’t really layer the code in the IDE in the same way I do on disk. The only thing I can do is organise viewing of files through the concept of filters.
  2. All configuration is manual, part 1 MS seems to have optimised for small projects. All configuration is kept in XML files, and the only interface to them is a set of property dialogues (some which can be resized, others not) requiring an amazing amount of pointing and clicking to get anything done.
  3. All configuration is manual, part 2 MS have optimised for beginning new projects. Getting started is amazingly quick, but once you reach a size of about 10 projects it becomes daunting to fix anything that requires configuration in all projects. Making sure that all configurations are correct is a major undertaking, and requires an insane amount using the mouse. Some earlier versions of VS seem to even have made it impossible to edit the common settings of configurations properly; a simple mistake and the value of one configuration is lost.
  4. Experience There are no shortcuts to discover. The configuration is all in XML, which means it’s not really possible to jump out of VS and use a text editor for fixing up semi-broken configurations (like dealing with someone’s decision to place all intermediate files and final results in the top-level solution directory).

So, how do Windows developers cope with this? Don’t they use VS? Do they police the configurations diligently to ensure no silliness creeps in? Or are they all using tools to address these points (like CMake)?

TCLAP for command line argument parsing in C++

- Magnus Therning

A long while ago I was looking for a way to handle command line arguments in a C++ program I was writing for Windows. At the time I only found Boost.Program_options. After a bit of experimenting I found that the pre-built Boost libs I found back then had some issues and after a bit of thinking I decided to write the program in Python instead :) Now I once again find that I need to handle command line arguments in C++ on Windows, but this time I found quite a few options, gflags, ezOptionParser, TCLAP. They are all stand-alone, meaning that I don’t need to pull in a huge dependency like Boost or Qt, and liberally licensed (BSD3 and MIT) so usage in at work is no problem. After a bit of playing with all three I found that TCLAP is most to my liking, but gflags does have one nice little feature–it allows putting command line option definitions pretty much anywhere in the code. This would solve one of the problems I’m facing; the tool shall consist of a main program and pluggable modules, where each module must be able to add command line arguments of its own. However, it turns out that the TCLAP API easily allows implementing such a scheme. Here’s how I implemented it in the experiment code I’ve been playing with this weekend.

How it works

The basic idea is to create a central instance of TCLAP::CmdLine that can be used to register options with in the program and the pluggable modules. By declaring the option instances as top-level variables in compilation units it will be enough to just load a pluggable to the constructors run, and by passing the central TCLAP::CmdLine to the constructors they will register themselves properly. This seems to be same idea used in gflags.

Registering a command line option

The following code is the code my pluggable module used to register an option:

1
2
3
4
5
6
7
TCLAP::ValueArg<int> value("v",
                           "value",
                           "ONE: the value to return",
                           false,
                           42,
                           "integer",
                           argparse::argparser);

The parser

I put the parser in its own namespace, and due to the API of TCLAP::CmdLine I found that I needed to subclass it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <tclap/CmdLine.h>

namespace argparse {

class ArgParser : public TCLAP::CmdLine {
public:
    ArgParser() : TCLAP::CmdLine("No message set.", ' ', "No version set.") {}

    void setMessage(std::string msg) { _message = msg; }
    void setVersion(std::string version) { _version = version; }
};

extern ArgParser argparser;

}

The main function

After this I just need to instantiate the central parser instance and set it up.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
argparse::ArgParser argparse::argparser;

int main(int ac, char **av)
{
    argparse::argparser.setMessage("The message.");
    argparse::argparser.setVersion("0.0.0");

    // load the the plugin modules

    argparse::argparser.parse(ac, av);

    // do the work

    return 0;
}

Limitations

The main limitation I can see with this approach, and AFAICS that would be true with gflags as well, is that it’s not very easy to find the plugin modules to load by passing them on the command line. If the TCLAP API allowed parsing without acting on --help and --version, as well as be forgiving with arguments it didn’t know how to handle, it would be possible to use an option in the main application to find the modules, load them, and then re-parse the command line. In my particular case it doesn’t matter much, the plugin modules will all be found in a well-known place and all available modules should be loaded every time. It does make testing a bit more cumbersome though.

Eclipse and greyed out #ifdef sections

- Magnus Therning

A note to the future me:

Are your #ifdef sections greyed out despite switching to a profile where the macro is set?

Read this bug comment!

Strachey, referential transparency, Haskell

- Magnus Therning

This is my input into the recent discussion on referential transparency (RT). I’m nowhere near as well versed in the subject as others, but how am I ever to learn anything unless I put my thoughts out there for them to be laughed at and ridiculed? ;)

It all started with a [post on stackoverflow.com](http://stackoverflow.com/questions/210835/what-is-referential-transparency, which received several very long and detailed responses, in particular from Uday Reddy (here and here. His answers were also linked to from Reddit. His second response contains a link to an excellent paper by Strachey, Fundamental concepts in programming languages. I’d go as far as saying that, despite it being lecture notes rather than a fully worked paper, it ought to be required reading for all software developers.

The rest of what I write here hinges on me actually understanding what Strachey writes in his paper. Of course I’m looking forward to comments/corrections/etc that help me correct my understanding.

What Strachey says about RT

In section 3.2.1 he introduces RT like this:

One of the most useful properties of expressions is that called by Quine referential transparency. In essence this means that if we wish to find the value of an expression which contains a sub-expression, the only thing we need to know about the sub-expression is its value. Any other features of the sub-expression, such as its internal structure, the number and nature of its components, the order in which they are evaluated or the colour of the ink in which they are written, are irrelevant to the value of the main expression.

There is however a crucial bit just before that:

Like the rest of mathematics, we shall be concerned only with R-values.

That is, he starts out with a very limited subset of what most people would consider a usable imperative programming language.

He then dives into some more details in section 3.2.2 by adding the concept of environment, which is handled through the use of a where-clause, or alternatively using let-statements (this ought to be making any Haskell developer feel right at home). After a few interesting sections on stuff like applicative structure, evaluation, and conditional expressions he finally tackles the issue of variables in section 3.3.1. There are two pieces to the trick, the first is to take advantage of his earlier insight that lead to a split of values into L-values and R-values:

If we consider L-values as well as R-values, however, we can preserve referential transparency as far as L-values are concerned. This is because L-values, being generalised addresses, are not altered by assignment commands. Thus the command x := x+1 leaves the address of the cell representing x (L-value of x) unchanged although it does alter the contents of this cell (R-value of x). So if we agree that the values concerned are all L-values, we can continue to use where-clauses and lambda-expressions for describing parts of a program which include assignments.

The cost of this is that the entire theory constructed earlier for operations taking R-values now has to be revised to incorporate L-values. The outline for this is in the rest of section 3.3 and it basically comes down to include an abstract store in the environment. However, before doing that he mentions that:

I think these problems are inevitable and although much of the work remains to be done, I feel hopeful that when completed it will not seem so formidable as it does at present, and that it will bring clarification to many areas of programming language study which are very obscure today. In particular the problems of side effects will, I hope, become more amenable.

He does reach his goal, but it’s a bit unfortunate that he stops short of considering the wider of problem of side effects. My assumption is that this would have to be dealt with in a similar way to assignment, but that would mean that rather than just adding an store to the environment the world, or a subset of it, would need to be added.

An open question (to me) is if anyone has built on Strachey’s work in this area and thought of the details of RT and general side effects?

RT in Haskell

The original question described RT as

it means you can replace equals with equals

which I actually think is a rather good, and very short, description of it. It’s not the full story, there are further important details, but it’s a good first intuition. Also, it’s a description usable in Haskell. Well, to be slightly more nuanced, it good for Haskell without IO (Haskell-IO). However, this is where the strict type system of Haskell really starts to shine because (here I’m probably a bit imprecise) we only have R-values in Haskell-IO. If we want to use assignment we add the use of a state monad, and we do that explicitly.

A former colleague of mine said that in Haskell we need to build up our own computational models ourselves. For instance, if we need assigment we use State, if we need to record progress we use Writer, etc. In other languages the language designer has already made all those choices for us, we don’t get to make them ourselves. For RT it means that Haskell is more explicit in what the environment of a function is.

Moving on to general side effects those are also more explicit in Haskell since they have to happen inside the IO monad. That alone is a great boon for RT in Haskell since it becomes explicit where RT as worked out by Strachey applies directly, and where there are (hopefully amenable) problems of side effects left. Even further, in Haskell it’s possible to make subsets of IO (by wrapping IO, see e.g. my own posts on wrapping IO, part 1 and wrappint IO, part 2. I’m sure that if including the world in the environment is the way to achieve RT with general side effects, then it’s highly beneficial to be able to create subsets of the world.

RT in Haskell vs. RT in (common) imperative languages

Uday writes in his first answer that:

But, today, functional programmers claim that imperative programming languages are not referentially transparent. Strachey would be turning in his grave.

This may well be true, but I think that when a Haskell programmer says it, he’s only twitching slightly. The reason? Strachey writes:

Any departure of R-value referential transparency in a R-value context should either be eliminated by decomposing the expression into several commands and simpler expressions, or, if this turns out to be difficult, the subject of a comment.

Which is something that Haskell programmers do naturally by use of IO. That is, in Haskell you either have an R-value, and you clearly see that you do, or you put in a comment, which is encoded in the type of the function.

This rather lengthy post basically arrives at the following, which is what I suspect the user [pacala is saying about RT on Reddit][reddit-pacala]:

Imperative languages my well be RT, but when trying to understand a code base the environment of each function is so large that understanding is an intractable problem. I don’t have this problem in Haskell.

Compiling boost for Windows, with MinGW on Linux

- Magnus Therning

Just in case you see the utter logic in developing for Windows on Linux :)

In the root of the unpacked Boost:

  • Run ./bootstrap.sh --with-python=$(which python2) --prefix=${HOME}/opt/boost-win --without-icu
  • Modify project-config.jam like this:

     # file.
     if ! gcc in [ feature.values <toolset> ]
     {
    -    using gcc ;
    +    using gcc : : i486-mingw32-g++ ;
     }
    
     project : default-build <toolset>gcc ;
  • Compile and install by running ./bjam --layout=system variant=release threading=multi link=shared runtime-link=shared toolset=gcc target-os=windows threadapi=win32 install

Limit what is built by adding e.g. --with-program_options to that last command.

Qt state machines and automatic timed transitions

- Magnus Therning

In the interest of full disclosure: this post is related to what I do for a living, development of and for embedded systems. I work for Semcon, but they don’t make me wear a suit and tie so these are my words, and mine alone.

A bit of background info

In a recent project we had a system where the turning of the wheels were controlled by a simple dial. It emitted pulses as it was turned and the pulse train was shifted slightly depending on the direction of the turn. In software this was mapped onto two signals, one for each direction, with one signal emitted for each pulse in the train. All very straight forward so far.

To avoid accidental change of direction we decided that

  1. only start turning the wheels after having received four initial signals, and
  2. if a full second without receiving any signal meant that the turning had stopped.

The solution

The application was to be implemented using Qt, so using the Qt state machine framework was an obvious choice. The full state machine wouldn’t have to be large, only 8 states. The initial state (sResting) would indicate that the system was in a steady state (no turning), from there any received signal would advance into a successive state (sOne, sTwo, sThree, sFour) to indicate the number of received signals. From the fourth state the machine would advance directly to a state (sTurning) where a received signal would initiate an actual turn of the wheels. The turning would happen upon the entry into two separate states (sTurnRight and sTurnLeft), each of these states would instantly return to sTurning. All of this is simple and straight forward, what wasn’t so clear was to implement the automatic return to the initial state after 1s of inactivity.

The implementation

As I like to do, I first experimented a little to find a suitable solution to the problem. What follows is the resulting code of that experiment. The final code used in the project ended up being very similar. It’s all based around the method postDelayedEvent() found in QStateMachine.

First off a new type of event is nedded, a ReturnEvent:

class ReturnEvent : public QEvent
{
public:
    ReturnEvent() : QEvent(QEvent::Type(QEvent::User + 1)) {}
};

There is also a need for a new type of transition, ReturnTransition:

class ReturnTransition : public QAbstractTransition
{
public:
    ReturnTransition(QState *target=0) { if(target) setTargetState(target); }

protected:
    virtual bool eventTest(QEvent *e) {
        return(e->type() == QEvent::Type(QEvent::User + 1));
    }

    virtual void onTransition(QEvent *) {}
};

For the experiment I decided to use a simple widget containing two buttons, it would also hold the state machine:

class MButtons : public QWidget
{
    Q_OBJECT;

public:
    MButtons(QStateMachine &m)
        : _right("Right"), _left("Left"),
        _m(m), _delayed(0) {
        QBoxLayout *lo = new QBoxLayout(QBoxLayout::TopToBottom);
        lo->addWidget(&_right);
        lo->addWidget(&_left);

        setLayout(lo);
    }
    virtual ~MButtons() {}

    QPushButton _right,
                _left;
    QStateMachine &_m;

The widget also holds the slots for all the state entry functions:

public slots:
    void sRestingEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
    }

    void sOneEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
        _delayed = _m.postDelayedEvent(new ReturnEvent, 1000);
    }

    void sTwoEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
        _delayed = _m.postDelayedEvent(new ReturnEvent, 1000);
    }
    void sThreeEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
        _delayed = _m.postDelayedEvent(new ReturnEvent, 1000);
    }
    void sFourEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
        _delayed = _m.postDelayedEvent(new ReturnEvent, 1000);
    }
    void sTurningEntered() {
        qDebug() << __PRETTY_FUNCTION__;
        if(_delayed) { _m.cancelDelayedEvent(_delayed); _delayed = 0; }
        _delayed = _m.postDelayedEvent(new ReturnEvent, 1000);
    }
    void sTurnRightEntered() {
        qDebug() << __PRETTY_FUNCTION__;
    }
    void sTurnLeftEntered() {
        qDebug() << __PRETTY_FUNCTION__;
    }

Sure, several of the entry functions could be folded into one, but in order to validate the idea it’s easier to make separate ones for each state. The pattern is easy to spot, on entry a delayed return event is registered (if there’s a previous one its replaced with a new), except for the steady state (sResting) where any delayed event is removed, and the turning states (sTurnRight and sTurnLeft) since those states immediately return to sTurning anyway.

Finally it also holds the handle for the delayed event:

private:
    int _delayed;
};

Now the main function for setting it all up is simple:

int main(int argc, char **argv)
{
    QApplication app(argc, argv);
    QStateMachine m;
    MButtons b(m);
    b.show();

    QState *sResting = new QState(),
           *sOne = new QState(),
           *sTwo = new QState(),
           *sThree = new QState(),
           *sFour = new QState(),
           *sTurning = new QState(),
           *sTurnRight = new QState(),
           *sTurnLeft = new QState();

    m.addState(sResting);
    m.addState(sOne);
    m.addState(sTwo);
    m.addState(sThree);
    m.addState(sFour);
    m.addState(sTurning);
    m.addState(sTurnRight);
    m.addState(sTurnLeft);
    m.setInitialState(sResting);

    sResting->addTransition(&b._right, SIGNAL(clicked()), sOne);
    sResting->addTransition(&b._left, SIGNAL(clicked()), sOne);
    sOne->addTransition(&b._right, SIGNAL(clicked()), sTwo);
    sOne->addTransition(&b._left, SIGNAL(clicked()), sTwo);
    sOne->addTransition(new ReturnTransition(sResting));
    sTwo->addTransition(&b._right, SIGNAL(clicked()), sThree);
    sTwo->addTransition(&b._left, SIGNAL(clicked()), sThree);
    sTwo->addTransition(new ReturnTransition(sResting));
    sThree->addTransition(&b._right, SIGNAL(clicked()), sFour);
    sThree->addTransition(&b._left, SIGNAL(clicked()), sFour);
    sThree->addTransition(new ReturnTransition(sResting));
    sFour->addTransition(sTurning);
    sTurning->addTransition(&b._right, SIGNAL(clicked()), sTurnRight);
    sTurning->addTransition(&b._left, SIGNAL(clicked()), sTurnLeft);
    sTurning->addTransition(new ReturnTransition(sResting));
    sTurnRight->addTransition(sTurning);
    sTurnLeft->addTransition(sTurning);

    QObject::connect(sResting, SIGNAL(entered()), &b, SLOT(sRestingEntered()));
    QObject::connect(sOne, SIGNAL(entered()), &b, SLOT(sOneEntered()));
    QObject::connect(sTwo, SIGNAL(entered()), &b, SLOT(sTwoEntered()));
    QObject::connect(sThree, SIGNAL(entered()), &b, SLOT(sThreeEntered()));
    QObject::connect(sFour, SIGNAL(entered()), &b, SLOT(sFourEntered()));
    QObject::connect(sTurning, SIGNAL(entered()), &b, SLOT(sTurningEntered()));
    QObject::connect(sTurnRight, SIGNAL(entered()), &b, SLOT(sTurnRightEntered()));
    QObject::connect(sTurnLeft, SIGNAL(entered()), &b, SLOT(sTurnLeftEntered()));

    m.start();

    return(app.exec());
}

Conclusion and open questions

I’m fairly happy with the solution, but I’d be curious how other people, people more skilled in using Qt, would have solved the problem.

For a while I considered solving the skipping of four initial signals using a single state and counter, but I saw no obvious easy way to implement that, so I instead opted to use separate states. Slightly wasteful of resources, but not too bad, and simplicity is important. I’m very curious to find out if there’s a simply way to implement it using a single state.

Manual setup of Qt+Eclipse on Windows

- Magnus Therning

Before the weekend I started looking at using Qt on Windows. More specifically I wanted to know whether this combination could be an option for a sub-project at work. We need to develop a program for the Windows desktop, and due to the overall context it would make sense to write it in C++ (that’s what we use for another part of the project). We already use both Eclipse and Visual Studio in the project, but I strongly prefer Eclipse, so I was hoping to be able to use it. However, it seems that the Qt developers strongly favour their own tool Qt Creator, though there are (outdated?) integrators for both Eclipse and Visual Studio. I’d rather avoid introducing a third IDE into a project—two is already one too many in my opinion. Anyway, I think I managed to find an acceptable configuration of Eclipse without using that old Qt integration plugin together with the MSVC (I was using the gratis version of MSVC for this).

Qt setup

I decided to install Qt into C:\QtSDK, and then I made the following permanent changes to the environment:

> set QTDIR=C:\QtSDK\Desktop\Qt\4.8.0\msvc2010
> set QMAKESPEC=%QTDIR%\mkspecs\win32-msvc2010
> set PATH=%PATH%;%QTDIR%\bin;C:\QtSDK\QtCreator\bin

Starting Eclipse so that it finds the compiler

It’s slightly disappointing that Eclipse happily lets one create MSVC project that isn’t buildable because it doesn’t know where the compiler is located. One easy way to remedy that seems to create a BAT file to create the proper environment for Eclipse:

@echo off
setlocal
call "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\vcvarsall.bat"
start C:\Eclipse\Indigo\eclipse.exe
endlocal

Creating the project

Creating a “makefile” project in Eclipse is fairly straight forward; one needs a C/C++ project, of the makefile type, and make it empty too so that there isn’t any cruft in the way. Then add a single source file, e.g. main.cxx:

#include <iostream>
#include <Qt/QtGui>

int main(int argc, char **argv)
{
    std::cout << __FUNCTION__ << std::endl;
    QApplication app(argc, argv);
    return(app.exec());
}

And then a project file, e.g. Test.pro:

TEMPLATE = app
TARGET = 
DEPENDPATH += .
INCLUDEPATH += .

CONFIG += qt

HEADERS +=
SOURCES += main.cxx

After this use qmake to create the required makefile. I decided to use a subdirectory (_build) in the project, which qmake seems to have full support for:

> qmake ..\Test.pro

Setting up building from Eclipse

In the project properties modify the C/C++ Build settings for the Debug target. Instead of the default build command (which is make) one can use nmake, or even better jom:

  • Build command: C:/QtSDK/QTCreator/bin/jom -f Makefile.Debug
  • Build directory: ${workspace_loc:/Test}/_build

Then one can create a Release target, which differs only in that it builds using Makefile.Release.

Running qmake from inside Eclipse

It’s very convenient to be able to run qmake and re-generate the makefiles from inside Eclipse. One can set that up by adding an external tool:

  • Location: C:\QtSDK\Desktop\Qt\4.8.0\msvc2010\bin\qmake.exe
  • Working directory: ${workspace_loc:/Test}/_build
  • Arguments: ../Test.pro

In closing

I plan to also have a look at the Qt Visual Studio Add-in, though I suspect we might be using the latest version of VS, which might cause trouble.

Suggestions for further integration with Eclipse would be most welcome, e.g. for forms and translations.

LXDE and multiple screens: replacing lxrandr with a script

- Magnus Therning

When using Gnome3 I was really impressed with the support for multiple screens. Then I switched to LXDE and was very disappointed in that desktop’s support for multiple screens. In fact so disappointed that I sat down and read the man-page for ‘randr’ and hacked up the following script:

#! /bin/bash

cmd=$1; shift

case $cmd in
    on)
        # turn on VGA1, auto-select size, right of laptop screen
        xrandr --output VGA1 --auto --right-of LVDS1
        ;;
    off)
        xrandr --output VGA1 --off
        ;;
    list)
        xrandr
        ;;
    *)
        echo "Commands: on, off, list"
esac

In my mind it’s vastly more usable than ‘lxrandr’ :)

0MQ and Haskell

- Magnus Therning

Ever since I heard the FLOSS weekly episode on 0MQ I’ve been looking for a reason to take a look at it. Well, to hell with reason, I’ll have a first look without any specific goal in mind.

I found a simple introduction to it in Nicholas Piël’s post ZeroMQ an introduction. The only issue was that it was based on Python, and Python2 at that. So here are my attempts at translating two of the clients to Haskell (using zeromq-haskell).

req-rep

Here’s the client in Python3 first:

import zmq

ctx = zmq.Context()
socket = ctx.socket(zmq.REQ)
socket.connect('tcp://127.0.0.1:5000')

for i in range(10):
    msg = "msg %s" % i
    socket.send_unicode(msg)
    print('Sending', msg)
    msg_in = socket.recv()

And here in Haskell:

import System.ZMQ
import Data.ByteString.Char8 as CBS

main = withContext 1 $ \ ctx -> withSocket ctx Req $ \ soc -> do
    connect soc "tcp://127.0.0.1:5000"
    let msgs = [pack ("msg " ++ show i) | i <- [0..9]]
    flip mapM_ msgs $ \ msg -> do
        send soc msg []
        CBS.putStrLn msg
        receive soc []

pub-sub

In Python3:

import zmq

ctx = zmq.Context()
socket = ctx.socket(zmq.SUB)
socket.connect('tcp://127.0.0.1:5000')
socket.setsockopt(zmq.SUBSCRIBE, b'sweden')
socket.setsockopt(zmq.SUBSCRIBE, b'denmark')

while True:
    print(socket.recv())

Haskell:

import System.ZMQ
import Control.Monad
import Data.ByteString.Char8 as CBS

main = withContext 1 $ \ ctx -> withSocket ctx Sub $ \ soc -> do
    connect soc "tcp://127.0.0.1:5000"
    subscribe soc "sweden"
    subscribe soc "denmark"
    forever $ receive soc [] >>= CBS.putStrLn

Two comments on the Haskell code here:

  • I’m not sure why, but the Haskell client dies after receiving just a few messages (they are properly filtered though).
  • The API for subscribe is a bit strange, it would make more sense if it took a ByteString rather than a String.

Shelltestrunner to the rescue

- Magnus Therning

A little while ago shelltestrunner was announced on haskell-cafe. At the time I was slowly losing hope on ever getting decent test coverage in cblrepo using HUnit. Using something like shelltestrunner could be an easier and more workable solution, especially since what cblrepo needed most in the short term is a bit of integration testing.

shelltestrunner is basically just a tool that runs shell commands and compares output (both stdout and stderr) and the exit code. It’s also possible to provide data to be passed to the command on stdin. The documentation on the shelltestrunner home page is very good and accessible. There are only a few things that I’d like to add to it:

  • Use the --with (`-w´) flag, it’s very handy to avoid littering the tests with long paths to the output of your build environment.
  • There is no support for set-up and tear-down steps in the tests (in my opinion this would be a very nice addition to the tool) so anything needed to be set up for the actual tests, will itself have to be tests.
  • There is no way to name tests (would be another good addition) so I found it crucial to organise tests into several files.

Compiling U-Boot for use in QEMU (VersatilePB)

- Magnus Therning

Since I’m now working a bit with embedded systems I thought I’d take a look at compiling for one of the ARM-based machines that QEMU supports. I settled for VersatilePB after finding this old-ish article. Rather optimistically I thought that maybe, just maybe things had change in a year and that the limitation of flash was removed. How wrong I was.

I did find an easier way to get it working, though with the limitation that Linux has to be started via tftpboot or some other network-based fashion. The patch looks like this:

--- u-boot.orig/src/u-boot-2011.12/include/configs/versatile.h
+++ u-boot/src/u-boot-2011.12/include/configs/versatile.h
@@ -31,6 +31,8 @@
 #ifndef __CONFIG_H
 #define __CONFIG_H
 
+#define CONFIG_ARCH_VERSATILE_QEMU
+
 /*
  * High Level Configuration Options
  * (easy to change)

Then just go ahead and modify the default boot argument (CONFIG_BOOTARGS in the same file) to your hearts content to minimise the amount of manual work for booting.

Adjusting to Sweden with XKB

- Magnus Therning

Having lived outside of Sweden for about a decade I’ve grown accustomed to non-Swedish keyboard layouts, first the US (as it’s widely used in The Netherlands) and later on the UK layout. Moving back to Sweden had me swearing over the layout used here within only a few days. The placement of “{[]}” is especially painful. Clearly the Swedish layout wasn’t designed for developers! Rather than go on muscle memory I decided to first attempt a small change to the X key mappings.

I found a good description of per-user XKB configuration after a bit of searching. Then I modified it slightly to fit better in my Arch-based LXDE system.

The XKB config

I started with removing all the configuration I’d previously put into /etc/X11/xorg.conf.d – if I’m to use per-user configuration then there should be no system-wide settings at all. Then I put the output of setxkbmap -print into ~/.xkb/maps/$(hostname) as a starting point. The main goal is to move the characters that requires awkward single-hand combinations with AltGr to slightly more comfortable mappings. After a bit of experimentation I settled on the following (which I put in ~/.xkb/symbols/sedev)

partial alphanumeric_keys
xkb_symbols "devkeys" {
    key  { [ q, Q, backslash ] };
    key  { [ w, W, asciitilde ] };

    key  { [ a, A, braceleft ] };
    key  { [ s, S, bracketleft ] };
    key  { [ d, D, bracketright ] };
    key  { [ f, F, braceright ] };
};

After setting it manually and verifying that the new mappings work I added it to my keymap, which ended up looking like this

xkb_keymap {
    xkb_keycodes  { include "evdev+aliases(qwerty)" };
    xkb_types     { include "complete" };
    xkb_compat    { include "complete" };
    xkb_symbols   { include "pc+se(nodeadkeys)+inet(evdev)+capslock(swapescape)+compose(paus)+sedev(devkeys)" };
    xkb_geometry  { include "pc(pc104)" };
};

Tying it together

Now all the remains is to load the new configuration on login. Based on madduck’s example I put the following into ~/.xprofile

# load XKB, if there is one
XKBDIR=${HOME}/.xkb
XKBMAPFILE=${XKBDIR}/keymap/$(hostname)
if [[ -f ${XKBMAPFILE} ]]; then
    xkbcomp -I${XKBDIR} ${XKBMAPFILE} ${DISPLAY}
fi

Now I just have to get used to using the new mappings.

LXDE and xmonad

- Magnus Therning

A few days ago I create the page on LXDE and Xmonad on the Xmonad area of the Haskell Wiki. It’s very short, mainly due to it being very simple to set it up. My config is a bit bare-bones at the moment though and I’m sure others have more to contribute.

And yes! This means I’ve left the Gnome camp. Quite possibly for good.

Xmonad and Gnome 3

- Magnus Therning

The upgrade to Gnome3 in ArchLinux a few days ago broke my previous setup that combined xmonad with Gnome. Gnome 3 has a fallback mode, but I found that the instructions for replacing metacity under Gnome 2 no longer worked. With some help from the xmonad mailing list (in particular Jens Petersen and his efforts of providing a working setup on Fedora) I now finally have a working setup again. Here’s how I did it.

Add a session file for use by Gnome Session (/usr/share/gnome-session/sessions/xmonad.session):

[GNOME Session]
Name=Xmonad session
RequiredComponents=gnome-panel;gnome-settings-daemon;
RequiredProviders=windowmanager;notifications;
DefaultProvider-windowmanager=xmonad
DefaultProvider-notifications=notification-daemon

And a desktop file for GDM (/usr/share/xsessions/xmonad-gnome-session.desktop):

[Desktop Entry]
Name=Xmonad GNOME
Comment=Tiling window manager
TryExec=/usr/bin/gnome-session
Exec=gnome-session --session=xmonad
Type=XSession

That’s all it takes. Of course I’ve raised a ticket against the Arch package.

Per-user Gnome 3 configuration

- Magnus Therning

Gnome 3 just hit the official ArchLinux repos a few days ago. It’s new, it’s slick, it’s shiny… but I don’t think it’s ready for general use just yet. It seems stable enough, but there’s just a few too many things missing to make it feel like it’s complete. Anyway, running Arch means that at times one has to live with not-quite-release-ready software anyway :-)

The biggest issue I’ve come across with Gnome 3, and especially Gnome Shell and the window manager, is configuring the themes. I was pointed to a fairly good article on customising the Gnome Shell, but it suggests modifying system files which is a bad thing to do even on single-user systems. So this post should be read as an addendum to that one.

First of all install the User Theme Gnome Shell Extension. The AUR packages available pull the source from its git repo because there doesn’t seem to be any releases of the extensions just yet. When using the bleeding edge source I ran into problems with Gnome Shell crashing so I advise against using it. I’ve had success with the source tagged at 3.0.1, you can find an Arch source package for Gnome Shell User Theme that I put together based on one of the AUR packages. Build and install that, then restart Gnome Shell (Alt-F2, r, return). Then verify that the extension has been loaded by using Looking Glass.

Then create copies of the default themes using rsync:

% rsync -a /usr/share/themes/Adwaita ~/.themes
% mv ~/.themes/Adwaita ~/.themes/Adwaita2
% mkdir -p ~/.themes/Default/gnome-shell
% rsync -a /usr/share/gnome-shell/theme ~/.themes/Default/gnome-shell

Then modify the file ~/.themes/Adwaita2/index.theme so that each mention of Adwaita says Adwaita2 instead, except for the cursor theme.

Make sure gnome-tweak-tool is installed (it’s in a package with the same name). Run it and change the shell theme to Default,the windows theme to Adwaita2, and the interface gtk+ theme to Adwaita2 as well.

Now you return to the article on configuring Gnome Shell, but instead of modifying the system files modify the ones in your ~/.themes.

ArchHaskell HABS with cblrepo

- Magnus Therning

As a follow-up to my earlier post on cblrepo I thought I’d convert a snapshot of ArchHaskell HABS to cblrepo. It’s mostly done as an exercise and to serve as an example. You can find it at http://www.kiwilight.com/~magnus/habs/. Of course I have used it to build all the packages, and I still have the result of that around, so if anyone asks I just might upload that as well.

Revisiting JSON in Haskell

- Magnus Therning

I just received an email with some praise for my earlier post on JSON in Haskell–it’s always nice to receive some praise ;-) However, the sender also mentioned that the mLookup function as coded there would blow up on incomplete JSON objects. That was by design, as a simplification, but the sender needed to deal with just that and asked if I had some more elegant solution than making every field in the data type a Maybe.

As I said, it’s always nice to receive praise, so here’s one solution that came to mind as I was reading the email.

I should mention that it relies on there being a reasonable default value for each type of the fields, and that the default is the same for all fields sharing a type.

First off, define a type class for types with default values:

class Defaultable d where
    def :: d

Then modify mLookup so that it uses Defaultable. I renamed it to mLookupAndReadJSON:

mLookupAndReadJSON a as = maybe def readJSON (lookup a as)

Now we need to provide some instances of Defaultable too. I limit this example to cover only GlossDef, so only the following instances are required:

instance Defaultable [a] where
    def = []

instance Defaultable a => Defaultable (Result a) where
    def = Ok def

Now it’s possible to decode incomplete JSON objects:

ghci> decode "{ \"GlossSeeAlso\": [\"GML\", \"XML\"] }" :: Result GlossDef
Ok (GlossDef {glossDefPara = "", glossDefSeeAlso = ["GML","XML"]})

I’m sure there are other ways of achieving what the author of the email asked for. Please let me know of them in comments.

Maintaining Haskell packages for a Linux distribution---cblrepo

- Magnus Therning

Maintaining a large set of Haskell packages for a Linux distribution is quite a chore. Especially if one wants to track Hackage as far as possible. Several distributions have tools to automatically convert Cabal-based packages into distribution packages, e.g. cabal2arch for ArchLinux and cabal-rpm. They are just conversion tools though, and the most time-consuming activity in maintaining Haskell packages is resolving and verifying dependencies.

At least that was my experience when I was actively involved in ArchHaskell. I only saw two options when adding or upgrading a package, either I worked out dependencies manually, or I simply tried it out. Neither of them was very appealing, and both were very time-consuming. It seemed obvious that I needed some tool to help out.

Enter cblrepo!

It allows me to maintain a database of specific versions of packages, and when I want to upgrade a package, or add a new one, it’ll verify that all dependencies can be satisfied. In other words, it helps me maintain a buildable set of packages at all times.

The tool also has some functionality that helps in tracking Hackage as packages are updated there.

Something about how it works

At the moment I maintain a small repository of Arch packages, mostly just to try out cblrepo and convince myself that it works. The work environment contains a database and a directory of patches:

% ls
cblrepo.db  patches/
%

The database is a cleartext file containing the information on the packages. It’s basically just a dump of the related Haskell datatype, encoded in JSON. The patches directory holds patches for Cabal files and PKGBUILD files. They must be named patch.cabal.<hackage name> or patch.pkgbuild.<hackage name> in order to be picked up by cblrepo.

There’s also an application directory (~/.cblrepo) for caching info about the packages available on Hackage:

% ls ~/.cblrepo
00-index.tar.gz
%

How to use it

A session with cblrepo looks something like this. First we update the information about what packages are available on Hackage:

% cblrepo idxsync
%

After that it’s possible to see what packages are out-of-date:

% cblrepo updates
cmdargs: 0.6.8 (0.6.9)
test-framework-th: 0.1.3 (0.2.0)
xml: 1.3.7 (1.3.8)
language-haskell-extract: 0.1.2 (0.2.0)
blaze-builder: 0.2.1.4 (0.3.0.0)
%

Let’s check whether cmdargs can be updated:

% cblrepo add -n cmdargs,0.6.9 %

It generates no output, so that means it’s possible to update. When attempting to add all the packages we run into a problem:

% cblrepo add -n cmdargs,0.6.9 \
> test-framework-th,0.2.0 \
> xml,1.3.7 \
> language-haskell-extract,0.2.0 \
> blaze-builder,0.3.0.0
Adding blaze-builder 0.3.0.0 would break:
  haxr : blaze-builder ==0.2.*

We’ll leave blaze-builder at the current version for now:

% cblrepo add cmdargs,0.6.9 \
> test-framework-th,0.2.0 \
> xml,1.3.7 \
> language-haskell-extract,0.2.0
%

After these updates we also need to make sure that all packages that depend on these ones are re-built, that is we need to bump their release version:

% cblrepo bump -n cmdargs \
> test-framework-th \
> xml \
> language-haskell-extract
Would bump:
test-framework
test-framework-hunit
test-framework-quickcheck2
%

Just re-run that without the -n to actually perform the bump. Now that all this is done we need to generate the files necessary to build the Arch packages. We can easily check what packages need re-building, and get a good order for building them:

% cblrepo build cmdargs \
> test-framework-th \
> xml \
> language-haskell-extract
cmdargs
xml
test-framework
test-framework-quickcheck2
test-framework-hunit
language-haskell-extract
test-framework-th
%

And generating the required files is also easy:

% cblrepo pkgbuild $(!!)
% tree
.
|-- cblrepo.db
|-- haskell-cmdargs
|   |-- haskell-cmdargs.install
|   `-- PKGBUILD
|-- haskell-language-haskell-extract
|   |-- haskell-language-haskell-extract.install
|   `-- PKGBUILD
|-- haskell-test-framework
|   |-- haskell-test-framework.install
|   `-- PKGBUILD
|-- haskell-test-framework-hunit
|   |-- haskell-test-framework-hunit.install
|   `-- PKGBUILD
|-- haskell-test-framework-quickcheck2
|   |-- haskell-test-framework-quickcheck2.install
|   `-- PKGBUILD
|-- haskell-test-framework-th
|   |-- haskell-test-framework-th.install
|   `-- PKGBUILD
|-- haskell-xml
|   |-- haskell-xml.install
|   `-- PKGBUILD
`-- patches

8 directories, 15 files
%

Now all that’s left is running makepkg in each of the directories, in the order indicated by cblrepo build above.

Unfortunately they won’t all build—generating the Haddock docs for test-framework-th fails. That’s however fairly easy to remedy by patching the PKGBUILD to disable the generation of docs.

I’ll get back to that in a later post though.

Your comments, please

Please leave comments and suggestions. I’m planning on uploading the source to github shortly.

On maintaining Haskell packages for a Linux distro

- Magnus Therning

When trying to maintain set of binary packages of Haskell libraries for a Linux distribution there are a few issues that come up:

  1. The set of packages must be compilable at all times, and
  2. Updating one package requires all packages that depend on it, in one or more steps, to be re-compiled.

The first requires keeping track of all dependencies of the packages in the set and making sure that they are satisfiable at all times. For a while I was doing this by simple attempting to compile all updated packages and check for breakages. Which was both time-consuming and a painful when build-failures had to be resolved.

The second requires bumping the package release number for all packages that are reachable when following the dependencies in the reverse direction. Doing this manually is tedious and very error prone in my experience.

Of course it ought to be possible to make this a lot easier with the help of a tool. The last few days I’ve been writing such a tool. This is how I’ve been using it so far.

Building the initial database

GHC in ArchLinux ships with a few Haskell libraries and ArchLinux also has a few Haskell packages in its base repositories. Since I don’t need to maintain any of those packages I decided to treat these as a sort of base. Adding those is as simple as this:

% head base-pkgs
base,4.2.0.2
array,0.3.0.1
bytestring,0.9.1.7
Cabal,1.8.0.6
containers,0.3.0.0
directory,1.0.1.1
extensible-exceptions,0.1.1.1
filepath,1.1.0.4
haskell98,1.0.1.1
hpc,0.5.0.5
% cblrepo addbasepkg $(cat base-pkgs)
Success

Then I need to add the packages of the binary repo provided by ArchHaskell. I wrote a little script that extracts the package name and version from the ArchHaskell HABS tree (get-ah-cabals):

#! /bin/bash

habsdir=$1

for d in ${habsdir}/habs/*; do
    . ${d}/PKGBUILD
    case $_hkgname in
        (datetime|haskell-platform)
            ;;
        (*)
            echo ${_hkgname},${pkgver}
            ;;
    esac
done

echo http://hackage.haskell.org/platform/2010.2.0.0/haskell-platform.cabal

Since haskell-platform isn’t on Hackage it requires special handling. The reason why datetime is excluded is slightly different. It’s the only package that requires old base (version <4). GHC in Arch does whip with both old and new base so datetime can be built, but cblrepo can’t deal with two versions of the same package. This is a limitation, but I’m not sure it’s worth fixing it since base is the only library that comes in two versions, and datetime is the only package that hasn’t been updated to use new base.

Knowing this it’s easy to add all the ArchHaskell packages to the database:

% cblrepo idxupdate
% cblrepo add $(get-ah-cabals path/to/habs)
Success

Attempting an update

Now it’s possible to attempt to attempt an update:

% cblrepo add neither,0.2.0
Failed to satisfy the following dependencies for neither:
  monad-peel >=0.1 && <0.2
Adding neither 0.2.0 would break:
  yesod : neither >=0.1.0 && <0.2
  persistent : neither >=0.1 && <0.2

The way to read this is that there first of all is a missing dependency to satisfy for neither itself, and second there are two packages, yesod and persistent, that wouldn’t be buildable if neither were updated.

Now if it were possible to update neither, what packages would require a bump?

% cblrepo bump neither     
persistent
yesod

Wrapping IO, part 2

- Magnus Therning

The previous post was a fairly silly example, unless of course it’s more useful than I realise :) However, here’s something that I can see a bit more use of, a monad that restricts reading and writing of files to two files, one to read from and one to write to.

Again, the first step is to create a data type:

newtype TwoFileIO a = TwoFileIO { execTwoFileIO :: (Handle, Handle) -> IO a }

This defines a type wrapping a function that takes a pair of handles (one for input and one for output) and returns an “IO action”. Turning this into a monad is straight forward (actually it’s similar to the Reader monad):

instance Monad TwoFileIO where
    return v = TwoFileIO $ \ _ -> return v
    (>>=) m f = let
            fInIO = execTwoFileIO . f
        in TwoFileIO $ \ hs ->
            execTwoFileIO m hs >>= \v -> fInIO v hs

To return a value we can simply drop the pair of handles and return the value in IO. Bind (>>=) only looks complicated, what happens is that the first argument is “executed” with the provided handles, then the second argument is passed the result and executed with the same pair of handles. Of course the handles aren’t actually known yet, so an anynmous function is created, and wrapped in an instance of TwoFileIO. That’s it for the most complicated part.

In order to avoid having to manually open files and wire everything up I wrote the following convenience function:

runFileIO m iFn oFn = do
    iH <- openFile iFn ReadMode
    oH <- openFile oFn WriteMode
    res <- execTwoFileIO m (iH, oH)
    mapM_ hClose [iH, oH]
    return res

Yes, it does lack a bit in exception handling, but it’s good enough for now.

Then I can define the actions/functions that are available inside TwoFileIO. Reading and writing lines:

fioPutStrLn s = TwoFileIO $ \ (iH, oH) ->
    hPutStrLn oH s

fioGetLine = TwoFileIO $ \ (iH, oH) ->
    hGetLine iH

Note how it now becomes very hard to mix up the files and accidentally read from the output file or write to the input file.

As a little test function I used this one, which reads two lines and then writes them in the reversed order:

get1stN2ndPutLast = do
    first <- fioGetLine
    second <- fioGetLine
    fioPutStrLn second
    fioPutStrLn first

I can now test this using ghci:

> h <- openFile "testIn.txt" ReadMode
> hGetContents h
"line 0\nline 1\nline 2\n"
> runFileIO get1stN2ndPutLast "testIn.txt" "testOut.txt"
> h <- openFile "testOut.txt" ReadMode
> hGetContents h
"line 1\nline 0\n"

Wrapping IO, part 1

- Magnus Therning

I’ve many times heard that Haskell can be used to prevent certain kind of programmer mistakes. In a presentation on Darcs it was explained how GADTs (especially phantom types) are used in Darcs to make sure that operations on patches follow certain rules. Another way, and at least it sounds easier, is to limit the available functions by running code in some sort of container. This being Haskell, that container is often a monad. I’ve really never seen this presented1 , so I thought I’d try to do it, and indeed it turns out to be very simple.

I started with a data type:

newtype HideIO a = HideIO { runHideIO :: IO a }

which I then made into a Monad in order to make it easy to work with:

instance Monad HideIO where
    return = HideIO . return

    (>>=) m f = HideIO $ runHideIO m >>= runHideIO . f

Then I can create an IO function that are allowed in the HideIO monad:

hioPutStrLn = HideIO . putStrLn

In ghci I can then do the following:

> runHideIO $ hioPutStrLn "Hello, World!"
Hello, World!

But I can’t do much else.


  1. Most probably due do my weak searching-fu than anything else.

JSON in Haskell

- Magnus Therning

The other day I wanted to experiment a bit with the JSON interface to AUR. Of course my first stop was at HackageDB to look for a Haskell package for parsing JSON. There are several of them, but only one that seemed suitable for some quick experimentation, especially I wanted to avoid pre-defining data types for the objects in the JSON interface. That failed however and I ended up switching to Python. It did bother me though, and later on, when I had some more time I decided to have another look at json. I was also helped by Don’s recent work on wrapping up the AUR JSON interface in Haskell.

After some searching online I found a reasonably good example1:

{ "ID": "SGML"
, "SortAs": "SGML"
, "GlossDef":
    { "para": "A meta-markup language, used to create markup languages such as DocBook."
    , "GlossSeeAlso": ["GML", "XML"]
    }
}

As a slight aside, the absolutely easiest way to add JSON to your program is to derive Data (and by implication Typeable too). This is the way I might have represented the data above in Haskell2:

data GlossDef = GlossDef
    { glossDefPara :: String
    , glossDefSeeAlso :: [String]
    } deriving (Eq, Show, Typeable, Data) 

data GlossEntry = GlossEntry
    { glossEntryId :: String
    , glossEntrySortAs :: String
    , glossEntryGlossDef :: GlossDef
    } deriving (Eq, Show, Typeable, Data)

After that it’s as easy as using Text.JSON.Generic.toJSON followed by Text.JSON.encode:

> let gd = GlossDef "foo" ["bar", "baz"]
> let ge = GlossEntry "aa" "bb" gd
> putStrLn $ encode $ toJSON ge
{"glossEntryId":"aa","glossEntrySortAs":"bb","glossEntryGlossDef":{"glossDefPara":"foo","glossDefSeeAlso":["bar","baz"]}}

As can be seen the “names” of the members are derived from the field names in the datatypes. Great for when you are designing new JSON objects, not when you are writing code to parse an already existing object. For that there is another, more verbose way to do it.

Start with the same data types, but without deriving Typeable and Data:

data GlossDef = GlossDef
    { glossDefPara :: String
    , glossDefSeeAlso :: [String]
    } deriving (Eq, Show)

data GlossEntry = GlossEntry
    { glossEntryId :: String
    , glossEntrySortAs :: String
    , glossEntryGlossDef :: GlossDef
    } deriving (Eq, Show)

Then you have to implement Text.JSON.JSON. Only two of the four functions must be implemented, showJSON and readJSON. Starting with GlossDef:

instance JSON GlossDef where
    showJSON gd = makeObj
        [ ("para", showJSON $ glossDefPara gd)
        , ("GlossSeeAlso", showJSON $ glossDefSeeAlso gd)
        ]

Basically this part defers to the already supplied implementations for the fields’ types. The same approach works for readJSON too:

    readJSON (JSObject obj) = let
            jsonObjAssoc = fromJSObject obj
        in do
            para <- mLookup "para" jsonObjAssoc >>= readJSON
            seeAlso <- mLookup "GlossSeeAlso" jsonObjAssoc >>= readJSON
            return $ GlossDef
                { glossDefPara = para
                , glossDefSeeAlso = seeAlso
                }

    readJSON _ = fail ""

The function mLookup is a wrapper around lookup that makes it a bit nicer to work with in monads other than Maybe:

mLookup a as = maybe (fail $ "No such element: " ++ a) return (lookup a as)

(The choice to include the key in the string passed to fail limits the usefulness somewhat in the general case, but for this example it doesn’t make any difference.)

Implementing the interface for GlossEntry is analogous:

instance JSON GlossEntry where
    showJSON ge = makeObj
        [ ("ID", showJSON $ glossEntryId ge)
        , ("SortAs", showJSON $ glossEntrySortAs ge)
        , ("GlossDef", showJSON $ glossEntryGlossDef ge)
        ]

    readJSON (JSObject obj) = let
            jsonObjAssoc = fromJSObject obj
        in do
            id <- mLookup "ID" jsonObjAssoc >>= readJSON
            sortAs <- mLookup "SortAs" jsonObjAssoc >>= readJSON
            gd <- mLookup "GlossDef" jsonObjAssoc >>= readJSON
            return $ GlossEntry
                { glossEntryId = id
                , glossEntrySortAs = sortAs
                , glossEntryGlossDef = gd
                }

With the JSON object mentioned at the top in the file test.json the following is possible:

> f <- readFile "test.json"
> let (Ok j) = decode f :: Result GlossEntry
> putStrLn $ encode j
{"ID":"SGML","SortAs":"SGML","GlossDef":{"para":"A meta-markup language, used to create markup languages such as DocBook.","GlossSeeAlso":["GML","XML"]}}

I have a feeling the implemention of readJSON could be simplified by using an applicative style, but I leave that as an excercise for the reader :-)


  1. It’s a modified version of what I found here.

  2. The file should include {-# LANGUAGE DeriveDataTypeable #-} and both Data.Typeable and Data.Data must be imported.