Posts tagged ‘haskell’

0MQ and Haskell

Ever since I heard the FLOSS weekly episode on 0MQ I’ve been looking for a reason to take a look at it. Well, to hell with reason, I’ll have a first look without any specific goal in mind.

I found a simple introduction to it in Nicholas Piël’s post ZeroMQ an introduction. The only issue was that it was based on Python, and Python2 at that. So here are my attempts at translating two of the clients to Haskell (using zeromq-haskell).

req-rep

Here’s the client in Python3 first:

1
2
3
4
5
6
7
8
9
10
11
import zmq
 
ctx = zmq.Context()
socket = ctx.socket(zmq.REQ)
socket.connect('tcp://127.0.0.1:5000')
 
for i in range(10):
    msg = "msg %s" % i
    socket.send_unicode(msg)
    print('Sending', msg)
    msg_in = socket.recv()

And here in Haskell:

1
2
3
4
5
6
7
8
9
10
import System.ZMQ
import Data.ByteString.Char8 as CBS
 
main = withContext 1 $ \ ctx -> withSocket ctx Req $ \ soc -> do
    connect soc "tcp://127.0.0.1:5000"
    let msgs = [pack ("msg " ++ show i) | i <- [0..9]]
    flip mapM_ msgs $ \ msg -> do
        send soc msg []
        CBS.putStrLn msg
        receive soc []

pub-sub

In Python3:

1
2
3
4
5
6
7
8
9
10
import zmq
 
ctx = zmq.Context()
socket = ctx.socket(zmq.SUB)
socket.connect('tcp://127.0.0.1:5000')
socket.setsockopt(zmq.SUBSCRIBE, b'sweden')
socket.setsockopt(zmq.SUBSCRIBE, b'denmark')
 
while True:
    print(socket.recv())

Haskell:

1
2
3
4
5
6
7
8
9
import System.ZMQ
import Control.Monad
import Data.ByteString.Char8 as CBS
 
main = withContext 1 $ \ ctx -> withSocket ctx Sub $ \ soc -> do
    connect soc "tcp://127.0.0.1:5000"
    subscribe soc "sweden"
    subscribe soc "denmark"
    forever $ receive soc [] >>= CBS.putStrLn

Two comments on the Haskell code here:

  • I’m not sure why, but the Haskell client dies after receiving just a few messages (they are properly filtered though).
  • The API for subscribe is a bit strange, it would make more sense if it took a ByteString rather than a String.

Shelltestrunner to the rescue

A little while ago shelltestrunner was announced on haskell-cafe. At the time I was slowly losing hope on ever getting decent test coverage in cblrepo using HUnit. Using something like shelltestrunner could be an easier and more workable solution, especially since what cblrepo needed most in the short term is a bit of integration testing.

shelltestrunner is basically just a tool that runs shell commands and compares output (both stdout and stderr) and the exit code. It’s also possible to provide data to be passed to the command on stdin. The documentation on the shelltestrunner home page is very good and accessible. There are only a few things that I’d like to add to it:

  • Use the --with (`-w´) flag, it’s very handy to avoid littering the tests with long paths to the output of your build environment.
  • There is no support for set-up and tear-down steps in the tests (in my opinion this would be a very nice addition to the tool) so anything needed to be set up for the actual tests, will itself have to be tests.
  • There is no way to name tests (would be another good addition) so I found it crucial to organise tests into several files.

ArchHaskell HABS with cblrepo

As a follow-up to my earlier post on cblrepo I thought I’d convert a snapshot of ArchHaskell HABS to cblrepo. It’s mostly done as an exercise and to serve as an example. You can find it at http://www.kiwilight.com/~magnus/habs/.

Of course I have used it to build all the packages, and I still have the result of that around, so if anyone asks I just might upload that as well.

Maintaining Haskell packages for a Linux distribution—cblrepo

Maintaining a large set of Haskell packages for a Linux distribution is quite a chore. Especially if one wants to track Hackage as far as possible. Several distributions have tools to automatically convert Cabal-based packages into distribution packages, e.g. cabal2arch for ArchLinux and cabal-rpm. They are just conversion tools though, and the most time-consuming activity in maintaining Haskell packages is resolving and verifying dependencies.

At least that was my experience when I was actively involved in ArchHaskell. I only saw two options when adding or upgrading a package, either I worked out dependencies manually, or I simply tried it out. Neither of them was very appealing, and both were very time-consuming. It seemed obvious that I needed some tool to help out.

Enter cblrepo!

It allows me to maintain a database of specific versions of packages, and when I want to upgrade a package, or add a new one, it’ll verify that all dependencies can be satisfied. In other words, it helps me maintain a buildable set of packages at all times.

The tool also has some functionality that helps in tracking Hackage as packages are updated there.

Something about how it works

At the moment I maintain a small repository of Arch packages, mostly just to try out cblrepo and convince myself that it works. The work environment contains a database and a directory of patches:

% ls
cblrepo.db  patches/
%

The database is a cleartext file containing the information on the packages. It’s basically just a dump of the related Haskell datatype, encoded in JSON. The patches directory holds patches for Cabal files and PKGBUILD files. They must be named patch.cabal.<hackage name> or patch.pkgbuild.<hackage name> in order to be picked up by cblrepo.

There’s also an application directory (~/.cblrepo) for caching info about the packages available on Hackage:

% ls ~/.cblrepo
00-index.tar.gz
%

How to use it

A session with cblrepo looks something like this. First we update the information about what packages are available on Hackage:

% cblrepo idxsync
%

After that it’s possible to see what packages are out-of-date:

% cblrepo updates
cmdargs: 0.6.8 (0.6.9)
test-framework-th: 0.1.3 (0.2.0)
xml: 1.3.7 (1.3.8)
language-haskell-extract: 0.1.2 (0.2.0)
blaze-builder: 0.2.1.4 (0.3.0.0)
%

Let’s check whether cmdargs can be updated:

% cblrepo add -n cmdargs,0.6.9 %

It generates no output, so that means it’s possible to update. When attempting to add all the packages we run into a problem:

% cblrepo add -n cmdargs,0.6.9 \
> test-framework-th,0.2.0 \
> xml,1.3.7 \
> language-haskell-extract,0.2.0 \
> blaze-builder,0.3.0.0
Adding blaze-builder 0.3.0.0 would break:
  haxr : blaze-builder ==0.2.*

We’ll leave blaze-builder at the current version for now:

% cblrepo add cmdargs,0.6.9 \
> test-framework-th,0.2.0 \
> xml,1.3.7 \
> language-haskell-extract,0.2.0
%

After these updates we also need to make sure that all packages that depend on these ones are re-built, that is we need to bump their release version:

% cblrepo bump -n cmdargs \
> test-framework-th \
> xml \
> language-haskell-extract
Would bump:
test-framework
test-framework-hunit
test-framework-quickcheck2
%

Just re-run that without the -n to actually perform the bump. Now that all this is done we need to generate the files necessary to build the Arch packages. We can easily check what packages need re-building, and get a good order for building them:

% cblrepo build cmdargs \
> test-framework-th \
> xml \
> language-haskell-extract
cmdargs
xml
test-framework
test-framework-quickcheck2
test-framework-hunit
language-haskell-extract
test-framework-th
%

And generating the required files is also easy:

% cblrepo pkgbuild $(!!)
% tree
.
|-- cblrepo.db
|-- haskell-cmdargs
|   |-- haskell-cmdargs.install
|   `-- PKGBUILD
|-- haskell-language-haskell-extract
|   |-- haskell-language-haskell-extract.install
|   `-- PKGBUILD
|-- haskell-test-framework
|   |-- haskell-test-framework.install
|   `-- PKGBUILD
|-- haskell-test-framework-hunit
|   |-- haskell-test-framework-hunit.install
|   `-- PKGBUILD
|-- haskell-test-framework-quickcheck2
|   |-- haskell-test-framework-quickcheck2.install
|   `-- PKGBUILD
|-- haskell-test-framework-th
|   |-- haskell-test-framework-th.install
|   `-- PKGBUILD
|-- haskell-xml
|   |-- haskell-xml.install
|   `-- PKGBUILD
`-- patches

8 directories, 15 files
%

Now all that’s left is running makepkg in each of the directories, in the order indicated by cblrepo build above.

Unfortunately they won’t all build—generating the Haddock docs for test-framework-th fails. That’s however fairly easy to remedy by patching the PKGBUILD to disable the generation of docs.

I’ll get back to that in a later post though.

Your comments, please

Please leave comments and suggestions. I’m planning on uploading the source to github shortly.

On maintaining Haskell packages for a Linux distro

When trying to maintain set of binary packages of Haskell libraries for a Linux distribution there are a few issues that come up:

  1. The set of packages must be compilable at all times, and
  2. Updating one package requires all packages that depend on it, in one or more steps, to be re-compiled.

The first requires keeping track of all dependencies of the packages in the set and making sure that they are satisfiable at all times. For a while I was doing this by simple attempting to compile all updated packages and check for breakages. Which was both time-consuming and a painful when build-failures had to be resolved.

The second requires bumping the package release number for all packages that are reachable when following the dependencies in the reverse direction. Doing this manually is tedious and very error prone in my experience.

Of course it ought to be possible to make this a lot easier with the help of a tool. The last few days I’ve been writing such a tool. This is how I’ve been using it so far.

Building the initial database

GHC in ArchLinux ships with a few Haskell libraries and ArchLinux also has a few Haskell packages in its base repositories. Since I don’t need to maintain any of those packages I decided to treat these as a sort of base. Adding those is as simple as this:

% head base-pkgs
base,4.2.0.2
array,0.3.0.1
bytestring,0.9.1.7
Cabal,1.8.0.6
containers,0.3.0.0
directory,1.0.1.1
extensible-exceptions,0.1.1.1
filepath,1.1.0.4
haskell98,1.0.1.1
hpc,0.5.0.5
% cblrepo addbasepkg $(cat base-pkgs)
Success

Then I need to add the packages of the binary repo provided by ArchHaskell. I wrote a little script that extracts the package name and version from the ArchHaskell HABS tree (get-ah-cabals):

#! /bin/bash
 
habsdir=$1
 
for d in ${habsdir}/habs/*; do
    . ${d}/PKGBUILD
    case $_hkgname in
        (datetime|haskell-platform)
            ;;
        (*)
            echo ${_hkgname},${pkgver}
            ;;
    esac
done
 
echo http://hackage.haskell.org/platform/2010.2.0.0/haskell-platform.cabal

Since haskell-platform isn’t on Hackage it requires special handling. The reason why datetime is excluded is slightly different. It’s the only package that requires old base (version <4). GHC in Arch does whip with both old and new base so datetime can be built, but cblrepo can’t deal with two versions of the same package. This is a limitation, but I’m not sure it’s worth fixing it since base is the only library that comes in two versions, and datetime is the only package that hasn’t been updated to use new base.

Knowing this it’s easy to add all the ArchHaskell packages to the database:

% cblrepo idxupdate
% cblrepo add $(get-ah-cabals path/to/habs)
Success

Attempting an update

Now it’s possible to attempt to attempt an update:

% cblrepo add neither,0.2.0
Failed to satisfy the following dependencies for neither:
  monad-peel >=0.1 && <0.2
Adding neither 0.2.0 would break:
  yesod : neither >=0.1.0 && <0.2
  persistent : neither >=0.1 && <0.2

The way to read this is that there first of all is a missing dependency to satisfy for neither itself, and second there are two packages, yesod and persistent, that wouldn’t be buildable if neither were updated.

Now if it were possible to update neither, what packages would require a bump?

% cblrepo bump neither     
persistent
yesod

XML character dereferencer

Just in case you ever need one:

xmlCharDeref :: String -> String
xmlCharDeref [] = []
xmlCharDeref ('&':'#':'x':r) = let
        (digits, remainder) = span (/= ';') r
        c = chr (read ("0x" ++ digits))
    in
        c : xmlCharDeref (tail remainder)
xmlCharDeref ('&':'#':r) = let
        (digits, remainder) = span (/= ';') r
        c = chr (read digits)
    in
        c : xmlCharDeref (tail remainder)
xmlCharDeref (c:r) = c : xmlCharDeref r

In ghci:

*Foo> xmlCharDeref "hello there"
"hello there"
*Foo> xmlCharDeref "hello&#32;there"
"hello there"
*Foo> xmlCharDeref "hello&#x32;there"
"hello2there"

Any Haskell puzzlers?

I just watched Joshua Block’s talk Effective Java – Still Effective After All These Years. I’m not a Java developeri but I still found the talk very interesting. Mr Block offers tips and tricks to deal effectively with a few aspects of Java, and I’m sure many a Java developer out there would find that part very interesting. For me however, the most interesting part was the appetizers and the dessert :-)

The appetizer and dessert consisted of three puzzlers. A puzzler is a piece of code that high-lights some peculiarity of the language or standard libraries. The puzzlers in this talk were are follows:

Simple question

What is printed by the following code, and why?

public class SimpleQuestion {
  static boolean yesOrNo(String s) {
    s = s.toLowerCase();
    if(s.equals("yes") || s.equals("t") || s.equals("y"))
      s = "true";
    return Boolean.getBoolean(s);
  }
 
  public static void main(String[] args) {
    System.out.println(yesOrNo("true") + " " + yesOrNo("YeS"));
  }
}

Searching

What is the result of the following code, and why?

import java.util.*;
 
public class Searching {
  public static void main(String[] args) {
    String[] strs = { "0", "1", "2", "3", "4", "5" };
 
    // Translate string array into list of integer
    List<Integer> ints = new ArrayList<Integer>();
    for(String s : strs)
      ints.add(Integer.valueOf(s));
 
    System.out.println(Collections.binarySearch(ints, 1, cmp));
  }
 
  static Comparator<Integer> cmp = new Comparator<Integer>() {
    public int compare(Integer i, Integer j) {
      return i < j ? -1 : (i == j ? 0 : 1);
    }
  };
}

PrintWords

This one consists of two classes, which are compiled together:

public class PrintWords {
  public static void main(String[] args) {
    System.out.println(Words.FIRST + " " + Words.SECOND + " " + Words.THIRD);
  }
}
public class Words {
  public static final String FIRST = "the";
  public static final String SECOND = null;
  public static final String THIRD = "set";
}

Now modify the latter like this:

public class Words {
  public static final String FIRST = "physics";
  public static final String SECOND = "chemistry";
  public static final String THIRD = "biology";
}

Compile the second version of Words.java alone and then run PrintWords, what is the result and why?

Any puzzlers for Haskell?

Of course I couldn’t help but wonder what puzzlers there are for Haskell. Do note though that puzzlers aren’t just obfuscated code; they are readable code that you think does one thing but in reality it does something else. I’d really like to read any Haskell puzzlers you can come up with. Post them on your own blogs, or as comments to this post.

NB I should probably mention that I really don’t want answers to the puzzlers. I’ve watched Josh Bloch’s presentation, and I think anyone interested in finding out should watch it for themselves.

  1. If I ever find myself in a situation that calls for Java I’m very likely to spend some time looking at Scala :-) [back]

Playing with sockets in Haskell

This is another one of those posts that I make mostly for myself, you know for organising and help my memory :-)

There are as far as I can see three ways to deal with sockets in Haskell. There’s the type Socket which is used throughout Network.Socket. From that it’s possible to get to the underlying filedescriptor, and it in turn can be converted to a Handle.

When coupled with fork+exec it’s crucial to make sure the child process can find the socket Leaving it in a predictable place seems to be the easiest way to do that, and as far as I can see that requires using dupTo from System.Posix.IO. So, on the child-side it’s necessary to find a way to turn an integer (CInt) into something that can be treated as a socket (i.e. a Socket, a Handle, or a filedescriptor).

A basic parent-child which obviously won’t work since the child’s socket is represented as a Socket:

import Control.Concurrent
import System.Posix.Process
import Network.Socket
 
childFunc s = send s "Ping from child" >> return ()
 
main = do
    (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol
    print (childSock, parentSock)
    child <- forkProcess $ childFunc childSock
    recv parentSock 10 >>= print

Let the child take a CInt and turn it into a filedescriptor:

import Control.Concurrent
import Control.Concurrent.MVar
import System.Posix.Process
import System.Posix.IO
import System.Posix.Types
import Network.Socket
 
childFunc sInt = do
    let fd = Fd sInt
    fdWrite fd "Ping from child" >> return ()
 
main = do
    (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol
    let childInt = fdSocket childSock
    print (childInt, parentSock)
    child <- forkProcess $ childFunc childInt
    recv parentSock 10 >>= print

Let the child take a CInt and turn it into a Handle:

import Control.Concurrent
import System.Posix.Process
import System.Posix.IO
import System.Posix.Types
import Network.Socket
import System.IO
 
childFunc sInt = do
    h <- fdToHandle $ Fd sInt
    hPutStr h "Ping from child"
    hFlush h
 
main = do
    (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol
    let childInt = fdSocket childSock
    print (childSock, parentSock)
    child <- forkProcess $ childFunc childInt
    recv parentSock 10 >>= print

Let the child take a CInt and turn it into a Socketi:

import Control.Concurrent
import Control.Concurrent.MVar
import System.Posix.Process
import System.Posix.IO
import System.Posix.Types
import Network.Socket
 
childFunc sInt = do
    s <- mkSocket sInt AF_UNIX Stream defaultProtocol Connected
    send s "Ping from child" >> return ()
 
main = do
    (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol
    let childInt = fdSocket childSock
    print (childInt, parentSock)
    child <- forkProcess $ childFunc childInt
    recv parentSock 10 >>= print
  1. It seems the socket is in the Connected state after socketPair succeeds.[back]

Updating Haskell packages on Arch

A few days ago I noticed that there were a few Haskell packages on AUR that had received updates. This was the excuse I had been waiting for to address the second part of keeping my Haskell packages up-to-date (I’ve written about the first part before, dealing with an update to GHC).

It’s easy to find the packages with available updates:

% yaourt -Su --aur

Unfortunately it isn’t as easy as just updating the listed packages. Any package that depends on an updated package really should be re-compiled and re-installed to guarantee that the entire system behaves as expected after the upgrade. Of course pacman can handle it:

% pacman -Rcn <all pkgs with updates>

This will list all packages that will be removed. After removing them all, they can all be re-installed. That is of course not quite as nice as it could be, since they all then will be explicitly installed, it would be nicer to just re-install the “top-level packages”. This is one way to achieve this.

I did a bit of refactoring and put the Arch-related functions from the previous post in their own module, Arch. Then I added a function that takes a package and recursively inspects Required By until a “top-level package” (i.e. a package that doesn’t require any other package) is reached:

getTopRequiredBy pkg = let
        tops = do
            first <- getRequiredBy pkg
            if null first
                then return [pkg]
                else liftM concat $ mapM getTopRequiredBy first
    in liftM nub tops

After that it’s straight forward to write up a little tool which offers some advice on what to do:

main = do
    pkgsToUpgrade <- getArgs
    pkgsToReinstall <- liftM (nub . concat) $ mapM Arch.getTopRequiredBy pkgsToUpgrade
    putStrLn $ "To remove     : pacman -Rnc " ++ unwords pkgsToUpgrade
    putStrLn $ "To re-install : yaourt -S " ++ unwords pkgsToReinstall

Using it on the packages I wanted to upgrade gave the following output:

% runhaskell PkgUpgrade.hs haskell-{configfile,haxml,json,missingh,safe,testpack,time}
To remove     : pacman -Rnc haskell-configfile haskell-haxml haskell-json haskell-missingh haskell-safe haskell-testpack haskell-time
To re-install : yaourt -S haskell-configfile haskell-haxml haskell-json haskell-hsh haskell-safe

Following that advice seemed to work just like I intended.

Wrapping IO, part 2

The previous post was a fairly silly example, unless of course it’s more useful than I realise :) However, here’s something that I can see a bit more use of, a monad that restricts reading and writing of files to two files, one to read from and one to write to.

Again, the first step is to create a data type:

newtype TwoFileIO a = TwoFileIO { execTwoFileIO :: (Handle, Handle) -> IO a }

This defines a type wrapping a function that takes a pair of handles (one for input and one for output) and returns an “IO action”. Turning this into a monad is straight forward (actually it’s similar to the Reader monad):

instance Monad TwoFileIO where
    return v = TwoFileIO $ \ _ -> return v
    (>>=) m f = let
            fInIO = execTwoFileIO . f
        in TwoFileIO $ \ hs ->
            execTwoFileIO m hs >>= \v -> fInIO v hs

To return a value we can simply drop the pair of handles and return the value in IO. Bind (>>=) only looks complicated, what happens is that the first argument is “executed” with the provided handles, then the second argument is passed the result and executed with the same pair of handles. Of course the handles aren’t actually known yet, so an anynmous function is created, and wrapped in an instance of TwoFileIO. That’s it for the most complicated part.

In order to avoid having to manually open files and wire everything up I wrote the following convenience function:

runFileIO m iFn oFn = do
    iH <- openFile iFn ReadMode
    oH <- openFile oFn WriteMode
    res <- execTwoFileIO m (iH, oH)
    mapM_ hClose [iH, oH]
    return res

Yes, it does lack a bit in exception handling, but it’s good enough for now.

Then I can define the actions/functions that are available inside TwoFileIO. Reading and writing lines:

fioPutStrLn s = TwoFileIO $ \ (iH, oH) ->
    hPutStrLn oH s
 
fioGetLine = TwoFileIO $ \ (iH, oH) ->
    hGetLine iH

Note how it now becomes very hard to mix up the files and accidentally read from the output file or write to the input file.

As a little test function I used this one, which reads two lines and then writes them in the reversed order:

get1stN2ndPutLast = do
    first <- fioGetLine
    second <- fioGetLine
    fioPutStrLn second
    fioPutStrLn first

I can now test this using ghci:

> h <- openFile "testIn.txt" ReadMode
> hGetContents h
"line 0\nline 1\nline 2\n"
> runFileIO get1stN2ndPutLast "testIn.txt" "testOut.txt"
> h <- openFile "testOut.txt" ReadMode
> hGetContents h
"line 1\nline 0\n"