This is my input into the recent discussion on referential transparency (RT). I’m nowhere near as well versed in the subject as others, but how am I ever to learn anything unless I put my thoughts out there for them to be laughed at and ridiculed?

It all started with a post on stackoverflow.com, which received several very long and detailed responses, in particular from Uday Reddy (here and here). His answers were also linked to from Reddit. His second response contains a link to an excellent paper by Strachey, Fundamental concepts in programming languages. I’d go as far as saying that, despite it being lecture notes rather than a fully worked paper, it ought to be required reading for all software developers.

The rest of what I write here hinges on me actually understanding what Strachey writes in his paper. Of course I’m looking forward to comments/corrections/etc that help me correct my understanding.

### What Strachey says about RT

In section 3.2.1 he introduces RT like this:

One of the most useful properties of expressions is that called by Quine referential transparency. In essence this means that if we wish to find the value of an expression which contains a sub-expression, the only thing we need to know about the sub-expression is its value. Any other features of the sub-expression, such as its internal structure, the number and nature of its components, the order in which they are evaluated or the colour of the ink in which they are written, are irrelevant to the value of the main expression.

There is however a crucial bit just before that:

Like the rest of mathematics, we shall be concerned only with R-values.

That is, he starts out with a very limited subset of what most people would consider a usable imperative programming language.

He then dives into some more details in section 3.2.2 by adding the concept of environment, which is handled through the use of a where-clause, or alternatively using let-statements (this ought to be making any Haskell developer feel right at home). After a few interesting sections on stuff like applicative structure, evaluation, and conditional expressions he finally tackles the issue of variables in section 3.3.1. There are two pieces to the trick, the first is to take advantage of his earlier insight that lead to a split of values into L-values and R-values:

If we consider L-values as well as R-values, however, we can preserve referential transparency as far as L-values are concerned. This is because L-values, being generalised addresses, are not altered by assignment commands. Thus the command `x := x+1` leaves the address of the cell representing `x` (L-value of `x`) unchanged although it does alter the contents of this cell (R-value of `x`). So if we agree that the values concerned are all L-values, we can continue to use where-clauses and lambda-expressions for describing parts of a program which include assignments.

The cost of this is that the entire theory constructed earlier for operations taking R-values now has to be revised to incorporate L-values. The outline for this is in the rest of section 3.3 and it basically comes down to include an abstract store in the environment. However, before doing that he mentions that:

I think these problems are inevitable and although much of the work remains to be done, I feel hopeful that when completed it will not seem so formidable as it does at present, and that it will bring clarification to many areas of programming language study which are very obscure today. In particular the problems of side effects will, I hope, become more amenable.

He does reach his goal, but it’s a bit unfortunate that he stops short of considering the wider of problem of side effects. My assumption is that this would have to be dealt with in a similar way to assignment, but that would mean that rather than just adding an store to the environment the world, or a subset of it, would need to be added.

An open question (to me) is if anyone has built on Strachey’s work in this area and thought of the details of RT and general side effects?

The original question described RT as

it means you can replace equals with equals

which I actually think is a rather good, and very short, description of it. It’s not the full story, there are further important details, but it’s a good first intuition. Also, it’s a description usable in Haskell. Well, to be slightly more nuanced, it good for Haskell without IO (Haskell-IO). However, this is where the strict type system of Haskell really starts to shine because (here I’m probably a bit imprecise) we only have R-values in Haskell-IO. If we want to use assignment we add the use of a state monad, and we do that explicitly.

A former colleague of mine said that in Haskell we need to build up our own computational models ourselves. For instance, if we need assigment we use State, if we need to record progress we use Writer, etc. In other languages the language designer has already made all those choices for us, we don’t get to make them ourselves. For RT it means that Haskell is more explicit in what the environment of a function is.

Moving on to general side effects those are also more explicit in Haskell since they have to happen inside the IO monad. That alone is a great boon for RT in Haskell since it becomes explicit where RT as worked out by Strachey applies directly, and where there are (hopefully amenable) problems of side effects left. Even further, in Haskell it’s possible to make subsets of IO (by wrapping IO, see e.g. my own posts on wrapping IO, part 1 and wrapping IO, part 2). I’m sure that if including the world in the environment is the way to achieve RT with general side effects, then it’s highly beneficial to be able to create subsets of the world.

### RT in Haskell vs. RT in (common) imperative languages

Uday writes in his first answer that:

But, today, functional programmers claim that imperative programming languages are not referentially transparent. Strachey would be turning in his grave.

This may well be true, but I think that when a Haskell programmer says it, he’s only twitching slightly. The reason? Strachey writes:

Any departure of R-value referential transparency in a R-value context should either be eliminated by decomposing the expression into several commands and simpler expressions, or, if this turns out to be difficult, the subject of a comment.

Which is something that Haskell programmers do naturally by use of IO. That is, in Haskell you either have an R-value, and you clearly see that you do, or you put in a comment, which is encoded in the type of the function.

This rather lengthy post basically arrives at the following, which is what I suspect the user [pacala is saying about RT on Reddit][reddit-pacala]:

Imperative languages my well be RT, but when trying to understand a code base the environment of each function is so large that understanding is an intractable problem. I don’t have this problem in Haskell.

Ever since I heard the FLOSS weekly episode on 0MQ I’ve been looking for a reason to take a look at it. Well, to hell with reason, I’ll have a first look without any specific goal in mind.

I found a simple introduction to it in Nicholas Piël’s post ZeroMQ an introduction. The only issue was that it was based on Python, and Python2 at that. So here are my attempts at translating two of the clients to Haskell (using zeromq-haskell).

## req-rep

Here’s the client in Python3 first:

```1 2 3 4 5 6 7 8 9 10 11 import zmq   ctx = zmq.Context() socket = ctx.socket(zmq.REQ) socket.connect('tcp://127.0.0.1:5000')   for i in range(10): msg = "msg %s" % i socket.send_unicode(msg) print('Sending', msg) msg_in = socket.recv()```

```1 2 3 4 5 6 7 8 9 10 import System.ZMQ import Data.ByteString.Char8 as CBS   main = withContext 1 \$ \ ctx -> withSocket ctx Req \$ \ soc -> do connect soc "tcp://127.0.0.1:5000" let msgs = [pack ("msg " ++ show i) | i <- [0..9]] flip mapM_ msgs \$ \ msg -> do send soc msg [] CBS.putStrLn msg receive soc []```

## pub-sub

In Python3:

```1 2 3 4 5 6 7 8 9 10 import zmq   ctx = zmq.Context() socket = ctx.socket(zmq.SUB) socket.connect('tcp://127.0.0.1:5000') socket.setsockopt(zmq.SUBSCRIBE, b'sweden') socket.setsockopt(zmq.SUBSCRIBE, b'denmark')   while True: print(socket.recv())```

```1 2 3 4 5 6 7 8 9 import System.ZMQ import Control.Monad import Data.ByteString.Char8 as CBS   main = withContext 1 \$ \ ctx -> withSocket ctx Sub \$ \ soc -> do connect soc "tcp://127.0.0.1:5000" subscribe soc "sweden" subscribe soc "denmark" forever \$ receive soc [] >>= CBS.putStrLn```

• I’m not sure why, but the Haskell client dies after receiving just a few messages (they are properly filtered though).
• The API for `subscribe` is a bit strange, it would make more sense if it took a `ByteString` rather than a `String`.

## Shelltestrunner to the rescue

A little while ago shelltestrunner was announced on `haskell-cafe`. At the time I was slowly losing hope on ever getting decent test coverage in cblrepo using `HUnit`. Using something like `shelltestrunner` could be an easier and more workable solution, especially since what `cblrepo` needed most in the short term is a bit of integration testing.

`shelltestrunner` is basically just a tool that runs shell commands and compares output (both `stdout` and `stderr`) and the exit code. It’s also possible to provide data to be passed to the command on `stdin`. The documentation on the `shelltestrunner` home page is very good and accessible. There are only a few things that I’d like to add to it:

• Use the `--with` (`-w´) flag, it’s very handy to avoid littering the tests with long paths to the output of your build environment.
• There is no support for set-up and tear-down steps in the tests (in my opinion this would be a very nice addition to the tool) so anything needed to be set up for the actual tests, will itself have to be tests.
• There is no way to name tests (would be another good addition) so I found it crucial to organise tests into several files.

As a follow-up to my earlier post on `cblrepo` I thought I’d convert a snapshot of ArchHaskell HABS to `cblrepo`. It’s mostly done as an exercise and to serve as an example. You can find it at http://www.kiwilight.com/~magnus/habs/.

Of course I have used it to build all the packages, and I still have the result of that around, so if anyone asks I just might upload that as well.

## Maintaining Haskell packages for a Linux distribution—cblrepo

Maintaining a large set of Haskell packages for a Linux distribution is quite a chore. Especially if one wants to track Hackage as far as possible. Several distributions have tools to automatically convert Cabal-based packages into distribution packages, e.g. cabal2arch for ArchLinux and cabal-rpm. They are just conversion tools though, and the most time-consuming activity in maintaining Haskell packages is resolving and verifying dependencies.

At least that was my experience when I was actively involved in ArchHaskell. I only saw two options when adding or upgrading a package, either I worked out dependencies manually, or I simply tried it out. Neither of them was very appealing, and both were very time-consuming. It seemed obvious that I needed some tool to help out.

Enter `cblrepo`!

It allows me to maintain a database of specific versions of packages, and when I want to upgrade a package, or add a new one, it’ll verify that all dependencies can be satisfied. In other words, it helps me maintain a buildable set of packages at all times.

The tool also has some functionality that helps in tracking Hackage as packages are updated there.

### Something about how it works

At the moment I maintain a small repository of Arch packages, mostly just to try out `cblrepo` and convince myself that it works. The work environment contains a database and a directory of patches:

``````% ls
cblrepo.db  patches/
%
``````

The database is a cleartext file containing the information on the packages. It’s basically just a dump of the related Haskell datatype, encoded in JSON. The `patches` directory holds patches for Cabal files and PKGBUILD files. They must be named `patch.cabal.<hackage name>` or ```patch.pkgbuild.<hackage name>``` in order to be picked up by `cblrepo`.

There’s also an application directory (`~/.cblrepo`) for caching info about the packages available on Hackage:

``````% ls ~/.cblrepo
00-index.tar.gz
%
``````

### How to use it

A session with `cblrepo` looks something like this. First we update the information about what packages are available on Hackage:

``````% cblrepo idxsync
%
``````

After that it’s possible to see what packages are out-of-date:

``````% cblrepo updates
cmdargs: 0.6.8 (0.6.9)
test-framework-th: 0.1.3 (0.2.0)
xml: 1.3.7 (1.3.8)
blaze-builder: 0.2.1.4 (0.3.0.0)
%
``````

Let’s check whether `cmdargs` can be updated:

% cblrepo add -n cmdargs,0.6.9 %

It generates no output, so that means it’s possible to update. When attempting to add all the packages we run into a problem:

``````% cblrepo add -n cmdargs,0.6.9 \
> test-framework-th,0.2.0 \
> xml,1.3.7 \
> blaze-builder,0.3.0.0
haxr : blaze-builder ==0.2.*
``````

We’ll leave `blaze-builder` at the current version for now:

``````% cblrepo add cmdargs,0.6.9 \
> test-framework-th,0.2.0 \
> xml,1.3.7 \
%
``````

After these updates we also need to make sure that all packages that depend on these ones are re-built, that is we need to bump their release version:

``````% cblrepo bump -n cmdargs \
> test-framework-th \
> xml \
Would bump:
test-framework
test-framework-hunit
test-framework-quickcheck2
%
``````

Just re-run that without the `-n` to actually perform the bump. Now that all this is done we need to generate the files necessary to build the Arch packages. We can easily check what packages need re-building, and get a good order for building them:

``````% cblrepo build cmdargs \
> test-framework-th \
> xml \
cmdargs
xml
test-framework
test-framework-quickcheck2
test-framework-hunit
test-framework-th
%
``````

And generating the required files is also easy:

``````% cblrepo pkgbuild \$(!!)
% tree
.
|-- cblrepo.db
|   `-- PKGBUILD
|   `-- PKGBUILD
|   `-- PKGBUILD
|   `-- PKGBUILD
|   `-- PKGBUILD
|   `-- PKGBUILD
|   `-- PKGBUILD
`-- patches

8 directories, 15 files
%
``````

Now all that’s left is running `makepkg` in each of the directories, in the order indicated by `cblrepo build` above.

Unfortunately they won’t all build—generating the Haddock docs for `test-framework-th` fails. That’s however fairly easy to remedy by patching the `PKGBUILD` to disable the generation of docs.

I’ll get back to that in a later post though.

## On maintaining Haskell packages for a Linux distro

When trying to maintain set of binary packages of Haskell libraries for a Linux distribution there are a few issues that come up:

1. The set of packages must be compilable at all times, and
2. Updating one package requires all packages that depend on it, in one or more steps, to be re-compiled.

The first requires keeping track of all dependencies of the packages in the set and making sure that they are satisfiable at all times. For a while I was doing this by simple attempting to compile all updated packages and check for breakages. Which was both time-consuming and a painful when build-failures had to be resolved.

The second requires bumping the package release number for all packages that are reachable when following the dependencies in the reverse direction. Doing this manually is tedious and very error prone in my experience.

Of course it ought to be possible to make this a lot easier with the help of a tool. The last few days I’ve been writing such a tool. This is how I’ve been using it so far.

### Building the initial database

GHC in ArchLinux ships with a few Haskell libraries and ArchLinux also has a few Haskell packages in its base repositories. Since I don’t need to maintain any of those packages I decided to treat these as a sort of base. Adding those is as simple as this:

```% head base-pkgs base,4.2.0.2 array,0.3.0.1 bytestring,0.9.1.7 Cabal,1.8.0.6 containers,0.3.0.0 directory,1.0.1.1 extensible-exceptions,0.1.1.1 filepath,1.1.0.4 haskell98,1.0.1.1 hpc,0.5.0.5 % cblrepo addbasepkg \$(cat base-pkgs) Success```

Then I need to add the packages of the binary repo provided by ArchHaskell. I wrote a little script that extracts the package name and version from the ArchHaskell HABS tree (`get-ah-cabals`):

```#! /bin/bash   habsdir=\$1   for d in \${habsdir}/habs/*; do . \${d}/PKGBUILD case \$_hkgname in (datetime|haskell-platform) ;; (*) echo \${_hkgname},\${pkgver} ;; esac done   echo http://hackage.haskell.org/platform/2010.2.0.0/haskell-platform.cabal```

Since `haskell-platform` isn’t on Hackage it requires special handling. The reason why `datetime` is excluded is slightly different. It’s the only package that requires old `base` (version `<4`). `GHC` in Arch does whip with both old and new `base` so `datetime` can be built, but `cblrepo` can’t deal with two versions of the same package. This is a limitation, but I’m not sure it’s worth fixing it since `base` is the only library that comes in two versions, and `datetime` is the only package that hasn’t been updated to use new `base`.

Knowing this it’s easy to add all the ArchHaskell packages to the database:

```% cblrepo idxupdate % cblrepo add \$(get-ah-cabals path/to/habs) Success```

### Attempting an update

Now it’s possible to attempt to attempt an update:

```% cblrepo add neither,0.2.0 Failed to satisfy the following dependencies for neither: monad-peel >=0.1 && <0.2 Adding neither 0.2.0 would break: yesod : neither >=0.1.0 && <0.2 persistent : neither >=0.1 && <0.2```

The way to read this is that there first of all is a missing dependency to satisfy for `neither` itself, and second there are two packages, `yesod` and `persistent`, that wouldn’t be buildable if `neither` were updated.

Now if it were possible to update `neither`, what packages would require a bump?

```% cblrepo bump neither persistent yesod```

## XML character dereferencer

Just in case you ever need one:

```xmlCharDeref :: String -> String xmlCharDeref [] = [] xmlCharDeref ('&':'#':'x':r) = let (digits, remainder) = span (/= ';') r c = chr (read ("0x" ++ digits)) in c : xmlCharDeref (tail remainder) xmlCharDeref ('&':'#':r) = let (digits, remainder) = span (/= ';') r c = chr (read digits) in c : xmlCharDeref (tail remainder) xmlCharDeref (c:r) = c : xmlCharDeref r```

In ghci:

```*Foo> xmlCharDeref "hello there" "hello there" *Foo> xmlCharDeref "hello&#32;there" "hello there" *Foo> xmlCharDeref "hello&#x32;there" "hello2there"```

I just watched Joshua Block’s talk Effective Java – Still Effective After All These Years. I’m not a Java developeri but I still found the talk very interesting. Mr Block offers tips and tricks to deal effectively with a few aspects of Java, and I’m sure many a Java developer out there would find that part very interesting. For me however, the most interesting part was the appetizers and the dessert

The appetizer and dessert consisted of three puzzlers. A puzzler is a piece of code that high-lights some peculiarity of the language or standard libraries. The puzzlers in this talk were are follows:

### Simple question

What is printed by the following code, and why?

```public class SimpleQuestion { static boolean yesOrNo(String s) { s = s.toLowerCase(); if(s.equals("yes") || s.equals("t") || s.equals("y")) s = "true"; return Boolean.getBoolean(s); }   public static void main(String[] args) { System.out.println(yesOrNo("true") + " " + yesOrNo("YeS")); } }```

### Searching

What is the result of the following code, and why?

```import java.util.*;   public class Searching { public static void main(String[] args) { String[] strs = { "0", "1", "2", "3", "4", "5" };   // Translate string array into list of integer List<Integer> ints = new ArrayList<Integer>(); for(String s : strs) ints.add(Integer.valueOf(s));   System.out.println(Collections.binarySearch(ints, 1, cmp)); }   static Comparator<Integer> cmp = new Comparator<Integer>() { public int compare(Integer i, Integer j) { return i < j ? -1 : (i == j ? 0 : 1); } }; }```

### PrintWords

This one consists of two classes, which are compiled together:

```public class PrintWords { public static void main(String[] args) { System.out.println(Words.FIRST + " " + Words.SECOND + " " + Words.THIRD); } }```
```public class Words { public static final String FIRST = "the"; public static final String SECOND = null; public static final String THIRD = "set"; }```

Now modify the latter like this:

```public class Words { public static final String FIRST = "physics"; public static final String SECOND = "chemistry"; public static final String THIRD = "biology"; }```

Compile the second version of `Words.java` alone and then run `PrintWords`, what is the result and why?

Of course I couldn’t help but wonder what puzzlers there are for Haskell. Do note though that puzzlers aren’t just obfuscated code; they are readable code that you think does one thing but in reality it does something else. I’d really like to read any Haskell puzzlers you can come up with. Post them on your own blogs, or as comments to this post.

NB I should probably mention that I really don’t want answers to the puzzlers. I’ve watched Josh Bloch’s presentation, and I think anyone interested in finding out should watch it for themselves.

1. If I ever find myself in a situation that calls for Java I’m very likely to spend some time looking at Scala [back]

## Playing with sockets in Haskell

This is another one of those posts that I make mostly for myself, you know for organising and help my memory

There are as far as I can see three ways to deal with sockets in Haskell. There’s the type `Socket` which is used throughout `Network.Socket`. From that it’s possible to get to the underlying filedescriptor, and it in turn can be converted to a `Handle`.

When coupled with fork+exec it’s crucial to make sure the child process can find the socket Leaving it in a predictable place seems to be the easiest way to do that, and as far as I can see that requires using `dupTo` from `System.Posix.IO`. So, on the child-side it’s necessary to find a way to turn an integer (`CInt`) into something that can be treated as a socket (i.e. a `Socket`, a `Handle`, or a filedescriptor).

A basic parent-child which obviously won’t work since the child’s socket is represented as a `Socket`:

```import Control.Concurrent import System.Posix.Process import Network.Socket   childFunc s = send s "Ping from child" >> return ()   main = do (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol print (childSock, parentSock) child <- forkProcess \$ childFunc childSock recv parentSock 10 >>= print```

Let the child take a `CInt` and turn it into a filedescriptor:

```import Control.Concurrent import Control.Concurrent.MVar import System.Posix.Process import System.Posix.IO import System.Posix.Types import Network.Socket   childFunc sInt = do let fd = Fd sInt fdWrite fd "Ping from child" >> return ()   main = do (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol let childInt = fdSocket childSock print (childInt, parentSock) child <- forkProcess \$ childFunc childInt recv parentSock 10 >>= print```

Let the child take a `CInt` and turn it into a `Handle`:

```import Control.Concurrent import System.Posix.Process import System.Posix.IO import System.Posix.Types import Network.Socket import System.IO   childFunc sInt = do h <- fdToHandle \$ Fd sInt hPutStr h "Ping from child" hFlush h   main = do (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol let childInt = fdSocket childSock print (childSock, parentSock) child <- forkProcess \$ childFunc childInt recv parentSock 10 >>= print```

Let the child take a `CInt` and turn it into a `Socket`i:

```import Control.Concurrent import Control.Concurrent.MVar import System.Posix.Process import System.Posix.IO import System.Posix.Types import Network.Socket   childFunc sInt = do s <- mkSocket sInt AF_UNIX Stream defaultProtocol Connected send s "Ping from child" >> return ()   main = do (childSock, parentSock) <- socketPair AF_UNIX Stream defaultProtocol let childInt = fdSocket childSock print (childInt, parentSock) child <- forkProcess \$ childFunc childInt recv parentSock 10 >>= print```
1. It seems the socket is in the `Connected` state after `socketPair` succeeds.[back]

## Updating Haskell packages on Arch

A few days ago I noticed that there were a few Haskell packages on AUR that had received updates. This was the excuse I had been waiting for to address the second part of keeping my Haskell packages up-to-date (I’ve written about the first part before, dealing with an update to GHC).

It’s easy to find the packages with available updates:

`% yaourt -Su --aur`

Unfortunately it isn’t as easy as just updating the listed packages. Any package that depends on an updated package really should be re-compiled and re-installed to guarantee that the entire system behaves as expected after the upgrade. Of course `pacman` can handle it:

`% pacman -Rcn <all pkgs with updates>`

This will list all packages that will be removed. After removing them all, they can all be re-installed. That is of course not quite as nice as it could be, since they all then will be explicitly installed, it would be nicer to just re-install the “top-level packages”. This is one way to achieve this.

I did a bit of refactoring and put the Arch-related functions from the previous post in their own module, `Arch`. Then I added a function that takes a package and recursively inspects `Required By` until a “top-level package” (i.e. a package that doesn’t require any other package) is reached:

```getTopRequiredBy pkg = let tops = do first <- getRequiredBy pkg if null first then return [pkg] else liftM concat \$ mapM getTopRequiredBy first in liftM nub tops```

After that it’s straight forward to write up a little tool which offers some advice on what to do:

```main = do pkgsToUpgrade <- getArgs pkgsToReinstall <- liftM (nub . concat) \$ mapM Arch.getTopRequiredBy pkgsToUpgrade putStrLn \$ "To remove : pacman -Rnc " ++ unwords pkgsToUpgrade putStrLn \$ "To re-install : yaourt -S " ++ unwords pkgsToReinstall```

Using it on the packages I wanted to upgrade gave the following output:

```% runhaskell PkgUpgrade.hs haskell-{configfile,haxml,json,missingh,safe,testpack,time} To remove : pacman -Rnc haskell-configfile haskell-haxml haskell-json haskell-missingh haskell-safe haskell-testpack haskell-time To re-install : yaourt -S haskell-configfile haskell-haxml haskell-json haskell-hsh haskell-safe```

Following that advice seemed to work just like I intended.