This is my input into the recent discussion on referential transparency (RT). I’m nowhere near as well versed in the subject as others, but how am I ever to learn anything unless I put my thoughts out there for them to be laughed at and ridiculed?

It all started with a post on stackoverflow.com, which received several very long and detailed responses, in particular from Uday Reddy (here and here). His answers were also linked to from Reddit. His second response contains a link to an excellent paper by Strachey, Fundamental concepts in programming languages. I’d go as far as saying that, despite it being lecture notes rather than a fully worked paper, it ought to be required reading for all software developers.

The rest of what I write here hinges on me actually understanding what Strachey writes in his paper. Of course I’m looking forward to comments/corrections/etc that help me correct my understanding.

### What Strachey says about RT

In section 3.2.1 he introduces RT like this:

One of the most useful properties of expressions is that called by Quine referential transparency. In essence this means that if we wish to find the value of an expression which contains a sub-expression, the only thing we need to know about the sub-expression is its value. Any other features of the sub-expression, such as its internal structure, the number and nature of its components, the order in which they are evaluated or the colour of the ink in which they are written, are irrelevant to the value of the main expression.

There is however a crucial bit just before that:

Like the rest of mathematics, we shall be concerned only with R-values.

That is, he starts out with a very limited subset of what most people would consider a usable imperative programming language.

He then dives into some more details in section 3.2.2 by adding the concept of environment, which is handled through the use of a where-clause, or alternatively using let-statements (this ought to be making any Haskell developer feel right at home). After a few interesting sections on stuff like applicative structure, evaluation, and conditional expressions he finally tackles the issue of variables in section 3.3.1. There are two pieces to the trick, the first is to take advantage of his earlier insight that lead to a split of values into L-values and R-values:

If we consider L-values as well as R-values, however, we can preserve referential transparency as far as L-values are concerned. This is because L-values, being generalised addresses, are not altered by assignment commands. Thus the command x := x+1 leaves the address of the cell representing x (L-value of x) unchanged although it does alter the contents of this cell (R-value of x). So if we agree that the values concerned are all L-values, we can continue to use where-clauses and lambda-expressions for describing parts of a program which include assignments.

The cost of this is that the entire theory constructed earlier for operations taking R-values now has to be revised to incorporate L-values. The outline for this is in the rest of section 3.3 and it basically comes down to include an abstract store in the environment. However, before doing that he mentions that:

I think these problems are inevitable and although much of the work remains to be done, I feel hopeful that when completed it will not seem so formidable as it does at present, and that it will bring clarification to many areas of programming language study which are very obscure today. In particular the problems of side effects will, I hope, become more amenable.

He does reach his goal, but it’s a bit unfortunate that he stops short of considering the wider of problem of side effects. My assumption is that this would have to be dealt with in a similar way to assignment, but that would mean that rather than just adding an store to the environment the world, or a subset of it, would need to be added.

An open question (to me) is if anyone has built on Strachey’s work in this area and thought of the details of RT and general side effects?

The original question described RT as

it means you can replace equals with equals

which I actually think is a rather good, and very short, description of it. It’s not the full story, there are further important details, but it’s a good first intuition. Also, it’s a description usable in Haskell. Well, to be slightly more nuanced, it good for Haskell without IO (Haskell-IO). However, this is where the strict type system of Haskell really starts to shine because (here I’m probably a bit imprecise) we only have R-values in Haskell-IO. If we want to use assignment we add the use of a state monad, and we do that explicitly.

A former colleague of mine said that in Haskell we need to build up our own computational models ourselves. For instance, if we need assigment we use State, if we need to record progress we use Writer, etc. In other languages the language designer has already made all those choices for us, we don’t get to make them ourselves. For RT it means that Haskell is more explicit in what the environment of a function is.

Moving on to general side effects those are also more explicit in Haskell since they have to happen inside the IO monad. That alone is a great boon for RT in Haskell since it becomes explicit where RT as worked out by Strachey applies directly, and where there are (hopefully amenable) problems of side effects left. Even further, in Haskell it’s possible to make subsets of IO (by wrapping IO, see e.g. my own posts on wrapping IO, part 1 and wrapping IO, part 2). I’m sure that if including the world in the environment is the way to achieve RT with general side effects, then it’s highly beneficial to be able to create subsets of the world.

### RT in Haskell vs. RT in (common) imperative languages

Uday writes in his first answer that:

But, today, functional programmers claim that imperative programming languages are not referentially transparent. Strachey would be turning in his grave.

This may well be true, but I think that when a Haskell programmer says it, he’s only twitching slightly. The reason? Strachey writes:

Any departure of R-value referential transparency in a R-value context should either be eliminated by decomposing the expression into several commands and simpler expressions, or, if this turns out to be difficult, the subject of a comment.

Which is something that Haskell programmers do naturally by use of IO. That is, in Haskell you either have an R-value, and you clearly see that you do, or you put in a comment, which is encoded in the type of the function.

This rather lengthy post basically arrives at the following, which is what I suspect the user [pacala is saying about RT on Reddit][reddit-pacala]:

Imperative languages my well be RT, but when trying to understand a code base the environment of each function is so large that understanding is an intractable problem. I don’t have this problem in Haskell.

1. I think we have to carefully consider the language level whose RT we’re talking about: do we include statements, function calls, embedded languages, and the language runtime?

Monadic expressions (of Haskell) are a RT way of encoding an imperative language that isn’t necessarily RT. The potential side effects lie in the semantics of the encoded language (which isn’t Haskell, although you could consider it a part of Haskell). Similarily, the underlying implementation of a monad might or might not rely on unsafe/non-RT constructions.

From that point of view, Haskell with the IO monad is RT, but the execution of Haskell with IO is not RT.

In a similar manner, you might claim that C is RT and the side effects lie on a different level. But as a programmer, you don’t have separate access to the RT level so its existence is theoretical. In C, you just can’t replace the expression random() + random() with 2 * random() but in (non-unsafe) Haskell you can. In the end, you can twist the terms in various ways but you shouldn’t hide the clear differences between C and Haskell.

2. AntC

Hi Magnus, Re: “An open question (to me) is if anyone has built on Stracheyâ€™s work in this area …”

At the time of his writing, Strachey was heavily involved with the team developing the language CPL.

The most rigorous use of L-values and R-values was (is) BCPL (Martin Richards) — which some claims stands for “Basically Christopher’s Programming Language”. BCPL inspired C, and the rest is history.

Regarding IO and side effects, CPL, and BCPL were conventional for their day (and followed ALGOL 60) in leaving IO as being ‘implementation dependent’. These were all in the von Neumann/Turing/procedural model of computing. They all used destructive assignment.

BCPL supports assignment to L-values. In particular, the R-values of functions/procedures are just static cells holding pointers to the code body. Function application is just entering that code point using the current stack frame. So you can put an arbitrary expression to yield the code point:

(x > y -> SIN, COS) (z)                        // ... -> ... , ... is conditional expression, apply to z

Or put function calls on the LHS of assignments:

Array (x, y) := z                              // array update (note that arrays are not native)

Or change a function to point to a different code point:

TRIG := (x > y -> SIN, COS)                    // excellent way to perplex any reader
// but can achieve something like anonymous funs

Strachey died tragically young. Oh, that he could have continued, and collaborated with the wealth of PL research going on at the time.

AntC

3. IMO

All that buzz about RT is going nowhere.

Maybe, I’ve learned it wrong. But I’ve never heard of RT when I was absorbed in an imperative world. RT is a common term in functional programming. I’ve read the term and it’s “definition” just after a couple of Haskell’s tutorials.

To me, it’s simple as that: RT is the ability to switch sentences that refer to the same thing without changing the result.

In Haskell, you can do that freely. Even inside IO monad! If you do anything wrong, the compiler won’t let you continue. In Java, you can do that too, but you need to be aware of all the environment and sequencing in order to preserve the result.

Sequencing in the IO monad is enforced by the types, an IO String will not fit as a valid argument for a function that needs a String, so you will not be able to inline an IO String value, you are enforced to use the bind operator.