CMake, ExternalData, and custom fetch script

I failed to find a concrete example on how to use the CMake module ExternalData with a custom fetch script. Since I finally manage to work out how to use it I thought I’d try to help out the next person who needs to go down this route.

Why ExternalData?

I thought I’d start with a short justification of why I was looking at the module at all.

At work I work with a product that processes images and video. When writing tests we often need some rather large files (from MiB to GiB) as input. The two obvious options are:

  1. Check the files into our Git repo, or
  2. Put them on shared storage

Neither of these are very appealing. The former just doesn’t feel quite right, these are large binary files that rarely, if ever, change, why place them under version control at all? And if they do change the Git repo is likely to balloon in size and impact cloning times negatively. The latter makes it difficult to run our tests on a machine that isn’t on the office network and any changes to the files will break older tests, unless we always only add files, never modify any in place. On the other hand, the former guarantees that the files needed for testing are always available and it is possible to modify the files without breaking older tests. The pro of the latter is that we only download the files needed for the current tests.

ExternalData is one option to address this. On some level it feels like it offers a combination of both options above:

  • It’s possible to use the shared storage
  • When the shared storage isn’t available it’s possible to fall back on downloading the files via other means
  • The layout of the storage is such that modifying in place is much less likely
  • Only the files needed for the currest tests will be downloaded when building off-site

The object store

We do our building in docker images that do have our shared storage mapped in, so I’d like them to take advantage of that. At the same time I want the builds performed off-site to download the files. To get this behaviour I defined two object stores:


The module will search each of these for the required files and download only if they aren’t found. Downloaded files will be put into the first of the stores. Oh, and it’s very important that the first store is given with an absolute path!

The store on the shared storage looks something like this:

└── MD5
    ├── 94ed17f9b6c74a732fba7b243ab945ff
    └── a2036177b190fbee6e9e038b718f1c20

I can then drop a file MyInput.avi.md5 in my source tree with the md5 of the real file (e.g. a2036177b190fbee6e9e038b718f1c20) as the content. Once that is done I can follow the example found in the introduction of the reference documentation.

curl vs sftp

So far so good. This now works on-site, but for off-site use I need to fetch the needed files. The last section of the reference documentation is called Custom Fetch Scripts. It mentions that files are normally downloaded using file(DOWNLOAD). Neither there, nor in the documentation for file is there a mention of what is used under the hood to fetch the files. After asking on in #cmake I found out that it’s curl. While curl does handle SFTP I didn’t get it to work with my known_hosts file, nor with my SSH agent (both from OpenSSH). On the other hand it was rather easy to configure sftp to fetch a file from the internet-facing SSH server we have. Now I just had to hook it into CMake somehow.

Custom fetch script

As the section on “Custom Fetch Scripts” mention three things are needed:

  1. Specify the script via the ExternalDataCustomScript:// protocol.
  2. Tell CMake where it can find the fetch script.
  3. The fetch script itself.

The first two steps are done by providing a URL template and pointing to the script via a special variable:

set(ExternalData_URL_TEMPLATES "ExternalDataCustomScript://sftp/mnt/shared/over/nfs/Objects/%(algo)/%(hash)")
set(ExternalData_CUSTOM_SCRIPT_sftp ${CMAKE_SOURCE_DIR}/cmake/FetchFromSftp.cmake)

It took me a ridiculous amount of time to work out how to write a script that turns out to be rather short. This is an experience that seems to repeat itself when using CMake; it could say something about me, or something about CMake.

get_filename_component(FFS_ObjStoreDir ${ExternalData_CUSTOM_FILE} DIRECTORY)
get_filename_component(FFS_InputFilename ${ExternalData_CUSTOM_LOCATION} NAME)
get_filename_component(FFS_OutputFilename ${ExternalData_CUSTOM_FILE} NAME)

execute_process(COMMAND sftp${ExternalData_CUSTOM_LOCATION}

  set(ExternalData_CUSTOM_ERROR "Failed to fetch from SFTP - ${FFS_SftpErr}")
  file(MAKE_DIRECTORY ${FFS_ObjStoreDir})
  file(RENAME ${FFS_InputFilename} ${FFS_ObjStoreDir}/${FFS_OutputFilename})

This script is run with cmake -P in the binary dir of the CMakeLists.txt where the test is defined, which means it’s oblivious about the project it’s part of. PROJECT_BINARY_DIR is empty and CMAKE_BINARY_DIR is the same as CMAKE_CURRENT_BINARY_DIR. This is the reason why the first store in ExternalData_OBJECT_STORES has to be an absolute path – it’s very difficult, if not impossible, to find the correct placement of the object store otherwise.

Qt5+D-Bus+CMake, a complete example

Yesterday I started digging into Qt5 and D-Bus. I never found a complete example, so I put one together myself:

Final version of JSON to sum type

After some feedback on my previous post I ended up with the following implementation.

instance FromJSON V.VersionRange where
  parseJSON = withObject "VersionRange" go
      go o =
        V.thisVersion <$> o .: "ThisVersion" <|>
        V.laterVersion <$> o .: "LaterVersion" <|>
        V.earlierVersion <$> o .: "EarlierVersion" <|>
        V.WildcardVersion <$> o .: "WildcardVersion" <|>
        nullaryOp V.anyVersion <$> o .: "AnyVersion" <|>
        binaryOp V.unionVersionRanges <$> o .: "UnionVersionRanges" <|>
        binaryOp V.intersectVersionRanges <$> o .: "IntersectVersionRanges" <|>
        V.VersionRangeParens <$> o .: "VersionRangeParens"

      nullaryOp :: a -> Value -> a
      nullaryOp = const

      binaryOp f [a, b] = f a b

Thanks David for your suggestions.

From JSON to sum type

For a while I’ve been planning to take full ownership of the JSON serialisation and parsing in cblrepo. The recent inclusion of instances of ToJSON and FromJSON for Version pushed me to take the first step by writing my own instances for all external types.

When doing this I noticed that all examples in the aeson docs use a product

data Person = Person {
      name :: Text
    , age  :: Int

whereas I had to deal with quite a few sums, e.g. VersionRange. At first I struggled a little with how to write an instance of FromJSON. After quite a bit of thinking I came up with the following, which I think is fairly nice, but I’d really like to hear what others think about it. Maybe I’ve just missed a much simpler way of implementing parseJSON:

instance FromJSON V.VersionRange where
  parseJSON = withObject "VersionRange" go
      go o = do
        lv <- (o .:? "LaterVersion") >>= return . fmap V.laterVersion
        tv <- (o .:? "ThisVersion") >>= return . fmap V.thisVersion
        ev <- (o .:? "EarlierVersion") >>= return . fmap V.earlierVersion
        av <- (o .:? "AnyVersion") >>= \ (_::Maybe [(Int,Int)]) -> return $ Just V.anyVersion
        wv <- (o .:? "WildcardVersion") >>= return . fmap V.WildcardVersion
        uvr <- (o .:? "UnionVersionRanges") >>= return . fmap toUvr
        ivr <- (o .:? "IntersectVersionRanges") >>= return . fmap toIvr
        vrp <- (o .:? "VersionRangeParens") >>= return . fmap V.VersionRangeParens
        maybe (typeMismatch "VersionRange" $ Object o)
          (lv <|> tv <|> ev <|> uvr <|> ivr <|> wv <|> vrp <|> av)

      toUvr [v0, v1] = V.unionVersionRanges v0 v1
      toIvr [v0, v1] = V.intersectVersionRanges v0 v1

Any and all comments and suggestions are more than welcome!

Freer play with effects

In the previous posts on my playing with free I got stuck at combining APIs. I recalled reading a paper on extensible effects as an alternatve to monad transformers. I have to admit to not having finished the paper, and not quite understanding the part I did read. When looking it up again I found that the work had continued and that there is a paper on more extensible effects. (I got to it via

A quick search of Hackage revealed the package extensible-effects with an implementation of the ideas, including the stuff in the latter paper. So, what would the examples from my previous posts look like using extensible effects?


The examples require a few extensions and modules:

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE TypeOperators #-}


import Control.Eff
import Control.Eff.Lift
import Control.Eff.Operational
import Data.Typeable

Just an API

This part was greatly helped by the fact that there is a example in extensible-effects.

I start with defining the SimpleFile API using GADTs

data SimpleFileAPI a where
  LoadFile :: FilePath -> SimpleFileAPI String
  SaveFile :: FilePath -> String -> SimpleFileAPI ()

The usage of the constructors need to be wrapped up in singleton. To remember that I create two convenience functions

loadFile :: Member (Program SimpleFileAPI) r => FilePath -> Eff r String
loadFile = singleton . LoadFile

saveFile :: Member (Program SimpleFileAPI) r => FilePath -> String -> Eff r ()
saveFile fp = singleton . SaveFile fp

For withSimpleFile I only have to modify the type

withSimpleFile :: Member (Program SimpleFileAPI) r => (String -> String) -> FilePath -> Eff r ()
withSimpleFile f fp = do
  d <- loadFile fp
  let result = f d
  saveFile (fp ++ "_new") result

Now for the gut of it, the interpreter.

runSimpleFile :: (Member (Lift IO) r, SetMember Lift (Lift IO) r) => Eff (Program SimpleFileAPI :> r) a -> Eff r a
runSimpleFile = runProgram f
    f :: (Member (Lift IO) r, SetMember Lift (Lift IO) r) => SimpleFileAPI a -> Eff r a
    f (LoadFile fp) = lift $ readFile fp
    f (SaveFile fp s) = lift $ writeFile fp s

Runnnig it is fairly simple after this

> :! cat test.txt 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam lectus.
Sed sit amet ipsum mauris. Maecenas congue ligula ac quam viverra nec
consectetur ante hendrerit.
> runLift $ runSimpleFile $ withSimpleFile (map toUpper) "test.txt"
> :! cat test.txt_new 

Now, that was pretty easy. It looks almost exactly like when using Free, only without the Functor instance and rather more complicated types.

Combining two APIs

Now I get to the stuff that I didn’t manage to do using Free; combining two different APIs.

I start with defining another API. This one is truly a play example, sorry for that, but it doesn’t really matter. The type with convenience function looks like this

data StdIoAPI a where
  WriteStrLn :: String -> StdIoAPI ()

writeStrLn :: Member (Program StdIoAPI) r => String -> Eff r ()
writeStrLn = singleton . WriteStrLn

The interpreter then is straight forward

runStdIo :: (Member (Lift IO) r, SetMember Lift (Lift IO) r) => Eff (Program StdIoAPI :> r) a -> Eff r a
runStdIo = runProgram f
    f :: (Member (Lift IO) r, SetMember Lift (Lift IO) r) => StdIoAPI a -> Eff r a
    f (WriteStrLn s) = lift $ putStrLn s

Now I just need a program that combines the two APIs

verboseWithSimpleFile :: (Member (Program StdIoAPI) r, Member (Program SimpleFileAPI) r) =>
                         (String -> String) -> String -> Eff r ()
verboseWithSimpleFile f fp = writeStrLn ("verboseWithSimpleFile on " ++ fp) >> withSimpleFile f fp

That type is surprisingly clear I find, albeit a bit on the long side. Running it is just a matter of combining runStdIo and runSimpleFile.

> :! cat test.txt 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam lectus.
Sed sit amet ipsum mauris. Maecenas congue ligula ac quam viverra nec
consectetur ante hendrerit.
> runLift $ runSimpleFile $ runStdIo $ verboseWithSimpleFile (map toUpper) "test.txt"
verboseWithSimpleFile on test.txt
> :! cat test.txt_new 

Oh, and it doesn’t matter in what order the interpreters are run!

At this point I got really excited about Eff because now it’s obvious that I’ll be able to write the logging “decorator”, in fact it’s clear that it’ll be rather simple too.

The logging

As before I start with a data type and a convenience function

data LoggerAPI a where
  Log :: String -> LoggerAPI ()

logStr :: Member (Program LoggerAPI) r => String -> Eff r ()
logStr = singleton . Log

For the decorating I can make use of the fact that APIs can be combined like I did above. That is, I don’t need to bother with any coproduct (Sum) or anything like that, I can simply just push in a call to logStr before each use of SimpleFileAPI

logSimpleFileOp :: (Member (Program SimpleFileAPI) r, Member (Program LoggerAPI) r) => SimpleFileAPI a -> Eff r a
logSimpleFileOp op@(LoadFile fp) = logStr ("LoadFile " ++ fp) *> singleton op
logSimpleFileOp op@(SaveFile fp _) = logStr ("SaveFile " ++ fp) *> singleton op

Of course an interpreter is needed as well

runLogger :: (Member (Lift IO) r, SetMember Lift (Lift IO) r) => Eff (Program LoggerAPI :> r) a -> Eff r a
runLogger = runProgram f
    f :: (Member (Lift IO) r, SetMember Lift (Lift IO) r) => LoggerAPI a -> Eff r a
    f (Log s) = lift $ putStrLn s

Running is, once again, a matter of stacking interpreters

> :! cat test.txt
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam lectus.
Sed sit amet ipsum mauris. Maecenas congue ligula ac quam viverra nec
consectetur ante hendrerit.
> runLift $ runLogger $ runSimpleFile $ runProgram logSimpleFileOp $ withSimpleFile (map toUpper) "test.txt"
LoadFile test.txt
SaveFile test.txt_new
> :! cat test.txt_new 

Closing thoughts

With Eff I’ve pretty much arrived where I wanted, I can

  • define APIs of operations in a simple way (simpler than when using Free even).
  • write a definitional interpreter for the operations.
  • combine two different APIs in the same function.
  • translate from one API to another (or even to a set of other APIs).

On top, I can do this without having to write a ridiculous amount of code.

I’m sure there are drawbacks as well. There’s a mention of some of them in the paper. However, for my typical uses of Haskell I haven’t read anything that would be a deal breaker.

Free play, part three

The code in the previous post can do with a bit of cleaning up. I start with introducing a type class for an API that can be run

class RunnableF f where
  runOp :: f a -> IO a

and a function that actually runs iteFile

runF :: (RunnableF o) => Free o a -> IO a
runF = foldFree runOp

A coproduct (Sum) of two runnable APIs can itself be runnable

instance (RunnableF f, RunnableF g) => RunnableF (Sum f g) where
  runOp (InL op) = runOp op
  runOp (InR op) = runOp op

After this all I have to do is to move the guts of runSimpleF and runLogging into implementations for SimpleFil and LogF respectively

instance RunnableF SimpleFileF where
  runOp (LoadFile fp f') = liftM f' $ readFile fp
  runOp (SaveFile fp d r) = writeFile fp d >> return r

instance RunnableF LogF where
  runOp (Log s a)= putStrLn s >> return a

The rest is left as is. Running now looks like this

> :! cat test.txt
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam lectus.
Sed sit amet ipsum mauris. Maecenas congue ligula ac quam viverra nec
consectetur ante hendrerit.
> runF $ withSimpleFile (map toUpper) "test.txt"
> :! cat test.txt_new
> runF $ foldFree loggingSimpleFileI $ withSimpleFile (map toLower) "test.txt"
** load file test.txt
** save file test.txt_new
> :! cat test.txt_new
lorem ipsum dolor sit amet, consectetur adipiscing elit. donec a diam lectus.
sed sit amet ipsum mauris. maecenas congue ligula ac quam viverra nec
consectetur ante hendrerit.

What else?

With these changes it becomes slightly esier to create new basic API types. but there are still a few things I think could be useful:

  • More generic decorators. logSimpleFileI is tied to decorating only SimpleFileF. I’m fairly sure this could be dealt with by a type class LoggableF and have APIs implement it. I also think there’s a rather natural implementation of LoggableF for Sum f g.
  • Combining decorators (and other interpreters). I’m guessing this is what John refers to as “generalizing interpreters.”
  • Combining APIs. I’d prefer making small APIs of related operations, but then that also means I need a nice way of combining APIs. I had a go at using FreeT but simply gave up on bending the types to my will. In any case I’m not completely sure that a stack of various FreeT is a good direction to go.

Free play, part two

This post on Free play builds on my previous one. It was John A De Goes’ post that finally got me started on playing with Free and his post contian the following:

Moreover, not only do algebras compose (that is, if you have algebras f and g, you can compose them into a composite algebra Coproduct f g for some suitable definition of Coproduct), but interpreters also compose — both horizontally and vertically.

And a little bit later he offers a few types and some details, but not all details. How could something like that look in Haskell?

Starting from the code in the previous post I first created a new type for a logging action

data LogF a = Log String a

This type has to be a Functor

instance Functor LogF where
  fmap f (Log s a) = Log s (f a)

The logging should basically decorate a SimpleFileF a action, so I need a function to map one into a Free LogF a

logSimpleFileI :: SimpleFileF a -> Free LogF ()
logSimpleFileI (LoadFile fp _) = liftF $ Log ("** load file " ++ fp) ()
logSimpleFileI (SaveFile fp _ _) = liftF $ Log ("** save file " ++ fp) ()

Now I needed a Coproduct for Functor. Searching hackage only offered up one for Monoid (in monoid-extras) so I first translated one from PureScript, but later I got some help via Twitter and was pointed to two in Haskell, Data.Functor.Coproduct from comonad and Data.Functor.Sum from transformers, I decided on the one from transformers because of its shorter name and the fact that it was very different from my translated-from-PureScript version.

Following John’s example I use Applicative to combine the logging with the file action

loggingSimpleFileI :: SimpleFileF a -> Free (Sum LogF SimpleFileF) a
loggingSimpleFileI op = toLeft (logSimpleFileI op) *> toRight (liftF op)

with toLeft and toRight defined like this

toLeft :: (Functor f, Functor g) => Free f a -> Free (Sum f g) a
toLeft = hoistFree InL
toRight :: (Functor f, Functor g) => Free g a -> Free (Sum f g) a
toRight = hoistFree InR

With all of this in place I can decorate the program from the last post like this foldFree loggingSimpleFileI (withSimpleF toUpper "FreePlay.hs"). What’s left is a way to run it. The function for that is a natural extension of runsimpleFile

runLogging :: Free (Sum LogF SimpleFileF) a -> IO a
runLogging = foldFree f
    f :: (Sum LogF SimpleFileF) a -> IO a
    f (InL op) = g op
    f (InR op) = h op

    g :: LogF a -> IO a
    g (Log s a)= putStrLn s >> return a

    h :: SimpleFileF a -> IO a
    h (LoadFile fp f') = liftM f' $ readFile fp
    h (SaveFile fp d r) = writeFile fp d >> return r

Running the decorated program

runLogging $ foldFree loggingSimpleFileI (withSimpleF toUpper "FreePlay.hs")

does indeed result in the expected output

** load file FreePlay.hs
** save file FreePlay.hs_new

and the file FreePlay.hs_new contains only uppercase letters.

My thoughts

This ability to decorate actions (or compose algebras) is very nice. There’s probably value in the “multiple interpreters for a program” in some domains, but I have a feeling that it could be a hard sell. However, combining it with this kind of composability adds quite a bit of value in my opinion. I must say I don’t think my code scales very well for adding more decorators (composing more algebras), but hopefully some type wizard can show me a way to improve on that.

The code above is rather crude though, and I have another version that cleans it up quite a bit. That’ll be in the next post.

Free play, part one

When I read John A De Goes post A Modern Architecture for FP I found it to be a bit too vague for me, but the topic was just interesting enough to finally push me to play a little with free monads. It’s not the first post I’ve read on the topic, there have been many before. None have quite managed to push me into actually doing something though!

A file API

To make it concrete but still short enough to not bore readers I came up with a small API for working with files:

data SimpleFileF a =
  LoadFile FilePath (String -> a)
  | SaveFile FilePath String a

The free monad wraps a functor, so here’s an implementation of that

instance Functor SimpleFileF where
  fmap f (LoadFile fp f')= LoadFile fp (f . f')
  fmap f (SaveFile fp d a) = SaveFile fp d (f a)

Now for some convenient functions to work with the API type

loadFile :: FilePath -> Free SimpleFileF String
loadFile fp = liftF $ LoadFile fp id

saveFile :: FilePath -> String -> Free SimpleFileF ()
saveFile fp d = liftF $ SaveFile fp d ()

With this in place I can write a somewhat more complex one

withSimpleFile :: (String -> String) -> FilePath -> Free SimpleFileF ()
withSimpleFile f fp = do
  d <- loadFile fp
  let result = f d
  saveFile (fp ++ "_new") result

Now I need a way to run programs using the API

runSimpleFile :: Free SimpleFileF a -> IO a
runSimpleFile = foldFree f
    f (LoadFile fp f') = liftM f' $ readFile fp
    f (SaveFile fp d r) = writeFile fp d >> return r

If this code was save in the file FreePlay.hs I can now convert it all to upper case by using

runSimpleFile $ withSimpleFile toUpper "FreePlay.hs"

which of course will create the file FreePlay.hs_new.

What have I bought so far?

Well, not much really.

So far it’s not much more than a rather roundabout way to limit what IO actions are available. In other words it’s not much more than what can be done by creating a limited IO, as in my earlier posts here and here.

Of course it would be possible to write another run function, e.g. one that doesn’t actually perform the file actions but just says what it would do. The nice thing though is, to use that same metaphor as John does, that I can add a layer to the onion. In other words, I can decorate each use of my SimpleFileF API with some other API. I think it was John’s description of this layering that pushed me to play with Free.

Trick for pre-processing source in CMake

When using <LANG>_COMPILER_LAUCHER or RULE_LAUNCH_COMPILER the following is a nice little pattern to deal with the arguments.

#! /usr/bin/python3

import argparse
import subprocess
import sys

# create an argumentparser for the stuff the launcher script needs, some
# specific to it and some of the compiler arguments
parser = argparse.ArgumentParser(description='Compiler launcher for C.')
parser.add_argument('--flag-for-wrapper', type=str)
parser.add_argument('-I', dest='includes', action='append', type=str, default=[])
parser.add_argument('-D', dest='defines', action='append', type=str, default=[])
parser.add_argument('-o', dest='output', type=str)
parser.add_argument('-c', dest='input', type=str)

args, rest = parser.parse_known_args(sys.argv)

# do stuff with the source, e.g. pass it through the C pre-processor (using the
# arguments picked out above) and then process the output

# build another argument parser to prepare the call to the compiler
cc_parser = argparse.ArgumentParser(description='Parser for removing launcher-specific arguments.')
cc_parser.add_argument('--flag-for-wrapper', type=str)
cc_cmd = cc_parser.parse_known_args(sys.argv)[1][1:]

That is, first create a parser for launcher-specific arguments and the compiler arguments1 that are of interest to the launcher. Then perform the magic of the launcher. Finally, create another parser and use it to remove the launcher-specific arguments and use the remainder to perform the compilation.

  1. Here I’m relying on CMake always putting -c just before the source file, which seems to hold most of the time.

Elm 0.16 with stack

I just noticed that there’s been a new release of the elm tools. The previous release was fairly easy to build using stack, but this release made it a little bit easier still.

The script for cloning should be modified to look like this:

#! /usr/bin/zsh

repos=("git clone -b 0.16"
       "git clone -b 0.16"
       "git clone -b 0.16"
       "git clone -b 0.16"
       "git clone -b 0.16"

for r in ${repos[@]}; do
    eval ${r}

When creating the inital stack.yaml tell stack to use release 7.10.2 of ghc:

% stack init --resolver ghc-7.10.2

That makes it possible to use the installed version of ghc. It’s also unecessary to provide any flag for aeson, which means there’s no need for any manual changed to stack.yaml prior to running (add --install-ghc if you don’t have it on your system already)

% stack solver --modify-stack-yaml

Now just build and install as before:

% stack install elm-{compiler,make,package,repl}
% stack install elm-reactor

That’s it!