Using QuickCheck to test C APIs

Last year at ICFP I attended the tutorial on QuickCheck with John Hughes. We got to use the Erlang implementation of QuickCheck to test a C API. Ever since I’ve been planning to do the same thing using Haskell. I’ve put it off for the better part of a year now, but then Francesco Mazzoli wrote about inline-c (Call C functions from Haskell without bindings and I found the motivation to actually start writing some code.

The general idea

Many C APIs are rather stateful beasts so to test it I

  1. generate a sequence of API calls (a program of sorts),
  2. run the sequence against a model,
  3. run the sequence against the real implementation, and
  4. compare the model against the real state each step of the way.

The C API

To begin with I hacked up a simple implementation of a stack in C. The “specification” is

/**
 * Create a stack.
 */
void *create();

/**
 * Push a value onto an existing stack.
 */
void push (void *, int);

/**
 * Pop a value off an existing stack.
 */
int pop(void *);

Using inline-c to create bindings for it is amazingly simple:

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}

module CApi
    where

import qualified Language.C.Inline as C
import Foreign.Ptr

C.include "stack.h"

create :: IO (Ptr ())
create = [C.exp| void * { create() } |]

push :: Ptr () -> C.CInt -> IO ()
push s i = [C.exp| void { push($(void *s), $(int i)) } |]

pop :: Ptr () -> IO C.CInt
pop s = [C.exp| int { pop($(void *s)) } |]

In the code below I import this module qualified.

Representing a program

To represent a sequence of calls I first used a custom type, but later realised that there really was no reason at all to not use a wrapped list:

newtype Program a = P [a]
    deriving (Eq, Foldable, Functor, Show, Traversable)

Then each of the C API functions can be represented with

data Statement = Create | Push Int | Pop
    deriving (Eq, Show)

Arbitrary for Statement

My implementation of Arbitrary for Statement is very simple:

instance Arbitrary Statement where
    arbitrary = oneof [return Create, return Pop, liftM Push arbitrary]
    shrink (Push i) = Push <$> shrink i
    shrink _ = []

That is, arbitrary just returns one of the constructors of Statement, and shrinking only returns anything for the one constructor that takes an argument, Push.

Prerequisites of Arbitrary for Program Statement

I want to ensure that all Program Statement are valid, which means I need to define the model for running the program and functions for checking the precondition of a statement as well as for updating the model (i.e. for running the Statement).

Based on the C API above it seems necessary to track creation, the contents of the stack, and even if it isn’t explicitly mentioned it’s probably a good idea to track the popped value. Using record (Record is imported as R, and Record.Lens as RL) I defined it like this:

type ModelContext = [R.r| { created :: Bool, pop :: Maybe Int, stack :: [Int] } |]

Based on the rather informal specification I coded the pre-conditions for the three statements as

preCond :: ModelContext -> Statement -> Bool
preCond ctx Create = not $ RL.view [R.l| created |] ctx
preCond ctx (Push _) = RL.view [R.l| created |] ctx
preCond ctx Pop = RL.view [R.l| created |] ctx
    where
        l = length $ RL.view [R.l| stack |] ctx

That is

  • Create requires that the stack hasn’t been created already.
  • Push i requires that the stack has been created.
  • Pop requires that the stack has been created and that it holds at least one value. (That last bit isn’t explicitly mentioned but seems like a good thing to assume.)

Furthermore the “specification” suggests the following definition of a function for running a statement:

modelRunStatement :: ModelContext -> Statement -> ModelContext
modelRunStatement ctx Create = RL.set [R.l| created |] True ctx
modelRunStatement ctx (Push i) = RL.over [R.l| stack |] (i :) ctx
modelRunStatement ctx Pop = [R.r| { created = c, pop = headMay s, stack = tail s } |]
    where
        c = RL.view [R.l| created |] ctx
        s = RL.view [R.l| stack |] ctx

(This definition assumes that the model satisfies the pre-conditions, as can be seen in the use of tail.)

Arbitrary for Program Statement

With this in place I can define Arbitrary for Program Statement as follows.

instance Arbitrary (Program Statement) where
    arbitrary = liftM P $ ar baseModelCtx
        where
            ar m = do
                push <- liftM Push arbitrary
                let possible = filter (preCond m) [Create, Pop, push]
                if null possible
                    then return []
                    else do
                        s <- oneof (map return possible)
                        let m' = modelRunStatement m s
                        frequency [(499, liftM2 (:) (return s) (ar m')), (1, return [])]

The idea is to, in each step, choose a valid statement given the provided model and cons it with the result of a recursive call with an updated model. The constant 499 is just an arbitrary one I chose after running arbitrary a few times to see how long the generated programs were.

For shrinking I take advantage of the already existing implementation for lists:

    shrink (P p) = filter allowed $ map P (shrink p)
        where
            allowed = and . snd . mapAccumL go baseModelCtx
                where
                    go ctx s = (modelRunStatement ctx s, preCond ctx s)

Some thoughts so far

I would love making an implementation of Arbitrary s, where s is something that implements a type class that contains preCond, modelRunStatement and anything else needed. I made an attempt using something like

class S a where
    type Ctx a :: *

    baseCtx :: Ctx a
    preCond :: Ctx a -> a -> Bool
    ...

However, when trying to use baseCtx in an implementation of arbitrary I ran into the issue of injectivity. I’m still not entirely sure what that means, or if there is something I can do to work around it. Hopefully someone reading this can offer a solution.

Running the C code

When running the sequence of Statement against the C code I catch the results in

type RealContext = [r| { o :: Ptr (), pop :: Maybe Int } |]

Actually running a statement and capturing the output in a RealContext is easily done using inline-c and record:

realRunStatement :: RealContext -> Statement -> IO RealContext
realRunStatement ctx Create = CApi.create >>= \ ptr -> return $ RL.set [R.l| o |] ptr ctx
realRunStatement ctx (Push i) = CApi.push o (toEnum i) >> return ctx
    where
        o = RL.view [R.l| o |] ctx
realRunStatement ctx Pop = CApi.pop o >>= \ v -> return $ RL.set [R.l| pop |] (Just (fromEnum v)) ctx
    where
        o = RL.view [R.l| o |] ctx

Comparing states

Comparing a ModelContext and a RealContext is easily done:

compCtx :: ModelContext -> RealContext -> Bool
compCtx mc rc = mcC == rcC && mcP == rcP
    where
        mcC = RL.view [R.l| created |] mc
        rcC = RL.view [R.l| o |] rc /= nullPtr
        mcP = RL.view [R.l| pop|] mc
        rcP = RL.view [R.l| pop|] rc

Verifying a Program Statement

With all that in place I can finally write a function for checking the validity of a program:

validProgram :: Program Statement -> IO Bool
validProgram p = and <$> snd <$> mapAccumM go (baseModelCtx, baseRealContext) p
    where
        runSingleStatement mc rc s = realRunStatement rc s >>= \ rc' -> return (modelRunStatement mc s, rc')

        go (mc, rc) s = do
            ctxs@(mc', rc') <- runSingleStatement mc rc s
            return (ctxs, compCtx mc' rc')

(This uses mapAccumM from an earlier post of mine.)

The property, finally!

To wrap this all up I then define the property

prop_program :: Program Statement -> Property
prop_program p = monadicIO $ run (validProgram p) >>= assert

and a main function

main :: IO ()
main = quickCheck prop_program

`mapAccum` in monad

I recently had two functions of very similar shape, only difference was that one was pure and the other need some I/O. The former was easily written using mapAccumL. I failed to find a function like mapAccumL that runs in a monad, so I wrote up the following:

mapAccumM :: (Monad m, Traversable t) => (a -> b -> m (a, c)) -> a -> t b -> m (a, t c)
mapAccumM f a l = swap <$> runStateT (mapM go l) a
    where
        go i = do
            s <- get
            (s', r) <- lift $ f s i
            put s'
            return r

Bring on the comments/suggestions/improvements/etc!

Oh no! Success

This can’t possibly be good for Haskell…

We chose Haskell and the FP Complete stack because we knew the complexity of the problem required a new approach. The result was that we built better software faster than ever, and delivered it defect-free into a production environment where it has proven robust and high-performance

Re. cryptonite

Excellent work Vincent, but then you’ve never done anything but! I’m looking forward to purging all the old crypto packages from ArchHaskell in the near future.

Oh, BTW, why Disqus for comments?

Facebook and PGP

I wonder if this will have any impact at all on PGP usage… I doubt it, but it’s still nice to see it done. The Register also writes about it.

Timing out in conduit

The previous post in my play-with-state-machines-and-conduit series was a bit of a detour into how to combine inputs in conduit. In this post I’ll try to wrap it up by making a conduit pipeline with a timeout.

Prelude

This post doesn’t reflect my first attempt at writing a state machine that times out. My first attempt was to actually write a state machine that timed out. I combined the input from the user with a tick generated once a second into a single event using Either. Each state of the machine then needed to keep track of the number of ticks it had seen in order to time out on seeing the fourth one. The counter was reset on each state transition. Obviously this led to a rather complicated state machine.

Then I decided to attempt to explore having more but simpler parts in the conduit pipeline. I think that strategy resulted in a much cleaner and simpler design.

A state machine for timing out

The state machine for the addition is unchanged from the earlier post and the timing out is added via a second state machine:

data TOState = TOGood Int | TOTimedOut
    deriving (Eq, Show)

data TOSignal a = TONoop | TOVal a | TOTime

toMachine :: Machine TOState (Either t a) (TOSignal a)
toMachine = createMachine (TOGood 0) go
    where
        go (TOGood _) (Right e) = (TOGood 0, TOVal e)

        go (TOGood n) (Left _)
            | n > 3 = (TOTimedOut, TOTime)
            | otherwise = (TOGood (n + 1), TONoop)

        go TOTimedOut _ = (TOTimedOut, TOTime)

It’s a tiny machine of only two states. The ticks come in via Left _ values and cause the counter to be increased each time. The Right e values are simply re-wrapped and passed on together with a resetting of the counter.

The process

The process is built up by combing the users input with a ticker source, the combination is passed through the time-out machine. After that the signal is inspected to see if it’s time to terminate, if not the inner value is extracted so it can be shown to the user.

The ticker source is a source that repeatedly waits followed by yielding a Tick:

sourceTick :: MonadIO m => Int -> Source m Tick
sourceTick w = forever $ do
    liftIO $ threadDelay w
    C.yield (Tick w)

The full process can then be written like this:

process :: ResourceT IO ()
process = ticker source $= timerOuter $= calcRes $$ output
    where
        source = stdinC $= concatMapC unpack $= calculator $= mapC Right

        calculator = wrapMachine calcMachine

        ticker s = whyTMVar s tickSource =$= wrapMachine toMachine
            where
                tickSource = sourceTick 1000000 $= mapC Left

        timerOuter = takeWhileC p1 =$= filterC p2 =$= mapC u
            where
                p1 TOTime = False
                p1 _ = True

                p2 (TOVal _) = True
                p2 _ = False

                u (TOVal a) = a
                u _ = undefined -- partial, but all right

        calcRes = mapC f
            where
                f CalcNothing = ""
                f (CalcResult n) = " =" ++ show n ++ "\n"
                f e@(CalcError {}) = "\n*** " ++ show e ++ "\n"

        output = mapC pack =$ stdoutC

Using the same definition of main as from the earlier post results in the following behaviour:

% runhaskell CalculatorTimeOut.hs
5 6 + =11
5 6 7
*** CalcError (ReadOperator 5 6) "Bad operator"
42 17 + =59

The process can be terminated by either pressing Ctrl-C, just like before, or by waiting for a time-out, like I did above.

Combining inputs in conduit

The code the post on using my simplistic state machine together with conduit will happily wait forever for input, and the only way to terminate it is pressing Ctrl-C. In this series I’m writing a simple adder machine, but my actual use case for these ideas are protocols for communication (see the first post), and waiting forever isn’t such a good thing; I want my machine to time out if input doesn’t arrive in a timely fashion.

I can think of a few ways to achieve this:

  1. Use a watchdog thread that is signalled from the main thread, e.g. by a Conduit that sends a signal as each Char passes through it.
  2. As each Char is read kick off a “timeout thread” which throws an exception back to the main thread, unless the main thread kills it before the timeout expires.
  3. Run a thread that creates ticks that then can be combined with the Chars read and fed into the state machine itself.

Since this is all about state machines I’ve opted for option 3. Furthermore, since I’ve recently finished reading the excellent Parallel and Concurrent Programming in Haskell I decided to attempt writing the conduit code myself instead of using something like stm-conduit.

The idea is to write two functions:

  1. one Source to combine two Sources, and
  2. one Sink that writes its input into a TMVar.

The latter of the two is the easiest one. Given a TMVar it just awaits input, stores it in the TMVar and then calls itself:

sinkTMVar :: MonadIO m => TMVar a -> Sink a m ()
sinkTMVar tmv = forever $ do
    v <- await
    case v of
        Nothing -> return ()
        Just v' -> liftIO (atomically $ putTMVar tmv v')

The other one is only slightly more involved:

whyTMVar :: MonadIO m => Source (ResourceT IO) a -> Source (ResourceT IO) a -> Source m a
whyTMVar src1 src2 = do
    t1 <- liftIO newEmptyTMVarIO 
    t2 <- liftIO newEmptyTMVarIO
    void $ liftIO $ async $ fstProc t1
    void $ liftIO $ async $ sndProc t2
    forever $ liftIO (atomically $ takeTMVar t1 `orElse` takeTMVar t2) >>= C.yield
    
    where
        fstProc t = runResourceT $ src1 $$ sinkTMVar t
        sndProc t = runResourceT $ src2 $$ sinkTMVar t

Rather short and sweet I think. However, there are a few things that I’m not completely sure of yet.

forkIO vs. async vs. resourceForkIO
There is a choice between at least three functions when it comes to creating the threads and I haven’t looked into which one is better to use. AFAIU there may be issues around exception handling and with resources. For now I’ve gone with async for no particular reason at all.
Using TMVar
In this example the input arrives rather slowly, which means having room for a single piece at a time is enough. If the use case changes and input arrives quicker then this decision has to be revisited. I’d probably choose to use stm-conduit in that case since it uses TMChan internally.
Combining only two Sources
Again, this use case doesn’t call for more than two Sources, at least at this point. If the need arises for using more Sources I’ll switch to stm-conduit since it already defines a function to combine a list of Sources.

The next step will be to modify the conduit process and the state machine.

Thought on JavaScript

Over the holidays I’ve been reading Douglas Crockford’s excellent book JavaScript: The Good Parts. About halfway through I came to the conclusion that JavaScript is an “anti-LISP”. There are many reasons to learn LISP, but none of them is “LISP is widely used in industry.” As Eric Raymond is famous words claim, knowing LISP will make you a better programmer. On the other hand there seems to be almost no reasons to learn JavaScript. It sure doesn’t seem to teach anything that’ll make you a better programmer. The only reason I can come up with is “JavaScript is widely used in industry.”

State machine with conduit

Previously I wrote about a simple state machine in Haskell and then offered a simple example, an adder. As I wrote in the initial post I’ve been using this to implement communication protocols. That means IO, and for that I’ve been using conduit. In this post I thought I’d show how easy it is to wrap up the adder in conduit, read input from stdin and produce the result on stdout.

At the top level is a function composing three parts:

  1. input from stdin
  2. processing in the adder
  3. output on stdout

This can be written as this

process :: ResourceT IO ()
process = source $= calc $$ output
    where
        source = stdinC

        output = stdoutC

        calc = concatMapC unpack =$= wrapMachine calcMachine =$= filterC f =$= mapC (pack . show)
            where
                f (CalcResult _) = True
                f _ = False

The input to the adder is ordinary Chars so the ByteStrings generated by stdinC need to be converted before being handed off to the state machine, which is wrapped to make it a Conduit (see below). The state machine is generating more signals that I want printed, so I filter out everything but the CalcResults. Then it’s passed to show and turned into a ByteString before sent to stdoutC.

I think the implementation of wrapMachine shows off the simplicity of conduit rather well:

wrapMachine :: (Monad m) => Machine state event signal -> Conduit event m signal
wrapMachine mach = do
    maybeEvent <- await
    case maybeEvent of
        Nothing -> return ()
        Just event -> let
                (newMach, signals) = stepMachine mach event
            in do
                yield signals
                wrapMachine newMach

The main function is as short as this:

main :: IO ()
main = do
    hSetBuffering stdin NoBuffering
    hSetBuffering stdout NoBuffering
    runResourceT process

The result is about as exciting as expected

% runhaskell Calculator.hs
56 42 +CalcResult 98
17 56 +CalcResult 73

It never exits on its own so one has to use Ctrl-C to terminate. I thought I’d try to address that in the next post by adding a time-out to the adder machine.

Adder as a state machine

In my previous post I wrote about a, probably rather naïve, approach to constructing state machines in Haskell. I ended it with a saying that the pattern matching of Haskell makes it rather simple to manually write the step function required to create a working state machines. Hopefully this post can convince you I’m right.

The vehicle I’ve chosen is a very simple machine capable of reading two integers and adding them. In slightly more detail it’s a machine that:

  1. reads a total of three parts, two integers followed by a plus sign (+)
  2. each part is separated by whitespace
  3. on errors the machine resets and starts over

The signals (a.k.a. the output alphabet) is the following type

data CalcSignal = CalcNothing | CalcResult Int | CalcError CalcStates String
    deriving (Eq, Show)

The events (a.k.a. the input alphabet) is simply Char. The states are

data CalcStates = Start | ReadFirst Int | ReadSecond Int Int | ReadOperator Int Int
    deriving (Eq, Show)

where the names hopefully are self explanatory. The type of the machine is then

type CalcMachine = Machine CalcStates Char CalcSignal

The machine itself can then be written like this:

calcMachine :: CalcMachine
calcMachine = createMachine Start go
    where
        go Start e
            | isNumber e = (ReadFirst (read [e]), CalcNothing)
            | otherwise = (Start, CalcError Start "No number")

        go s@(ReadFirst i) e
            | isNumber e = (ReadFirst (10 * i + read [e]), CalcNothing)
            | isSpace e = (ReadSecond i 0, CalcNothing)
            | otherwise = (Start, CalcError s "Bad format")

        go s@(ReadSecond l i) e
            | isNumber e = (ReadSecond l (10 * i + read [e]), CalcNothing)
            | isSpace e = (ReadOperator l i, CalcNothing)
            | otherwise = (Start, CalcError s "Bad format")

        go s@(ReadOperator i j) e
            | e == '+' = (Start, CalcResult (i + j))
            | isSpace e = (s, CalcNothing)
            | otherwise = (Start, CalcError s "Bad operator")

That’s rather simple and easy to read I find. Though I’m not sure it scales too well to much larger machines. I’ve not really used any DSLs to create large machines either, so I don’t know how well any method scales ;)

To do a bit of exploratory testing it’s handy to create the following function

calculate :: String -> IO ()
calculate = foldM_ go calcMachine
    where 
        go mach c = do
            let (m, s) = stepMachine mach c
            print s 
            return m

Using that function it’s easy to check if the machine works as intended.

> calculate "56 67 +"
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcResult 123

So far so good. What about the behaviour on errors?

> calculate "5a6 67 +"
CalcNothing
CalcError (ReadFirst 5) "Bad format"
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcResult 73

That looks good enough to me. Though there is (at least) one detail of how the machine works that might be surprising and hence should be fixed, but I’ll leave that as an exercise for the reader ;)

As I mentioned in the previous post I’ve been using this method for writing state machines to implement two different protocols. For the IO I used conduit which means I had to turn the state machine into a conduit. I’ll write about that in a later post though.