Haskell and C—structs

When creating Haskell bindings for a C library one will sooner or later have to deal with ‘structs’. Let’s look at how it can be done using hsc2hs.

Here’s a somewhat silly struct that’ll do for this example:

typedef struct {
    int a;
    int b;
} Foo;

It goes the file foo.h. I need some way of making sure that the data was passed properly from Haskell to C so here’s a declaration for a function to print an instance of Foo:

void print_foo(Foo *);

The actual implementation of print_foo goes into foo.c:

void
print_foo(Foo *f)
{
    printf("%s\n", __FUNCTION__);
    printf("f->a: %i\n", f->a);
    printf("f->b: %i\n", f->b);
}

No surprises so far. Now onto the Haskell side of things. First some basic setup:

{-# OPTIONS -ffi #-}
module Main
    where

import Foreign
import Foreign.C.Types

That makes sure I don’t forget to tell GHC that I’m using the foreign function interface (FFI) and imports everything I need for the rest. hsc2hs needs to know about the struct so I simply include the header file. I also need the Haskell representation of Foo, called Bar here, and for convenience I add a type for a pointer to Bar:

#include "foo.h"

data Bar = Bar { a :: Int, b :: Int }
type BarPtr = Ptr (Bar)

Now I’m ready to add the declaration of the “foreign” function:

foreign import ccall "static foo.h print_foo"
    f_print_foo :: BarPtr -> IO ()

Looking through the standard modules it isn’t completely obvious how to create a BarPtr for passing to f_print_foo. Whit the help of people on #haskell I found with, which has the type Storable a => a -> (Ptr a -> IO b) -> IO b. That means I have to make Bar a Storable. According to the documentation on Storable and some experimentation I found that for this particular example I only need full implementations of sizeOf, alignment and poke:

instance Storable Bar where
    sizeOf _ = (#size Foo)
    alignment _ = alignment (undefined :: CInt)
    peek _ = error "peek is not implemented"
    poke ptr (Bar a' b') = do
        (#poke Foo, a) ptr a'
        (#poke Foo, b) ptr b'

([Edited 09-08--2007 07:43 BST] See DeeJay’s comment below for an explanation of the definition of alignment.)

Using with every time I have to call f_print_foo will get tiring so here’s a function that’s a bit more convenient to use:

printFoo b = with b f_print_foo

Now I can write a small main function to test it all:

main = printFoo $ Bar { a=17, b=47 }

It works beautifully. However it’s very limited since it only covers the cases when a C function takes a pointer to a struct as a pure in argument. What about inout? (It’ll be easy to see how to deal with out arguments once the inout case is covered.) So, here’s a C function that adds 1 to one of the members in the struct:

void
add_a(Foo *f)
{
    printf("%s\n", __FUNCTION__);
    f->a++;
}

Of course a declaration in the header file is needed as well, but that’s pretty obvious so I’ll skip it here. The declaration of the foreign function is as simple as for f_print_foo:

foreign import ccall "static foo.h add_a"
    f_add_a :: BarPtr -> IO ()

Now comes the interesting part. Writing a convenience function for f_add_a isn’t as straight forward as for f_print_foo. I think something with the type Bar -> IO Bar would be useful. with creates a temporary BarPtr for the duration of the call which means I have to have an inner function that takes the Bar part out of the BarPtr and returns it inside IO. Luckily peek does exactly that. Adding an implementation of it means that Bar as a Storable is implemented like this:

instance Storable Bar where
    sizeOf _ = (#size Foo)
    alignment _ = 1 -- ???
    peek ptr = do
        a' <- (#peek Foo, a) ptr
        b' <- (#peek Foo, b) ptr
        return Bar { a=a', b=b' }
    poke ptr (Bar a' b') = do
        (#poke Foo, a) ptr a'
        (#poke Foo, b) ptr b'

And the convenience function for f_add_a can be written like this:

addA b = with b $ \ p -> f_add_a p >> peek p

Then I can modify the main function to test this as well:

main = do
    b <- return $ Bar { a=17, b=47 }
    printFoo b
    d <- addA b
    printFoo b
    printFoo d

Indeed, produces the expected out put:

print_foo
f->a: 17
f->b: 47
add_a
print_foo
f->a: 17
f->b: 47
print_foo
f->a: 18
f->b: 47

Pure out arguments can be handled using alloca with a function similar to the one I pass to with in addA above.

Share

9 Comments

  1. Thanks for the informative post. When messing around with the foreign function interface some time ago I certainly could have used this information. I spent quite some time reading up on structure packing and alignment. There seems to be a fairly standard way of packing structures, however I’m sure there are plenty of exceptions and oddities. The hsc2hs special constructs take the pain out of poking and peeking data to and from structs. Otherwise you would have to determine the offsets manually (taking padding into account) and code that explicitly in Haskell. The hsc2hs special constructs just use some C in the process of the Haskell code generation to compute the offsets.

    So your implementation of sizeOf, peek and poke are fine, however defining alignment to be 1 is not! An incorrect alignment will work for x86 however a program will suffer a significant performance penalty and in other architectures it will raise an exception in the processor!

    Wikipedia has an entry on struct alignment which references a short definition and example of alignment on MSDN, an article on IBM developerworks discussing the performance issues with alignment as well as a further article about alignment. In addition to its article, MSDN has a collection of alignment examples. As an aside here is an article discussing the implications of structure padding.

    To summarize: Your particular struct should have an alignment of 4 not 1. An int (32bit) has a size of 4 bytes and an alignment of 4 (You can verify this with ghci). Therefore an int must be placed at a memory location which is divisible by 4. When an int is within a structure, the structure is padded to ensure that the int starts at a 4 byte boundary relative to the start of the structure. However if the structure were to be misaligned in memory the int would not aligned in the global sense. Therefore the structure should also have an alignment of 4. To correct your specific example:

    alignment _ = alignment (undefined::CInt)

    In general the struct should have the same alignment as the largest data type within the struct. So for example a double is 8 byte aligned on Windows and 4 byte aligned on Linux and OS X, so a struct with a double or a 64 bit integer is likely to be the largest data field and thus the struct will have a matching alignment. While all of the links above discuss structure padding rather than structure alignment explicitly, I believe my rule is correct and that the same logic holds for C unions.

    Finally, GCC has an extension which allows one to programmatically query the alignment for data types. You can use this to experiment with more complex structs such as the examples that MSDN provide and verify my claims.

  2. DeeJay,

    Thanks for that excellent comment. My original code still has some question marks after the alignment since I wasn’t sure of it. I’ll modify the post to correct for my new-found knowledge.

    One thing I’m still wondering though, is it always that easy, just pick the size of the largest (built-in) data type within the struct, or will the CPU’s address size or word size ever play a role? I guess what I’m really wondering is if I have to take special care to write code to run on both 32-bit and 64-bit platforms?

  3. I am very confident that as long as you define the alignment of a struct in terms of the alignment of basic types within the struct, rather than a constant/literal then you should be fine. E.g. an Int maybe 4 byte aligned on a 32bit architecture and 8 byte aligned on a 64bit architecture (I don’t know) but as long as you define the alignment in terms of something else you should be correct. Having an alignment that is too small is potentially disastrous for some architectures, however having an alignment that is too big will not cause any problems other than wasting space (fragmentation) and less efficient use of the cache. As alignment values are powers of two, any larger alignment will be an integer multiple of any smaller alignment.

    Anyway after further exploration down the rabbit hole that is GHC’s source code I am pretty sure that the alignment field in the type class Storable is not actually used in the implementation of allocating memory. It seems to be purposed as information that can be queried by the programmer and little more. So having an alignment of 1 does no more than mislead a programmer… not the low level allocation routines.

    For what it is worth, here is an interesting approach to computing alignment using C++ templates.

    Alignment becomes of particular concern when dealing with SSE on x86 and AltiVec on PowerPC. As these instructions work on 128bit/16byte data chunks, they must be 16 byte aligned. I’m assuming that GHC doesn’t support these instructions, however if you wanted to add it, you would have to make sure that the allocation routines respected these constraints… and thus the flip-side of my above observation is that defining alignment to be 16 wouldn’t cut it.

  4. I use this trick for computing the alignment:

    #include <stddef.h>
    #let alignment t = "%lu", (unsigned long)offsetof(struct {char x__; t (y__); }, y__)
    
    instance Storable Foo where
        sizeOf    _ = #size struct foo
        alignment _ = #alignment struct foo
        {- ... -}
    
  5. Wow, a year later and I still found this post rather useful. I had been trying to make sense of how I should be using the (#peek) & (#poke) directives in hsc2hs, and this was very helpful.

  6. Pingback: gamr7.com » Accessing Recursive Haskell Data Structures from C/Python

  7. It’s now been two years since this article was posted, but I’ve bookmarked it, because it’s going to be extremely helpful in an upcoming program I’ll be writing. Thanks a lot!

  8. @Vincent, I’m happy you find the post useful. Please let me know if there’s any extra info that’s worth adding, either as a comment here or in a new post.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>