Posts tagged ‘haskell’

Series of posts on testing Haskell code

I’ve been using both QuickCheck and HUnit to make sure that the functions in dataenc both have the expected properties and follow the specifications.

This morning I stumbled on a series of posts by Matthew Podwysocki on Functional Programming Unit Testing (part 1, part 2, part 3, part 4, part). I think I really could have used these posts about two years ago when I first ventured into the world of testing Haskell. The only new thing I learnt was the existance of the package test-framwork in parth three. Anyway, they are a good introduction to testing Haskell code (and F#).

Useful thing when adopting test-framework after already using HUnit

I found test-framework today. In adopting it with my already existing unit tests I found this function to be rather useful:

import Test.HUnit
import qualified Test.Framework.Providers.API as TFAPI
 
unitTest2TFTest :: Test -> [TFAPI.Test]
unitTest2TFTest t = let
        unitTest2TFTest' desc (TestCase a) = [testCase desc a]
        unitTest2TFTest' desc (TestLabel s t) = unitTest2TFTest' (desc ++ ":" ++ s) t
        unitTest2TFTest' desc (TestList ts) = concat $ map (unitTest2TFTest' desc) ts
    in unitTest2TFTest' "" t

dataenc 0.12 posted to HackageDB

Yesterday I implemented yEncoding in dataenc and uploaded version 0.12 to HackageDB.

Haskell and Time

I’ve found myself getting confused over the Data.Time modules a few times now. Today I wanted to do something that I ought to be simple both to do (it was) and find out how to do (it wasn’t, at least not for me). I thought I’d write it down, because I’m bound to forget about it otherwise, and it might be helpful for some other poor soul out there.

I wanted to store a date in a database (SQLite) using Haskell (HDBC). Since I’m only interested in the date and not the time I chose to represent it in Haskell as a Date.Time.Calendar.Day. For Day 0 means 1858-11-17. HDBC on the other hand needs the date in seconds since epoch (1970-01-01 00:00 UTC), which conveniently can be represented with a single Integer. Time.Data.Clock.POSIX contains a few handy functions to convert between POSIXTime (NominalDiffTime) and UTCTime. However, there are no special functions for converting between an Integer and NominalDiffTime. The way I found was to do it via the class types that NominalDiffTime implements.

First I thought that Enum might offer the solution, but my experiments with that was a bit confusing:

> pt <- getPOSIXTime
1221603766.526661s
> fromEnum pt
4118657661830593344
> toEnum 4118657661830593344 :: POSIXTime 
4118657.661830593344s

Not really the result I had hoped for. No, instead the function to use to get the Integer is floor, and the only way I found to convert a NominalDiffTime from an Integer is to go via a Ratio:

> fromRational (1221603766 % 1) :: POSIXTime
1221603766s

Have I missed something or is this the best way to achieve this?

Confusion with HaskellDB

Dear LazyWeb,

I’ve recently made some comparisons between HaskellDB and HDBC. My gut feeling is that I’d like to use HaskellDB due to its type safety and that it allows me to abstract away from SQL—something in me twitches every time I see strings containing SQL mixed in with other code, no matter what language it is. However there are some things that confuse me with HaskellDB. After overcoming the initial puzzlement that I seem to hit whenever I look at a Haskell API I stumbled on the apparent lack of support for primary keys. This surprised me and to check if I overlooked something I wrote the following code:

module Main where
 
import System.Environment
import System.IO
import Database.HaskellDB.HSQL.SQLite3
import Database.HaskellDB.DBSpec
 
testDBOpts = DBOptions { useBString=False }
 
sqliteOptions= SQLiteOptions { filepath="replace me", mode=ReadMode }
 
main :: IO ()
main = do
    dbfn <- fmap (!! 0) getArgs
    let options = sqliteOptions { filepath=dbfn }
    sqliteConnect options (dbToDBSpec False "extracted.db") >>= print

This is the surprising output when I point this to two different databases. The first database has no primary key:

% sqlite3 test1.db .dump
BEGIN TRANSACTION;
CREATE TABLE person (name text  not null,
                     age int);
COMMIT;

and the DBInfo looks the way I expected:

% ./extract test1.db 
DBInfo {dbname = "extracted.db", opts = DBOptions {useBString = False},
tbls = [TInfo {tname = "person", cols = [CInfo {cname = "name",
descr = (StringT,False)},CInfo {cname = "age", descr = (StringT,True)}]}]}

The second database does have a primary key:

% sqlite3 test2.db .dump
BEGIN TRANSACTION;
CREATE TABLE person (name text primary key not null, age int);
COMMIT;

but not only does the DBInfo lack any mention of it, it also contains the table info twice:

% ./extract test2.db 
DBInfo {dbname = "extracted.db", opts = DBOptions {useBString = False},
tbls = [TInfo {tname = "person", cols = [CInfo {cname = "name",
descr = (StringT,False)},CInfo {cname = "age", descr = (StringT,True)}]},
TInfo {tname = "person", cols = [CInfo {cname = "name",
descr = (StringT,False)},CInfo {cname = "age", descr = (StringT,True)}]}]}

Am I doing something fundamentally wrong or does HaskellDB not support the notion of a primary key?

Is it a bug in HaskellDB that the second extracted DBInfo contains two instances of TInfo when the database clearly only has one table?

TagSoup, meet Parsec!

Recently I began writing a tool to scrape some information off a web site for some off-line processing. After writing up the basics using TagSoup I showed what I had to a colleague. His first comment was “Can’t you use Parsec for that?” It took me a second to realise that he didn’t mean that I should write my own XML parser but rather that Parsec allows writing parsers of a list of anything. So I thought I’d see just what it’d take to create a parser for [Tag].

A look at the string parser shipped with Parsec offered a lot of inspiration.

First the basic type, TagParser:

type TagParser = GenParser Tag

The basic function of Parsec is tokenPrim, basically that’s what other basic parsers use. Taking a cue from the string parser implementation I defined a function called satisfy:

satisfy f = tokenPrim
        show
        (\ pos t _ -> updatePosTag pos t)
        (\ t -> if (f t) then Just t else Nothing)

The positioning in a list of tags simply an increase of column, irrespective of what tag is processed:

updatePosTag s _ = incSourceColumn s 1

Now I have enough to create the first Tag parser—one that accepts a single instance of the specified kind:

tag t = satisfy (~== t) <?> show t

It’s important to stick the supplied tag on the right of (~==). See its documentation for why that is. The second parser is one that accepts any kind of tag:

anyTag = satisfy (const True)

So far so good. The next parser to implement is one that accepts any kind of tag out of a list of tags. Here I want to make use of the convenient behaviour of (~==) so I’ll need to implement a custom version of elem:

l `elemTag` r = or $ l `elemT` r
    where
        l `elemT` [] = [False]
        l `elemT` (r:rs) = (l ~== r) : l `elemT` rs

With that in place it’s easy to implement oneOf and noneOf:

oneOf ts = satisfy (`elemTag` ts)
noneOf ts = satisfy (\ t -> not (t `elemTag` ts))

So, as an example of what this can be used for here is a re-implementation of TagSoup’s partitions:

partitions t = liftM2 (:)
        (many $ noneOf [t])
        (many $ liftM2 (:) (tag t) (many $ noneOf [t]))

Of course the big question is whether I’ll rewrite my original code using Parsec. Hmm, probably not in this case, but the next time I need to do some web page scraping it offers yet another option for doing it.

More prefixes

As Edward pointed out the code in the previous post expressions such as a .+. b .+. c are problematic. By expanding the expression and introducing a few parentheses it becomes apparent what the problem is:

fromIB $ (toIB (fromIB $ (toIB a) `addIB` (toIB b))) `addIB` (toIB c)

The problem is that inner fromIB. Unfortunately GHC doesn’t realise that the left-most sum could take on any type that is a CIB, the exact type doesn’t really matter since the result is passed to toIB anyway. It would be sort of cool if I could tell the compiler to prefer a specific CIB, basically a directive along the lines of “if in doubt, use Bytes”. I don’t think that’s possible in GHC. As it stands one would have to specify the type of the inner sum:

(a .+. b :: Bytes) .+. c :: KBytes

One possible solution would be to remove the call to fromIB from the definition of .+. and instead require the user to call it explicitly:

fromIB $ a .+. b .+. c :: KBytes

I suppose it’s all right, but not quite as elegant as I had hoped.

Now on to the more interesting part of Edward’s comment. I needed quite a bit of clarification, but I now know what he is proposing, I even think I understand it ;-)

The general idea is to have the compiler choose the largest prefix that can represent the sum of two values without losing precision. The represention I used didn’t have the problem of ever losing precision, so I’ll change the representation to better show what Edward meant.

First off I need some extensions to GHC:

{-# OPTIONS_GHC -XGeneralizedNewtypeDeriving
        -XMultiParamTypeClasses
        -XFunctionalDependencies #-}

Now the prefixes are represented with a single integer (or rather a single instance of Num). This is easily done thanks to GeneralizedNewtypeDeriving.

newtype Bytes a = Bytes a
    deriving (Eq, Show, Num)
 
newtype KBytes a = KBytes a
    deriving (Eq, Show, Num)
 
newtype KiBytes a = KiBytes a
    deriving (Eq, Show, Num)

Now I need a typeclass to bind together two types in a ‘lesser-than-or-equal’ relationship and provide a conversion function:

class LEq s u where
    lower :: Num a => s a -> u a

Now the relation has to be implemented for the prefixes. In short the following says that Bytes is the less than both KBytes and KiBytes and that each prefix is less than or equal to itself:

instance LEq Bytes Bytes where
    lower = id
 
instance LEq KBytes KBytes where
    lower = id
 
instance LEq KiBytes KiBytes where
    lower = id
 
instance LEq KBytes Bytes where
    lower (KBytes k) = Bytes $ k * 10^3
 
instance LEq KiBytes Bytes where
    lower (KiBytes ki) = Bytes $ ki * 2 ^ 10

Now there’s a second relationship that is designed to relate two prefixes to a third one. Basically the third is the largest prefix that can be used to represent the sum of the other two without a loss of precision. One could say that the third is where the other two meet.

class (LEq s u, LEq t u) => Meet s t u | s t -> u

I have to manually define where the different prefixes meet:

instance Meet Bytes Bytes Bytes
instance Meet KBytes Bytes Bytes
instance Meet Bytes KBytes Bytes
instance Meet KiBytes Bytes Bytes
instance Meet Bytes KiBytes Bytes
instance Meet KBytes KBytes KBytes
instance Meet KBytes KiBytes Bytes
instance Meet KiBytes KBytes Bytes
instance Meet KiBytes KiBytes KiBytes

Now I can define addition and subtraction in terms of Meet instances:

(.+.) :: (Meet s t u, Num a, Num (u a)) => s a -> t a -> u a
(.+.) a b = (lower a) + (lower b)
 
(.-.) :: (Meet s t u, Num a, Num (u a)) => s a -> t a -> u a
(.-.) a b = (lower a) - (lower b)

Finally I’ve arrived at the destination, but as so often I have to admit that the journey was at least half the fun.

*Prefixes> let i = KiBytes 4
*Prefixes> let j = KBytes 3
*Prefixes> let k = Bytes 900
*Prefixes> i .+. j
Bytes 7096
*Prefixes> i .+. i
KiBytes 8
*Prefixes> i .+. j .+. k
Bytes 7996

Playing with prefixes

I just realised that it’s been over a month since I posted something. All I can do is blame life ;-) With a daughter of 8 months and a recent switch of positions I just haven’t found time to play around really. Hopefully that’ll change now that the situation at work is “calming down” a bit.

Anyway, here’s a bit of code I played with today. A friend described a problem inside a tool related to reporting of free and used memory. Apparently, depending on the location in the code, memory could be stored in integers in bytes or in kilobytes (kB) or in kibibytes (Ki) or in mebibytes (Mi) etc. In every place the prefix would be ‘recorded’ in the variable name, resulting in a situation where a change of prefix used in one place would force a change of variable names in callers/callees. A bit of a mess. I thought I’d try to explore what Haskell’s type system could produce ;-)

First approach

My first thought was to create a data type for the prefixes:

data Size = Bytes Int
    | KB Int Int
    | KiBi Int Int
    deriving (Show, Eq)

Both for KB and KiBi I record the number of full KBs (Kis) and the remaining bytes. Then I thought that it’d be easiest to do all the calculations in bytes, which required a conversion function:

toBytes (KB i r) = Bytes (10^3 * i + r)
toBytes (KiBi i r) = Bytes $ 2^10 * i + r
toBytes b = b

After that it’s possible to make Size an instance of Num (incomplete, but the rest of the functions in Num are straight forward):

instance Num Size where
    (Bytes b1) + (Bytes b2) = Bytes $ b1 + b2
    s1  + s2 = (toBytes s1) + (toBytes s2)
    (Bytes b1) - (Bytes b2) = Bytes $ b1 - b2
    s1  - s2 = (toBytes s1) - (toBytes s2)

Unfortunately the result will always be in Bytes so a few other conversion functions are necessary:

toKB (Bytes i) = uncurry KB $ divMod i (10^3)
toKB s = toKB $ toBytes s
 
toKiBi (Bytes i) = uncurry KiBi $ divMod i (2^10)
toKiBi s = toKiBi $ toBytes s

Now it’s easy to create a type for bookeeping of memory and making it an instance of Num as well:

data MemUsage = Free Size | Used Size
    deriving (Eq, Show)
 
instance Num MemUsage where
    (Free i) + (Free j) = Free $ i + j
    (Used i) + (Used j) = Used $ i + j
    (Free i) + (Used j) = Free $ i - j
    (Used i) + (Free j) = Used $ i - j
 
    (Free i) - (Free j) = Free $ i - j
    (Used i) - (Used j) = Used $ i - j
    (Free i) - (Used j) = Free $ i + j
    (Used i) - (Free j) = Used $ i + j

On some level this ‘algebra’ for memory usage makes sense, though refinements are certainly possible e.g. . I suspect there are some interesting choices to be made regarding the remaining functions in Num, e.g. should negate turn free memory into used memory?

I find this solution somewhat lacking in the extensibility department—as the commonly available amount of memory grows (the code above is stuck in the 80′s it seems) changes have to be made to Size. The same goes for changes to support non-standard prefixes, probably not a common problem but It would be nice to have a solution that is “independently extensible”.

Second approach

Here my idea was to create one type per prefix, use a separate type for the calculations and use a type class for the conversions. Quite straightforward when put like that, though it took me a while to come up with it ;-)

Here is the internal representation used during calculations:

data IB = IB Int
    deriving (Eq, Show)

Simple enough. Then the class for conversions to and from IB:

class CIB a where
    toIB :: a -> IB
    fromIB :: IB -> a

No surprises there either. Now I want to add and subtract, but due to the definition of Num (it requires all arguments to be of the same type, e.g. the type of addition is (+) :: a -> a -> a) I’ll need two new functions for that (at least I don’t know a way around this):

(.+.) :: (CIB a, CIB b, CIB c) => a -> b -> c
i .+. j = fromIB $ (toIB i) `addIB` (toIB j)
    where
        addIB (IB i1) (IB i2) = IB (i1 + i2)
 
(.-.) :: (CIB a, CIB b, CIB c) => a -> b -> c
i .-. j = fromIB $ (toIB i) `subIB` (toIB j)
    where
        subIB (IB i1) (IB i2) = IB (i1 - i2)

One interesting observation is that due to the type of .+. and .-. it is possible, and often required, to specify the type of the result, somewhat similar to HSH. Now the stage is set for the prefix types to be defined, starting with plain bytes:

data Bytes = Bytes Int
    deriving (Eq, Show)
 
instance CIB Bytes where
    toIB (Bytes i) = IB i
    fromIB (IB i) = Bytes i

Similarly for kilobytes and kibibytes:

data KBytes = KBytes Int Int
    deriving (Eq, Show)
 
instance CIB KBytes where
    toIB (KBytes i j) = IB $ i * 10^3 + j
    fromIB (IB i) = uncurry KBytes $ divMod i (10^3)
 
data KiBytes = KiBytes Int Int
    deriving (Eq, Show)
 
instance CIB KiBytes where
    toIB (KiBytes i j) = IB $ i * 2^10 + j
    fromIB (IB i) = uncurry KiBytes $ divMod i (2^10)

The data type for memory usage wrap the memory usage:

data MemUsage a = Free a | Used a
    deriving (Eq, Show)

Again I can’t make MemUsage a an instance of Num, instead there are two new operators:

(~+~) :: (CIB a, CIB b, CIB c) => MemUsage a -> MemUsage b -> MemUsage c
(Free i) ~+~ (Free j) = Free $ i .+. j
(Used i) ~+~ (Used j) = Used $ i .+. j
(Free i) ~+~ (Used j) = Free $ i .-. j
(Used i) ~+~ (Free j) = Used $ i .-. j
 
(~-~) :: (CIB a, CIB b, CIB c) => MemUsage a -> MemUsage b -> MemUsage c
(Free i) ~-~ (Free j) = Free $ i .-. j
(Used i) ~-~ (Used j) = Used $ i .-. j
(Free i) ~-~ (Used j) = Free $ i .+. j
(Used i) ~-~ (Free j) = Used $ i .+. j

End thoughts

There is one thing I like about the first approach and one thing I don’t like. That both the types are instances of Num is something I like since I am interested in adding and subtracting memory, though arguably there are functions in Num that don’t make much sense for memory (e.g. multiplication and negation). The thing I don’t like is the need for explicit conversion functions (toKB and friends).

There are also one thing that I like about the second approach and one I don’t like. I would have liked to use the standard functions in Num, introducing new (type-specific) operators is something that I’m never particularly keen on. Though I suppose that implementing separate operators does emphasize that memory isn’t quite like regular integers. I really do like using Haskell’s own syntax to do the conversion, it’s more readable than having to call conversion functions.

That’s it! Comments and suggestions for improvements are always welcome of course.

Google Treasure Hunt primes question

In the comments to my post on the Google treasure hunt Andre asked me to put up my solution for the fourth problem. The problems were personalised and here is mine:

Find the smallest number that can be expressed as
the sum of 3 consecutive prime numbers,
the sum of 29 consecutive prime numbers,
the sum of 373 consecutive prime numbers,
the sum of 665 consecutive prime numbers,
and is itself a prime number.

The idea behind my solution is that if I have four lists of the sums, and a list of primes, then the solution to the problem is the smallest member of the intersection of all five lists.

I saved the problem of finding primes to the last and instead concentrated on summing up lists of integers in a convenient way. I ended up with the following solution to that:

sumN :: Int -> [Integer]
sumN n = map fst sumi
    where
        sumi = tail $ iterate (sumn) (0, primes)
        sumn (r, xs) = (sum $ take n xs, tail xs)

The real work happens in the inner functions. sumn picks out as many numbers as I want and sums them up, the reason for taking and returning a tuple is that sumi can use iterate to calculate the sequence of sums. The first item in the result of iterate a is a, so sumi drops the first one. This lets me create the needed lists of sums:

sum3 = sumN 3
sum29 = sumN 29
sum373 = sumN 373
sum665 = sumN 665

After having a quick look at the implementation of Data.List.intersect I realised it probably would be a bit too slow. Instead I wrote one that exploits that all the involved lists are ordered:

intersection a@(x:xs) b@(y:ys)
    | x == y = x : intersection xs ys
    | x < y = intersection (dropWhile (< y) xs) b
    | x > y = intersection a (dropWhile (x >) ys)

After convincing myself that this code did what I wanted (basically through playing in GHCi using an incorrect definition of primes, primes = [1..]) I turned to the problem of finding the primes. Of course someone else had already written the Haskell code to find primes. That code fitted perfectly with mine since it produces an infinate list of Integers. Brilliant! :-)

Finally here’s the main that allows me to compile the code and get the answer quickly (it’s slightly modified for the medium, basically I wrote it all on one, long line):

main :: IO ()
main = let
        i1 = (primes `intersection` sum665)
        i2 = (sum3 `intersection` sum373)
        s =  i1 `intersection` i2 `intersection` sum29
    in
        print $ head s

Compiled, it runs in about 5.5 seconds on my desktop machine.

Google Treasure Hunt, check!

I personally found the first one to be the trickiest, quite a bit of thinking with a colleague before we saw Pascal’s triangle in the problem. I was a bit worried that things would get trickier from there on. It surely didn’t.

The second, summing certain lines taken from files matching certain glob patterns, was easily solved with a short Haskell program.

The third, a routing problem was probably the easiest and I think most easily solved on paper. I did make a mistake though, which I caught before typing my answer into the webform.

The fourth one, adding sequences of primes, was fairly straight forward. I think it would have presented a bit more trouble if I didn’t make use of Haskell’s laziness and support for infinate lists though.