Archive for March 2008

Kid’s play with HTML in Haskell

In my ever-continuing attempts to replace Python by Haskell as my language of first choice I’ve finally managed to dip a toe in the XML/HTML sea. I decided to use the Haskell XML Toolkit (HXT) even though it’s not packaged for Debian (something I might look into doing one day). HXT depends on tagsoup which also isn’t packaged for Debian. Both packages install painlessly thanks to Cabal.

As the title suggests my itch wouldn’t require anything complicated, but when I’ve previously have looked at any Haskell XML library I’ve always shied away. It all just looks so complicated. It turns out it looks worse than it is, and of course the documentation is poor when it comes to simple, concrete examples with adequate explanations. HXT would surely benefit from documentation at a level similar to what’s available for Parsec. I whish I were equipped to write it.

Anyway, this was my problem. I’ve found an interesting audio webcast. The team behind it has published around 90 episodes already and I’d like to listen to all of them. Unfortunately their RSS feed doesn’t include all episodes so I can’t simply use the trusted hpodder to get all episodes. After manually downloading about 20 of them I thought I’d better write some code to make it less labour-intensive. Here’s the complete script:

module Main where

import System.Environment
import Text.XML.HXT.Arrow

isMp3Link = (==) "3pm." . take 4 . reverse

myReadDoc = readDocument [(a_parse_html, "1"), (a_encoding, isoLatin1),
    (a_issue_warnings, "0"), (a_issue_errors, "0")]

myProc src = (runX $ myReadDoc src >>> deep selectMp3Links >>> getAttrValue “href”)
    >>= mapM_ putStrLn

selectMp3Links = hasName “a” >>> hasAttrValue “href” isMp3Link

main = do
    [src] <- getArgs
    myProc src

The thing that took by far the most time was finding out that hasAttrValue exists. I’m currently downloading episodes using the following command line:

curl -L $(for h in $(runhaskell get_mp3links.hs 'http://url.of.page/'); do \
    echo '-O' $h; done)

Yet another set of itches where Haskell has displaced Python as the utensil used for scratching. :-)

Saddle, two SDDL related tools

I’ve just uploaded two small tools that makes it a little easier to deal with SDDL (Security Descriptor Description Language, this is a good resource for SDDL):

  1. saddle-ex - “extract” the security descriptor for a number of different kinds of objects
  2. saddle-pp - a pretty printer for an SDDL string

Full build instructions are included. Download the archive here.

The tools are written (mostly) in Haskell and I’ll probably upload it all to hackage at some point in the future. I’m holding out for a fix to ticket 2097 since a fix for it will allow me to add a third tool which I feel belong in the package.

Suggestions for improvements are always welcome, patches even more so :-)

Omnicodec released

Four days ago I uploaded omnicodec to Hackage. It’s a collection of two simple command-line utilities for encoding and decoding data built on the dataenc package. I’m planning on building a Debian package for it as soon as I find the time.