In my ever-continuing attempts to replace Python by Haskell as my language of first choice I’ve finally managed to dip a toe in the XML/HTML sea. I decided to use the Haskell XML Toolkit (HXT) even though it’s not packaged for Debian (something I might look into doing one day). HXT depends on tagsoup which also isn’t packaged for Debian. Both packages install painlessly thanks to Cabal.
As the title suggests my itch wouldn’t require anything complicated, but when I’ve previously have looked at any Haskell XML library I’ve always shied away. It all just looks so complicated. It turns out it looks worse than it is, and of course the documentation is poor when it comes to simple, concrete examples with adequate explanations. HXT would surely benefit from documentation at a level similar to what’s available for Parsec. I whish I were equipped to write it.
Anyway, this was my problem. I’ve found an interesting audio webcast. The team behind it has published around 90 episodes already and I’d like to listen to all of them. Unfortunately their RSS feed doesn’t include all episodes so I can’t simply use the trusted hpodder to get all episodes. After manually downloading about 20 of them I thought I’d better write some code to make it less labour-intensive. Here’s the complete script:
module Main where
isMp3Link = (==) "3pm." . take 4 . reverse
myReadDoc = readDocument [(a_parse_html, "1"), (a_encoding, isoLatin1),
(a_issue_warnings, "0"), (a_issue_errors, "0")]
myProc src = (runX $ myReadDoc src >>> deep selectMp3Links >>> getAttrValue "href")
>>= mapM_ putStrLn
selectMp3Links = hasName "a" >>> hasAttrValue "href" isMp3Link
main = do
[src] <- getArgs
The thing that took by far the most time was finding out that
hasAttrValue exists. I’m currently downloading episodes using the following command line:
curl -L $(for h in $(runhaskell get_mp3links.hs 'http://url.of.page/'); do \
echo '-O' $h; done)
Yet another set of itches where Haskell has displaced Python as the utensil used for scratching.
I just began listening to episode 95 of pauldotcom and was glad to hear that they enjoyed my email. Here’s the complete email I sent them:
Well, something must have changed since I last communicated with you
(see http://therning.org/magnus/archives/257). I’m not sure what
though. I heard you when you were on the Linux Reality audio cast and
thought I’d check you out again, just to see what you were up to. Well,
episode 92 (both parts) was bloody brilliant, episode 93 was good too,
and now I’m halfway into episode 94. I have no recollection of the
earlier episodes being this organised and good. At some point when I
wasn’t listening you must have learnt to rock!
I enjoy the tech segment. The amount of banter is down and the episodes
move along a lot more than I remember. No offence to Twitchy, but I’m
not sad he isn’t as involved any more, you know, Kramer is brilliant but
Seinfeld just wouldn’t be a good show if he were in each and every
scene. Twitchy has more of a “celebrity guest” personality… The only
criticism I have, and this is pushing it I know; given my walk to work
I’d prefer each episode to be around 60 minutes, rather than 80-90
Keep it up!
PS I’m planning on posting this email on my blog. I’ll put any reply
from you on there as well.
Reading my email on the show sure beats any reply they could have sent by email At some point I have to go back and check out the other podcast I stopped listening to…
I just received my new Samsung portable music player. The main reason for choosing Samsung was that it plays OGG.
After going through the podcasts I listen to I have two that I want to shame. I won’t name the podcasts that don’t do OGG—I can’t really blame them because the demand for it is probably not that high. The podcasts that deserve a bork-out are LQ and FLOSS weekly, both do offer OGG versions, but neither offer a feed for it!
I found the podcast Security Now the other day. (Actually it was before I listened to episode 139 of TLLTS which contains an interview with SecurityMonkey. Well, back to Security Now.) It’s a rather good show which offers good explanations, though rather basic sometimes, on security related topics. Episode 38 was on browser security and the latest, 39, discussed buffer overruns. I found it to be a little too basic at times, but it’s a good starting point for someone who’s interested in security but is finding reading about it difficult. Anyone with a genuine interest should go to Google and do some searching after listening to the podcasts.
Oh, for episode 39 I’d recommend having pen and paper nearby, visualising the stack on a paper will make the explanation so much clearer.
Well, it sure looks like it. After reading Cory Doctorow’s posting about how UN is about to kill podcasts I can’t help but wonder where this world is headed. UN’s homepage greets you with “Welcome to the UN. It’s your world.” It’s increasingly looking like they are turning into yet another US lapdog. It’s a pity, because what the world really needs is a body with power that can stand against the short-termist and often idiotic thinking that the US seems to be filled to the brim with.
I’ve been listening to a few episodes of LugRadio and I’ve found them very enjoying so far. What I haven’t enjoyed so much though is my silly MP3 player. The episodes are recorded in variable bit rate (VBR) and my player is utterly pants with handling that–I can’t rely on the timer, I can’t see the total length of a track and I can’t fast forward. Add to this that the bloody thing doesn’t remember its position within a track and it’s obvious just how much pain it’s bringing me at the moment. In a relatively recent episode someone mentioned gnormalize to turn it into a constant bit rate (CBR) MP3. Just too bad that gnormalize isn’t available as a Debian package.
“Why have an entire program written for something that should be simple to do on the commandline using GStreamer?”
Well, here’s the command line that recodes an MP3 (no matter what bit rate it has) to an MP3 with a CBR of 96:
gst-launch-0.8 filesrc location=lugradio.mp3 ! mad \
! lame bitrate=96 ! filesink location=lugradio_cbr.mp3
Now using my player is slightly less painful