Archive for March 2007

More on the weirdness that is del.icio.us API

As I’ve said before del.icio.us is not easy to program against. Here’s a brief recount of my latest explorations into how to program against del.icio.us.

I received a report about some problems with epilicious, to which I responded with a pointer to my post about my past experiences. In response I got a suggestion to pick apart the “official” Firefox plugin for del.icio.us. It turned up some interesting details. AFAICS they are using the API in an undocumented way!

First of all every single request is a POST rather than a GET. The data they send is _user=<cookie>. That cookie is retrieved from Firefoxes cookie jar; it’s set after the user logs in at https://secure.del.icio.us/login. Apparently this allows them to use cookie:cookie as the username and password for Basic Authentication. Also, every request has src=ffbmext1.4.27 as an argument in the URI (1.4.27 is the current version of their plugin). There’s also an undocumented “function” for getting only the hashes of the bookmarks at https://api.del.icio.us/v1/posts/hashes?. The maybe most interesting detail is that they don’t seem to bother with the threat of throttling at all!

Hopeful that this might offer a way around the increasingly erratic behaviour of the API I played around with python. No luck though! A few hours of investigation, some quick hacking and long testing (once you’re throttled it takes a LONG time for del.icio.us to let you back in, sometimes more than 10 minutes!!) later I found myself not being able to hold back on profanities. The API usage description is a joke! Programming against something like that is little better than programming Windows based on MSDN ;-)

It seem the official plugin can get away with disregarding throttling because all interaction with del.icio.us happens with user interaction between each call. It takes a very fast user to trigger the throttling.

I can understand that del.icio.us wasn’t designed to be used a base for synchronising bookmarks, the throttling clearly shows that. However, one second between requests is not enough to avoid throttling and there is no indication on how long one remains in a throttled state.

This exercise has pushed me to put more energy into looking for alternative backends for epilicious. So far ma.gnolia seems good and I’d like to add support for Google bookmarks as well.

Python, web forms and cookies

Just the other day I finally got around to something that I’ve wanted to play around with for a fairly long time—posting web forms using python. As an added bonus I also took a look at dealing with cookies in Python.

For posting forms there is of course a module that makes things a lot easier, mechanize, but I wanted to first of all understand how to do it myself and secondly to avoid using anything but the standard Python modules. It turns out there isn’t much to understand. Say that we have a very simple form, say it’s a login form containing two text entries:

<form method="post" action="/login">
<label for="user_name">username</label>
<input type="text" name="user_name" id="user_name" value="" />
<label for="password">password</label>
<input type="password" name="password" id="password" class="sized" />
<input type="submit" class="button" name="login" value="log in" />
</form>

One way to post this form would be the following:

import urllib
import urllib2

login_data = urllib.urlencode({'user_name' : 'foo', 'password' : 'bar'})
resp = urllib2.urlopen('http://url.for.my.site/login', login_data)

Simple enough, I’d say. urllib2.urlopen automatically switches from GET to POST on the existance of some data.

On most sites a cookie is used to track whether a user is logged in or not. Extending the example above to deal with this and enable subsequent requests to the site as a logged-in user leads us to the CookieJar:

import urllib
import urllib2
import cookielib

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'user_name' : 'foo', 'password' : 'bar'})
resp = opener.open('http://url.for.my.site/login', login_data)

After this cj will hold all the cookies returned in the response. You can enumerate over them like this:

for c in enumerate(cj):
    print c.name, c.value

Making requests with a cookie c is simple as well, just add c to the cookie jar before making the request:

cj.set_cookie(c)

The cookie jar also has a policy object and a method, set_cookie_if_ok that will set a cookie for a specific request only if the policy allows it. I.e. it seems fairly simple to make sure there is no cookie leakage when making requests to multiple sites. I’ll leaving playing with that for another day though.

repeat and sequence

It seems to happen all the time to me. I ask a question on haskell-cafe and a get an answer involving at least one function I’ve never heard of before. This time around it was sequence (or rather sequence_) and repeat.

I took a look at Hoogle of course but found it as sparse as always so I decided to play a little.

First repeat:

> :t repeat
repeat :: a -> [a]
> repeat 4
[4,4,4,4,4,4,4,4,4,4,4,4,4,...

The list repeats until I interrupt it by pressing Ctrl-C. Let's limit the list to 5 items.

> take 5 $ repeat 2
[2,2,2,2,2,]

Simple enough. Can I do “IO stuff” and stuff it in the list?

> take 5 $ repeat getChar
-- error because there's no instance of Show (IO Char)

Ah, that’s another one of those pesky little things when using GHC interactively. I’m guessing that there’s a reason for the response mail using sequence_

> :t sequence_
sequence_ :: (Monad m) => [m a] -> m ()
> :t sequence
sequence_ :: (Monad m) => [m a] -> m [a]

Combining them seems like a good idea

> sequence . (take 5 ) . repeat $ getChar
abcde"abcde"

Brilliant. However, to make sure I really understand what’s happening I wanted to write the functions myself. First my repeat:

foo x = x : foo x

Making infinite lists in Haskell is quite simple :-) Replacing repeat by foo in the code snippets above shows that foo indeed exhibits the same behaviour as repeat.

It proved a little more difficult to get the behaviour of sequence. I usually find it easiest to try for a recursive function first. I find that using higer-order functions is a lot easier if I first have grasped a recursive solution. Here’s my recursive version of sequence:

bar [] = return []
bar (h:t) = do
    head <- h
    res <- bar t
    return $ head : res

The termination case is simple. The recursive step is straight forward as well, simply pull out the values from the IO monad in order (that is important since monad’s do impose ordering) and at the end put together the list and return it in the monad.

Looking at that recursive definition it’s not unreasonable to suspect it’s possible to use foldM to define another variant of sequence. Given foldM‘s type ((Monad m) => (a -> b -> m a) -> a -> [b] -> m a) it’s obvios that in this case a is a list (e.g. [Char]) and b is a monad (e.g. IO Char). This means that the first argument should be of type (Monad m) => [t] -> m t -> m [t] and something like qux here seems to fit the bill:

qux l n = do
    v <- n
    return $ v : l

It’s basically an abridged version of bar from above. It could also be written as a one-line lambda expression like this \ l n -> n >>= (\ v -> return $ v : l). The termination case remains unchanged so a sequence implemented using foldM ought to look like this:

bar2 = foldM (\ l n -> n >>= (\ v -> return $ v : l)) []

Unfortunately this definition has a small bug, the resulting list is reversed. Lifting reverse into the monad takes care of that though:

bar2 l1 = liftM reverse $ foldM (\ l n -> n >>= (\ v -> return $ v : l)) [] l1

It seems both bar and bar2 mimick the behaviour of sequence. Even to the level that they are “un-lazy”:

> (take 5) . sequence . repeat $ getChar
abcdefg... (ad infinitum)

I’ll leave adding laziness to people who are smarter than me.

On the “divorce scenario” in DRM…

It’s always fun to find things mentioned in articles that I’ve heard discussions about, or even in some cases discussed myself, years ago. Here’s a piece on DVB’s CPCM. I heard of the “divorce scenario” during my time at Philips. It was fun to see how really smart people would point out that this scenario was largely unsolved in most DRM solutions and then go on trying to “solve” it. In the meantime they totally missed the real point.

The “divorce scenario” is simply the first of a large set of issues that DRM systems will have to “solve” in order to fit with how our societies work and how people interact socially. I believe that there are several other scenarios that need special attention within a DRM system, I also belive many of those scenarios are tied to culture. This leads me to believe that if one wants a DRM system to gain acceptance in all the locations where DVB has been choosen as a standard then one ends up with a DRM without DRM!

That divorce came up as a special case shows what happens when you let smart, but very technical people create standards that will end up interacting with people’s social life. It’ll most likely be tailored for situations common among white people living in the northern hemisphere.

Paying your users?

Microsoft, if you need to pay people to use your product then it’s probably not a very good one!

You know you are a bit screwed up when…

…you walk into TK Maxx, see a picture of Kelly Osborne and wonder why she would be interested in random numbers.

Epilicious and ma.gnolia.com

As I’ve told before del.icio.us can be a very fickle partner to interact with. Lately I’ve again started seeing obstinate behaviour from del.icio.us and yesterday I finally got around to adding support for ma.gnolia.com to epilicious. They support an API that is “del.icio.us compatible” (I believe there are some things missing, but everything epilicious needs is there) so adding the feature was extremely simple.

So far it’s not quite complete, there is no GUI to change the backend store you’ll have to interact with GConf directly, other parts of the GUI lie by still referring to del.icio.us even though ma.gnolia.com is being used. However, the backend is working and the only problem so far seems to be that ma.gnolia.com returns badly formatted XML at times (still haven’t investigated it thoroughly to say anything definite).

I haven’t pushed the changes into any of my repos yet. The only way to get it is by using the epiphany-extension-epilicious-pre package in my APT repo. Debian Sid users can simply install the package, others will have to use the source.

Epilicious 0.11

I’ve been sitting on a few changes for a while now and now I’ve finally rolled them into a release version. Get it in the normal place.

My programmer personality

I did the Programmer personality test:

Your programmer personality type is:

   PHTB

You’re a Planner.

You may be slow, but you’ll usually find the best solution. If something’s worth doing, it’s worth doing right.

You like coding at a High level.

The world is made up of objects and components, you should create your programs in the same way.

You work best in a Team.

A good group is better than the sum of it’s parts. The only thing better than a genius programmer is a cohesive group of genius programmers.

You are a liBeral programmer.

Programming is a complex task and you should use white space and comments as freely as possible to help simplify the task. We’re not writing on paper anymore so we can take up as much room as we need.

More on Vista’s “integrity control”

I just noticed a post by Joanna Rutkowska on a very handy little tool—chml.

For the record I’d like to point out that this tool further highlights how confused the MIC is in Windows Vista. A no-read-up policy in integrity control? I rest my case.