Python, web forms and cookies

Just the other day I finally got around to something that I’ve wanted to play around with for a fairly long time—posting web forms using python. As an added bonus I also took a look at dealing with cookies in Python.

For posting forms there is of course a module that makes things a lot easier, mechanize, but I wanted to first of all understand how to do it myself and secondly to avoid using anything but the standard Python modules. It turns out there isn’t much to understand. Say that we have a very simple form, say it’s a login form containing two text entries:

<form method="post" action="/login">
<label for="user_name">username</label>
<input type="text" name="user_name" id="user_name" value="" />
<label for="password">password</label>
<input type="password" name="password" id="password" class="sized" />
<input type="submit" class="button" name="login" value="log in" />
</form>

One way to post this form would be the following:

import urllib
import urllib2

login_data = urllib.urlencode({'user_name' : 'foo', 'password' : 'bar'})
resp = urllib2.urlopen('http://url.for.my.site/login', login_data)

Simple enough, I’d say. urllib2.urlopen automatically switches from GET to POST on the existance of some data.

On most sites a cookie is used to track whether a user is logged in or not. Extending the example above to deal with this and enable subsequent requests to the site as a logged-in user leads us to the CookieJar:

import urllib
import urllib2
import cookielib

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'user_name' : 'foo', 'password' : 'bar'})
resp = opener.open('http://url.for.my.site/login', login_data)

After this cj will hold all the cookies returned in the response. You can enumerate over them like this:

for c in enumerate(cj):
    print c.name, c.value

Making requests with a cookie c is simple as well, just add c to the cookie jar before making the request:

cj.set_cookie(c)

The cookie jar also has a policy object and a method, set_cookie_if_ok that will set a cookie for a specific request only if the policy allows it. I.e. it seems fairly simple to make sure there is no cookie leakage when making requests to multiple sites. I’ll leaving playing with that for another day though.

Share

9 Comments

  1. nice recipe, maybe we can implement a session module using this, which can manage connections from the client.

  2. Pingback: ???? » Blog Archive » python???????????

  3. I had to use c[1] instead of c because I receive a tuple (at least with Pyhon 2.6).

    for c in enumerate(cj): print c[1].name, c[1].value

    Thanks for your article!

  4. Very helpful- but I can’t figure out how to print (or save to txt) the results of the search.

    when i do a print resp command I get this in the shell:

    addinfourl at 21503312 whose fp = >

    could someone please help?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>