Archive for June 2007
The other day I was talking to a mate and former colleague of mine, he’s been doing a lot of Java and C# before but recently he got hired by a small company to do Python work. Anyway he related a funny part of the interview where he said he’d done design patterns and they asked him to explain one that he’s used. He chose Decorator. After he was done explaining the interviewer commented that surely he meant Proxy. The interviewer was wrong and my mate suspects this might be something that’s common in the Python world due to the built-in support for function/method decorators in the language. I suspect he’s right. Anyway, he showed me what he was playing with and I couldn’t help but play a bit on my own afterwards.
Here’s the class of the core object, a simple self-explanatory piece of code:
class Writer(object): def write(self, s): print s
Here’s a not very exciting example of using it:
> w = Writer() > w.write('hello') hello
We want to decorate it by modifying the string passed to
write in different ways. First here’s a base decorator class:
class WriterDecorator(object): def __init__(self, wrappee): self.wrappee = wrappee def write(self, s): self.wrappee.write(s)
Using it is straight forward, and still not very exciting:
> wd = WriterDecorator(w) > wd.write('hello') hello
The constructor requires a wrappee object and the implementation of
write is straight forward. Strictly speaking this class is unnecessary, but it’s convenient once we implement “real” decorators. Here’s the first one, it converts the string to upper case before passing it on down the chain:
class UpperWriter(WriterDecorator): def write(self, s): self.wrappee.write(s.upper())
This is where it gets a little more exciting, not much though:
> uw = UpperWriter(w) > uw.write('hello') HELLO
Here’s a nice detail about Python that I’ve never reflected over myself—constructors are inherited in Python. Here’s another decorator, one that makes the string “shouty”:
class ShoutWriter(WriterDecorator): def write(self, s): self.wrappee.write('!'.join([t for t in s.split(' ') if t]) + '!')
Now it’s getting a little more interesting, because the decorators can be combined:
> sw1 = ShoutWriter(w) > sw1.write('hello again') hello!again! > sw2 = ShoutWriter(uw) > sw2.write('hello again') HELLO!AGAIN!
Some of these combinations are more useful than others, and if they’re used very often then it might be worth creating a convenience class for them. Here’s one that I imagine could be useful if you’re a writer for The Register:
class YahooWriter(WriterDecorator): def __init__(self, wrappee): self.wrappee = UpperWriter(ShoutWriter(wrappee))
Using it is simple:
> yw = YahooWriter(w) > yw.write('hello again') HELLO!AGAIN!
Well, so far it’s been child’s play and I wouldn’t have bothered writing about this unless I took this a little further. I thought something was familiar about how the convenience class worked. I vaguely remembered reading something about super being harmful and there seemed to be similarities between behaviour described there and the desired behaviour when nesting decorators. Rewriting the basic decorator classes using
super like this retains their behaviour:
class UpperWriter(WriterDecorator): def write(self, s): super(UpperWriter, self).write(s.upper()) class ShoutWriter(WriterDecorator): def write(self, s): super(ShoutWriter, self).write('!'.join([t for t in s.split(' ') if t]) + '!')
What this does though is allow implementing
YahooWriter like this:
class YahooWriter(UpperWriter, ShoutWriter): pass
I think that’s pretty cute.
Here’s where I have to stop though. I don’t know if this is even useful, is it? Maybe it has some serious draw-backs my inexperience and ignorance prevents me from seeing, does it? Has
super been used like this somewhere? I’d love pointers to that code
[Edited 16-06-2007 00:34 BST] Bloody hell, can’t believe I had a spelling error in the title all this time. Embarrassing really!
Just after being added to Planet Haskell I changed the theme of WPi but as always with themes there were things I didn’t really like. I was happy to notice that this time I’d chosen a theme written by someone who knew English which was a relief since the previous theme was commented and even contained id names in Spanish. Still, modifying the theme, especially the style sheet, is a pain. Then I found Firebug. Let’s just say I’m never going to bother looking through the style sheet for a theme again without first having found the exact line number by using Firebug. It’s simply a brilliant add-on for Firefox.
After talking to a mate I dropped
~/.gnupg/gpg-agent.conf and stopped using
ssh-agent altogether. The Debian developers seem to have anticipated this and there’s full support for this in the scripts in
I’ve finally taken the time to look into getting the webcam that I bought from OpenForEveryone working. After a false start with
spca5xx–it doesn’t build on recent kernels–I built a kernel module for
gspca. Firing up Camorama revealed that the cam was indeed working, however colours, contrast and brightness was all screwed up and couldn’t be changed. Later that turned out to be a problem with Camorama rather then with the cam itself; it works perfectly well in Ekiga.
- I had made a manual change to the old theme that really didn’t belong on a Planet. It can only be described as discrimination against IE users.[back]
I’ve read very little about Unicode before but today I had the questionable pleasure of delving a bit deeper into it. Mind you, it still feels like I’ve just dipped a foot in the water, but before today I had only dipped a single toe.
Especially I was interested in the URI encoding (“percentage encoding”) and Unicode. According to RFC 3986:
When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded.
Of course this particular document if fairly new (January 2005) so I bet there are quite a few URI codecs out there that don’t behave this way yet. Another interesting detail is that Microsoft long has supported a special URI encoding especially suited for dealing with UCS-2i which takes the form
%uhhhh. E.g. the character ‘A’ would be %41 according to the standard encoding, using Microsoft’s encoding it looks like %u0041. So far it’s quite straight forward but then enters something strange in Unicode; compatibility characters. They do make certain sense when they are combinations of a base character and some sort of marker (I’m not sure I’m using the right terminology here), e.g. the character ‘å’ can be constructed in two ways, either using the code point U+00E5 or by combining an ‘a’ (U+0061) and the “combining diacritical mark” ‘ ̊’ (U+030A). Of course comparing these two characters which are completely differently encoded while still having exactly the same semantics is a bit of a problem. That’s solved by canonicalisation, which there are two standards for. I didn’t bother going further into that, because my real problem, the reason why I started all of this was that there are compatibility characters for something called “Halfwidth and Fullwidth Forms” (block FF01–FFEF). This block contains some non-latin characters and then it makes sense, but for some strange reason all printable characters in the Basic Latin block (0000–007f) is present as “fullwidth forms” as well. The reason for this is unclear to me and I’d really love an explanation. The result of this is that there apparently is some confusion just what to do with these “fullwidth forms” when decoding them, in some cases they are treated just like their “halfwidth form” cousins in the Basic Latin block. The end result is that on Microsoft products ‘A’ can also be encoded as %uff21.
While reading about Unicode I always have to remind myself that “for every complex problem, there is a solution that is simple, neat, and wrong”. I simply can’t help but think “this is so complicated, there must be an easier solution”…
Re-reading this post I realise there isn’t much of a point to it, besides possibly that writing (or talking) about something always helps my understanding of it. Please let me know if my understanding of Unicode or URI encoding is wrong…
- I suspect this is connected to Microsoft’s love for UCS-2 in other areas of their operating system.[back]
I received a few comments on part 3 of this little mini-series and I just wanted to address them. While doing this I still want the main functions of the parser
parseXxx to read like the
maps file itself. That means I want to avoid “reversing order” like
thenSpace did in part2. I also don’t want to hide things, e.g. I don’t want to introduce a function that turns
(a <* char ' ') <*> b into
a <#> b.
So, first up is to do something about
hexStr2Int <$> many1 hexDigit which appears all over the place. I made it appear in even more places by moving around a few parentheses; the following two functions are the same:
foo = a <$> (b <* c) bar = (a <$> b) <* c
Then I scrapped
hexStr2Int completely and instead introduced
hexStr = Prelude.read . ("0x" ++) <$> many1 hexDigit
This means that
parseAddress can be rewritten to:
parseAddress = Address <$> hexStr <* char '-' <*> hexStr
Rather than, as Conal suggested, introduce an infix operation that addresses the pattern
(a <* char ' ') <*> b I decided to do something about
a <* char c. I feel Conal’s suggestion, while shortening the code more than my solution, goes against my wish to not hide things. This is the definition of
(<##>) l r = l <* char r
After this I rewrote
parseAddress = Address <$> hexStr <##> '-' <*> hexStr
(== c) <$> anyChar appears three times in
parsePerms so it got a name and moved down into the
where clause. I also modified
cA to use pattern matching. I haven’t spent much time considering error handling in the parser, so I didn’t introduce a pattern matching everything else.
parsePerms = Perms <$> pP 'r' <*> pP 'w' <*> pP 'x' <*> (cA <$> anyChar) where pP c = (== c) <$> anyChar cA 'p' = Private cA 's' = Shared
The last change I did was remove a bunch of parentheses. I’m always a little hesitant removing parentheses and relying on precedence rules, I find I’m even more hesitant doing it when programming Haskell. Probably due to Haskell having a lot of infix operators that I’m unused to.
The rest of the parser now looks like this:
parseDevice = Device <$> hexStr <##> ':' <*> hexStr parseRegion = MemRegion <$> parseAddress <##> ' ' <*> parsePerms <##> ' ' <*> hexStr <##> ' ' <*> parseDevice <##> ' ' <*> (Prelude.read <$> many1 digit) <##> ' ' <*> (parsePath <|> string "") where parsePath = (many1 $ char ' ') *> (many1 anyChar)
I think these changes address most of the comments Conal and Twan made on the previous part. Where they don’t I hope I’ve explained why I decided not to take their advice.
I got a great many comments, at least by my standards, on my earlier two posts on parsing in Haskell. Especially on the latest one. Conal posted a comment on the first pointing me towards
liftM and its siblings, without telling me that it would only be the first step towards “applicative style”. So, here I go again…
First off, importing
<|> is defined in both
Applicative and in
Parsec. I do use
Parsec so preventing importing it from
Applicative seemed like a good idea:
import Control.Applicative hiding ( (<|>) )
Second, Cale pointed out that I need to make an instance for
GenParser. He was nice enough to point out how to do that, leaving syntax the only thing I had to struggle with:
instance Applicative (GenParser c st) where pure = return (<*>) = ap
I decided to take baby-steps and I started with
parseAddress. Here’s what it used to look like:
parseAddress = let hexStr2Int = Prelude.read . ("0x" ++) in do start <- liftM hexStr2Int $ thenChar '-' $ many1 hexDigit end <- liftM hexStr2Int $ many1 hexDigit return $ Address start end
On Twan’s suggestion I rewrote it using
where rather than
let ... in and since this was my first function I decided to go via the
ap function (at the same time I broke out
hexStr2Int since it’s used in so many places):
parseAddress = do start <- return hexStr2Int `ap` (thenChar '-' $ many1 hexDigit) end <- return hexStr2Int `ap` (many1 hexDigit) return $ Address start end
Then on to applying some functions from
parseAddress = Address start end where start = hexStr2Int <$> (thenChar '-' $ many1 hexDigit) end = hexStr2Int <$> (many1 hexDigit)
By now the use of
thenChar looks a little silly so I changed that part into
many1 hexDigit <* char '-' instead. Finally I removed the
where part altogether and use
<*> to string it all together:
parseAddress = Address <$> (hexStr2Int <$> many1 hexDigit <* char '-') <*> (hexStr2Int <$> (many1 hexDigit))
From here on I skipped the intermediate steps and went straight for the last form. Here’s what I ended up with:
parsePerms = Perms <$> ( (== 'r') <$> anyChar) <*> ( (== 'w') <$> anyChar) <*> ( (== 'x') <$> anyChar) <*> (cA <$> anyChar) where cA a = case a of 'p' -> Private 's' -> Shared parseDevice = Device <$> (hexStr2Int <$> many1 hexDigit <* char ':') <*> (hexStr2Int <$> (many1 hexDigit)) parseRegion = MemRegion <$> (parseAddress <* char ' ') <*> (parsePerms <* char ' ') <*> (hexStr2Int <$> (many1 hexDigit <* char ' ')) <*> (parseDevice <* char ' ') <*> (Prelude.read <$> (many1 digit <* char ' ')) <*> (parsePath <|> string "") where parsePath = (many1 $ char ' ') *> (many1 anyChar)
I have to say I’m fairly pleased with this version of the parser. It reads about as easy as the first version and there’s none of the “reversing” that
I just listened to episode 10 of the Get Illuminated audio cast where Steven E. Landsburg is interviewed (I found the link on Boing Boing). It sounds like a very interesting book; I really love that sort of provocative writing, the sort that challenges your common sense.
The only argument I can raise against the author’s reasoning, and bear in mind that I haven’t actually read the book yet this is all based on the interview, is that it’s “how not to be part of the problem”, but it’s not “how to solve the problem”. I suppose it really highlights the difference between “do no evil” and “do good”.