Date: 2006-11-18 18:26:00
content negotation breakdown

IE7's content negotiation appears slightly broken. While checking out my web site in IE7, I clicked on the Card Games link on my home page (under Widgets), and was presented with only the widget screenshot JPG. Ok, I'll rewind a bit for some background.

The /widgets/ directory contains (among other things) card-games.html and card-games.jpg. The HTML file refers to the JPG file to show a screenshot of the widget. The home page links to /widgets/card-games (no extension), which Apache (using MultiViews) usually resolves to card-games.html. Actually, Apache's content negotiation relies on what the browser declares as its preferred formats. It's a conceptually simple idea but not at all straightforward in practice.

Firefox uses an HTTP Accept: header with a higher q value for text/html than for image/jpeg. IE7 asks for all kinds of formats without specifying any q values at all. Whatever Apache is doing, it seems to be selecting the card-games.jpg file when IE7 asks for the /widgets/card-games URL.

The weird thing is, after clicking on Card Games from the home page in IE7 and getting only the screenshot, pressing Reload causes Apache to serve the card-games.html file and all appears well.

I suppose what I need to do is rename the screenshot file so it's called card-games-screenshot.jpg or something, or put it in a different directory from the HTML file. The presence of both a .html and a .jpg file in the same directory, that are not different representations of the same content, appears to go against the spirit of content negotiation.

I came up with a wizardous unix command line to find files that might be subject to content negotiation problems, which I'd like to share:

for a in `find . -type d`; do ls $a|sed -e "s#^#$a/#"|sed -e 's/\.[^.]*$//'|uniq -c; done|grep -v '^ *1'|less

Comprehension and use of the above is left as an exercise for the reader.

Durr. Fixed version:

find . -type f | sed -e 's/\.[^.]*$//' | sort | uniq -c | awk '$1 > 1'
Ah, good one. I was inadvertently making things harder for myself by listing each directory independently. And, does never having learned awk make me bad at unix? :)
No, but awk is very convenient because it autosplits the input (with arbitrary separator if you use -F) and allows regexes and expressions almost to perl's extent. In many cases, like this one, that autosplit makes it more convenient than perl.
I am curious, and I don't have access to unix conveniently right now - what does the single carrot in the first sed command match? That seems totally weird to me.
The ^ matches the beginning of the string, so I'm prepending stuff (in this case the directory name) to each line. I used # as the delimiter because the directory name contains /. As it turns out, there are multiple better ways to do this. :)
[info]banana : Tangentially...
I looked at your card games page, and then I wanted to look at your other widgets, but in the breadcrumb trail the word "widgets" isn't a link.

Also I'm guessing that these are OS X widgets, but you don't say.
[info]ghewgill : Re: Tangentially...
You're right, I should clean that up a bit and provide some context.

They are Yahoo Widgets (formerly Konfabulator), which nicely runs on both Windows and OS X.

I keep meaning to blog about all the different "widget" (in a generic sense) engines, and why I think Yahoo Widgets is currently the best choice for desktop widget authors.
Greg Hewgill <>