Apache mod_xsendfile and non-ASCII filenames.

If you are using a web development framework like Django, you know that it is not efficient at serving static files. However, if you need to protect access to the files to only specific users, and you want to use some code to do so, you need to find a solution that allows your Django application to do the authorization, while offloading the sending of the file to an external server.

If you are using Apache as a webserver, then one choice is to use mod_xsendfile, which allows you to set a header, instructing Apache to efficiently send the file to the client. This gives you the best of both worlds.

However, when dealing with an international application where filenames are determined by customers, you will often have file names that contain non-ASCII characters. In this case, Django will raise an exception when you set the X-SendFile header:

UnicodeEncodeError: 'ascii' codec can't encode character u'\uf026' in position 74:
ordinal not in range(128), HTTP response headers must be in US-ASCII format

This is because HTTP headers must be ASCII only. This problem is well documented. So, what is the solution?

My solution was to patch mod_xsendfile so that it can accept an additional header, X-SendFile-Encoding, instructing it to decode the file name before use. In this manner, the Django application can encode the filename into ASCII, send it to the module, which will then decode it and send the file. The encoding scheme I selected is url encoding. Given that my file system encoding is UTF8, the full solution is:

response['X-SendFile-Encoding'] = 'url'
response['X-SendFile'] = urllib.quote(path.encode('utf8'))

The patch is available at: http://ben.timby.com/pub/mod_xsendfile-url_encoding.patch
I have also built an RPM for CentOS at: http://dagobah.ftphosting.net/yum/mod_xsendfile-0.11.1-5.x86_64.rpm
And of course the SRPM: http://dagobah.ftphosting.net/yum/SRPMS/mod_xsendfile-0.11.1-5.x86_64.rpm

I hope this helps somebody!

Update

The author of mod_xsendfile implemented a better version of my patch. Now by default the header’s value will be decoded (causing no problems for non-encoded values). This behavior can be disabled using an optional configuration flag (XSendFileUnescape off).

https://github.com/nmaier/mod_xsendfile/commit/b98d2d1df9f7acd720bf082e32b1392188a23379

https://github.com/nmaier/mod_xsendfile/commit/0efcd03ac196930da6b139b77972c0d430e0225c

Thank you Nils!


About this entry