DivergentCoder - HTML Sanitization

After posting that WebSockets chat demo I got some grief from a co-worker of mine about the amount of security - or rather, the lack thereof - I put in. With that in mind I set off to learn more about the various ways someone could use the chat as vector for accomplishing some rather nasty things, reading up on all sorts of XSS vulnerabilities. Needless to say, the topic is rather vast (and interesting) and it soon became apparent to me that unless I wanted to devote a lot of time to this it would be best to simply grab some library to accomplish the task for me. A few searches later and I had just such a thing - well, two such things actually.

On the client-side of things I went ahead and integrated the html sanitizer from the Google Caja library. The library is simple to use - at its most basic I simply call a single function while providing my own functions for deciding how to handle url strings and id/class names. It ends up looking something like this:

function url_filter(url) { return ""; }
function id_filter(id) { return ""; }

html_sanitize("<img src=javascript:alert('xss')>", url_filter, id_filter);

Super simple. You can of course do even more complicated checks with it as well.

On the server side I checked out the Sanitize Ruby library. This is also a very easy to use library, working off a whitelist concept where the default behavior is to strip everything down to plain text. From there you can allow various tags and attributes that you have decided to be safe. A simple example:

require 'sanitize'

output = Sanitize.clean(input, {:elements => ['b', 'i', 'u']}) # allows bold, italic and underline

I have integrated both libraries into the current incarnation of the chat and they seem to work pretty well, handling any of the attack avenues I tried from this page. Of course, that doesn't mean it is now invulnerable as I am sure there are probably still a variety of ways this sort of thing could be compromised.

The new code can be found here.