I know I had said that I was done with MLAA for a while, but what can I say - I can be a little obsessive.

This demo again focuses on speeding the process up. The main improvements in speed come from simplifying the final edge blending process and utilizing the stencil buffer to minimize the number of pixels being processed. Together these optimizations bring the cost down to around 2.5ms on my Geforce 9600M GS and 0.7ms on my Radeon 4850. Beyond this I suppose it would be possible to gain some performance by cleaning up the edge detection so we aren't wasting time on noise; I could also explore reimplementing this with DirectCompute.

I have also tossed in a text display of a running 30 frame average of the time per frame. This proved most helpful in uncovering some interesting behavior in my timing on the 9600M. It seems that when I was running in a window the timing would fluctuate by as much as 2 ms - obviously problematic when I am trying to measure timing differences around that same magnitude. After a quick driver update the issue seems to have disappeared.

You can download the demo here. It clocks in at around 35MB this time - I have changed it to use the fairy scene from the University of Utah.