I just started reading about Intel's SPMD Program Compiler - ISPC - tonight and thought I would give it a quick test. If you haven't been following, a little while back I wrote a series of posts looking at data organization and SIMD optimization for a very dumb particle system - you can read over the posts starting here. To briefly recap, I set up a simple system with a 4 component position and 4 component velocity and went over various ways to organize the data and write an update loop, performing tests on a variety of particle counts and providing timings for those test.
After reading about ISPC I went ahead and dusted off the code and test setup I used for those posts. Very quickly I wrote a pretty straightforward program to compile with ISPC, looking like this:
export void simple_update(uniform float px[],
uniform float py[],
uniform float pz[],
uniform float pw[],
uniform float vx[],
uniform float vy[],
uniform float vz[],
uniform float vw[],
uniform int count,
uniform float dt)
{
for (uniform int i=0; i<count; i += programCount)
{
int index = i + programIndex;
px[index] = px[index] + vx[index] * dt;
py[index] = py[index] + vy[index] * dt;
pz[index] = pz[index] + vz[index] * dt;
pw[index] = pw[index] + vw[index] * dt;
}
}
task void
update_axis(uniform float pos[],
uniform float vel[],
uniform int count,
uniform float dt)
{
for (uniform int i=0; i<count; i += programCount)
{
int index = i + programIndex;
pos[index] = pos[index] + vel[index] * dt;
}
}
export void task_update(uniform float px[],
uniform float py[],
uniform float pz[],
uniform float pw[],
uniform float vx[],
uniform float vy[],
uniform float vz[],
uniform float vw[],
uniform int count,
uniform float dt)
{
launch < update_axis(px, vx, count, dt) >;
launch < update_axis(py, vy, count, dt) >;
launch < update_axis(pz, vz, count, dt) >;
launch < update_axis(pw, vw, count, dt) >;
}
Like I said - straightforward. As you can see I made two different implementations: the simple_update just goes through the elements, performing the update "one at a time" much like the original SOA loop I wrote in C++; the task_update behaves similarly but launches a task for each axis independently. The language itself is obviously very understandable, reminiscent of writing a simple shader program. To compile this I just do the following at the command line:
ispc -O2 --arch=x86 --target=SSE2 -o ParticleSimd.o -h ParticleSimd.h ParticleSimd.ispc
ParticleSimd.ispc is the name of the file that contains the code from above and after entering this at the command line I am provided with ParticleSimd.o for me to link with my C/C++ program and ParticleSimd.h which I can include from my C/C++ program. To use the code in a program all I have to do is call ispc::update_simple or ispc::update_task (in the case of C++ where the namespace makes sense). Pretty simple! How does it perform? I re-ran the last few test cases using this code and got the following results:
So, interesting results - not that much different from what I already had but still a little bit faster for the task based update. It seems like there is some definite potential here for some cool stuff, so I'll probably dig in to it a little deeper and see what I can get out of it. Fun.