Fast Inverse Square Root approximation from Quake III engine
-
-
That’s clever
-
Ah, the joys of IEEE754 arithmetic.
I believe the trick is outdated these days, as modern processors support this operation directly, and in a faster way than that algorithm.
I also believe the code only works with certain compilers and platforms - according to the C specification, the behavior of casting floats to long is undefined, so a compiler can do whatever it wants here; it's pure luck if the code works. Also, there's portability issues related to endianness.
Newton's method is a standard example in my introduction to programming class.
-
Game engine “performance optimization” is often platform specific and tool chain specific. It’s not “luck” that it worked, but testing and careful tool chain selection. If the compiler does not do pointer type casting the way you expect it to, you either find another way or find another compiler.
But, yes, recognizing the frequent use of the inverse square root approximation, I would not be surprised if modern CPU/GPUs just provide a new instruction to do it with hardware acceleration, rendering the Quake III engine’s software approximation unnecessary.
-
@axtremus said in Fast Inverse Square Root approximation from Quake III engine:
Game engine “performance optimization” is often platform specific and tool chain specific. It’s not “luck” that it worked, but testing and careful tool chain selection.
Yeah, but at what price? Large system design works only if you decompose it into small interchangeable parts, not if you glue everything together with super glue. "Premature optimization is the root of all evil" is a famous quote from Donald Knuth that I never fail to mention in my programming classes. Only in very exceptional circumstances is it justified to forsake portability, maintainability, readability etc. for these kinds of "clever" tricks. If you do many of these "tricks", you'll soon end up with a mess that you can only throw in the garbage bin. In the Q3 engine, it may have been justified, but in most similar cases it is not.
-
@klaus said in Fast Inverse Square Root approximation from Quake III engine:
@axtremus said in Fast Inverse Square Root approximation from Quake III engine:
Game engine “performance optimization” is often platform specific and tool chain specific. It’s not “luck” that it worked, but testing and careful tool chain selection.
Yeah, but at what price? ...
That pretty much comes down to the business model. If your business is to provide libraries for others to integrate into their products, then chances are you want to be very general with your implementation to maximize compatibility with many platforms and many tool chains. But if you build end-products yourself, you know a priori which platforms you build for. And in the gaming world where you want bragging rights for greatest graphics with highest frame rates, you bake into your business model the need to re-optimize code for every new platform the comes out that you want to sell into.
The Quake III game engine's inverse square root function shown in the video may seem very specific to a certain big-endian 32-bit architecture, very specific to IEEE 754, very specific to certain C compiler ... but the fact that it's wrapped in a Q_rsqrt() function leaves a lot of flexibility for different platforms and different compilers. It's fairly common for software shops to craft compiler preprocessing directives to compiles different implementations for any given function depending on the target platform. E.g.:
#if defined (__WIN32__) // compile the Intel x86-32 bit implementation here #elif defined (__PlayStation2__) // compile the PlayStation2 specific implementation here #else // call the good old generic sqrt() function #endif
How many and which platforms you specifically optimize for again comes down to business model -- which platforms make up the bulk of your business, which platforms would get you the biggest "bang for the buck" to invest engineering time to craft platform specific optimizations?
-
@axtremus said in Fast Inverse Square Root approximation from Quake III engine:
That pretty much comes down to the business model.
Actually, not really.
Only a tiny tiny part of the code in a 10mloc code base will have measurable impact on the performance of the system.
But "performance-aware programmers" think they should do "clever" coding all the time, even though most of the time it merely turns the code into an unmaintainable mess without tangible benefits.
Only after a careful analysis of the performance relevance, and a consideration of all alternatives, can it ever be justified to write code like that. Otherwise planes and rockets will crash, people will die, or, in the most harmless case, millions of dollars will be wasted. And that's completely independent of ones business model.
-
@klaus said in Fast Inverse Square Root approximation from Quake III engine:
@axtremus said in Fast Inverse Square Root approximation from Quake III engine:
That pretty much comes down to the business model.
Only after a careful analysis of the performance relevance, and a consideration of all alternatives, can it ever be justified to write code like that.
Business considerations determine which aspects of the system's performance are prized. The planes and rockets business prioritize different aspects of the system than the gaming business.
It's not like the Quake III engine's developer team did this sort of "unusual" coding all over the place. This inverse square root function garnered extra attention in large part because it's an outlier compared to the rest of the code base. The video called out why this is justified -- because the inverse square root function is used to calculate light reflection off every polygon surface in 3D-space -- considering a game can cycle through millions of polygons per second and this "unusual" implementation gets the inverse square root approximation "three times faster" than the standard sqrt() function, it seems sufficiently justified.
There is no shortage of "cowboy" software coders who like to write "unnecessarily clever" code, but there are also real resource and time constraints that strongly incentivize professional software shops to reuse code rather than write new code. "Student projects" and "hobbyist projects" aside, mentally reviewing professional software shops I have come into contact with over the years, I really cannot recall seeing "overly clever" coding being anything but generally discouraged, and have no reason to suspect the Quake team operated differently in this regard.