When designing a complex signal or image processing system, the choice of math libraries is probably pretty low on the list of priorities—but the wrong choice can lead you down a rocky road, either today when you can’t get the performance you need, or tomorrow when you need to do a technology insertion because of obsolescence issues.
Often, the primary reason for using a math library is the desire to get maximum performance out of a specific processor without the programmer having to be an expert in the low-level workings of the device. This can be a complex area as, in most cases, some form of vector processor adjunct to a CPU is involved, with its own instruction set and peculiarities of how to maximize performance.
Managing data locality is key to optimization for most architectures. The (time) price of fetching data increases dramatically the further you have to reach out from the processor—register file, various levels of cache, main memory. This can sometimes be mitigated by issuing hints to the instruction stream to pre-fetch data that is not needed immediately, but can be predicted to be needed soon. That way, data can be moved closer concurrent with crunching the data we already have in hand. How and when to do this is a difficult practice to master, and is not necessarily something an algorithm designer is, or should be, proficient in. Similarly, squeezing clock cycles out of the inner loops of an algorithm is a specific skill, and is different for each processor type.
Technology insertion
These are all reasons why people turn to math libraries in the first place. Another good reason is that the right choice will allow for technology insertion when newer processors come along. For example, it may be that today's Intel processor is identified as being superior in performance to yesterday's Power Architecture device, or vice versa. Moving between these different types of processor—indeed, moving between iterations of the same processor family—can be challenging and an expensive process.
Consider, for example, moving from SSE to AVX to AVX2 to AVX3 on Intel Architecture, or from an AltiVec vector engine to an AVX one. Even with higher level intrinsic instruction set support, these all look quite different to the application. Many COTS board vendors and third parties have libraries available that implement proprietary APIs. Here, performance is stressed—but at the expense of true Open Architecture (OA) and portability. Because of this, a common choice is to use VSIPL, a true OA math library API definition. Now, all the boxes can be checked. Performance is usually good; it is available from multiple commercial and open source repositories; it can be procured for a variety of processor architectures; and it is available for a variety of operating systems.
Consider Abaco's AXISLib, a mature, robust implementation that is available for PPC/AltiVec and Intel SSE/AVX/AVX2, and also has a straight C code reference version. It supports Linux, Windows, VxWorks, LynxOS, and others. By selecting AXISLib, a developer can be assured of achieving highly optimized performance that is portable across a variety of platforms, and because it complies with the VSIPL API definition that is managed by the Object Management Group.
Evolution AXISLib has been available since 2005, but that doesn’t mean it is static—it continues to evolve. New processor and vector engine types have been and will be added, and performance continues to improve. AXISLib-AVX Revision 2.5 has just been released, and includes enhancements that are a direct response to customer feedback and to experience of its use in real world applications. There are over 60 improvements to existing functions and 15 new functions have been added. Many of these were as a result of work performed as part of the USAF’s Next Gen Radar initiative which proved to be an ideal test case for many functions in the library, with aggressive demands for performance within a specified system size, weight and power. If you want to see how Abaco can help you to speed time to solution, improve performance, optimize SWaP, and help move your application to an Open Architecture environment, give us a call. Also, look out for our upcoming white paper: “Design for Refresh—Strategies for Open Architecture COTS Systems Software Operating Systems.”