Freescale QorIQ AMP microprocessor offers alternative for low-power high-performance embedded computing
By Bill Mercerand Glenn BeckGuest viewpoint -- AltiVec technology from Freescale Semiconductor Inc. in Austin, Texas, expands the capabilities of microprocessors built on Power Architecture technology by providing general-purpose processing, data processing, and digital signal processing (DSP) on one chip.AltiVec technology is the single-input-multiple-data (SIMD) accelerator on the Freescale e600 core microprocessors, and is part of the Freescale QorIQ communications processor roadmap. It includes Power Architecture cores, accelerators, and security. The platform includes QorIQ advanced multiprocessing (AMP) processors based on 28-nanometer process technology. Blending AltiVec technology with the QorIQ microprocessors enables aerospace and defense embedded computing designers to use a SIMD engine within a multicore processor.Aerospace and defense applications need power-efficient SIMD performance, as well as fast signal processing, image processing, and math operations such as matrix multiplication. Real-time imaging and signal processing enables autonomous decision making and provides image and data information quickly to control centers.
A major portion of defense budgets are expected to go into unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs). The number of sensors and the data they generate are increasing exponentially from first-generation UAVs to what will be seen in the future. The need for autonomous real-time decision making within the size, weight, and power (SWaP) requirements will place increasing demands for processing capability in future systems, making a SIMD engine well-suited to perform the vector processing for these applications.
Synchronous processing and DSPs
AltiVec technology provides DSP- and graphics-processing (GPU)-like functionality. Typically a CPU would offload data to a DSP or GPU to be processed; meanwhile, the CPU core would stall or move on to other processing. Although this is fast processing, it is not quite real-time processing. DSPs are more powerful than the AltiVec engine in terms of mathematical processing power; however, DSPs and AltiVec technology are geared toward different needs. The synchronous behavior of the AltiVec engine in the core is what sets it apart from DSPs, particularly in aerospace and defense applications.
GPUs are designed to process polygons and triangles for applications like video game graphics, yet embedded computing designers are attracted to GPUs for its supercomputer performance, and its ability to show real-time data about real-life situations.
An AltiVec engine can provide that fast processing capability without the software overhead of a GPU. AltiVec technology provides some of the functions of graphics processors and DSPs without the asynchronicity of sending the data off-chip and waiting for it to come back. Combining a core with an AltiVec engine and a DSP on the same chip can provide more fast processing options. For example, telecom applications use DSPs for packet processing because they can do fast math—but that can happen off-chip. Providing quality of service for voice and data traffic does not incur the same real-time requirements as many aerospace and defense applications.
In the vector execution flow within the core, the dispatch unit looks at the instructions coming in and sends data to the vector unit. Without a vector unit the data would have to be processed in floating point or scalar. The integer unit (IU) and floating point unit (FPU) would stall because of all the cycles it takes to perform these higher math calculations. The vector unit, with its large register set and data bus width, can manipulate several sets of data with a single instruction. This synchronous processing frees the IU and FPU to continue scalar and floating point calculations without putting additional processing load onto the AltiVec engine.
AltiVec’s vector execution unit is concurrent with Power Architecture integer and floating-point units. It features enhanced separate, dedicated vector registers. There is no penalty for mixing integer, floating point and AltiVec operations.
A vector architecture enables simultaneous processing of several data items in parallel. Operations are performed on several data elements by one instruction. This is also called Single Instruction Multiple Data (SIMD) parallel processing. With SIMD, one instruction does not span over several cores.
Low power
AltiVec technology already benefits from the low power consumption of Power Architecture processors. AltiVec engines in multicore processors benefit from a power-saving technique called cascading power management, which steers tasks to a relatively small number of cores so that the idle cores can enter a minimal-power or “drowsy” state. Cascading power management reduces energy consumption under light network loads and then enables cores to return to full function quickly and automatically when network loads increase.
A 12-core device with 24 virtual cores, for example, would have 12 physical cores and 12 physical AltiVec engines. Systems designers who want to use all those cores can take advantage of them, but with the accompanying power load. Nevertheless, using only two or four cores dedicated to a task can free other processors in that case for floating point and integer instructions.
The AltiVec logic in those other cores would float down close to zero, which enables wider adoption of multicore AltiVec devices without using excessive power.
AltiVec in QorIQ AMP
The newly introduced Freescale QorIQ AMP series of microprocessors delivers up to four times the performance of Freescale’s previous generation flagship eight-core QorIQ P4080 device. The AMP series is for next-generation control and data plane processors scaling from cost-effective, ultra-low-power single-core products to advanced systems on chips featuring 24 virtual cores targeting the most demanding networking, industrial and aerospace/defense applications.
Aerospace and defense systems designers face demanding performance needs and power requirements of smart mobile devices and associated IP traffic. Meanwhile, services-oriented networks face increasing software complexity and demand for greater processing per packet, while the adoption of cloud computing requires networks to handle more applications converging on multicore processors with virtualized processing and I/O resources. AMP series processors deliver precise blends of performance, power and state-of-the-art embedded intelligence to help address these challenges, as well as the demanding requirements of other aerospace and defense applications including target acquisition and cockpit displays, as well as long-range, all-weather and all-altitude radar imaging.
Central to Freescale’s AMP series is the new multithreaded, 64-bit Power Architecture e6500 core running at up to 2.5 GHz. Ideal for both high-end control plane and high-performance data plane applications, the e6500 will populate the AMP series products. The e6500 incorporates an enhanced version of the proven, high-performance widely adopted AltiVec vector processing unit. In the T4240 device, the AltiVec engine is able to deliver 240 billion floating point operations per second of performance within a single monolithic silicon substrate.
Editor's note: Bill Mercer is applications engineer, and Glenn Beck is industrial segment market manager at Freescale Semiconductor Inc. in Austin, Texas.
Related stories