Using the h.264 standard might not be the best idea for real-time mission-critical UAV video data feeds
By Brian AmesGuest viewpoint, 8 Dec. 2011.Unmanned aerial vehicles (UAVs) must be able to receive and transmit streaming video data. There is generally no substantive visual intelligence at the node, which is the tradeoff of moving to unmanned aircraft. This means transmitted video intelligence must be as reliable as direct visual contact.Enter analog video, which transmits TV-quality imagery from the UAV to the ground. Mission accomplished -- right? Yes, if the goal is to send up a few UAVs. The problem involves the spectrum necessary to send and receive standard analog color TV signals. National Television System Committee (NTSC) TV signals, which are standard in most of North America and throughout the world, require 6 MHz. Put too many UAVs up in the air at the same time and there simply is not enough bandwidth to go around.Even single UAVs feel this bandwidth pinch when users ask for dual cameras, high frame rates, or great resolution. These present huge benefits on the ground, but come with a tradeoff -- the need for more bandwidth. The solution is compression.
Standards like MJPEG and JPEG2000 gained popularity as they sent information in ever-smaller streams. There seemed to be no limit to this synergy. Both appeared to be following some version of Moore’s Law. The problem, however, was that compression and video quality were actually headed in different directions. Compression was heading toward a practical limit, and video was about to enter a period of aggressive growth. About two years ago, cameras and compression hit an inflection point that would make all the difference in real-time-over-the-air transmission of video.
The inflection point was the move from spatial compression to temporal compression. Spatial compression condenses down to a few commands any repeated information within a single frame. That means, if a line in a frame is all white, instead of sending information about every pixel, Spatial compression just tells the system to write the same pixel value n times.
The h.264 standard, which introduces temporal compression, was created to send video over the Internet. This takes the form of I-frames, P-frames and B-frames. An I-frame takes a snapshot of the entire scene -- like a photograph, and spatially compresses it. The new feature was the P-frame which looks at the I-frame and says “I see that the line from frame one is still all white -- so just send the command to repeat the same pixel, line, or group from the last image."
The B-frame took this one step further and focused on out-of-order frame sequencing. By keeping many frames in a buffer and comparing both forward and backward, the system is able to achieve impressive compression ratios. While this required more logic and computational power, advances in technology made this all possible with minimal latency.
The best part was the ability to set quality vs. compression ratios. While complex, the simple explanation is that compression level is based on a grouping of I-frames, P-frames and B-frames. This is known as the Group of Pictures (GOP), which tells the system how often to take a full image (I-frame), and how long to make the chain of compressed images based off that initial full image. The longer the chain, the greater the compression. The natural tendency of system designers was to say “how low can this compression go and keep my video usable?” The results in the lab were stunning.
Still, this reliance on forward and backward frames is exactly why h.264 does not work well for mission-critical, real-time military avionics applications. UAVs must be designed to operate in any theater at any time if they are to revolutionize the way U.S. military forces engage threats. The challenge in this case is a hidden problem -- the invisible, situationally dependent bit error rate (BER) profile.
Bit errors present huge problems for temporal compression schemes. A single bit error can confuse the coding algorithm and lock up entire sections of the video stream for the whole Group of Pictures, which can be 16 frames or more.
Worse, a bit error every 0.5 second (which might be considered good in some circles), might result in a system that rarely sees any video. Greatly simplified, consider the following scenario: a video captures an I-frame at the very outset of the GOP. The very next frame encounters a bit error. This triggers what is known as a “cascading error." The bit error can cascade to every follow-on P and B frame. The error is only corrected when the next I-frame is sent. The new I-frame “wipes the slate clean." How long it takes to reset the video stream, and how the system handles a cascading error is the challenge.
UAV operators who are landing vehicles, tracking objects of interest, or selecting targets cannot tolerate these problems. For these UAV operators, it would be preferable to have analog video transmission where the image gets hazy when the conditions are noisy -- or where the image fades out as the UAV enters the limits of radio range.
The UAV operator would also be better off with a spatial filter only, which is what we generally find in the field. In these cases there is just a bit error, and not a “cascading bit error." A single frame would have a corrupt line or a corrupt set of pixels for one frame. The error is controlled to a certain duration and area.
The question becomes, can we correct for these problems and make h.264 act like analog video while achieving the low bandwidth necessary to enable U.S. military forces to send waves upon waves of UAVs?
Think of waves and waves of cell phones. That works. Common cell phones are deployed massively and need to deliver high-quality service in all sorts of lossy environments. How can they operate effectively? Outside of the two obvious points that voice is low bandwidth (vs. video) and that cell phones can fail without major consequences, there are fundamental lessons that apply.
The first lesson is that cell phones operate forward error correction (FEC) algorithms to handle bit errors. A common form is the Viterbi algorithm (often coupled with Reed Solomon algorithms). FEC requires sending redundant data to cross check and remove bit errors. This redundancy is generally tuned to allow the receiver to detect and correct a limited number of bit errors without a resend of the data. This implementation of the practical optimum bandwidth makes cell phones work.
The second answer is that in cell phones, responsibility for FEC and encoding is handled in one unit. This allows the cell to detect the BER and take any number of corrective actions.
Challenges for the military and h.264
In the military, independent system components handle FEC and encoding. The video encoder generally works in the blind. The quality of the radio link -- where one would find the bit errors -- is not under the control or feeding information to the encoder. Without visibility into the BER, the encoder cannot take advantage of real time intelligence to adjust bandwidth to ensure quality.
The h.264 standard has FEC options built in, although they are not generally activated -- especially not commercial encoders. With advanced coding, it would be possible to monitor the BER and tune the amount of bandwidth. While this is positive for generally predictable environments, a deploy-anywhere-at-any- time UAV may find that it has a certain BER at the airfield, another over the battlefield, another as it passes behind a mountain and loses line of sight, and another when it passes near other aircraft. The unpredictable nature of the BER makes h.264 risky when set for maximum compression. Determining a practical optimum bandwidth for UAVs that is based on actual use patterns can mitigate this problem.
Following the cell phone model, we must develop a practical optimum bandwidth profile. To accomplish this, intensive data must be collected and studied about actual field Bit Error Rate Profiles. This will help define the theoretical maximum compression ratio for h.264 within BER conditions, and allow for the system to deliver higher compression with the necessary quality.
Second, we must understand how to deal with BERs that we control -- specifically contention for bandwidth. A UAV set to minimum bandwidth in a crowded environment, where its data stream must compete for space, is at a huge technical disadvantage. There is no mechanism to ensure quality of service) for incoming video streams.
Until systems designers figure out these two issues, h.264 will have reliability and usability issues. In the meantime, remember that in a lab anyone can make claims to wild improvements in compression ratios.
Don’t be fooled.
Brian Ames is strategic accounts manager at Digital Design Corp. in Arlington Heights, Ill.