Military and aerospace systems designers see machine vision as an important enabling technology in robotics and unmanned vehicles, yet they propose moving forward cautiously by ensuring that humans retain ultimate control
By J.R. Wilson
Machine vision has been in practice for decades in its most rudimentary forms as infrared and motion sensors. Here, machine vision serves security systems to detect unauthorized presence; helps control quality on assembly lines as visual sensors compare products against standardized norms to find anomalies; and identifies known cheats and criminals in Las Vegas casinos by comparing camera images with photographic databases. In each case, human operators — not machines — make final judgments before anyone takes action.
In light of today's advanced imaging and information processing, these applications are relatively simple. Yet for such military applications as reconnaissance or even battle damage assessment, something more complex is necessary. In those instances, the pure mechanics of "vision" — the gathering of data by any of a wide range of sensors — is insufficient. Instead, the concept must move to the next level of actually recognizing objects and properly interpreting them in a fluid, real-time environment where static comparisons against standard templates would not suffice.
"Machine vision is trying to automate the process of having recognition of objects in the natural world," says Alan Schultz, head of the intelligence system section at the Navy Center for Applied Research in Artificial Intelligence (NCARAI) in Washington. "What we're really interested in is more general purpose vision so we can build robots that can spot snipers or navigate through difficult terrain."
That task can be more complicated than it sounds, Schultz points out. "It is difficult when you go off-road and encounter a ravine or creek that is overgrown with weeds. How do you recognize that?" he asks. In some cases, Schultz says, systems designers can use sensors that are not analogous to eyes, such as radio- and laser-detection-and-ranging sensors — better known as radar and ladar.
The term machine vision means different things to different people, and becomes increasingly confusing when it relates to concepts in artificial intelligence (AI), where "we are more interested in doing more complex efforts such as humans would do them," Schultz says. Perhaps a baseline definition for machine vision comes from the United Kingdom Department of Trade and Industry:
"When an image is captured in an electronic form, it can be treated as a piece of data," according to the U.K. Trade and Industry department. "Machine vision techniques use this data to simulate human vision by recognizing objects within the image and by interpreting aspects of the scene. The raw image is now commonly captured in a purely digital form through the use of Charge Coupled Devices, but traditional analogue sources, such as television cameras, can also be digitally captured through frame-grabbing techniques."
Two disciplinesWhen machine vision systems designers seek to improve their technology, they look on two fronts — vision and information processing, Schultz says. Better vision is necessary for general manipulation increasing the autonomy of robots. Advancing image-processing software, meanwhile, will join forces with ever-faster computer hardware that has made once inappropriate algorithms more useful."Continued developments in algorithms promise to enable the economical use of machine vision in newer applications. The latest algorithms are expected to provide robust geometric pattern-matching with several degrees of freedom, compensate for extreme illumination and perform 3D analysis of complex shapes," notes an October 2000 technology trends report by the board of directors of the Machine Vision Association of the Society of Manufacturing Engineers (MVA/SME) in Dearborn, Mich.
"One thing we really need to address is having multiple modalities of vision," Schultz says. "Some scientists just look at color tracking or motion or range of images. To have robotic vision, we need to combine those techniques to use the appropriate one at appropriate times.
"There are still some advances that can be made on the hardware. For example, we are looking at some new hardware than can address individual pixels rather than needing the whole frame. If there is something coming at you, you don't necessarily have to look at the whole thing, just that object. Addressing individual pixels also allows you to have much faster algorithms."
Yet while finding patterns amid hundreds or thousands of pixels is one computing task, determining what those patterns represent is quite another, experts say. Machines, in short, need to know what they are looking at.
"What you need to know about it depends on your goals. If you are just trying to navigate from point A to B, it doesn't matter if you encounter a table or a chair, just that there is no empty space. If a robot is detecting a sniper, we have the ability to look at motion, heat, skin tone, and face recognition to determine if it is a human. But is it a sniper? AI for years has been trying to look at what is a cup — does it have to have a handle, is it cylindrical? What makes it a cup? It holds liquid, but that is hard to tell by just looking at it. That's a problem with AI that hasn't been solved yet."
Military applicationsScientists in the Robotics and Process Systems Division (RPSD) at Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tenn., have tested some evolving technologies in machine vision for several military applications, including a U.S. Army Future Armor Rearm System (FARS) to rearm tanks automatically in the field. RPSD experts say FARS will "improve the safety and efficiency of U.S. warfare by automatically reloading battlefield vehicles with ammunition."In another program, researchers at the Oak Ridge Image Science and Machine Vision group equipped eight-wheeled remotely operated robots with electronic systems, sensors, computers, and an ORNL-developed automated data acquisition system. The resulting robot surveyed radioactive waste sites around the United States.
The ISMV Group formed in the 1990s to conduct pure and applied research in machine vision and perception. The group's goal was emulating the human visual sensory and cognitive process in computers and robots.
Experts have found that a robot traditionally must compare what it sees against some recognizable object in its field of view and estimate position using the vehicle's recent motion history. This is an observation of Rustam Stolkin and Mark Hodgetts (Sira Limited, U.K.) and Alastair Greig (Department of Mechanical Engineering, University College London) in a paper on the use of robotic vision for underwater navigation, which they delivered to the 11th British Machine Vision Conference in September 2000. Most of the solutions proposed to achieve that, they added, depend on acquiring high quality images under conditions of constant, uniform lighting, and excellent visibility.
"Comparatively little attention has been given to the more difficult problem of recognising an object in the adverse conditions of the real world," they wrote. "In contrast, we know that the human visual system is extremely robust, even in conditions of dynamic, non-uniform lighting, poor visibility, and occlusion. Further, there are many instances where the human visual system can correctly interpret an image, even when that image possesses insufficient information to enable such interpretation. It is our belief that such a system can only function by making use of some prior knowledge of the scene."
To achieve the same thing with a robotic system, the authors developed a new algorithm to interpret murky underwater images, which they say demonstrates a clear improvement over model-based scene interpretation and conventional Markov Random Field (MRF) methods.
Software issues"The algorithm allows machine vision systems to make use of prior knowledge of their environment in several novel ways," they wrote. "Firstly, the predicted image is used to automatically update the probability density functions required for MRF segmentation. Secondly, the predicted class of each pixel is introduced within an extended MRF model to enable image segmentation to be both data and expectation driven. Thirdly, our estimates of image interpretation and camera position are mutually refined within an Expectation-Maximisation framework."The use of prior knowledge has enabled the algorithm to interpret parts of the image which contain little or no useful information, producing similar interpretations to those arrived at intuitively by human observers," they wrote. "The algorithm learns about its environment with each successive iteration and adjusts the relative contributions of the predicted and observed information by responding to the visibility conditions both at any given moment and in any given portion of the image. Future work will investigate the use of more sophisticated tracking algorithms and the incorporation of underwater lighting models."
Similar problems arise when experts apply machine vision to automatic target recognition (ATR), which detects, tracks, and recognizes small targets using data from a variety of sensors, including forward-looking infrared (FLIR), synthetic aperture radar (SAR), and ladar sensors. ATR can help military commanders assess the battlefield, monitor targets, and re-evaluate target position during unmanned weapons fire. These jobs require precise separation and identification of elements within a noisy and usually fluid background.
Support for machine visionAs with most technology developments, the future of robotic vision has two parents — government and private industry. While much of machine vision and artificial intelligence started with government funding, private industry also has pushed forward in factory environments on such elements as robot arms, robotic production, security, and inspection. The primary difference, researchers note, is the generally static nature of commercial applications, with robots looking for specific parts on a never-changing assembly line, versus the chaotic environment in which military robots will operate.Even so, even NCARAI has cooperative efforts to share technology and development with non-military users. One such is a cooperative research and development agreement with Ford Motor Co. and Perceptron Inc. of Plymouth, Mich., to give assembly line robots the ability to spot objects on the conveyor belt and pick them up without being concerned about the precise placement of the object on the belt. Another would apply Navy-developed machine vision techniques to NASA's problems with rendezvous, docking, and assembly of parts in space.
The Defense Advanced Research Projects Agency (DARPA) in Arlington, Va., and the U.S. Army are pursuing machine vision for robotic navigation in a program called Perception for Off-Road Robots (PerceptOR), one of six key supporting technology programs of the DARPA/Army Future Combat Systems (FCS) program.
As a prototype development investigating the practicality of ground robots, PerceptOR is in place to advance the state-of-the-art for off-road obstacle detection and enable the high levels of autonomous mobility that are necessary for FCS operations. That means navigating in a variety of terrains and conditions — including crowded cities and bad weather — with minimal human intervention.
"PerceptOR will provide strong data for the rapid advance of perception system algorithms under real world conditions," says DARPA program manager Scott Fish. "The resulting prototype performance will also enable the Army to make clear robot application decisions based on the conditions of employment and the level of human involvement in the robot navigation."
Fish has his own two-part definition of machine vision. The first part, he says, involves "static data understanding." This involves information from one sensor or from a sweep of sensors. Static data understanding has three goals:
- classify objects in the image;
- understand relationships between objects in the image, such as which ones are closer than others and which objects are moving; and
- determine how the objects in the image relate to the people viewing the image.
The second part of machine vision, Fish says, involves what he calls "dynamic analysis." This considers how objects in an image relate to one another over time, their movement, and whether or not each object is important. "In a machine vision application aboard an unmanned aerial vehicle, for example, you then have to determine what to do about what is seen, which gets into autonomy," Fish says.
As part of its phased approach to PerceptOR, at the end of March DARPA awarded contracts worth roughly $1.5 million each to four industry/university teams for an eight-month effort to develop and test sensors, algorithms, mounting mechanisms, and processors.
A down-select to three teams will continue into Phase II, a yearlong period of field tests with full perception system prototypes mounted on all-terrain vehicles and operating under a variety of terrain and weather conditions to evaluate their ability to perceive obstacles in foliage, at night, and through obscurants such as smoke and dust. During the program's last phase, two teams will continue experiments for another year in more challenging terrain and under degraded component conditions.
"We're trying to keep PerceptOR pretty generic, to develop something that will have applications across whatever elements of FCS the Army decides to pursue," Fish says. "We're focusing on navigation — just getting from A to B and understanding all the different environmental hazards that can crop up."
Making judgmentsFish says the machine has better sensors than human eyes, but has inferior data processing to the human brain. "We take a lot for granted about the processing that goes on in our brains to automatically handle reactionary behaviors," he says.The challenge for machine vision systems engineers involves programming for relationships between sensor data and what the machine does with that data. Fish says one unique aspect of PerceptOR revolves around machine learning. Another is the use of multipersepectives — having two ground robots look out for each other, for example, or combining the ground perspective with overhead assets to help resolve navigation problems.
Everyone who is involved in developing machine vision and autonomous vehicles for the military agrees the goal is not a "Terminator," or an autonomous robot that not only senses and identifies but also that takes independent action in the form of weapons use. "We are interested in systems that work with the human in the loop," says the Navy's Schultz.
Inserting a human in the process calls for dynamic autonomy, or different levels of autonomy based on the situation, the task at hand and the rules of engagement, he says. "The interaction between human and robot will determine the level of autonomy, but there is a human in the loop. Future unmanned combat air vehicles — or UCAVs, for example, will have weapons, "but would the robot autonomously decide to release weapons? Never," Schultz says. "Humans must be held accountable for that final decision. So the vehicle may be able to go out and find what it thinks is the bad guy, but in the end it must show that to a human for a final decision on what to do."
Schultz says Navy leaders want robots "to do the three D's — dull, dirty, and dangerous." That way, military commanders can keep their fighting forces out of harm's way as much as possible, and concentrate their use of humans on "those things humans are really good at; we're not ready to replace them," he says.
At the same time, Fish says, researchers are making every effort to reduce the level of human involvement outside that final decision-making role.
"We're trying to implement a system that reduces the amount of human operator involvement," he says "In the timeframe of PerceptOR, I doubt we will get to the point of just tasking the robot and not having to hear back from it until the job is done. We don't expect to get a single answer to handle all environmental issues. How autonomous we can make these vehicles depends on the environment in which they will have to operate. They may be able to operate in a desert with little user input, but in a woods, the robot may have to call back for help along the way."