/people/spike/html>
spike@vered.rose.utoronto.ca
"Stereoscopic Vision and Augmented Reality", Scientific Computing and Automation, 9(7), 31-34, June 1993.
I have repaired some of the (minor) editing, and updated some terminology, but it is essentially identical to the published version.
The effectiveness of human-machine systems is often determined by the quality of the human-machine interface. Unfortunately, most existing telerobots are equipped with standard monoscopic video (MV) displays as the main source of information to the operator. MV displays eliminate all binocular depth cues (i.e. eye convergence and disparity), as well as several monocular depth cues (i.e. texture gradient). The loss of these important depth cues results in situations where the location of objects in the remote scene is ambiguous. While motion parallax or multiple views can sometimes resolve these ambiguities, operating conditions may render these options unfeasible.
A related problem is the difficulty in estimating absolute sizes with a MV system. It is difficult to determine whether an obstacle is too steep to climb, or if a depression is deep enough to present a hazard. One British study reported that using standard MV systems made bomb squad personnel reluctant to use their remote manipulator. (Robinson, M. "Remote control vehicle guidance using stereoscopic displays", Proc. Human Factors Society Meeting, 1984)
Human Engineering Research and Consulting (HERC) recently investigated the benefits of using 3-D, or stereoscopic video (SV) for teleoperation applications in the Canadian Armed Forces. SV provides an immediate and compelling sense of depth, which can greatly simplify teleoperation tasks requiring delicate manipulation.

At the lowest level of difficulty, it was found that the benefit of SV faded as subjects repeated a single task again and again. However, whenever the task changed, the advantages of SV were once again immediately apparent. At the highest levels of difficulty, the performance advantages of SV were found even after subjects had performed the same task many times. Since defence-related teleoperation tasks, such as bomb disposal and hazardous materials management, are all characterised by an unpredictable and changing environment, operators will not have the luxury of repeating a task several times. Thus even for very simple tasks, it is reasonable to expect the benefits of SV to be significant and important. For difficult tasks, it can mean the difference between success and failure.
More recently, Human Engineering Research and Consulting (HERC), in conjunction with DCIEM, conducted an investigation into the benefits of using SV for teleoperation applications in the Canadian Armed Forces for experienced telerobot operators. Using several tasks related to bomb-disposal teleoperation, these experiments showed that even expert operators perform better when using SV. More importantly, the operators strongly preferred SV to MV, judging it highly desirable for a variety of tasks, and rating it more usable and more comfortable to use than a comparable MV display.
Until recently, the high cost and technical complexity of flickerless SV systems has limited their use, but the recent introduction of 120 Hz SV systems has made it possible to consider these systems for a wide range of new applications. Several different systems are available, ranging in price up to US$15,000. DCIEM has obtained one of these systems and is considering it as an alternative to the low-end NTSC SV system. It is expected that the flicker-free display will be more easily accepted by the operators and should result in greater user satisfaction with the display. Initial results are encouraging, but cross-talk (seeing the right image with the left eye, and vice versa) due to phosphor persistence in the 120 Hz monitor is distracting. It remains to be seen whether the lack of flicker will outweigh the greater cross-talk and considerably greater expense.
Since 1989, Drascic and Milgram have been breaking new ground by combining computer generated stereoscopic graphics with live stereoscopic video (SV), a technology they dub ARGOS, which means "Augmented Reality through Graphic Overlays on Stereo-video". Using ARGOS it is possible to create virtual objects that appear to exist in the video image. By generating a carefully calibrated virtual pointer of some sort, and allowing the operator to adjust the position of this pointer in the three dimensional video space, it is possible for the operator to indicate a precise destination for the telerobot, or to indicate a path for it to follow. Positioning a virtual pointer is a much simpler task than driving a telerobot. Using such an interface would reduce operator workload considerably.

ARGOS is the foundation of the University of Toronto's Augmented Reality system. Much media attention has been devoted to the phenomenon of Virtual Reality, which generally entails immersing people in completely artificial computer-generated worlds, using as many different senses as possible to complete the illusion. By contrast, Augmented Reality does not attempt to create a virtual world; instead, its goal is to allow the user to perceive the real world more clearly and with greater understanding than is possible using ordinary vision.
Several different kinds of Augmented Reality systems exist. ARGOS is one of the simplest and most robust, because it uses a standard monitor as the stereoscopic display device. Other augmented reality systems use immersive head-mounted displays, but there are many perceptual and calibration issues that remain to be resolved before these systems can be used by industry.
Since the virtual pointer can be used to specify single points in the remote space, it is a simple extension to create a virtual tape-measure, so that the operator can make measurements of the locations and sizes of remote objects.
As a further example of Augmented Reality, consider a space-going telerobot. All video images in space suffer the same problem with shadows: because there is no air in space to scatter light, shadows are completely black, and anything in shadow is completely invisible. However, since the dimensions of everything sent into space are very well known, it is possible to use ARGOS to generate the missing images, carefully drawn to appear at the correct location in the video image.
In other situations, objects that may be invisible to normal vision may be detectable with other sensors. In many underwater situations, normal vision is good only for a very limited distance. While it is easier to see through murky depths with SV than with MV, operators are still very limited. However, using radar and sonar and infra-red cameras, it is possible to sense objects that would otherwise be invisible. If the information from these sensors is sent to the ARGOS computer, appropriately shaped graphic objects can be drawn at the correct position in space, in effect making visible what is normally invisible.
Similarly, information from various medical imaging sensors, such as CAT, PET, and MNR scanners can be used to generate graphic images of the interior of the human body. These images can be super-imposed onto a live video image of the body using ARGOS, and seen in three dimensions, providing a clear advantage of systems that use flat two-dimensional displays.
Improving the human-machine interface of telerobots will enable them to fulfil the myriad tasks they will be facing in the future. Stereoscopic Video and Augmented Reality can greatly improve the feedback of information from the remote machine to the human operator, and tools such as the Virtual Pointer can greatly facilitate the communication of human instructions to the machine.