* Defence and Civil Institute of Environmental Medicine,
PO Box 2000, Station A, Downsview, Ontario, Canada M3M 3B9
(c) Copyright 1995.
Keywords: Augmented reality, mixed reality, virtual reality, augmented virtuality, virtual control, telerobotic control, stereoscopic displays.
The topic of Augmented Reality has begun to appear in the literature with increasing frequency, usually in conjunction with some treatment of the better known subject of Virtual Reality (VR). However, there is currently little consensus on precise definitions of either VR or AR. VR, for example, is used to refer to systems ranging from totally immersive computer generated virtual environments, to interactive desktop computer graphic applications, to text-only "Adventure"-style computer games. The term "Augmented Reality" is also used in different ways by different people, without what could reasonably be considered a consistent definition. We use AR to refer to real scenes that are enhanced or "augmented" with computer graphics. Although in terms of their fundamental properties VR and AR may appear to be quite different, they face many of the same issues, and much of the research and technology of one pertains to the other.
In general, a Virtual Reality environment is one in which the user is immersed in a completely synthetic world, which mimics the properties of a real-world environment to a certain extent, and which may also exceed the bounds of physical reality by creating a world in which the physical laws governing gravity, time and material properties no longer hold. In contrast, the real-world environments of Augmented Reality systems are obviously constrained by the laws of physics, which necessarily impose certain restrictions on one's ability to interact with the world. AR tools are designed to facilitate such interactions. Rather than regard the concepts of VR and AR as antitheses, however, it is more instructive to view them as lying at opposite ends of a continuum, which we refer to as the Reality-Virtuality (RV) continuum (Milgram et al, 1994).
Figure 1: Schematic Representation of the Reality-Virtuality (RV) Continuum.The RV continuum concept is illustrated in Figure 1, where the "completely real environment" shown at the left side of the RV continuum defines any environment consisting solely of real objects, and includes whatever is observed when viewing a real-world scene either directly (i.e. in person) or by means of a video display. An illustration of this case is given in Fig. 1a. The "completely virtual environment" case at the right defines any world consisting solely of virtual objects, examples of which would include conventional computer graphic simulations, either monitor-based or immersive. An illustration of this case is given in Fig. 1d. Between the extremes of the RV continuum lies the range of Mixed Reality (MR) environments in which real and virtual worlds are combined in various proportions and presented as a unified whole.
Augmented Reality (AR) and Augmented Virtuality (AV) are special cases of Mixed Reality (MR), within the RV continuum.
Fig 1a: Completely Real
Fig 1b: Augmented Reality
Fig 1c: Augmented Virtuality
Fig 1d: Completely Virtual
Figures 1c and 1d were kindly provided by the ATR Communication Systems Research Laboratories, Kyoto, Japan, as an illustration of a portion of their Virtual Space Teleconferencing system.
Using the framework of the RV continuum, the definition of Augmented Reality, as indicated in Fig. 1b, pertains to any otherwise completely real environment which is somehow enhanced by means of computer graphics. (Although in this paper our treatment of Augmented Reality is limited strictly to visual displays, analogous concepts apply also to both auditory and haptic display modalities.) That is, the image in Fig. 1b comprises essentially the same objects as the real-world image of Fig. 1a, with the addition of the virtual robot shown in the foreground.
Another important class of MR displays, also shown along the RV continuum in Fig. 1, is labelled Augmented Virtuality (AV). This class is similar to AR, but comprises enhancement of otherwise completely virtual environments with real images and objects. One example of such a display is shown in Fig. 1c, which is to be compared to Fig. 1d. Another example of what might be considered Augmented Virtuality occurs in haptic display research in which subjects viewing virtual objects can touch corresponding real objects, where the real object simulates an advance haptic display of a virtual object.
In the completely virtual environment shown in Fig. 1d, we have a modelled participant in a virtual space teleconference (Takemura & Kishino, 1992), manipulating a collection of modelled (virtual) objects. We see here that, through the addition of an (unmodelled) background video scene, a significant degree of richness and realistic detail has been added, at minimal computational expense, to replace the otherwise plain backdrop of the completely modelled environment.
A block diagram of the ARGOS (TM) system is given in Fig. 2. Separate left and right camera images are combined into one video signal using the alternating-field method (Milgram, Drascic & Grodski, 1990). The heart of the computer generated imaging system is the graphics workstation, which creates the stereographic (SG) images, which can in turn be manipulated interactively in 3D space by means of one of a variety of 6 degree-of-freedom input devices (Zhai and Milgram, 1993). Stereoscopic images can be presented either as conventional 60 Hz NTSC images (in North America), using alternating left and right fields, or as 120 Hz non-interlaced flicker-free images (Lipton and Meyer, 1984). The Stereo Format Conversion System is able to convert stereoscopic video back and forth between 60 and 120 Hz formats, allowing us to generate images for 120 Hz display on a graphics workstation, to store these on a regular 60 Hz VCR, and then view the images again during playback on a(nother) 120 Hz flicker free system.
One important related class of AR systems are those which use head-mounted displays rather than a monitor for viewing the remote scene, with the remote cameras slaved to the head motions of the user (e.g. Tachi, 1993). Such systems are typically used for manually controlling remote robotic manipulators or vehicles. Other immersive forms of AR use see-through head-mounted display systems, in which the real world surrounding the observer is viewed directly, either using half-silvered mirrors as in optical see-through displays (e.g. Caudell & Mizell, 1992; Janin, Mizell & Caudell, 1993; Feiner, MacIntyre & Seligmann, 1993), or using carefully aligned video see-through imaging (e.g. Rolland, Holloway & Fuchs, 1994; Edwards, Rolland & Keller, 1993). All of the immersive approaches are therefore based on the observer's feeling naturally part of the world being viewed, which brings with it clear advantages for applications in which presence is required.
Although immersive displays such as head-mounted video can be very effective for certain tasks in which the full attention of the operator is required, they interfere with the operator's ability to attend to more than one task at the same time. An advantage of the monitor-based approach is that operators can selectively attend to and meet the demands of a variety of tasks simultaneously, and are therefore well suited to both manual and supervisory control tasks. In addition, when computer augmentation (AR) is added to the immersive displays, new technical difficulties arise with respect to body tracking and graphic calibration and registration (Rolland et al, 1994). A detailed taxonomy of immersive and non-immersive AR and other related MR displays is given elsewhere (Milgram & Kishino, 1994; Milgram et al, 1994).

Fig. 2: Block diagram of ARGOS system. The signals from the two cameras are combined with a stereoscopic encoder, and images from the graphic workstation are combined with the video. The AR display is present on a monitor at either 60Hz or 120Hz, depending on the specific hardware being used.
The problem of absolute position determination in a video image can be solved by using ARGOS to transform it from an absolute position estimation task into a relative position estimation task. We present the viewer with a Virtual Pointer, which appears to "float" in the remote scene (Milgram et al, 1990). By aligning the Virtual Pointer with objects in the remote scene, the operator is able to specify precise three dimensional coordinates. Because it is a relatively straightforward task to calibrate and align the stereoscopic graphics with the stereoscopic video, it is possible to animate the Virtual Pointer in such a way as to appear as realistic as desired, given the limits of the graphics computer. While our current implementations use a variety of Silicon Graphics workstations and IBM PC-compatibles, our early work used Commodore Amigas to animate the Virtual Pointer. Our studies with the earlier systems showed that operators can use the Virtual Pointer with essentially the same accuracy as a real pointer, with standard errors in depth corresponding to less than one pixel on the display (Drascic & Milgram, 1991).
A simple extension of the Virtual Pointer is the Virtual Tape-measure, both of which are illustrated in Fig. 3. The Virtual Tape-measure can be used to measure sizes and distances in the remote world. Operators use it by first specifying the starting point of the Virtual Tape-measure with the Virtual Pointer, and then dragging it out toward the end point. The ARGOS system can then report the depth information in a variety of ways, in addition to transmitting it to the telerobot. For example, it can display text floating in space at the depth of the end point, comprising the absolute {x,y,z} locations of the start and end points and of the total distance between them. Another option is to use speech synthesis to report the distance audibly.
Because the user is free to position the Virtual Pointer at any location in space, it can also be used to specify a target or mark a path for an intelligent mobile telerobot to follow. (See Section 5.)

Figure 4. The Virtual Tether concept, for a peg-in-hole experiment. The Virtual Tether is shown joining the gripper to the cyclindrical target tube.
Some of these limitations can be overcome by providing the operator with a graphical simulation of the robot operating in a modelled workspace. (In terms of the RV continuum this would be considered a "completely virtual environment".) With such a tool the operator can plan the robot's tasks by issuing high level "put-that-there" types of commands (Cannon & Leifer, 1994), while lower level task execution can be governed by local sensing and automatic control. For improved scene interpretation the operator could be provided with a variety of artificially generated cues about the relationships between objects in the workspace and the robot. Such a control scheme requires complete and up to date knowledge about the robot and its workspace, which can be expensive to acquire and maintain. Such models are typically created only for repeatedly-used sites and are unfeasible for one-time sites and one-time operations.
To overcome some of the difficulties of completely real and completely virtual environments, we have been developing a new application of Augmented Reality for telerobotics, which we call "Virtual Control". Using a stereoscopic video display, the operator views the robot workspace and receives continuously updated task information from the remote site. Because the robot itself remains invariant, it can be modelled beforehand. No model of the remote world is necessary. Initially, a 3D wireframe image of the robot is rendered stereoscopically and superimposed on the stereovideo display, conforming exactly in size and location to the video image of the real robot, thereby creating a highlighted outline around the real robot. An example of such a virtual robot (Rastogi, Milgram, Grodski & Drascic, 1993), as shown (monoscopically) in Fig. 5.


The virtual robot can also be rendered fully, as shown in Fig. 6. The operator controls the virtual robot, as it appears to interact with objects in the real environment. The operator can define a task off-line by directing the virtual robot to different locations in the workspace. If desired, paths and tragectories can also be displayed. When desired, the planned task can be executed by the real robot.
Another option of our system is for the operator to use the virtual pointer to define virtual planes, which can be used as a means of specifying constraints in the workspace to the robot control system. These boundaries can be displayed as opaque or transparent planes, if desired. For example, in Fig. 6, the four corners of the table on which the robot is operating can be selected interactively, creating a computer model of the surface of the table. This constraining boundary can be used to prevent collisions of the robot with the table. A related capability is for real objects to be interactively encapsulated within graphical wireframe boxes. These virtual encapsulators serve to prevent potential collisions of the real robot with the encapsulated real objects. With these Augmented Reality tools, the operator has the capability to create a partial world model of the robot workspace. With this approach we obtain many of the benefits of full graphical simulation, without the difficulties of acquiring a complete virtual world database.
A final class of applications involves using overlaid stereographic objects for the purpose of visualising how these modelled objects might appear were they to be really added to the scene. This is anticipated to be useful in architecture or interior design, for example, for visualising how changes or additions to existing rooms, buildings, neighbourhoods, or landscapes might appear. This same concept is illustrated in Fig. 7, where the potential for using Augmented Reality as a tool for choreography is presented. In this example, we assume that the choreographer does have a means of interactively controlling the positions and motions of individual dancer mannikens, but does not necessarily have a sufficiently tailed (world) model of a particular stage. The AR system can then be used to assist the choreographer in visualising how some dance combinations might appear, not within a crudely modelled virtual computer environment, but superimposed on a high quality stereoscopic video image of the actual stage.

Figure 7 Illustration of ARGOS as a tool for on-line virtual choreography
[1] DJ Cannon and LJ Leifer. "Point -and-direct robotics". Proc. International Conference on Intelligent Teleoperation, Greensboro, NC, 95-106, 1991.
[ 2] TP Caudell and DW Mizell. "Augmented reality: An application of heads-up display technology to manual manufacturing processes". Proc. IEEE Hawaii International Conf. on Systems Sciences, 1992.
[3] DB Diner and DH Fender. Human Engineering in Stereoscopic Viewing Devices. Plenum Publishing, 1993.
[4] D Drascic. "Skill acquisition and task performance in teleoperation using monoscopic and stereoscopic video Remote Viewing", Proc. Human Factors Society 35th Annual Mtg, San Francisco, 1367-71, 1991.
[5] D Drascic and P Milgram. "Positioning accuracy of a virtual stereographic pointer in a real stereoscopic video world", SPIE Vol 1457 - Stereoscopic Displays and Applications II, San Jose, Calif., Feb. 1991.
[6] D Drascic and JJ Grodski. "Defence teleoperation and stereoscopic video", SPIE Volume 1915 - Stereoscopic Displays and Applications IV, San Jose California, 58-69, Feb. 1993.
[7] D Drascic, JJ Grodski, P Milgram, K Ruffo, P Wong and S Zhai. "ARGOS: A display system for augmenting reality", ACM SIGGRAPH Tech Video Review, Vol 88: InterCHI `93 Conf on Human Factors in Computing Systems, (Abstract in Proceedings of InterCHI'93, p 521), Amsterdam, April 1993.
[8] EK Edwards, JP Rolland and KP Keller. "Video see-through design for merging of real and virtual environments". Proc. IEEE Virtual Reality International Symp. (VRAIS'93), Seattle, WA, 223-233, 1993.
[9] S Feiner, B MacIntyre and D Seligmann. "Knowledge-based augmented reality". Communications of the ACM, 36(7), 52-62, 1993.
[10] Imagina 95, Programme notes, Monte Carlo, Feb. 1-3, 1995.
[11] AL Janin, DW Mizell and TP Caudell. "Calibration of head-mounted displays for augmented reality". Proc. IEEE Virtual Reality International Symposium (VRAIS'93), Seattle, WA, 246-255, 1993.
[12] L Lipton and L Meyer. "A flicker-free field-sequential stereoscopic video system". J. Society of Motion Picture & TV Engineers (SMPTE),1047-1051, Nov. 1984.
[13] SG Maclean, M Rioux, F Blais, JJ Grodski, P Milgram, HFL Pinkney and BA Aikenhead. "Vision system deelopment in a space simulation laboratory. Proc. ISPRS: Close Range Photogrammetry & Machine Vision. 1990.
[14] DF McAllister (ed). Stereo Computer Graphics and Other True 3D Technologies. Princeton University Press, Princeton, NJ, 1993.
[15] P Milgram, D Drascic and JJ Grodski: "A virtual stereographic pointer for a real three dimensional video world", in Human-Computer Interaction -- INTERACT'90, D Diaper, D Gilmore, G Cockton & B Shackel (ed's), Elsevier , 695-700, 1990.
[16] P Milgram, D Drascic & JJ Grodski. "Enhancement of 3-D video displays by means of superimposed stereographics", Proc. Human Factors Soc. 35th Annual Meeting, San Francisco, 1457-1461, 1991.
[17] P Milgram, D Drascic and JJ Grodski. "Stereoscopic video-graphic coordinate specification system". US Patent No. 5,175,616; Dec. 29, 1992.
[18] P Milgram and F Kishino. "A taxonomy of mixed reality visual displays", IEICE (Institute of Electronics, Information and Communication Engineers) Transactions on Information and Systems, Special issue on Networked Reality, Dec. 1994.
[19] P Milgram, H Takemura, A Utsumi and F Kishino. "Augmented Reality: A class of displays on the reality-virtuality continuum". SPIE Vol. 2351-34, Telemanipulator and Telepresence Technologies, 1994.
[20] P Milgram, S Zhai, D Drascic & JJ Grodski. "Applications of augmented reality for human-robot com-munication", Proc. IROS'93: Int'l Conf. on Intelligent Robots and Systems, Yokohama, 1467-72, 1993.
[21] A Rastogi, P Milgram, JJ Grodski and D Drascic. "Virtual telerobotic control". Proc. DND Knowledge-based Systems and Robotics Workshop, Ottawa, Ontario, Canada, 1993.
[22] JP Rolland, RL Holloway and H Fuchs. "Comparison of optical and video see-through head-mounted displays". Proc. SPIE Vol. 2351-35, Telemanipulator and Telepresence Technologies, 1994.
[23] K Ruffo and P Milgram. "Effects of stereographic + stereovideo "tether" enhancement for a peg in hole task", Proc. 1992 IEEE Int'l Conf. on Systems Man and Cybernetics, 1992.
[24] S Tachi. "Virtual reality and tele-existence - Harmonious integration of synthesized worlds and the real world", Proc. Industrial Virtual Reality Conf. (IVR'93), Makuhari Messe, Japan, June 23-25, 1993.
[25] H Takemura and F Kishino. "Cooperative work environment using virtual workspace". Proc. Computer Supported Cooperative Work (CSCW'92), 226-232, 1992.
[26] S Zhai and P Milgram. "Human performance evaluation of manipulation schemes in virtual environments", VRAIS'93: 1st IEEE Virtual Reality Annual International Symposium, Seattle, Sept 1993.