Increasing power of mobile devices and smartphones has significantly contributed to the progress of augmented reality (AR) applications becoming mobile. The first ever mobile AR system, called The Columbia Touring Machine (Feiner, MacIntyre, Hollerer, & Webster, 1997), had to use a wearable laptop together with GPS and Orientation sensors housed in a backpack. In comparison, current day mobile and hand-held devices such as smartphones and tablet PCs contain similar sensors and computing power and are capable of much the same, if not more.
Mobile Augmented Reality
In general, mobile AR applications can be classified as being applicable globally at any location in the world, or being local to the current position of the user within a local coordinate frame. Globally relevant (applicable) AR applications usually depend on GPS based location and orientation together with a known 3D model of the world around them to determine what the user is looking at. Many mobile AR systems ignore the three dimensional nature of the world around the user while providing information at the horizon (e.g., www.layar.com/). In such applications, there is no precise registration of the user’s view and the surrounding 3D world. As a consequence it is impossible to augment the world with objects that are scaled in proportion to the user's position and context in their surroundings.
On the other hand most local AR applications make use of 2D markers (M. Fiala, 2005; Hirokazu Kato & Billinghurst, 1999; Wagner, Reitmayr, Mulloni, Drummond, & Schmalstieg, 2008) to visually localize the user in a local coordinate frame rather than GPS. The marker thus provides the required registration of the captured image (view) and the surrounding 3D world. As a consequence, a much better localization (pose) of the viewer is obtained together with potentially the exact scale and size of objects in the world. Recent improvements in natural feature detection, tracking and recognition have also lead to natural features being used instead of the traditional marker to estimate pose and motion of a mobile device (see (Bay, Ess, Tuytelaars, & Van Gool, 2008; Kurz & Ben Himane, 2011; Wagner et al., 2008)).
The need for determining location of the user/viewer is usually a prerequisite to be able to correctly augment the viewed world with additional virtual information or content. There is a wide range of domains to which augmented reality is effectively used including entertainment, education/training, interaction with virtual objects (see (Chang, Koh, & Been-Lirn Duh, 2011a; Olsson & Salo, 2011; de Sa, Churchill, & Isbister, 2011). In all cases, the content needs to correctly be augmented onto the video captured by the camera and displayed to the user coherent with the perspective of the user. For this purpose, using a marker is typically a standard robust approach to determine the correct perspective of the user.
However, AR goes beyond simply augmenting the visual experience of the real world with virtual content. Many AR applications require interaction of the user with the virtual content (Chang et al., 2011a; Chang, Koh, & Been-Lirn Duh, 2011b; Gjosaeter, 2009; H. Kato, Billinghurst, Poupyrev, Imamoto, & Tachibana, n.d.; Olsson & Salo, 2011; Papagiannakis, Singh, & Magnenat-thalmann, 2008). In this context, the metaphors used for interaction, the enabling technologies and display techniques, all play a key role in overall user experience. The variation of interaction in the mobile AR context is large, and beyond the scope of this paper (see (Kolsch, Bane, Hollerer, & Turk, 2006; Papagiannakis et al., 2008) for an overview of interaction with AR systems and enabling technologies). We shall instead focus on AR with hand-held devices such as mobile phones and particularly those relevant to the application of modeling and manipulation of 3D objects using AR for virtual furnishing.