A Survey of Mobile Vision Recognition Applications

A Survey of Mobile Vision Recognition Applications

Andrew Molineux (Lancaster University, UK) and Keith Cheverst (Lancaster University, UK)
DOI: 10.4018/978-1-4666-0954-9.ch014


In recent years, vision recognition applications have made the transition from desktop computers to mobile phones. This has allowed a new range of mobile interactions and applications to be realised. However, this shift has unearthed new issues in mobile hardware, interactions and usability. As such the authors present a survey into mobile vision recognition, outlining a number of academic and commercial applications, analysing what tasks they are able to perform and how they achieve them. The authors conclude with a discussion on the issues and trends found in the survey.
Chapter Preview


Mobile phones are becoming an ever more ubiquitous resource in modern society. This, paired with the increasing range of mobile resources (e.g., GPS, digital cameras, etc.) has allowed for different application domains to evolve, one of which is mobile vision recognition. In this chapter we survey a number of academic and commercial applications that use mobile vision recognition, analysing what tasks they perform, how they achievethem,and compare them according to five criteria which are: system architecture, target range, restricted domain, interaction style and modification required to the environment. We focus largely on mobile phone-based vision recognition applications although an early PDA and tablet PC based system are also described as they represent key work in this field.

Vision recognition uses statistical methods to disentangle image data, using models constructed with the aid of geometry, physics and learning theory (Forsyth & Ponce, 2002).Italso has the ability to extend beyond the visible spectrum of humans, and may include other forms of vision such as infrared (Cao & Balakrishnan, 2006), heat, x-ray and radar. Some examples of vision recognition applications include object recognition, motion detection, optical character recognition (OCR) and barcode scanning. Vision recognition is also often used within augmented reality applications. One example of an augmented reality interaction style that we will be exploring in our survey is the magic lens metaphor. First envisaged in 1993 (Bier, Stone, Pier, Buxton, & DeRose, 1993) the magic lens metaphor was designed to work using small lenses placed over objects. The concept used these lenses to reveal hidden information, enhance data of interest or suppress distracting information. This metaphor was later realised on camera phones (Schöning, Krüger, & Müller, 2006), where the screen and camera acted as a magic lens by augmenting content beneath the phone that was visible to the phone’s camera.

Historically, vision recognition applications were most commonly found on desktop computers due to their high processing overhead. This limited the domain in which applications of this nature could be used to a desktop environment, with some impractical exceptions (Höllerer & Feiner, 2004). However, more recently vision recognition applications have made the transition onto mobile phones, which is the focus of this chapter. This change can mostly be attributed to increased processor speeds, and improved mobile resources.One significant development is the introduction of cameras to mobile phones. It is estimated that there are currently 4.6 billion mobile phones in use worldwide, and of those more than one billion are equipped with cameras (“Camera-phones dotty but dashing,” 2010). The first commercial camera phone, the Sharp J-SH04, was released in 2000 (Sharp, 2010) and contained a 110,000-pixel CMOS sensor. Since then, major strides have been made in the field of camera phones including: xenon/LED flashes, front facing cameras and a vast improvement in image quality such as the Nokia N8 (Nokia, 2010) which has a 12 megapixel camera with HD video recording. However, mobile phone hardware is far from ideally setup for vision recognition. At the time of writing (2011) there is still a lack of mobile devices with hardware support for floating point mathematics, which greatly hinders the performance of vision recognition algorithms. Recent trends in handset hardware have shown improvement in this area (Yoskowitz, 2010).

Vision recognition allows applications to build a picture of the visual setting surrounding them. When an application/device can read information about the surrounding environment and act upon this information, it is often said that it is context-aware. In a standard usecase, a mobile phone could be set up to be contextually aware of a number of environmental variables including location, light levels, sound levels and even the phones poise/momentum. Through placing a normal camera phone within a room, an application has the potential to acquire a range of visual contextual elements such as who is in the room (face recognition), what items are in the room (object recognition) or even the position of the device within the room (using stereo vision techniques). For example the Nexus S (Google, 2011a) has an ambient light sensor that allows the screen’s backlight to automatically adapt to varying lighting conditions.

Complete Chapter List

Search this Book: