Mobile Applications for Automatic Object Recognition

Mobile Applications for Automatic Object Recognition

Danilo Avola (University of Udine, Italy), Gian Luca Foresti (Department of Mathematics and Computer Science, University of Udine, Italy), Claudio Piciarelli (University of Udine, Italy), Marco Vernier (University of Udine, Italy) and Luigi Cinque (Sapienza University, Italy)
Copyright: © 2018 |Pages: 12
DOI: 10.4018/978-1-5225-2255-3.ch538
OnDemand PDF Download:
List Price: $37.50


In recent years, the technological improvements of mobile devices in terms of computational capacity, embedded sensors, natural interaction and high-speed connection are enabling an ever-increasing number of designers to develop advanced mobile applications to be used in everyday life. Among these, the vision based applications for the Automatic Object Recognition (AOR) play a key role since enable users to interact with the world around them in innovative way that makes more productive and profitable their entertainment, learning and working activities. The proposed chapter is divided into four sections. The first one, Background, explores the most recent works in AOR mobile applications highlighting the feature extraction processes and the implemented classifiers. The second one, MV Development Technologies, provides an overview of the current frameworks used to support the mobile AOR applications. The third one, Future Research Trends, discusses the aims of the next generation of AOR applications. Finally, Conclusion, concludes the chapter.
Chapter Preview


In the last decade, mobile devices have had ongoing and growing technological advances. Currently these devices, even those of low cost, have a set of hardware features that make them comparable with a wide range of desktop processing units. In fact, these mobile devices, with particular reference to those of the latest generation, present a set of significant improvements, including:

  • Multi-Core Processor (MCP): A single processor that contains several cores. This technology typical of common processing units (e.g., workstations, servers) allows mobile devices to rapidly process a large amount of data also improving the performance of running multiple applications.

  • Advanced Storage Capacity (ASC): A large amount of internal memory and the possibility to adopt external memories (e.g., compact flash, memory stick). This technology allows mobile devices to support both complex data and bulky applications.

  • Mobile Sensing (MS): The set of sensors embedded in a mobile device. This technology allows mobile devices to be equipped with a wide range of sensors, including: Red-Green-Blue (RGB) sensor (i.e., image camera), Global Position System (GPS) receiver (i.e., localization system), accelerometer sensor (i.e., proper acceleration “g-force”), gyroscope sensor (i.e., orientation system) and others. These sensors are used to acquire any type of information from both the external world (e.g., images, temperature, pressure) and user’s behavior (e.g., speed, locations, actions).

  • Natural User Interfaces (NUIs): The set of natural interfaces to favor the Human-Computer Interaction (HCI) between users and mobile devices. This technology allows users to adopt human-oriented interfaces (e.g., speech recognition, touch-screen interaction) to manage data and devices.

  • Fast Internet Connections (FICs): The set of information and communication technologies (ICTs) that allow devices to access to any type of resource, including: World Wide Web (WWW), cloud computing, dedicated networks and others.

Key Terms in this Chapter

Natural User Interface (NUI): The field of the computer science that deals with the human-oriented interfaces. The term NUI highlights that these interfaces have to be invisible to the users. The NUI interfaces are designed to detect the natural actions of the human beings (e.g., hand movements, body poses, body gestures) and use them to interact with any kind of system.

Object Classification (OC): The field of the computer science that deals with the classification of objects. The classification is supported by a specific algorithm named classifier which can be implemented by using different theoretic principles, including: machine learning, statistical approaches, mathematical approaches, geometrical approaches and exhaustive computations.

RGB Camera (RGB-Cam): A camera equipped with a standard CMOS sensor through which the colored images of persons and objects are acquired. The acquisition of static photos is usually expressed in megapixels (e.g., 12MP, 16MP) that define the amount of pixels (i.e., length x height) that compose a photo. While, the acquisition of videos is usually expressed with explicative terms such as Full HD (i.e., 1080 x 1920 pixels with 30 frame per second) or Ultra HD (i.e., 3840 x 2160 pixels with 30/60 frame per second).

IR Camera (IR-Cam): A camera equipped with an infrared (IR) technology (i.e., IR projector and IR sensor) through which the depth maps of persons and objects are built. The maps of these elements show the distance between them and the camera thus making the recognition process simpler with respect to the traditional approaches based on RGB cameras.

Object Recognition (OR): The field of the computer science that deals with the recognition of objects. It consists of a set algorithms, including: feature extraction, feature matching and object classification. OR regards both images and video frames.

Feature Matching (FM): The process for the comparison of two sets of keypoints coming from two different images or video frames. The process compares the description of each keypoint of the first image (or frame) with each keypoint of the second image (or frame). A rank algorithm establishes a list of the best matching between them. This process is used to check the bi-univocal correspondence between the keypoints of two similar or overlapped images (or frames).

Human-Computer Interaction (HCI): The field of the computer science that deals with the interaction between users and computers. The aim of the HCI is the definition of interactive interfaces by which to guide any system (desktop or mobile) in respect to the usability principles.

Feature Extraction (FE): The process for the extraction and description of salient points from images and videos. Any image, or frame of a video, can be represented by a set of keypoints (i.e., features) whose aim is to highlight and synthesize shapes, objects and properties contained within them. These keypoints are extracted from an image, by a detector algorithm, and, subsequently, they are exhaustively described, by a descriptor algorithm.

Mobile Vision (MV): The field of the computer science that deals with the analysis and understanding of images and videos focused on mobile devices (e.g., smartphones, tablets).

Global Position System (GPS): A radio navigation system that provides accurate positional data of an object equipped with a GPS receiver. The positional data is referred to the surface of the earth and include latitude, longitude and altitude along with the time. This basic information can be processed by the receiver to derive other dynamic data, such as: speed and acceleration. The working of the GPS system is due to a network of satellites deployed in the space that continually emit a measurable signal.

Complete Chapter List

Search this Book: