Improving Live Augmented Reality With Neural Configuration Adaptation

Improving Live Augmented Reality With Neural Configuration Adaptation

Copyright: © 2024 |Pages: 28
DOI: 10.4018/979-8-3693-0230-9.ch007
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Instead of relying on remote clouds, today's augmented reality (AR) applications send videos to nearby edge servers for analysis to optimize user's quality of experience (QoE). Lots of studies have been conducted to help adaptively choose the best video configuration, e.g., resolution and frame per second (fps). However, prior works only consider network bandwidth and ignores the video content itself. In this chapter, the authors design Cuttlefish, a system that generates video configuration decisions using reinforcement learning (RL) based on network condition as well as the video content. Cuttlefish does not rely on any pre-programmed models or specific assumptions on the environments. Instead, it learns to make configuration decisions solely through observations of the resulting performance of historical decisions. Cuttlefish automatically learns the adaptive configuration policy for diverse AR video streams and obtains a gratifying QoE. The experimental results show that Cuttlefish achieves a 18.4%-25.8% higher QoE than the other prior designs.
Chapter Preview
Top

1. Introduction

Augmented Reality (AR) is a technology that allows virtual objects to be overlaid on the real world. With the increasing demand for intelligent mobile devices, AR is becoming more popular among users with diverse requirements. According to work (Azuma et al., 2001), an AR system should have the following attributes: the ability to combine real and virtual objects in a real environment, to geometrically align virtual objects with real ones in the real world, and to run interactively and in real time. AR technology has been applied to a wide range of fields, including tourism, entertainment, marketing, surgery, logistics, manufacturing, maintenance, and others (Westerfield et al., 2015; Akçayır & Akçayır, 2017). Report (Virtual Reality and Augmented Reality Device Sales to Hit 99 Million Devices in 2021, 2017) forecasts that the shipment of AR/VR devices will reach 99 million in 2021, and the market will reach 108 billion dollars by then (The reality of VR/AR growth, 2017). Existing mobile AR systems, such as ARKit, Microsoft HoloLens (Microsoft HoloLens, 2020), and the announced Magic Leap One (Magic Leap One, 2020), facilitate the interaction between humans and the virtual world.

With the emergence of Mobile Edge Computing (MEC) (Shi et al., 2016; Satyanarayanan, 2017; Roman et al., 2018), object detection in AR applications has shifted from remote clouds to edge servers, benefiting from the reduced latency and increased reliability. In this approach, the AR device encodes and uploads the video to the edge server for detection and rendering, before downloading the processed video. State-of-the-art object detection algorithms, such as YOLO (Redmon et al., 2016; Redmon & Farhadi, 2017; Farhadi & Redmon, 2018), are utilized by the AR system on the edge, which adopts a single-stage detector strategy for regression-based detection of the boundary coordinates and corresponding class probability.

Current AR systems are not equipped to handle the performance gap caused by several factors. Firstly, the fluctuating network throughout over time causes inconsistencies in performance. Secondly, Quality of Experience (QoE) requirements, such as accuracy and latency of detecting, and fluency of video play, often conflict with each other. Finally, the time-shifted moving velocities of target objects pose a challenge. To illustrate the impact of AR video configuration on user QoE, we take fps and resolution selection as an example. We divide the total time of interest into multiple slots of equal length and define fps as the number of frames per slot. Higher resolution images, divided into multiple grid cells in YOLOv3, improve detecting accuracy but cause longer transmission delays. Similarly, videos encoded with a high fps lead to better fluency but cause larger uploading and detecting delays. Encoding videos with an exorbitant configuration may lead to a deteriorating QoE and degraded network status, but assigning a poor configuration abates the network utilization as well as QoE. Moving trends of objects in terms of moving velocity and direction are also unknown, which presents additional challenges. High-speed objects require a high fps to guarantee fluency, but a much lower fps suffices if the objects are almost static. Thus, the video configuration must match the time-varying network bandwidth and the moving velocities of objects in the videos. These challenges will be described in greater detail in next section.

We propose a novel approach for adaptive configuration of AR video that does not rely on detailed analytical performance modeling but instead embraces inference. Our approach is inspired by recent successes in deep reinforcement learning (DRL) (Mnih et al., 2015, 2016; Henderson et al., 2018) in diverse fields such as the Alpha-go game (Silver et al., 2017), video streaming (Mao et al., 2017), and job scheduling (Mao, Schwarzkopf, et al., 2019). To this end, we introduce Cuttlefish, an intelligent encoder that employs a learning-based approach to select the optimal video configuration without relying on any pre-programmed models or specific assumptions.

Complete Chapter List

Search this Book:
Reset