Semantic Analysis of Videos for Tags Prediction and Segmentation

Semantic Analysis of Videos for Tags Prediction and Segmentation

Umair Ali Khan (Quaid-e-Awam University of Engineering, Science and Technology, Pakistan)
DOI: 10.4018/978-1-7998-2803-7.ch014
OnDemand PDF Download:
No Current Special Offers


Due to the growing volume of multimedia data generated these days, it has become extremely difficult to manually analyze the data and extract useful information from it. Especially the analysis of videos pertaining to different fields such as surveillance, videos, social media, education, etc. cannot be done efficiently by manual methods. This requires automatic analysis algorithms that can intelligently analyze videos and derive salient information from them. This information can be useful in a number of tasks such as video segmentation, incident detection, anomaly detection, query-based video retrieval, and content censorship. This chapter provides a detailed review of the techniques proposed for video analysis to provide a compact set of video tags. This chapter considers it a joint tag-segmentation problem and critically analyzes the relevant literature to highlight their respective pros and cons. At the end, potential research areas in this domain and suggestions for improvement are discussed.
Chapter Preview

Video Tagging & Segmentation

The problem of video tagging and segmentation has not been addressed in combination, though its applications are ostensible. The existing approaches of video segmentation operate on surficial level and/or are focused on specific videos. As the related work in this domain in the coming sections will show that the approaches dependent on the hand-crafted image features do not suffice to deliver an adequate level of accuracy in this problem, an intelligent algorithm for this semantic analysis is required. At the same time, the problem of video segmentation in the context of video tagging needs to be re-formulated.

Applying the traditional approaches of object detection on the individual video frames will be inefficient as we will end up with low-level information (e.g., the localization of objects in the individual frames instead of the contextual relationship among them). In addition, processing each video frame will also introduce computational inefficiency and result in redundant information. This problem can only be addressed by a machine learning algorithm which can be trained to predict the high-level, representative features of video scenes.

Complete Chapter List

Search this Book: