An Effective Edge and Texture Based Approach towards Curved Videotext Detection and Extraction

An Effective Edge and Texture Based Approach towards Curved Videotext Detection and Extraction

P. Sudir (VTU, Belgaum, India) and M. Ravishankar (VVIT, Mysore, India)
Copyright: © 2015 |Pages: 29
DOI: 10.4018/IJSDA.2015070101
OnDemand PDF Download:
List Price: $37.50


In present day video text greatly helps video indexing and retrieval system as they often carry significant semantic information. Video text analysis is challenging due to varying background, multiple orientations and low contrast between text and non-text regions. Proposed approach explores a new framework for curved video text detection and recognition where from the observation that curve text regions can be well defined by edges size and uniform texture, Probable curved text edge detection is accomplished by processing wavelet sub bands followed by text localization by utilizing fast texture descriptor LU-transform. Binarization is achieved by maximal H-transform. A Connected Component filtering method followed by B-Spline curve fitting on centroid of each character vertically aligns each oriented character. The aligned text string is recognized by optical character recognition (OCR). Experiments on various curved video frames shows that proposed method is efficacious and robust in detecting and recognizing curved videotext.
Article Preview

1. Introduction

Today enormous amount of visual intelligence [Goldman, 2014; Hopkins, 2014; West, 2014; Ranker, 2014] are generated every second and are being stored in various digital formats like Images and Videos. Boom in technical advancements in Cameras in Mobiles [Chen, 2014; Li, 2014; Ojala, 2014], Televisions [Cesar, 2014, Trundle, 2014; Blanchfield, 2014], Internet [Hürst, 2015; Ahsan, 2014; Sankaran, 2014] add to the data repository of the multimedia information. According to statistics per day 4 billion videos are viewed in one of the popular video portal YouTube [Yu, 2015; Cheng, 2008] and about 60 million of video are uploaded every minute. With such an enormous amount of data being produced it calls for an urgent need to for efficient video indexing and retrieval algorithms [Saravanan, 2015; Souza, 2014; Chen, 2014]. Relevancy of the video content are better represented by effectively extracting descriptive features. Perceptual features like color, intensity, object shape, texture, etc. are widely used for image classification and video indexing [Mehmood, 2015; Gupta, 2015]. However, they cannot provide an exact description of the video content. Texts, Images, Speeches, Music and so on are the typical contents of a video. Of all these contents texts carry more significant semantic information, thus contributing greatly to video content understanding and analysis. With relatively reliable OCR technique, video text analysis is a promising way for video content processing. However, the detection and recognition of video text is a challenging task due to variations of video text in terms of size, color, font and alignment. Video text analysis [Nguyen, 2014; Lu, 2014] is a widely researched way of video content processing. High level semantic information about video content and distinctive visual characteristic is provided by Video text. Being highly compact and structured video text provides valuable indexing information such as scene location, speaker names, program introduction, sports scores, special announcements, dates and time. Two types of text in video are: (1) caption/graphics/artificial text [Lu, 2014; Castillo, 2013; Wang, 2012] which is artificially superimposed on the video at the time of editing, and (2) scene text [Zhu, 2015; Weinman, 2014] which naturally occurs in the field of view of the camera during video capture. Text detection and extraction in video has to deal with problems peculiar to video, namely, low contrast, low resolution, color bleeding, text movement and blurring.

There have been several approaches proposed in the last few years for the automatic extraction of text in digital videos. There are basically three major categories of text extraction methods, namely Connected component (CC) based [Zhang, 2008; Yi, 2007], Edge based [Huang, 2014; Kumar, 2012] and texture based [Shivakumara, 2014; Prakash, 2014; Shekar, 2014]. CC-based methods group small components into successively larger ones until all regions are identified in the image. Kim [Kim, 1996] proposed a method in which image is segmented using color clustering by a color histogram in the RGB space. Non-text components, such as long horizontal text lines and text segments. Kim [Kim, 2008] proposed a novel static text region detection algorithm for preventing the motion compensation error in FRC (frame rate conversion). Based on the concept that the color of the text is spatio-temporally consistent and the orientation of the text boundary is preserved in consecutive frame the algorithm can perfectly extract the static text region.

Jain [Jain, 1998] suggested to apply some connected component based approaches like bit dropping, color clustering, multivalued image decomposition and foreground image generation. 24 bit color image is reduced to 6 bit image and then quantized by a color clustering algorithm. Each clustered region then undergoes text localization process followed by merger into one output image. The algorithm works does not work when the color histogram is sparse.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 6: 4 Issues (2017)
Volume 5: 4 Issues (2016)
Volume 4: 4 Issues (2015)
Volume 3: 4 Issues (2014)
Volume 2: 4 Issues (2013)
Volume 1: 4 Issues (2012)
View Complete Journal Contents Listing