Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Word-Level Script Identification Using Texture Based Features

Pawan Kumar Singh, Ram Sarkar, Mita Nasipuri

Source Title: International Journal of System Dynamics Applications (IJSDA) 4(2)

DOI: 10.4018/ijsda.2015040105

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Script identification is an appealing research interest in the field of document image analysis during the last few decades. The accurate recognition of the script is paramount to many post-processing steps such as automated document sorting, machine translation and searching of text written in a particular script in multilingual environment. For automatic processing of such documents through Optical Character Recognition (OCR) software, it is necessary to identify different script words of the documents before feeding them to the OCR of individual scripts. In this paper, a robust word-level handwritten script identification technique has been proposed using texture based features to identify the words written in any of the seven popular scripts namely, Bangla, Devanagari, Gurumukhi, Malayalam, Oriya, Telugu, and Roman. The texture based features comprise of a combination of Histograms of Oriented Gradients (HOG) and Moment invariants. The technique has been tested on 7000 handwritten text words in which each script contributes 1000 words. Based on the identification accuracies and statistical significance testing of seven well-known classifiers, Multi-Layer Perceptron (MLP) has been chosen as the final classifier which is then tested comprehensively using different folds and with different epoch sizes. The overall accuracy of the system is found to be 94.7% using 5-fold cross validation scheme, which is quite impressive considering the complexities and shape variations of the said scripts. This is an extended version of the paper described in (Singh et al., 2014).

Article Preview

Top

1. Introduction

Optical Character Recognition (OCR) is the conversion of scanned images of handwritten, typewritten or printed text into machine-encoded format. In any multilingual and multi-script world, OCR systems need to be capable of recognizing characters irrespective of the script in which they are written. In general, recognition of characters written in different scripts by a single OCR module is next to impossible. This is because of features which are necessary for character recognition depend on the structural properties, style and nature of writing which mainly varies from one script to another. For example, features used for identification of Roman script might not be useful for identifying other scripts. This could possibly be solved by using a bank of OCRs (different OCRs for different scripts) corresponding to different scripts. The text in an input document can then be recognized reliably by selecting the appropriate OCR system from the OCR repository. However, it requires a priori knowledge of the script in which the document is written. But, manual identification of the documents’ scripts may be monotonous and time consuming (P. K. Singh et al., 2014).

India is a multilingual country where people residing at different sections use different languages/scripts. However, Roman script is frequently used in conjunction with different Indic scripts in their daily life/regular work. Therefore, in this multilingual environment, to develop a successful OCR system for any script, separation or identification of different scripts is of utmost important. In an automated multilingual environment, such document processing systems relying on OCR would clearly need human intervention to select the appropriate OCR package, which is certainly inefficient, undesirable and impractical. It is difficult to feed a document as an input to OCR unless the script/language type of the text in it is pre-determined since a single OCR cannot recognize multiple scripts. The solution of this problem is to develop an automatic script identification system. Script identification facilitates many important applications such as sorting the document images, selecting appropriate script specific text understanding system and searching online archives of document images containing a particular script, etc.

Difficulties inherent in recognizing handwritten text due to the large variations in handwriting styles pose huge challenges. Due to varied writing styles, resemblances among different scripts are more feasible for handwritten documents rather than the printed ones. Cultural/individual differences, and even differences in the way that people write at different times due to diversified cultures across the globe, enlarge the inventory of possible word shapes seen in handwritten documents. Also, problems typically pertaining in preprocessing, such as ruling lines, word fragmentation, noise, skew, etc. are common in handwritten documents. Since, in a multilingual document, the script may vary from word to word, and not from character to character, so the identification of the scripts at word-level are more preferable than at character or line level. Performing script identification at word-level is much more challenging than at text-line and page-level because the information gathered from few characters present in a single word may not be adequate for the script recognition purpose.

The rest of the paper is organized as follows: a brief survey related to script identification is described in Section 2 and some basic information related to scripts used in the present work is illustrated in Section 3. The proposed technique based on texture based features is presented in Section 4 whereas Section 5 describes the selection of some well-known classifiers used in the present work. The experimental results and discussions are given in Section 6. Section 7 concludes the work and lists future directions of the work.

Complete Article List

Search this Journal:

Reset

Volume 12: 1 Issue (2024): Forthcoming, Available for Pre-Order

Volume 11: 5 Issues (2022)

Volume 10: 4 Issues (2021)

Volume 9: 4 Issues (2020)

Volume 8: 4 Issues (2019)

Volume 7: 4 Issues (2018)

Volume 6: 4 Issues (2017)

Volume 5: 4 Issues (2016)

Volume 4: 4 Issues (2015)

Volume 3: 4 Issues (2014)

Volume 2: 4 Issues (2013)

Volume 1: 4 Issues (2012)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Word-Level Script Identification Using Texture Based Features

Abstract

1. Introduction

Complete Article List