Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Tifinaghe Document Converter

Mehdi Boutaounte, Driss Naji, M. Fakir, B. Bouikhalene, A. Merbouha

Source Title: International Journal of Computer Vision and Image Processing (IJCVIP) 3(3)

DOI: 10.4018/ijcvip.2013070104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Recognition of documents has become a basic necessity for two reasons: first to secure the existing data in paper because of the limited of their lives duration and the high rate of destruction insects, fire or humidity secondly to reduce space of archives. The aim of this work is to realize a converter that detects images and text within a document image taken by a scanner and applying a system for the recognition of characters (OCR) in order to obtain a web page (HTML extension) ready to be used in the same computer or on the web hosts to be accessible by everyone.

Article Preview

Top

1. Introduction

The problem in the creation of a converter from image to document can be divided into two parts: first the Optical character recognition specially for Tifinagh characters in which we found some works using Neural networks (R.EL Ayachi et al., 2011) or other methods as Horizontal and Vertical Centerline of Character (Y.Es Saady et al., 2011)…etc. Second part the analyzing of document layout the physical structure (K. Hadjar et al., 2004) in the literature methods can be classed into two categories the top-down methods and bottom-up methods (S. N. Srihari et al., 1986)

Most work are reserved to the converter image files to doc or PDF extension which poses a large problem in conserving the original structure of the document (the positioning of images and text blocks inside) also the work reserve the transformation to an HTML page, are in the majority of these studies don’t support the pictures. For those who do not ignores the images and they cannot recognize a Tifinagh letters

The first converter transform a document image into an HTML page, but it acts in text blocks not in non-text blocks and Tifinagh characters did not taking into account. The second type of conversion software, whether they directly integrate the image in an HTML page or generate a sequence of character with different colors that looks like the original image and the last type of converter, these converters are not free of charge and give good results in terms of conservation and conservation of documents structure, but they also not support Tifinagh characters

To keep up with the evolution of technology in our lives and in order to create intelligent systems which spread our needs we try to describe in this paper a system, that convert a document image taken with a scanner into a HTML page ready to be used in a web site. Figure 1 illustrate the flow-chart of the converter Tifinagh document developed, that start by applying a preprocessing for the acquire, then segment the image and save the coordinates of each area. This coordinates will be used after stage of areas classification into text and non-text, and applying a OCR system on the text regions to create the structure of the page.

Figure 1.

Flow-chart of convertion system

This paper is organized as follows: the first section describes the method used to analyze the physical structure of the document, in order to extract homogeneous components from the original image (text, title, image…etc) which will be used in the next section. In the second section we classify the components into text and non-text (images, graphic…etc), the text will be undergo into next processing, segmentation and recognition of characters using the neural network. The last section is reserved for the creation of the HTML page code.

Top

2. Preprocessing

The acquired image is always accompanied by parasites: noise, tilt ... etc. Preprocessing applied in this study includes in this section is described as follows:

Binarisation is an operation that produces two classes of pixels represented by black pixels and white pixels. The method selected is the one adopted by “OTSU” (N. Thi Oanh et al., 2004) based on the calculation of an automatic threshold by calculating the histogram given by Equation (1).