Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Four-Layer Grapheme Model for Computational Paleography

Raymond E.I. Pardede, Loránd L. Tóth, György A. Jeney, Ferenc Kovács, Gábor Hosszú

Source Title: Journal of Information Technology Research (JITR) 9(4)

DOI: 10.4018/JITR.2016100105

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This article proposes a novel mathematical model of logical relationship among glyphs belonging to the same grapheme. Its research belongs to the computational paleography that is a field in the applied computer science. The proposed grapheme model is presented in four logical layers from bottom to up namely as Topology, Visual Identity, Phonetic, and Semantic Layer. In the Topology Layer, a unique glyph is defined by a set of topological properties. When trying to describe the logical relation of various glyphs, their topological properties must be examined in a higher layer framework so called Visual Identity Layer. In that layer, the glyphs of a single grapheme share some topological attributes in common. These common topological attributes form a main identity of a grapheme, which is called Common Identity template that is obtained by means of Supervised Learning method. The Phonetic Layer gives the sound values associated to the grapheme, and the Semantic Layer describes the usage of the grapheme in texts. Some potential implementations of the grapheme model are also presented.

Article Preview

Top

Introduction

A significant research field of the human-computer interface development is the Natural Language Processing (NLP) that deals with processing the input given in oral or written forms (Kovács 2012). In order to handle the written contents, various grapheme processing methods are developed, e.g. the Optical Character Recognition (OCR) or the Automatic Speech Recognition (ASR). These approaches need deep analysis of the writing systems, especially the graphemes. Our research work is focused on general modeling of graphemes.

A writing system or in other word a script can be associated with different orthographies. As an example, the Latin script has several associated orthographies, such as French, German, Indonesian, English, Hungarian, and other orthographies. In writing system, grapheme is defined as the smallest semantically distinguishing fundamental unit, or in other word a minimally distinctive unit in a writing system. Graphemes may be in a form of alphabetic letters, ligatures, numerical digits, or punctuation marks. Also in writing system, glyph in general refers to a unique shape (an image) that represents a single grapheme and contains topological information about the shape of the grapheme. However in several cases, different glyphs may represent exactly the same abstract grapheme (Hosszú 2014). On the other hand, character refers to the encoded extension of a grapheme. It is noteworthy that the use of the terms grapheme and character is not consequent in the scientific literature.

Historical script relics have one or more inscriptions. An inscription is composed by symbols, which are the smallest individual units of an inscription from visual perspective. Typically, a symbol is materialization of a certain grapheme; in other words, the grapheme is the abstraction of a symbol and in vice versa, a symbol is the realization of a glyph of a certain grapheme. It is noteworthy that Kohrt (1986) and August (1986) use the term graph essentially in the same meaning as we use the term symbol.

The studies related to glyphs of particular script are special and challenging subject for pattern recognition. This subject may include but not limited to deciphering encrypted glyphs that are discovered through excavation, recognizing patterns in glyphs transformation, and so on. The effective software may assist the researchers to accelerate the research time and to provide more accurate result through the automated process. Producing such software surely needs a support of a solid mathematical model. Therefore, our main objective is to develop such descriptive mathematical model as a useful framework for building a tool, which can help in supporting the deciphering the historical or hard-to-read inscriptions. It is noteworthy that the appropriate software-based solution for analyzing historical inscriptions needs normalized data models, powerful parallel processing databases and parallel-computing approach (Willson 2011).

In this article, glyph relations and identification are modeled by using layer-based approach, which from bottom to up consists of the Topology Layer, the Visual Identity Layer, the Phonetic Layer, and the Semantic Layer. The relations of the topological, phonetic and semantic components of our four-layer grapheme model have special significance in case of ASR when there is a need to select the appropriate graphemes or words among homonyms. For this problem, typically Hidden Markov Models (HMMs) are employed (Segi et al. 2014). Kovács developed a morpheme analyzer for the NLP engine (2012). The phonetic layer of the grapheme model is important e.g. in the script deciphering (Hosszú 2014) and in ASR. In such systems, the pronunciation or phonetic dictionary is significant component that needs an appropriate phonetic model of the graphemes, which are elementary units of any written content (Ali et al. 2009). The semantic layer of the grapheme model is related to several semantic-based research works. One of them is the karaka relations in the Hindi language, which represent syntactico-semantic or semantico-syntactic relationship between various elements of the Hindi sentences. The computational identification of the sense of a word in a certain context called Word Sense Disambiguation (WSD) as part of the natural language processing was investigated and two supervised WSD algorithms were developed (Singh & Siddiqui 2015). The developed four-layer grapheme model including its principles and implementation examples are described further in the following sections.

Complete Article List

Search this Journal:

Reset

Volume 16: 1 Issue (2024): Forthcoming, Available for Pre-Order

Volume 15: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 14: 4 Issues (2021)

Volume 13: 4 Issues (2020)

Volume 12: 4 Issues (2019)

Volume 11: 4 Issues (2018)

Volume 10: 4 Issues (2017)

Volume 9: 4 Issues (2016)

Volume 8: 4 Issues (2015)

Volume 7: 4 Issues (2014)

Volume 6: 4 Issues (2013)

Volume 5: 4 Issues (2012)

Volume 4: 4 Issues (2011)

Volume 3: 4 Issues (2010)

Volume 2: 4 Issues (2009)

Volume 1: 4 Issues (2008)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Four-Layer Grapheme Model for Computational Paleography

Abstract

Introduction

Complete Article List