Novel Algorithmic Approach to Deciphering Rovash Inscriptions

Novel Algorithmic Approach to Deciphering Rovash Inscriptions

Loránd Lehel Tóth, Raymond Pardede, Gábor Hosszú
Copyright: © 2015 |Pages: 12
DOI: 10.4018/978-1-4666-5888-2.ch711
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The article presents a method to decipher Rovash inscriptions made by the Szekelys in the 15th-18th centuries. The difficulty of the deciphering work is that a large portion of the Rovash inscriptions contains incomplete words, calligraphic glyphs or grapheme errors. Based on the topological parameters of the undeciphered symbols registered in the database, the presented novel algorithm estimates the meaning of the inscriptions by the matching accuracies of the recognized graphemes and gives a statistical probability for deciphering. The developed algorithm was implemented in software, which also contains a built-in dictionary. Based on the dictionary, the novel method takes into account the context in identifying the meaning of the inscription. The proposed algorithm offers one or more words in a different random values as a result, from which users can select the relevant one. The article also presents experimental results, which demonstrate the efficiency of method.
Chapter Preview
Top

Introduction

In the Middle Ages, the Szekely-Hungarian Rovash script was carved in stone, paper, wood, and the walls of buildings, which were exposed to the weather for centuries. As a result, the paleographers usually encounter the problem of interpreting the carved text because of the incomplete graphemes or grapheme errors. The Rovash (pronounced “rove-ash,” other spelling: Rovas) is a script family, which was used by nations in the Carpathian Basin and in the Eurasian Steppe. One member of this script family is the Szekely-Hungarian Rovash, which was used and gradually developed by the Szekelys in Szekelyland (present day Romania) (Hosszú, 2013).

The article introduces the methods used in the computerized paleography for identifying unknown inscription. The script identification differs from the Optical Character Recognition (OCR), since in the OCR, the normalized glyph of each grapheme belonging to a certain script are known (Szűcs, 2009). Therefore, the task of the OCR is to convert the signs in a certain inscription into the well-known normalized glyphs of an alphabet. In other words, the OCR focuses on the automatic grapheme extraction from a certain inscription. Oppositely, in the computerized paleography and more specifically, in the script identification the right interpretation of the signs in an inscription is the main problem to be solved. In several cases, the shape of the signs in an inscription can be copied easily to a sheet of paper; however, the inscription remains undeciphered. Its reasons can be the following: (1) the script used for making the inscription is unknown, (2) even if the script is known, but the normalized glyphs of the script is unknown (which can be specific for a certain age and a certain area), or (3) the language of the inscription is uncovered. Therefore, the script identification focuses on these three problems. Naturally, in several practical cases, the OCR and the script identification can be overlapped.

After shortly presenting the script identification methods, the article describes a novel script identification algorithm and its implementation called SID software. This algorithm is general purposed; however, it was exclusively applied for the Szekely-Hungarian Rovash script (Hosszú, 2011). The results obtained from the SID are also presented. Finally, the conclusions summarize the new method and the experimental results.

Key Terms in this Chapter

Standard Dictionary: The Hungarian words stored with their pronunciation by representing it with the graphemes of the International Phonetic Alphabet (IPA). The database also contains the period of the use of each word in the Hungarian language.

Grapheme: an abstraction, which has some properties, including glyphs (used shapes of the grapheme), sound values, a grapheme name, period of the use of the grapheme, etc. A grapheme belongs to a certain script. One grapheme can have more than one glyph; however, there is one glyph, which is taken as normalized glyph. Typically, in the alphabet listing, the normalized glyph is presented.

Normalized Glyph Storage: The Topological Parameter values of the normalized glyph of each Szekely-Hungarian Rovash graphemes. It also contains the name of the graphemes.

Script: The system of representing human language(s). The script is composed of graphemes.

IPA Storage: a database that contains the sound values of each grapheme. The sound values are represented by IPA signs.

Unknown Glyph Storage: The Topological Parameters of the glyphs of the symbols in unknown inscriptions. It also contains the name of the glyphs, which refers to the relic, from which the glyph was extracted.

Character: an additional property to the grapheme; namely, the character has code point, which is used in the computerized presentation.

Symbol: A grapheme unit, which is part of the inscription. Each symbol represent a certain grapheme.

Dictionary of Unknown Words: Contains unknown inscriptions.

Inscription: A sequence of the symbols, which have a certain meaning in a language.

Complete Chapter List

Search this Book:
Reset