Chinese-Braille Translation Based on Braille Corpus

Chinese-Braille Translation Based on Braille Corpus

Xiangdong Wang (Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China), Yang Yang (Jiangsu Enterprise Information Operation Center, China Telecom Corporation Limited, Beijing, China), Hong Liu (Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China) and Yueliang Qian (Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China)
DOI: 10.4018/IJAPUC.2016040104

Abstract

For people with visual disabilities, reading Braille text is an important way to acquire information. There are great challenges for Chinese-Braille translation due to the characteristics of word segmentation and tone marking in Chinese Braille. In this paper, a novel scheme of Chinese-Braille translation is proposed. Unlike current methods which use heuristic rules defined by experts for Braille word segmentation, the proposed method performs Chinese-Braille translation based on a Braille Corpus without experts on Braille. Under the scheme, a Braille word segmentation model based on statistical machine learning is trained on a Braille corpus, and Braille word segmentation is carried out using the statistical model directly without the stage of Chinese word segmentation. Tone marking and some special treatment are also performed based on word and rule mining on the Corpus. This method avoids manually establishment of rules concerning syntactic and semantic information and uses statistical model to learn the rules by stealthily and automatically. Experimental results show the effectiveness of the proposed approach.
Article Preview
Top

1. Introduction

For people with visual disabilities, reading Braille text is an important way to acquire information. Braille is a tactile writing and reading system used by the blind. Blind people read Braille by touching and recognizing the dots on the paper or a refreshable Braille display connected to computers or other terminals. Many systems have been developed to convert text in languages such as English, Danish, Spanish, Portuguese and Devanagari into corresponding Braille text (Christensen et al., 2012; Christensen and Chourasia, 2014; Christensen and Stevns, 2015; Coutinho et al., 2012; Bodale et al., 2014). The conversion from the text of the above-mentioned languages to corresponding Braille is relatively easy and simple, since there exists direct mapping from letters or words to Braille characters (known as cells). However, when concerning the language of Chinese, there are great challenges due to the characteristics of word segmentation and tone marking in Chinese Braille (Jiang and Zhu, 2006).

Unlike alphabetic languages, e. g. English, the basic unit of Chinese is character. And there are tens of thousands of characters used in the Chinese language. Therefore, it is impossible to map Chinese characters to Braille cells. In China, the most widely-used Braille system (called prevailing Mandarin Braille) maps the pronunciation of characters to Braille cells. Each syllable is written with up to three cells, representing the initial, final, and tone, respectively. To reduce ambiguity, Braille words are separated by spaces in writing, which is different from Chinese text. Furthermore, there is no direct mapping from Chinese word to Braille word, because phrases with relatively complete meaning are defined as words in Braille to further reduce ambiguity. In the standard of Chinese Braille, hundreds of rules are given to define Braille words, most of which are syntactic or semantic rules (AQSIQ, 2009; Teng and Li, 1996). This poses great challenge to automatic Chinese-Braille conversion, since the vocabulary of Braille words can be infinite according to the rules and the syntactic or semantic rules are difficult to be understood and processed accurately by computers. Another challenge of automatic Chinese-Braille conversion is the tone marking problem. To reduce printing cost and avoid information overload, in prevailing Mandarin Braille, an average of about 5% syllables need to be marked with tones. Besides some simple situations, most situation of tone marking depend on subjective judgment of experts, e. g., unfamiliar words and words may cause ambiguity, which makes it difficult to be automated by computers.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing