Support for the Semitic language on mobile devices, while not yet universal, is becoming more ubiquitous. Many items and data contents written in Arabic or Hebrew, for example, can be seen on a whole range of mobile devices. It is not uncommon, however, to encounter difficulties in entering and displaying Semitic text; as experienced mobile/computer users know, there is in fact a variety of different Semitic input methods and character encoding schemes. The challenge involved in constructing a text entry system for Semitic scripts is amplified by the fact that existing resources are inadequate. This chapter attempts to review the current state of affairs regarding text entry for Semitic scripts on mobile devices in order to provide a stepping ground for further investigation in this area.
The Semitic family language includes many languages spoken by a large number of native speakers. However, Semitic languages are still understudied. Support for the Semitic language on mobile devices, while not yet universal, is becoming more ubiquitous. Many items and data contents written in Semitic scripts can be seen on a whole range of devices, from the simplest mobile handset to smart phones to full feature PDAs. It is not uncommon, however, to encounter difficulties in entering and displaying Semitic text; as experienced mobile/computer users know, there is in fact a variety of different Semitic input methods and character encoding schemes. The challenge involved in constructing a text entry system for Semitic languages is amplified by the fact that the existing resources are inadequate.
Semitic Languages and Scripts History
The Semitic languages are a family of languages spoken by more than 370 million people across much of the Middle East where they probably originated, as well as in North and East Africa. They constitute the northeastern subfamily of the Afro-Asiatic languages and the only branch of this group spoken in Asia (see Figure 1).
Semitic languages family tree
The most prominent members of this family are Arabic (206 million speakers) followed by Amharic (26 million speakers), Tigrinya (6.75 million speakers), and Hebrew (6 million speakers). Semitic languages were among the earliest to attain a written form, with Akkadian writing beginning in the middle of the 3rd century b.c. The term Semitic for these languages, after Shem, a son of Noah, is etymologically a misnomer in some ways, but is nonetheless standard (Wikipedia, 2006).
The Aramaic Language
The Aramaic language was the international trade language of the ancient Middle East between 1000 and 600 b.c., spoken from the Mediterranean coast to the borders of India. Aramaic was used by the conquering Assyrians as a language of administration communication, followed by the Babylonian and Persian empires that ruled from India to Ethiopia and employed Aramaic as the official language. For this period (about 700–320 b.c.), Aramaic held a position similar to that occupied by English today. The most important documents of this period are numerous papyri from Egypt and Palestine. Its script, derived from Phoenician and first attested during the 9th century b.c. also became extremely popular and was adopted by many people with or without any previous writing system (Lo, 2005).
The Arabic Language
The Arabic language, which is the mother tongue of more than 300 million people, presents significant challenges to many text entry applications. Arabic is a highly inflected and derived language. The Arabic alphabet consists of 28 letters that can be extended to 90 by additional shapes, marks, and vowels (Tayli & Al-Salamah, 1990). Eight of the doublets are differentiated by diacritics. Some letters are ambivalent between two or more sounds, and some letters do not indicate a sound; they have only a grammatical function. Unlike Latin-based alphabets, the orientation of writing in Arabic is from right to left. In written Arabic, short vowels are often omitted.
The Arabic script stems from the same source as the Latin, Greek, and Hebrew alphabets: Phoenician. The underlying proto-alphabet had some two dozen characters. The direct forebear of the Arabic alphabet is an Aramaic alphabet from which it inherits the tendency to merge letter groups into larger units marked by a final swash instead of a space (Djoudi, 1991).
Key Terms in this Chapter
Linguistic: The scientific study of language that can be theoretical or applied. Someone who engages in this study is called a linguist.
QWERTY Soft Keyboard: A virtual soft QWERTY keyboard that can be used with any other Windows application.
Text Entry: Input method to enter text into mobiles devices. The three text entry methods are the standard MultiTap system, the pen-based Graffiti, and the scaled-down QWERTY soft keyboard.
Semitic Languages: Languages that have their roots in Semitic (from the Biblical “Shem”) and include the ancient and modern forms of Amharic, Arabic, Aramaic, Akkadian, Ge’ez, Hebrew, Phoenician, Maltese, Tigre, and Tigrinya, among others.
Script: A set of defined base elements or symbols, individually termed characters, or graphemes.
ROOT: The primary lexical unit of a word that carries the most significant aspects of semantic content and cannot be reduced into smaller constituents.
Mobile Devices: A pocket-sized computing device, typically comprising a small visual display screen for user output and a miniature keyboard or touch screen for user input.