In this chapter we describe a proposal of a metropolitan information system (MIS) for providing various information for inhabitants of the city, as well as for strangers. The main principle is based on a philosophy of accessing data from the Internet and to provide a user-friendly interface to these data using various types of intelligent kiosks. The stress is put to the multimodal human-computer communications in both directions using image audio/speech and text modes. We propose several versions of the intelligent kiosks and various types of communications with MIS. The first version is placed on public places and offer three-dimensional human head displayed on a large display that gives information about city, institutions, weather, and so on. It is a system with integrated microphone array, camera, and touch screen as an input and two displays and loudspeakers as the output. Speech recognized question for some information will be transformed into an answer using database or Internet and then visually and acoustically displayed to the costumer with help of robust multilingual speech synthesizer and powerful graphical engine. The second flexible version, even if with limited functionality, is the concept of mobile phone used as a multimedia terminal for access to different information. The last possibility is to use a regular phone (fixed or mobile) to access MIS via an intelligent speech communication interface. The type of communications depends on the version of the terminal. The stand terminals suppose to have mainly fixed IP connection to MIS, but wireless access can be used as well. The second version of terminals uses WiFi technology to MIS. The last solution, the general phone, can access the MIS using either fixed telecommunication network or GSM.
One of the information kiosks that demonstrated a significant improvement over earlier systems is the MINNELLI system (Steiger & Suter, 1994). MINNELLI facilitates interactions with bank customers primarily by the use of short animated cartoons to present information on bank services. However, the MINNELLI system requires a basic user training, which reduces its applicability in most public sites. Another successful kiosk with a broader scope than the MINELLI system is a MACK system (Cassell et al., 2002). MACK is an embodied conversational kiosk that provides information on residents and directions to locations at a research site. It integrates multiple input sources that include speech, gesture, and pressure. The system also exhibits a degree of spatial intelligence by utilizing its awareness of its location and the layout of the building to reference physical locations when it provides directions (Stocky & Cassell, 2002). An August spoken dialog system is also kiosk-based and helps users to find their way around Stockholm, Denmark using an on-screen street map. The most advanced system is a MIKI system (L. McCauley & D’Mello, 2006). MIKI is a three-dimensional, directory assistance-type digital persona displayed on a LCD in FedEX Institute of Technology at the University of Memphis. MIKI stands for Memphis Intelligent Kiosk Initiative and is used to guide students, staff, and visitors through the Institute’s maze of classrooms, labs, lecture halls and offices through graphically rich, multidimensional, interactive, touch and voice sensitive digital content. MIKI differs from above mentioned intelligent kiosk systems by advanced natural language understanding capabilities that provide it with the ability to answer informal verbal queries without the need for rigorous phraseology.
The idea of an application for communication in mobile phone is not new and there have been some systems developed already, working both as embedded (Németh, Kiss, & Tóth, 2005; Gros et al., 2001) and client-server based (Farrugia, 2005).
Key Terms in this Chapter
Data Mining: Is the principle of sorting through large amounts of data and picking out relevant information.
Speech Synthesis: Is the ability of a machine or program to convert the text into speech.
Java Mobile: Is a technology that allows programmers to use the Java programming language and related tools to develop programs for mobile wireless information devices such as cellular phones and personal digital assistants (PDAs).
Multimedia: Is media that utilizes a combination of different content forms.
Human Computer Interaction: Is the study of how people interact with computers and to what extent computers are or are not developed for successful interaction with human beings.
Speech Recognition: Is the ability of a machine or program to recognize voice commands or take dictation.