Using Spatial Audio in Game Technology for Expressing Mathematical Problems to Blind Students

Using Spatial Audio in Game Technology for Expressing Mathematical Problems to Blind Students

Flaithrí Neff (Limerick Institute of Technology, Ireland) and Ian Pitt (University College Cork, Ireland)
DOI: 10.4018/978-1-60960-495-0.ch021
OnDemand PDF Download:
No Current Special Offers


Game technology often offers solutions to problems that are difficult or impossible to solve in traditional educational settings. Maturing spatial audio technology being developed to enhance the playing experience of gamers is increasingly recognized as a promising method for relaying complex educational scenarios to blind students. The subject of mathematics is a prime example of complex information in education that has challenged teachers of blind students, the students themselves, and researchers for many years. This is especially prevalent in relation to mathematics with inherent spatial attributes or complex sequences that are most effectively portrayed in the traditional medium using visual diagrams or spatially organized symbols on a page. This chapter discusses the alternative uses of spatial sound in gaming industry for overcoming some of the problems associated with presenting some of these complex attributes in mathematics to blind students. The authors also present a theoretical framework designed to offer guidelines to audio game designers focused on presenting complex information to blind students using spatial sound technology. Furthermore, the authors present results of a pilot study examining the presentation of trigonometric shapes using game surround sound tools.
Chapter Preview


We live in a society that is fundamentally dependent upon what is frequently regarded as our primary sensory apparatus – vision. However, for many members of our society, their sense of vision is either impaired or completely absent. For a blind person, the sense of hearing and touch plays an extremely important role in how he/she gathers information about the world around them. They have overcome many difficulties in an environment that relies almost predominantly on the visual presentation of data and that presumes one must ‘see’ data in order to interact with it. Digital technology and the digitization of data have made information accessible to the visually disabled that was previously out of reach and difficult to acquire. For example, instead of depending on others to relay text content from a book or waiting for a Braille print version, thousands of books now have a digital equivalent that can be easily accessed using standard Text-To-Speech (TTS) software. However, there remains a significant amount of data that is exclusively available only to those who are sighted, and even though there is increased recognition that information should be available to all members of our society, progress in this regard remains slow in some areas. In many cases this is down to either the unwillingness or lack of awareness of those disseminating the information, but in other instances it remains technically difficult to efficiently and comprehensively represent certain types of information in a non-visual format.

An example of information that is difficult to comprehend without some form of visual representation is certain forms of mathematics. Some mathematics is, of course, easily translatable to standard non-visual formats, such as TTS output. However, other forms consist of abstract elements, spatial composition and structural organization that are incompatible with the linear nature of speech output or Braille. This represents a serious problem when it comes to mathematics education for blind students. A lack of access to mathematics eventuates to problems with subjects such as science, engineering and technology, since many topics in these areas rely on sound mathematical knowledge. The kinds of mathematical elements that pose most problems for blind students when using Braille devices or TTS are those that traditionally rely on key visual elements, such as diagrammatic aids (e.g. trigonometry), spatial association (e.g. matrices), or particular structural arrangement that influences the problem’s outcome (e.g. algebra). Representing these non-linear visual elements using either TTS output or a digital Braille device is very difficult, perhaps due to the fact that TTS output and Braille output is linear in nature. Furthermore, even when the mathematical content is purely linear, speech presentation of complicated equations can easily become overwhelming for the listener, given the transient nature of speech (Edwards & Stevens, 1993). Furthermore, in the case of sighted users, the screen display allows them to review elements that occurred at the beginning of the equation, but blind users lack this form of external memory. Without an external memory facility in speech output systems, the listener can only rely on their own internal memory, which may be incomplete (Edwards & Stevens, 1993).

Although speech output is one of the richest and successful methods for presenting digital information to visually disabled computer users (especially for relaying precise information), it is not the most ideal method for representing an overview of complex content. The importance of an overview, or glance, in mathematics is fundamental to understanding and tackling the problem. For example, algebraic equations displayed in a visual format present key syntactic elements in its notation (Stevens et al, 1994). For a sighted user, this relational structure is immediately recognized, even before examining the precise content of the equation itself. An overview, therefore, relays information to the user that is fundamental to the planning and decision-making process of tackling an equation (Ernest, 1987; Stevens et al, 1994). As a consequence of the difficulties pertaining to obtaining an overview of a mathematical problem via speech, researchers in the field have looked to using non-speech sound (Stevens et al, 1996; Stevens et al, 1994; Brewster et al, 1994).

Key Terms in this Chapter

MathML: Mathematical Markup Language. A standardized, XML-based description of mathematical notation, structure and content.

Crosstalk Cancellation: This is a method for eliminating an undesirable effect whereby a signal from one channel has a negative impact on the other. In terms of audio technology, this is primarily concerned with sound files already incorporating binaural filters being output via standard stereo speakers. The human auditory system filters for a second time the already filtered audio output (binaural encoding). Also, sound meant for one ear is also reaching the opposite ear causing undesirable effects. Crosstalk cancellation is another set of filters incorporated to reduce the above effects.

OpenGL: A cross-platform graphics API (see API description above).

Interaural Level/Intensity Differences: The sound-wave pressure difference registered in the left and right ears. For example, a sound source on the left will have a higher sound pressure level when reaching the left ear compared to the right.

Auditory Pathway: An illustrative description of the capture and processing stages of sound information in the human hearing system.

Spatialization Cues: The attributes of a sound that determines its location in space. Various elements can impact or influence these attributes, such as immediate environmental surroundings (echo from walls etc.), the sound-source’s distance from the observer, the sound-source’s degree of angle relative to the observer etc.

Lossless Compression: A form of data compression where techniques are used to reduce the file size without deleting any information during the process. In audio, this involves predictive algorithms and other advanced processes. Examples of lossless audio compression are Free Lossless Audio Codec© and Apple™’sApple Lossless©.

Interaural Time Difference: The temporal difference between a sound-wave reaching one ear before the other. For example, a sound source on the left will reach the left ear before it reaches the right.

XNA: A set of programming tools facilitating the development of games for Microsoft™ platforms (including the Xbox™).

ASCII Nemeth code: Nemeth code is a form of Braille code for mathematics and scientific notation. ASCII Braille correlates Braille cells with the corresponding ASCII digits and characters.

Auditory Graph: A graph whereby some, or all, of its data elements are presented using sound.

Fuzzy Felt: Simple fabric shapes, used conventionally as toys, but also used as tactile representations of 3D shapes for the blind in education.

Multimodal: A system that allows the user to interact with data and functionality using several different senses - vision, hearing, touch etc.

Accelerometer: In the context of this chapter, it is a sensor in a device that determines orientation, vibration, and shock.

Audio Middleware: Middleware is software allowing developers to more easily integrate assets and implement various services in a larger software environment. Audio Middleware allows sound designers to easily attach sound files to a game environment and to readily determine the interaction of those files with existing, non-audio assets such as graphic sprites etc.

XACT: An audio library, audio engine and front-end GUI for incorporating audio elements into an XNA© Project. It is designed for Microsoft™ platforms.

Audio API: API is an acronym for Application Programming Interface. An API is a controlled and regulated interface allowing a programmer to create programming elements and to integrate these in an existing software system or framework. An audio API is a controlled programming environment specifically relating to creating and connecting audio elements to a games engine for example.

HRTF: An acronym for Head Related Transfer Function. This is a complex set of filters simulating the effect imposed by the head, shoulders and ears of the listener on a sound entering the auditory canal.

PCM file: Pulse Code Modulation. This is the raw, uncompressed digital representation of an analog signal. Typically, the analog signal is sampled several thousand times a second and represented in a binary format. Popular audio file formats, such as AIFF and WAV, employ PCM bitstream encoding.

Doppler Shift: Represents the frequency-change effect observed by a stationary listener as a sound-emitting source passes by. It also represents the frequency-change effect observed by a moving listener relative to the sound-emitting source.

Auditory Stream Segregation: A primitive perceptual process of the human hearing system where sound that is present in the auditory scene is organized into individual streams, thereby associating one or more sounds with the same source in the environment. In a rich auditory scene, the perceptual system will segregate and associate various sonic elements so that it can formulate how many sources are present in the environment and determine what sonic elements are associated with each source. These different sources can have one or many individual auditory streams associated with them.

Notch Filter: A filter that blocks or attenuates frequencies that fall within the stop band of a particular center frequency. Frequencies above or below the band are not affected.

Lossy Compression: A form of data compression where elements of the data is deleted from the original file in order to reduce its file-size. In relation to audio technology, this is usually associated with compression techniques based on perceptual factors. Therefore, a lossy compression technique for audio incorporates some form of perceptual analysis to determine what elements can be permanently deleted without impacting on the perceived quality of the sound-file during playback. A prime example of this approach is MPEG-1, Layer 3 or .mp3.

VRML/X3D: Virtual Reality Modeling Language. A standard file format incorporating three-dimensional, vector graphics and limited spatial audio primarily for Web deployment. X3D is the XML-based replacement of VRML.

Decoder: A decoder interprets file formats that have been encoded in a particular format. In terms of audio technology, this often refers to software that can interpret compressed formats and output the file as sound playback. It also refers to encoded audio incorporating particular spatial cues and speaker matrixes.

VDU: Visual Display Unit. Technology that allows data to be presented in a medium that is accessible using vision.

Down-mix: Where multi-channel audio is reduced in terms of its channel count.

OpenAL: A cross-platform audio API (see API description above).

OpenSL-ES: Open Sound Library for Embedded Systems. This is a cross-platform audio API capable of implementing spatial sound on mobile devices.

Basilar Membrane: A component of the inner ear that vibrates as a function of pressure waves entering the inner ear structure. Sensory cells along its length are activated when particular movement of the membrane occurs, thereby encoding the pressure disturbances as neuro-chemical events. The Basilar membrane movement is representative of the incoming-signal’s physical shape.

Unreal Audio System: The cross-platform audio tool component of the Unreal Game Engine© developed by Epic Games™.

Spatial Audio Coding: A process whereby spatial cues are extracted from a multi-channel recording and compiled as side/ancillary information. The actual audio is down-mixed (see description of down-mix). At the client side, the audio is up-mixed (see description of up-mix) in accordance with the side information.

Multi-Channel: In audio technology, multi-channel refers to a system where sonic elements are output on discrete or shared speakers. Usually, it refers to systems incorporating more than two speakers and typically at least four.

Encoder: An encoder, in its simplest definition, is hardware or software that converts data from one format to another. In terms of audio technology, an encoder is predominantly used to describe software applications that compress raw audio files to a more compact format. It also refers to software applications that extract and format specific information relating to audio (such as spatial cues) that can be used for the purpose of an intermediary task, or for changing the final file format.

B-format: A multi-channel audio recording format associated with the Ambisonic technique.

External Memory: Within the context of this chapter, this term represents the visual display. When using a GUI, users can keep much of the information onscreen without having to memorize it. At a later stage, they may then resort back to that information on screen.

Pinnae: (singular: Pinna). The portion of the ear that is visible and protrudes from the head.

Sweet spot: In a multi-speaker setup, the center-most location within the speaker configuration usually provides the best spatial sound experience for the listener. Outside the sweet spot results in a poorer spatial experience.

Auditory Scene: A term used to describe the overall soundscape, or aural environment, where many independent and interactive sonic events/sources are present.

LFE: An acronym for Low Frequency Effect. This denotes the channel in surround sound that is dedicated to very low frequencies, usually below 120Hz. The subwoofer in a typical surround setup is the LFE.

JAWS: Job Access With Speech. An advanced screen-reader developed by Freedom Scientific™ aimed at the Windows™ system and tightly integrated with common OS functions and applications.

SAPI: Speech Application Programming Interface. An advanced speech recognition and synthesis API developed by Microsoft™. It is integrated into the Windows™ OS and other Microsoft™ applications.

Azimuth: When referring to the spatial localization of sound, the azimuth represents the position of sound sources at approximately head height, 360º, around the listener.

Pinna Transfer Function: A mathematical representation of the filters imposed by the human pinna due to its physical structure.

Distance Attenuation: As an observer/listener moves farther away from a virtual sound-source, the sound-source’s emitting sound gradually decreases in volume/loudness to simulate the increasing distance between source and listener.

Flanger: An audio effect created when two signals of the same frequency are mixed, but where the phase of one of the signals is slightly delayed.

German Film: Plastic sheets that produce raised lines when a pen is applied to their surface. Used in the past for creating tactile drawings for the blind.

LaTeX: A document markup language, formatting and typesetting system.

Irrelevant Sound Effect: Derived from the Irrelevant Speech Effect whereby a person’s visual recall of information is disrupted/corrupted/inaccurate because of concurrent irrelevant speech content being present in the environment. A similar effect can be observed when irrelevant non-speech sound is present, albeit to a lesser degree. It is also evident that speech and text content recall can be affected by concurrent irrelevant non-speech sound present in the environment. In the case of irrelevant non-speech sound, a link has been associated between the degree of impact the effect has on the recall process and the degree of variation of the irrelevant signal.

AMMS: (Advanced Multimedia Suppliments, JSR-234). The most advanced multimedia API extending the capabilities of MMAPI (Mobile Media API) for Java-enabled Symbian™ mobile devices. It allows the implementation of virtual binaural spatial sound via small Java applications on the highest spec devices (for example, the Nokia™ N97). A special type of Java environment (J2ME – Java 2 Micro Edition) is employed and is designed for devices with limited processing power, capacity and battery life.

Text-to-Speech (TTS): A system that converts normal language text on a computer into audible speech output. The speech output may be pre-recorded pieces of concatenated speech of the human voice stored in a database, or it may be purely synthetic based on synthesis techniques such as models of the human vocal chords, vocal tract and mouth. Visually disabled or impaired computer users rely heavily on this kind of technology for hearing emails, menu items, word documents etc.

Virtual Binaural: This is an approach to spatial audio rendering where filters and other spatial cues inherent in the human auditory system are utilized or simulated. Typically, the best results when listening to audio using this system is through standard headphones. With the addition of extra filters, this system can also be rendered on a stereo speaker setup as well.

ISO/IEC: International Organization for Standardization / International Electrotechnical Commission. These are non-governmental international standards bodies, acting as umbrella organizations for many national standards authorities worldwide. The subcommittees of these bodies often encompass academic, commercial and industrial interests. They foster international collaboration between industry and academia, as well as promoting compatibility in software and hardware design and development. They also provide established and rigorous methodologies for the development of internationally compatible standards.

Up-mix: Where the channel count of an audio file is increased, usually in strict accordance with ancillary data containing precise spatial cue information.

AC-3: Advanced Codec 3, Audio Codec 3. An audio coding algorithm developed by Dolby™.

Sonic Dimensions: Sound has several concurrent characteristics that can exist independently, yet interact with each other. These include rhythm, pitch, loudness, texture etc.

Sonify: Sonification. This is the process of representing events, structures and assets using non-speech sound. Sonification, in the context of computer interface design, is used to present GUI elements (such as icons) or events (such as alarms or warnings). Sonification is also used to convey scientific, mathematical and statistical data to the end-user. Sonification plays an increasingly important role for visually disabled users of technology.

.NET Framework: This is a very large library of pre-coded elements for the Windows™ platform. It is also an environment that allows programmers to use several different compatible languages to create applications and services (virtual machine).

DirectX: A collection of APIs (see description of API above) handling multimedia and game elements for Microsoft™ platforms, including the Xbox™.

Complete Chapter List

Search this Book: