Article Preview
TopIntroduction
Mobile devices are not only tools to call a person, but also powerful small computers. It is possible to use a variety of applications beyond telephony and messaging. One challenge is to design these applications for a variety of usage scenarios given the limited screen size and resolution of mobile devices. For many years, most interfaces were predominantly GUI-oriented and presented little information via other modalities. Recently, the potential of auditory and tactile output messages - especially in mobile situations where users need their visual attention for other tasks pursued in parallel (e.g. walking) - has been in the focus of research (Brewster, 2002; Brewster, Chohan, & Brown, 2007; Hoggan & Brewster, 2007). One study that examines the advantages and disadvantages of auditory and tactile feedback in different usage context has been conducted by Hoggan and colleagues (Hoggan, Crossan, Brewster, & Kaaresoja, 2009). However, they focused mainly on environmental effects on preference and performance to find threshold levels for the different modalities including audio.
Auditory feedback has the advantage that it is easy to implement and thus frequently used by system designers. Sound can be heard even if the user's eyes are not focusing on the device or the user's attention is allocated to something else as frequently the case with mobile devices. In these situations important information can be transferred via sound very well. On the other hand, sound can be annoying and disturbing, especially in social contexts.
One possibility to describe sounds is given by Jekosch from a semiotic perspective. It focuses on the meaning of a sound, interpreting it as an information carrier, or more simple, a sign (Jekosch, 1999). In sign theory developed by Peirce, three types of relations between a sign and the object it refers to exist: symbolic, iconic and indexical relation (Peirce, 1960). Symbolic relationships are arbitrary and offer the widest range in sound design, but users need to learn the concrete mapping. Iconic relationships are representational with a certain similarity between the sound and the denoted object. The closest relationship is indexical, also called causal, where the sound and the object are directly connected.
In this article, we will focus on sounds as non-speech auditory feedback. For this type of feedback two concepts that make use of the different relations between sounds and objects will be further described in the next sections: Auditory icons, and earcons. Sounds that are produced by the devices themselves - e.g. the vibration motor sounds that occur during the presentation of tactile feedback are not in scope of this article.
Auditory Icons
Auditory icons were established by Gaver (1986) as caricatures of sounds that appear in the real world. He emphasized their most important advantage was that people are exposed to those sounds in their everyday life and are therefore used to this kind of auditory information. Additionally, the mapping between a certain sound and the event or object it represents is not arbitrary, but rather iconic.
Therefore, auditory icons can intuitively be mapped as analogies to actions or events and do not have to be learned. For instance, deleting a file can be mapped to the sound of a rumpled piece of paper being thrown into a recycling bin. Finding suitable sounds that are associated to certain events and objects in human-computer interfaces is a challenging task as not all events in this context produce a sound that is obviously related. In these cases, metaphorical mappings (possibly without ambiguity) need to be determined. The stronger the existing associations are, the better the learning and retention rates of sound-event pairings (Stephan, Smith, Martin, Parker, & McAnally, 2006). However, the use of natural or representational relationships may have its disadvantages. As different people may have diverse associations, the intended intuitive mapping can get lost. Hence, auditory icons can be a powerful approach for providing information about an event or object in human-computer interaction, provided that their acoustic meaning evokes clear and distinct associations for the user. The challenge is to find sounds with this property.