Accent Conversion in Computer-Assisted Pronunciation Training (CAPT)

Accent Conversion in Computer-Assisted Pronunciation Training (CAPT)

Copyright: © 2021 |Pages: 34
DOI: 10.4018/978-1-7998-6609-1.ch006
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This study investigated a computer-assisted pronunciation training (CAPT) software that utilized automatic speech recognition (ASR) and accent conversion technology to improve pronunciation of second language learners. Such speech processing method is capable of addressing the typical shortcoming of ASR technology for L2 pronunciation training, which is providing meaningful corrective feedback. Thirty-six student participants were involved in the treatment group. For the treatment, they worked on a CAPT tool that utilized ASR and AC to provide the participants with corrective feedback. A comparison group was also used and consisted of 36 students but worked on a different type of CAPT tool. Two trained raters took part in rating each monologue completed for the pretest, posttest, and comparison data. Findings showed preliminary statistical significance in regards to improved pronunciation for the treatment group. Additional results also showed no statistical differences in the rater scores between the control group and the experimental posttest scores.
Chapter Preview
Top

Introduction

We are living in an era in which recent fast-paced phenomena such as globalization, advances in technology, and immigration patterns have led to more communication between native and non-native speakers of a language (Portes & Rumbaut, 2006); this means that English, as a lingua franca, is spoken as a nonnative language by a large number of people worldwide which is estimated to be over one billion (Melitz, 2016). However, despite this rapid increase in the number of immigrants and second language speakers around the world, attitudes towards immigration and immigrants have remained mainly negative (Kessler & Freeman, 2005; Simon & Lynch, 1999). Such negative attitudes not only manifests themselves in different aspects of a non-native speaker’s life, it also has several ramifications for the individual, among which non-native accent/pronunciation is the forefront (Derwing & Munro, 2009; Moyer, 2004). Unfortunately, many second language speakers experience discrimination as a result of their accented speech. Such discrimination exists in different aspects of their lives, from accommodation (Zhao, Ondrich, & Yinger, 2006) to education (Marvasti, 2005) to employability (Carlson & McHenry, 2006; Lacey, 2011; Lippi-Green, 1997), and even the legal system (Lippi-Green, 1994).

Despite the prominence pronunciation has in the process of language learning (Celce-Murcia, Brinton, & Goodwin, 2010), the language-teaching profession has taken different positions regarding the teaching of pronunciation. It is true that while attention is being given to pronunciation in second language teaching, it has long taken a back seat to other more prominent language skills and even has been called the “Cinderella area” of foreign language teaching, as Kelly (1969) states.

Pronunciation gained a lot of popularity with Audiolingualism, where this skill is taught explicitly from the start, while it was marginalized and mostly neglected by early-Communicative Language Teaching (Thomson & Derwing, 2014). Derwing and Munro (2005) argued that the emergence of Communicative Language Teaching and its emphasis on meaning pushed pronunciation instruction aside; however, due to the focus of Communicative Language Teaching puts on language as communication, there has been a shift in focus and accent has regained a great public interest (Celce-Murcia, Brinton, & Goodwin, 2010). Another major reason for this regained public attention is related to the fact that immigrants and second language learners more than ever feel the need to become more and more competent in how they sound due to the stigma attached to being a non-native speaker of a language (Derwing & Munro, 2009; Thomson, 2012).

Speaking with a foreign accent is most often an inevitable phenomenon of second language performance even after a person spends years in the target culture. Despite non-native speakers’ rigorous efforts for improving pronunciation, age seems to have a very resilient influence on all other factors. According to Critical Period Hypothesis as proposed by Lenneberg (1967), having foreign accent is unavoidable after the critical period, which refers to a period of time ideal to acquire language in a linguistically rich environment, after which further language learning becomes much more difficult. The intelligibility principle has been highlighted in the past few decades to address the view of second language pronunciation stating that it is neither realistic nor desirable to reach native like pronunciation (Derwing & Munro, 2005; Felps, Bortfeld, & Gutierrez-Osuna, 2009; Neri et al., 2002). However, despite the dominance of intelligibility principle and its emphasis on the fact that having an accent does not necessarily influence the communication, there still often exists a negative, discriminatory attitude towards non-native speakers and their non-native accent (Gluszek & Dovidio, 2010; Levis, 2015; Lippi-Green, 1997; Thomson & Derwing, 2014; Trofimovich & Isaacs, 2012).

Whether it is discrimination, stereotyping and prejudice, or identity crisis, the non-native accent stigma seems to be a paralyzing factor for second language learners, immigrants and learners in ESL setting in particular (Lacey, 2011; Lippi-Green, 1997; Zhao, Ondrich, & Yinger, 2006) and therefore demands distinctive attention in the area of second language learning.

Key Terms in this Chapter

Voice Conversion: Voice conversion seeks to transform utterances from a speaker so they sound as if another speaker had produced them. Voice conversion techniques aim to transform utterances from a source speaker to sound as if a target speaker had produced them (Felps, Bortfeld, & Gutierrez-Osuna, 2009).

Automatic Speech Recognition (ASR): This term refers to the use of computer hardware and software-based techniques to identify and process human voice. It is used to identify the words a person has spoken or to authenticate the identity of the person speaking into the system. While speech recognition aims at recognizing the word spoken in speech, the goal of automatic speaker recognition systems is to extract, characterize, and recognize the information in the speech signal conveying speaker identity.

Stigma: Stigma is defined as an attribute of a person that is intensely discrediting, which in others’ minds reduces that person from a whole and usual person to a tainted, discounted one (Gluszek & Dovidio, 2010).

Accent Conversion: In contrast to the voice conversion, accent conversion seeks to transform only those features of an utterance that contribute to accent while maintaining those that carry the identity of the speaker. The goal of accent conversion is to capture the regional accent of the source while preserving the voice quality of the target (Felps, Bortfeld, & Gutierrez-Osuna,

Accent: Generally, accent is referred to as distinct phonological aspect of language which is influenced by the individual’s mother tongue, social rank, or geographical origin.

Complete Chapter List

Search this Book:
Reset