Article Preview
Top1. Introduction
Traditionally, communication systems have been designed using a layered architecture. Each layer accomplishes a well-defined set of tasks and communicates with the layer immediately above and the layer immediately below through well-defined interfaces. The layered architecture reduces the complexity of systems and simplifies the design because it allows the designers to focus on optimizing each layer.
However, this architecture is not always optimal. In some scenarios, optimizing a given layer without taking into consideration other layers may result in loss of efficiency. One example is the behavior of congestion-avoidance protocols in networks where the physical medium is a wireless channel with fading. Another example is resource allocation in MIMO systems, where it is no longer possible for the MAC and the PHY layer to coordinate through a few simple metrics and the allocation needs to be performed in a joint fashion.
The transmission of audio through wireless networks is a paradigm where use of cross-layer approaches instead of a purely layered approach may prove to be advantageous (Chen, Liu, Wu, and Chen, 2011). Currently, state-of-the-art wireless systems use Adaptive Modulation and Coding (AMC) in the physical layer in order to provide transport of bit streams with a Bit-Error Rate (BER) not exceeding a value specified by upper layers. In order to maintain the target BER, the rate of the raw bit stream is adapted as the quality of the channel changes. When the channel quality is low because of fading and/or interference, or when more users enter in a system with limited resources, the data rate is dropped. Inversely, more data can be transmitted in a given time slot when the SNR is high or when more bandwidth can be allocated to a given user.
In live audio transmission, if it is important that the audio quality remain the same throughout transmission, the user will need to boost its power if the SNR decreases, resulting in higher power consumption. Moreover, the system may be forced to deny service to users requesting transmission in order to maintain the data rates of existing users. An alternative approach is to drop the quality of the transmitted audio signal. In some cases, such as audio broadcasting in home networks, this may be highly undesirable.
In other applications, it might be preferable to maintain a given stream rate, even if this results in some degradation of the audio or some periods of silence (e.g. secured low bit-rate voice or video transmissions). Moreover, in applications where some delay can be tolerated, and, therefore, interleaving can be used either in the audio encoder or by physical layer algorithms, it may not be necessary to change the data rate.
When Adaptive Coding and Modulation is used, the adaptation is based on appropriate metrics. Typically, the used metric is the Bit-Error Rate (BER) that can be offered from the channel at a given instant. The BER is a function not only of the channel condition, but also of the transmission rate. However, when the physical medium is used to transport audio, it might be possible to base the adaptation decision on other metrics. For example, if some amount of audio buffering is used, it might be possible to delay switching to a lower rate, as the channel may improve before the buffer empties. For this decision, metrics such as the size of the buffer and the statistics of the channel could be considered. Even if a larger BER cannot be avoided, if the audio decoder can conceal the additional errors it might be better to maintain the same transmission rate at the physical layer. Bandwidth saving, latency and packet loss for different options are studied in (Saldana, Fernandez-Navajas, Ruiz-Mas, Murillo, Viruete-Navarro, and Aznar, 2012). In this paper, the Mean Opinion Score (MOS) is used (ITU-T Rec. P.862, 2001), (ITU-T Rec. P.862.1, 2003), (ITU-T Rec. P.863, 2011). The MOS quantifies the quality of the audio as perceived by a listener. It can be obtained via listening tests by experts, or using objective testing by means of speech transmission quality algorithms (ITU-T Rec. P.862, 2001), (ITU-T Rec. P.862.1, 2003), (ITU-T Rec. P.863, 2011).
Our aim is to investigate if the common practice of adapting the physical layer parameters is beneficial when the goal is to maintain a given MOS rather than a given BER of the encoded audio bit stream. This is motivated by the observation (Slavata & Holub, 2014) that, although larger BER values create more bit errors to the encoded audio stream, by keeping the audio rate constant instead of reducing it, the audio decoder may be able to better conceal the effect of the additional errors on the quality of the audio.