Article Preview
Top1. Introduction
Embedded multimedia systems continue to evolve offering many solutions to facilitate the quotidian life. They cover different areas of use (digital TV, military, medical, mobile cellular, automobile…). Moreover, customer needs are increasing to solve more daily requirements. In order to resolve constraints imposed by final users, the progress of microelectronics technology have given several solutions. Actually, it is possible to develop complex circuits for embedded systems. They contain many microprocessors (CPUs), memories, bus, and coprocessors (coproc) on the same chip, such as Multiprocessor System on Chip (MPSoC) Technology (Wolf, 2008; Vakili, 2010). The MPSoC architectures are usually devoted to applications which need an intensive computation.
Video applications are considered as complicated applications having regard to its algorithmic complexity, like H.264/AVC. These applications require powerful platforms to comply with real time processing while respecting the circuit area and power consumption constraints. H.264/AVC standard is developed by VCEG area and MPEG group. It integrates various modules to fill the compromise encoding time and video quality (Ghanbari, 2011; Richardson, 2010; Zrida, 2011). However, the encoding efficiency is followed by an algorithmic complexity rising.
Related works have shown the efficiency of using parallel architectures to decrease the encoder processing time (Tushar, 2012). Consequently, different methods of parallelism have been proposed. We distinguish three levels of parallelism for H.264/AVC encoder: tasks, component, and data. Tasks level parallelism (TLP) consists in assigning different functions of the application on separate CPUs. Component level parallelism (CLP) is a partitioning where the process of Luma and Chroma components is split on separated CPUs. Finally, data level parallelism (DLP) consists in looking for data structures that can be assigned to separate CPUs. To ensure parallelism for the H.264/AVC, the data dependencies among different blocks of this video encoder have to be taken in account to preserve the quality of the reconstructed video. Several studies have proven the effectiveness of MPSoC technology for H.264/AVC (Kulmala, 2008; Amari, 2009). It can meet the real-time processing, power consumption and circuit area constraints (Yan, 2009; Zrida, 2011; Javaid, 2011).
Various parallel algorithms were proposed for the H.264/AVC in anterior works. In fact, these partitioning are implemented in various hardware platforms. In this paper an efficient Macro Blocks Line Parallelism (MBLLP) is proposed for the intra prediction encoding chain of H.264 to accelerate the processing time. The proposed parallel algorithm takes in account the data dependency in intra prediction and filter modules. This approach is implemented in a new MPSoC architecture which is based on SoCLib platform. This latter is an open platform for virtual prototyping of MPSoC architectures. The MPSoC architecture requires maximum optimization technique for the size of used memory. The experimental results show an interesting time saving which requires the smallest size of used memory comparing to other parallelism approaches. Therefore, it respects the size of used memory constraint, which affects directly the area of final circuit.