This paper presents a comparison of two systems that can simultaneously decode multiple videos on a simple
CPU and dedicated function-level hardware accelerators. The first system is implemented in a traditional way,
such that the decoder instances access the accelerators concurrently without external coordination. The second
system implementation coordinates the tasks' accelerator accesses by scheduling. The solutions are compared
by execution cycles, energy consumption and cache hit ratios. In the traditional solution each decoder task
continuously requests access to the needed hardware accelerators. However, since the other tasks are competing
on the same resources, the tasks must often yield and wait for their turn, which reduces the energy-effciency.
The scheduling-based approach assumes that the accelerator latencies are deterministic and assigns time slots for
accelerator accesses required by each task. The accelerator access schedule is re-designed for each macroblock at
run-time, thus avoiding the over-allocation of resources and improving energy-effciency. Deterministic accelerator
latencies ensue that the CPU is not interrupted when an accelerator finishes. The contribution of this study is
the comparison of the accelerator timing solution against the traditional approach.