## 1.

## Introduction

Layered coding combined with unequal error protection (UEP) is a promising method for video transmission over error-prone channels. Data partitioning^{1} in the H.264/AVC standard^{2} is an effective layered coding technique and is utilized in many UEP schemes, owing to its lower overhead and better error resilience.^{3, 4} With data partitioning, each partition in one slice has unequal importance for the reconstructed video quality due to their different dependency relationships.

In one group of pictures (GOP), which consists of one intraframe [I-frame or instantaneous decoder refresh (IDR)-frame] and a set of interframes (P-frames and B-frames), the intraframe is the predictive reference frame of subsequent interframes, thus the intraframe is more important than the interframes. Furthermore, previous P-frames are more important than succeeding P-frames, and B-frames are the least important in the event that hierarchical B-frames are not applied. Based on this fact, some video transmission schemes provide unequal error protection to frames with different encoding types.^{5}

From the prior analysis, the importance measurement at different coded levels, e.g., slice level, frame level, or GOP level, is a prerequisite for designing an effective UEP scheme. In this work, we develop an algorithm for deriving distortion as a metric to evaluate importance, jointly considering the unequal importance of different partitions in one slice and frames with different positions in one GOP. Moreover, a priority sorting method, which employs the characteristics of wireless channels, is proposed.

## 2.

## Distortion Derivation

According to the definition of data partitioning,^{1, 2} the encoded video data of one slice is rearranged into three different partitions, namely, partitions A, B, and C. Without loss of generality, we adopt a simple slice mode, i.e., one slice per frame, thus the encoded data of one frame is divided into three partitions. The average distortion over one GOP is selected as the metric to assess the importance of different partitions in different frames. A GOP is composed of one I-frame followed by
${N}_{G}-1$
P-frames, where
${N}_{G}$
denotes the total number of frames in one GOP. Due to its importance, the I-frame is assumed to be perfectly protected and correctly received.

Let ${f}_{n}^{i}$ denote the $i$ th pixel in frame $n$ , and ${\widehat{f}}_{n}^{i}$ and ${\stackrel{\u0303}{f}}_{n}^{i}$ denote its corresponding reconstructed pixel at the encoder and the decoder, respectively. Moreover, let ${d}_{n}^{i}$ denote the distortion in terms of the mean squared error (MSE) of ${f}_{n}^{i}$ , which can be calculated by:

To analyze the importance of different partitions in frame $n$ , we derive the distortion of each partition in frame $n$ under the assumption that only the partition under discussion is lost and other partitions in the current frame and all partitions in subsequent frames are error-free. As for the error concealment scheme, the modified temporal replacement is adopted, in which the intracoded macroblock (MB) or intercoded MB with lost motion information is replaced by the colocated MB in the previous frame.

To compute ${d}_{n}^{i}$ using Eq. 1, ${\stackrel{\u0303}{f}}_{n}^{i}$ is required. For the determination of ${\stackrel{\u0303}{f}}_{n}^{i}$ , three cases are considered, depending on which partition in frame $n$ is lost. We refer to an intracoded MB as I-MB, and an intercoded MB as P-MB.

### Partition A is lost

When partition A in frame $n$ is lost, the whole frame is corrupted, because partition B and partition C are dependent on it. Hence, for all pixels in the current frame, the reconstructed pixel ${\stackrel{\u0303}{f}}_{n}^{i}$ can be expressed as:

### Partition B is lost

When partition B in frame $n$ is lost, the pixels in an I-MB are affected, but the reconstructed pixels in a P-MB can be obtained by motion compensated prediction with the correct motion vector, reference frame, and residual data. Let ${\widehat{f}}_{n-1}^{k}$ denote the pixel from which ${f}_{n}^{i}$ is predicted, and ${\widehat{e}}_{n}^{i}$ refers to the quantized prediction error. Accordingly, ${\stackrel{\u0303}{f}}_{n}^{i}$ can be formulated as:

### Partition C is lost

When partition C in frame $n$ is lost, the reconstructed pixels in an I-MB will not be influenced when constrained intraprediction is utilized. Therefore, the reconstructed pixel ${\stackrel{\u0303}{f}}_{n}^{i}$ can be computed as:

## Eq. 4

$${\stackrel{\u0303}{f}}_{n}^{i}=\{\begin{array}{ll}{\widehat{f}}_{n}^{i},& i\u220a\mathrm{I}\text{-}\mathrm{MB}\\ {\stackrel{\u0303}{f}}_{n-1}^{k},& i\u220a\mathrm{P}\text{-}\mathrm{MB}\end{array}\phantom{\}}.$$For frames from frame $n+1$ to frame ${N}_{G}$ in the same GOP, a reconstructed pixel in I-MB will be ${\widehat{f}}_{n+1}^{i}$ , and a reconstructed pixel in P-MB will be ${\stackrel{\u0303}{f}}_{n}^{k}+{\widehat{e}}_{n+1}^{i}$ , in which ${\stackrel{\u0303}{f}}_{n}^{k}$ , can be computed recursively by Eqs. 2, 3, 4.

As analyzed before, ${\stackrel{\u0303}{f}}_{n}^{i}$ in frame $n$ and successive frames within one GOP can be obtained when any previously mentioned case has occurred. Moreover, let ${\overline{D}}_{n,1}$ , ${\overline{D}}_{n,2}$ , and ${\overline{D}}_{n,3}$ denote the average distortion for cases 1, 2, and 3, respectively, and $L$ stand for the total pixels in one frame. Substituting ${\stackrel{\u0303}{f}}_{n}^{i}$ into Eq. 1 to compute ${d}_{n}^{i}$ , and then the corresponding average distortions, which will be used to assess the importance of the different partitions, are given by:

## 3.

## Priority Sorting

Based on the analysis of ${\overline{D}}_{n,1}$ , ${\overline{D}}_{n,2}$ , and ${\overline{D}}_{n,3}$ for frames from frame 2 to frame ${N}_{G}$ , we present a priority sorting method jointly taking into consideration the unequal importance of partitions in one frame and P-frames at different positions in one GOP. That is, all partitions of the first $m$ P-frames and partition A of the middle $n$ P-frames following the preceding $m$ P-frames are labeled as high priority (HP), and partition B and partition C of the $n$ P-frames and all partitions of the remaining $p$ P-frames in one GOP are labeled as lower priority (LP).

For efficient delivery of the coded video data over wireless channels, the coded video data with HP must be delivered to the decoder with small probability of being affected by bit errors, providing basic reconstruction video quality. For each subcarrier in wireless channels employing orthogonal frequency division multiplexing (OFDM) technology, the extent of attenuation subject to fading is different, which can be utilized to categorize the subcarriers into two groups. The subcarriers with the signal-to-noise ratio (SNR) above ${S}_{0}$ are assigned to the high quality (HQ) subchannel group, otherwise the subcarriers are assigned to the low quality (LQ) subchannel group. Thus, the threshold ${S}_{0}$ used for grouping subchannels is determined by the bit error rate (BER) requirement of the coded video data with HP. Let ${C}_{H}$ and ${C}_{L}$ denote the available transmission capacity provided by HQ and LQ subchannel groups during ${N}_{G}-1$ P-frame intervals, respectively. And ${C}_{0}$ refers to the summation of all partition A for total P-frames. Moreover, let ${b}_{i,1}$ , ${b}_{i,2}$ , and ${b}_{i,3}$ signify the total coded bits for partitions A, B, and C in frame $i$ , respectively, and ${b}_{i,0}$ be the summation of ${b}_{i,2}$ and ${b}_{i,3}$ for simplicity. Therefore, $m$ , $n$ and $p$ for determining the priority of the total coded video data in all P-frames should satisfy the following constraints:

## Eq. 6

$$\{\begin{array}{l}m+n+p={N}_{P}\\ {C}_{0}\u2a7d\sum _{i=1}^{m}({b}_{i,1}+{b}_{i,0})+\sum _{j=m+1}^{m+n}{b}_{j,1}\u2a7d{C}_{H}\\ \sum _{j=m+1}^{m+n}{b}_{j,0}+\sum _{k={N}_{P}-p+1}^{{N}_{P}}({b}_{k,1}+{b}_{k,0})\u2a7d{C}_{L}\end{array}\phantom{\}},$$Given ${C}_{H}$ , ${C}_{L}$ , and video encoding parameters, $m$ , $n$ , and $p$ can be determined by a heuristic algorithm summarized as follows:

1. Calculate ${C}_{0}$ and subtracted it from ${C}_{H}$ , and then initialize $m$ and $p$ to one.

2. Compare ${\sum}_{i=1}^{m}{b}_{i,0}$ with the remaining transmission capacity supported by the HQ subchannel group, i.e., ${C}_{H}\text{-}{C}_{0}$ . If ${\sum}_{i=1}^{m}{b}_{i,0}$ is less than or equal to the latter, increase $m$ by one and repeat step 2; otherwise go to step 3.

3. Similarly, compare ${\sum}_{i=1}^{m}{b}_{i,0}$ with ${C}_{H}-{\sum}_{k=1}^{{N}_{p}-p}{b}_{k,1}$ . If ${\sum}_{i=1}^{m}{b}_{i,0}$ is less than or equal to the latter, increase $m$ by one and repeat step 3; otherwise go to step 4.

4. Increase $p$ by one and compare ${\sum}_{i=1}^{m}{b}_{i,0}$ with the rest of the transmission capacity provided by the HP subchannel group. If ${\sum}_{i=1}^{m}{b}_{i,0}$ is less than or equal to the latter, increase $m$ by one and repeat step 4; otherwise the procedure is finished.

## 4.

## Experimental Results

We first examine the performance of the distortion derivation algorithm. The sequences “Grandma” and “Silent” in QCIF format with a frame rate $30\phantom{\rule{0.3em}{0ex}}\mathrm{Hz}$ were encoded using H.264/AVC reference software JM15.1. The GOP consisted of one I-frame and ${N}_{G}-1$ P-frames without B-frames, where ${N}_{G}$ was set to 15. In addition, the UseConstrainedIntraPred option was set up, and the quantization parameter (QP) was fixed at 28 for all experiments.

Figure 1 shows the distortion of all subsequent frames versus frame index when partition A of the first P-frame is lost for “Silent.” From Fig. 1, we can observe that the distortion derived accords with the distortion simulated to a large extent, so the distortion determined by Eq. 5 can be used as the metric to evaluate the importance of different partitions of P-frames in one GOP.

Next, the performance of the priority sorting method is verified. Five video sequences of 300 frames, “Mother&Daughter,” “Grandma,” “Salesman,” “Silent,” and “Foreman,” were encoded with the same parameters used in the first set of experiments. Assume that the BER of HP coded video data is ${10}^{-6}$ , the BER of LP data is ${10}^{-3}$ and ${10}^{-2}$ , and ${C}_{H}$ and ${C}_{L}$ satisfied ${C}_{H}:{C}_{L}=1:2$ . For comparison, the rations of $m$ , $n$ , and $p$ to ${N}_{P}$ were fixed as $1\u22156$ , $2\u22153$ , and $1\u22156$ , repectively.

Comparing the proposed priority sorting scheme with data partitioning in H.264/AVC at the frame level, the average peak signal-to-noise ratio (PSNR) for all sequences and the PSNR versus the frame number for sequence “Silent” are given in Table 1 and Fig. 2, respectively. From Table 1 and Fig. 2, we can see that the proposed method can improve the reconstructed quality due to its better error resilience and adaptability, especially for sequences with medium motion.

## Table 1

Average PSNR comparison of the proposed scheme against data partitioning.

Sequence | BER for LP data is 10−3 | BER for LP data is 10−2 | ||||
---|---|---|---|---|---|---|

Original | Proposed | ΔPSNR | Original | Proposed | ΔPSNR | |

Mother and Daughter | 36.66 | 36.89 | $+0.23$ | 35.62 | 36.29 | $+0.67$ |

Grandma | 36.62 | 36.66 | $+0.04$ | 35.98 | 36.20 | $+0.22$ |

Salesman | 35.02 | 35.25 | $+0.23$ | 33.72 | 34.62 | $+0.90$ |

Silent | 32.19 | 32.74 | $+0.55$ | 30.63 | 31.72 | $+1.09$ |

Foreman | 29.95 | 31.47 | $+1.52$ | 28.66 | 31.09 | $+2.43$ |

## 5.

## Conclusions

A distortion derivation model is proposed by jointly considering the unequal importance of different partitions in one slice and each frame in one GOP. Furthermore, a heuristic algorithm is designed to determine priority sorting. Simulation results show that the distortion derivation model matches the actual distortion well, and the introduced priority sorting method improves the reconstructed video quality under error-prone environments.

## Acknowledgments

This research was supported by the National Natural Science Foundation Research Program of China, numbers 60772134 and 60902081, and the 111 project (B08038). We would like to thank the editors and anonymous reviewers for their valuable comments and suggestions.

## References

**,” IEEE Trans. Circuits Syst. Video Technol., 13 (7), 645 –656 (2003). https://doi.org/10.1109/TCSVT.2003.814966 1051-8215 Google Scholar**

*H.264/AVC over IP***,” IEEE Signal Process. Lett., 12 (8), 577 –580 (2005). https://doi.org/10.1109/LSP.2005.851261 1070-9908 Google Scholar**

*Prioritized transmission of data partitioned H.264 video with hierarchical QAM***,” IEEE Commun. Mag., 44 (1), 107 –114 (2006). https://doi.org/10.1109/MCOM.2006.1580940 0148-9615 Google Scholar**

*Toward an improvement of H.264 video transmission over IEEE 802.11e through a cross-layer architecture***,” Signal Process. Image Commun., 19 (1), 67 –79 (2004). https://doi.org/10.1016/j.image.2003.08.018 0923-5965 Google Scholar**

*Unequal error protection for MPEG-2 video transmission over wireless channels*