Block boundary matching algorithm for generating side information in distributed video codec

Abstract. Distributed video codec (DVC) has been developed to construct a simple encoder that utilizes information theory for distributed sources in the circumstance of mobile multimedia communication. In the DVC codec, an efficient algorithm to generate side information (SI) is one of the most important techniques to improve the coding performance. We propose a scheme to increase the quality of SI frame, where the proposed scheme consists of three steps. In the first step, SI frame is constructed by motion estimation and motion compensation in the DVC decoder. Then, in the second step, the blocks in the temporary SI frame are classified into reliable or unreliable ones. The unreliable blocks are updated by block boundary matching algorithm in the third step. Simulation results show that the proposed algorithm outperforms the conventional methods significantly. In addition, the proposed scheme can be combined with the conventional schemes generating SI frame to increase the coding performance of the DVC codec.


Introduction
In the last few years, a variety of mobile multimedia devices have been developed and become popular.The end-users would like to capture various scenes and to send those through up-link channels.Usually, the capability of the mobile devices that the end-users have is constrained with low-power and low-CPU speed.In this circumstance, the conventional video codecs, such as MPEG-2, H.264/AVC, and MPEG-4, are not appropriate for the portable devices because those codecs need the high power to encode video sequences.Thus, demand for simple and lowpower video encoder has been continuously increasing.Distributed video codec (DVC) is one of the solutions to encode video sequence with low complexity on the encoder side.The complexity of DVC encoder is significantly lower than those of the conventional encoders.DVC codec is regarded as an advanced codec that is appropriate for low power applications such as wireless surveillance, sensor network, and mobile camera phones.
DVC had been developed based on Slepian-Wolf 1 and the Wyner-Ziv 2 theorems which had proved that two signals can be decoded with prediction on the decoder side after those signals have been encoded without prediction between them on the encoder side.A theoretical method for lossless coding with side information (SI) was performed in Ref. 1.This method is referred to as Slepian-Wolf coding and is used with a channel-coding scheme.Slepian-Wolf coding was extended to a lossy compression by Wyner and Ziv. 2,3The Wyner-Ziv codec consists of a quantization module, channel coding module, such as turbo code and low-density paritycheck accumulator, punctured matrix, and SI generator.Most DVC codecs have been developed based on the models 4,5 which Stanford University and University of California Berkeley have provided.The model of UC Berkeley is named power-efficient robust, high-compression, syndrome-based multimedia (PRISM) coding. 5In Ref. 6, a project, which is called distributed coding for video services (DISCOVER), had been performed to construct the lowpower video codec based on Slepian-Wolf and Wyner-Ziv theorems.In DISCOVER codec, key frames were encoded with the conventional intra-coding technique (intra-mode of H.264/AVC).On the other hand, the Wyner-Ziv frames were split to nonoverlapped blocks and then the blocks were transformed and quantized.The quantized coefficients were ordered bit plane by bit plane.Those coefficients were fed into a systematic channel encoder.
][9][10][11] Qing et al. 7 proposed a model for the correlation of noise statistics, which was utilized to increase coding performance.Coding performances of various DVC codecs were analyzed in Ref. 8, where motion vectors (MVs) were estimated with subpixel resolution to generate SI.In Refs.9-11, enhanced SI was generated using various algorithms.Petrazzuoli et al. 9 proposed a method to generate the SI, where more than two intra-decoded frames are used to estimate the position of the current MC block.A bilateral ME-based scheme to generate SI was described in Ref. 10.In Ref. 11, SI was generated using a side matching algorithm.Among the tools to increase the performance of the DVC codec, an efficient scheme to generate the SI frame is one of the most important techniques because it dominantly affects the quality of the picture in the DVC decoder and the encoding rate.In this paper, we propose a scheme to generate the SI frame using block boundary matching after analyzing the coding blocks in the initial SI frame.
The main functions of the SI generation module are motion estimation (ME) and motion compensation (MC).Thus, it is important to estimate MVs efficiently in DVC decoder.][17] In this paper, we propose an efficient scheme to estimate MVs while the specific properties of MVs in DVC decoder are considering.
This paper is organized as follows.In Sec. 2, the model for DVC codec is formulated.The proposed scheme is explained in Sec. 3. Simulation results are presented in Sec. 4. Section 5 concludes this paper.
2 Wyner-Ziv Codec Model Figure 1 shows a video communication system that incorporates the Wyner-Ziv (WZ) codec.In this system, the odd frames fX 2i−1 ; i ¼ 1; 2; 3; : : : g are encoded using the intra-coding mode of the H.264/AVC standard, 18 while the even frames fX 2i ; i ¼ 1; 2; 3; : : : g are encoded using the WZ codec which consists of a Slepian-Wolf coding module and an outer quantizer-reconstruction pair.The data generated from encoding intra-frame X 2i−1 and WZ frame X 2i are transmitted separately over independent channels.We assume that the channel for intra-frame is robust enough to prevent channel errors.fX 0 2i−1 ; i ¼ 1; 2; 3; : : : g and fX 0 2i ; i ¼ 1; 2; 3; : : : g are the decoded frames on the decoder side.
When a WZ frame is encoded on the encoder side, the frame is split to nonoverlapped 8 × 8 blocks and the blocks are transformed by discrete cosine transform (DCT).The transformed frame is denoted by T 2i .The DCT transformed frame is quantized by a scalar quantizer.The quantized frame is denoted by Q 2i , where a quantized datum is represented by a binary index.The binary indexes are encoded using a turbo coder to protect it from channel error.The turbo encoder changes each binary index to another binary vector which consists of information and redundant bits.Among the resulted bits, only the redundant bits are transmitted to the decoder, whereas the information bits are not sent.The set of binary vectors which turbo encoder generates is denoted by P 2i .
On the decoder side, while the odd frames fX 0 2i−1 ; i ¼ 1; 2; 3; : : : g are reconstructed by the H.264/ AVC decoder with the intra-mode, the even frames fX 0 2i ; i ¼ 1; 2; 3; : : : g are made by WZ decoder.In decoding WZ frames, when the quantized frame Q 0 2i is generated by the turbo decoder both redundant and information bits are needed.But because the information bits have not been sent from the encoder, the turbo decoder should use the prediction values for the information bits.In Fig. 1, QTðY 2i Þ is used instead of the information bits, where Y 2i is SI frame.Y 2i is constructed using ME and MC for the previous and next intra-frames, X 0 2i−1 and X 0 2iþ1 , that have been reconstructed by H.264/AVC decoder.QTðY 2i Þ is resulted from quantizing after DCT transforming Y 2i .After the turbo decoder has generated Q 0 2i , the decoded WZ frame X 0 2i is reconstructed by applying inverse DCT after inverse quantizing the Q 0 2i .Because the X 0 2i is made using QTðY 2i Þ, the quality of X 0 2i depends on the quality of Y 2i .Therefore, various research [9][10][11] has been performed to increase the quality of the SI frame.Petrazzuoli et al. 9 proposed a method to generate the SI where more than two intra-decoded frames are used to estimate the position of the current MC block.In the first step of Ref. 9, temporary MVs are estimated from X 0 2iþ1 to X 0 2i−1 .Then, the half-sized vectors of the temporary MVs are used as MVs for the corresponding blocks in the SI frame.This step provides a temporary SI frame.In the next step, new forward and backward MVs are estimated from Y 2i to X 0 2iþ1 and X 0 2i−1 , respectively.Based on the forward and backward MVs, the SI frame is refined.In Ref. 10, a bilateral ME-based scheme to generate SI frame had been proposed, where side matching distortion was used in the ME process.In the initial step of Ref. 10, seed blocks were selected to increase the performance of ME process, which increases the quality of SI frame.Ko et al. 11 had proposed an algorithm to generate SI frame using a side matching algorithm, where blocks in SI frame were refined considering the mismatch error between the template of the current blocks and the corresponding pixels in the intraframes X 0 2iþ1 and X 0 2i−1 .In this paper, we make an initial SI frame by using ME and MC processes where these procedures are not constrained by a specific algorithm.One of the conventional ME and MC schemes can be used in this step.Because the qualities of some blocks in the initial SI frame may be very low, the blocks are refined according to the reliabilities of the blocks.The reliability of each block is calculated using the concentration ratio of MVs of the neighbor blocks.

Proposed Algorithm
The proposed scheme consists of three steps to generate the SI frame.The initial SI frame is constructed by using forward, backward, and bidirectional ME and MC between X 0 2i−1 and X 0 2iþ1 in the first step.Then, the blocks in SI frame are analyzed in the second step.Finally, in the third step, the SI frame is refined using the information generated in the second step.

First Step: Generating the Initial SI
In the first step, a temporary SI frame is constructed by ME and MC for X 0 2i−1 and X 0 2iþ1 .After temporary backward and forward MVs are estimated from X 0 2iþ1 to X 0 2i−1 and X 0 2i−1 to X 0 2iþ1 , respectively, the first temporary SI frame is constructed by MC based on the temporary MVs.Then, new forward and backward MVs are estimated from the first temporary SI frame to X 0 2iþ1 and X 0 2i−1 , respectively.By using the MC based on the new forward and backward MVs, the second temporary SI frame is constructed.
Figure 2 shows the relationship between the two key frames (X 0 2i−1 , X 0 2iþ1 ) and the second temporary SI frame (Y 2i ).In Y 2i , a block B s;t whose size is N × N is constructed by ME and MC.s and t denote the horizontal and vertical indexes of the block in Y 2i .MV 0 s;t and MV 1 s;t are the MVs estimated for X 0 2i−1 and X 0 2iþ1 , respectively, to reconstruct B s;t .The superscript 0 and 1 imply the backward and forward data, respectively.The horizontal and vertical components of MV l s;t are denoted by MV l s;t ðxÞ and MV l s;t ðyÞ, respectively, where the superscript l is 0 or 1.As can be seen from other researches, [9][10][11] it is difficult for the temporary SI frame to demonstrate high quality because the ME and MC modules generate some poor quality blocks.

Second Step: Evaluating the Reliability
In the second step, the reliability of each B s;t is evaluated.If a B s;t has high quality, then the MVs of the B s;t are highly correlated to those of the neighboring blocks.To describe the neighbor blocks, we define two sets related to the neighboring blocks with Ω ¼ fB sþm;tþn ; ðm; nÞ ¼ ð−1; −1Þ; ð−1; 0Þ; ð−1; 1Þ; ð0; −1Þ; ð0; 1Þ; ð1; −1Þ; ð1; 0Þ; ð1; 1Þg and Φ ¼ Ω ∪ fB s;t g.Ω is a set of the neighbor blocks of the current block B s;t .Adding the current block B s;t to Ω results in the extended set Φ. Note that the number of blocks in Ω and Φ are 8 and 9, respectively.If the variance of MVs of blocks in Φ is smaller than that of blocks in Ω, it means that adding MVs of B s;t to the set of MVs of blocks in Ω increases the concentration of MVs.This case implies the quality of B s;t is high and the block is reliable.
The variance of MVs can be evaluated by using eigenvalues of the covariance matrix of the MVs.The covariance matrixes of MVs related to Ω and Φ are where , and λ 2 ðϒ l Φ Þ, respectively, the reliability J s;t of the block B s;t is defined as follows: If the block B s;t is made using the ME and MC from only the previous intra-coded frame X 0 2i−1 or only the next frame X 0 2iþ1 , then the reliability of Eq. (9) becomes or respectively.In Eqs. ( 9)-( 11), as the value of jλ 1 ðϒ l Φ Þjþ jλ 2 ðϒ l Φ Þj becomes smaller, the MVs in ϒ l Φ become more locally concentrated.Therefore, the state of J s;t > 1 refers to the case in which adding MV l s;t to ϒ l Ω results in an increased concentration of ϒ l Φ .This implies that if the J s;t of a block B s;t is larger than those of the other blocks, the quality of the block B s;t is higher than others.

Third Step: Refinement of Blocks
In this step, if J s;t 's of some blocks are <1, then the blocks are classified as unreliable and remade.On the other hand, if J s;t 's of some blocks are not <1, then the blocks are classified as reliable.After the unreliable blocks are sorted in a decreasing order of their J s;t 's, those are remade in the order by ME/MC procedure using block boundary matching (BBM).Figure 3 shows that an SI frame consists of reliable and unreliable blocks.Because the neighbor reliable blocks can be used to remake the unreliable block B s;t , the MV of B s;t is re-estimated with templates T U ðs; tÞ, T B ðs; tÞ, T L ðs; tÞ, and T R ðs; tÞ which are the upper, bottom, left, and right templates, respectively.Note that those templates consist of pixels in neighbor reliable blocks.If some neighbor blocks around the current unreliable block B s;t are unreliable, then the corresponding templates are not used.The MV of B s;t is estimated by minimizing the following cost function: where superscripts "l ¼ 0" and "l ¼ 1" imply that the data are related to X 0 2i−1 and X 0 2iþ1 , respectively.R l U ðMV l s;t Þ, R l B ðMV l s;t Þ, R l L ðMV l s;t Þ, and R l R ðMV l s;t Þ are sets of pixels in reference frames whose pixels are overlapped with templates T U ðs; tÞ, T B ðs; tÞ, T L ðs; tÞ, and T R ðs; tÞ displaced by MV l s;t , respectively.D½T q ðs; tÞ; R l q ðMV l s;t Þ denotes the mean-squared difference between T q ðs; tÞ and R l q ðMV l s;t Þ, where q is one of the fU; B; L; Rg.If T q ðs; tÞ is used because it is included in the reliable neighbor block, then w q is set to 1, otherwise, it is set to 0. If ME/MC is performed with forward or backward only, then the corresponding one in fl ¼ 0; 1g is used in Eq. (12).The variable Z is the number of pixels used to calculate the cost function of Eq. ( 12).

Simulation Results
In this section, the gain of the proposed scheme is represented by Bjøntegaard delta (BD) rate reduction. 19The intra-frames are encoded with the intra-mode of JM18.0.The number of frames to be encoded is 300.The GOP structure is "IWIWI. . .," where "I" and "W" denote the intra-and WZ frames, respectively.When the intra-and WZ frames are encoded, the QPs are set to {27, 31, 35, 39} which are used in the calculation of BD rate reduction. 19Note that the quantization module of H.264/AVC is used for both intra-and WZ frames.In the decoders of DVC codecs, the MVs were estimated with 1∕4 pixel resolution.
In Figs. 4 and 5, and Tables 1 and 2, the DVC codec incorporating the proposed scheme (BBM) is compared with DVC codecs using the conventional algorithms, [9][10][11] where the size N × N of B s;t is set to 8 × 8 or 16 × 16.Figures 4  and 5 show that the rate-distortion (RD) curves of the DVC codecs using the proposed scheme are higher than those incorporating the conventional methods.It implies that the proposed scheme outperforms the conventional schemes in the viewpoint of RD.In Tables 1 and 2, the proposed algorithm has gains of −5.9%, −5.3%, −8.1% and −6.3%, −5.7%, −8.1% on average BD rates against the conventional schemes.Note that the negative number of BD rate implies that the proposed scheme reduces the total number of Fig. 3 The SI frame is updated through ME and MC using block boundary matching.
Optical Engineering 103111-4 October 2013/Vol.52 (10)  the bits generated from encoding the video sequence while the image quality resulted from the proposed scheme is equal to those of the conventional schemes.The gains of the case of N × N ¼ 16 × 16 are larger than that of N × N ¼ 8 × 8 because the template of N × N ¼ 16 × 16 block contains the more useful information to construct the SI frame than that of N × N ¼ 8 × 8 block.
To understand the tendency of gains according to the block size, we show the relationship between the BD rate gain and the block size N × N in Fig. 6, where the DVC codec incorporating BBM is compared with the DVC codec using the conventional algorithm. 9Although the performance depends on the test sequences, the overall gains show that the case of N × N ¼ 16 × 16 provides the best performance among three cases of the block size.The boundary template of N × N ¼ 8 × 8 is less useful to update the unreliable blocks than that of N × N ¼ 16 × 16, because the number of pixels in the template of N × N ¼ 8 × 8 is smaller than that of N × N ¼ 16 × 16.On the other hand, because in the case of N × N ¼ 32 × 32, the correlation between pixels in a block and templates of the block is lower than that of the case of N × N ¼ 16 × 16, the case of N × N ¼ 16 × 16 gives more gain than that of N × N ¼ 32 × 32.
In Tables 1 and 2, the complexity of the DVC codec using the proposed scheme is between 97% and 105% of those using the conventional schemes.Note that 100% implies the complexity of the proposed scheme is equal to that of the conventional scheme.The complexities represented in the tables show that the complexity of the proposed scheme is approximately equal to those of the conventional schemes.
Validity of the reconstructed SI has been checked in Table 3 where the averaged PSNRs of SI reconstructed in DVC decoder were measured.Even though it is not allowed in DVC scenario, it is helpful to check the accuracy of SI generation algorithms.As we can see from this table, the PSNRs of SI generated by the proposed BBM are higher than those resulting from the conventional algorithms.It implies that the proposed scheme outperforms the conventional algorithms in increasing the quality of the generated SI frame.
To analyze the performances of the steps of the proposed algorithm, the percentage of the blocks regarded as unreliable in the proposed scheme is shown in Table 4, where the percentages of the unreliable blocks are <3%.Although the portion is small, the gains resulted by updating the unreliable blocks are significant.In this table, we measure BD rate gains of the third step against the first step of the proposed scheme.This table shows that the third step of the proposed algorithm increase the coding efficiency significantly.
The second and third steps of the proposed scheme described in Sec. 3 can be used to enhance the performance of the conventional algorithms, [9][10][11] because the steps increase the quality of the SI frame by updating those that have been made by the conventional algorithms.To demonstrate it, the performance comparisons between DVC codecs using the combined techniques (conventional schemes + second and    third steps of BBM) and the conventional schemes are represented in Table 5, where N × N ¼ 16 × 16.In the DVC codec using the combined techniques, instead of the first step of the proposed algorithm, one of the conventional schemes is used to make the temporary SI frame.The temporary SI frame is updated by the second and third steps of the proposed algorithm.The DVC codecs using the combined techniques are more efficient than codecs using the conventional schemes.As for the results related to Ref. 9 in Table 5, the gains of the combined method are insignificant.Because SI frame constructed by Petrazzuoli et al. 9 a lot of unreliable blocks, the number of reliable neighbor blocks that the BBM scheme can utilize is small.Note that BBM is useful when a lot of reliable neighbor blocks are included in the temporary SI frame.
In DVC codec, the complexity of the channel decoder (turbo decoder) decreases as the quality of the SI frame increases, because the number of operations sending more parity bits which are requested by the decoder is reduced as the quality of the SI frame increases.Therefore, in Table 5, because the quality of SI frame generated from the combined techniques is higher than those resulting from the conventional schemes, the complexity of the DVC codec using the combined techniques may be smaller than the conventional codecs.This table shows that the proposed scheme (BBM) can be used to increase the performances of the conventional schemes.

Conclusions
This paper proposes an efficient method to reconstruct the SI frame in a DVC decoder.In the proposed algorithm, the blocks in the SI frame are classified to reliable and unreliable blocks, and then the unreliable blocks are remade using a BBM scheme.Simulation results show that the proposed scheme outperforms the conventional methods.The proposed scheme can be combined with the conventional schemes to increase the coding performance further.

Fig. 4
Fig. 4 Comparison between rate-distortion (RD) curves of the conventional and the proposed algorithms, where the BBM is performed with N × N ¼ 8 × 8. (a) Foreman, (b) mobile, (c) hall, (d) silent.

Fig. 5
Fig. 5 Comparison between RD curves of the conventional and the proposed algorithms, where the BBM is performed with N × N ¼ 16 × 16.(a) Foreman, (b) mobile, (c) hall, (d) silent.

Table 1
Bjøntegaard delta (BD) rate gains and the relative complexity of the proposed DVC codec.N × N ¼ 8 × 8.

Table 2 BD
rate gains and the relative complexity of the proposed DVC codec.N × N ¼ 16 × 16.