Spatiotemporal deinterlacing using a maximum a posteriori estimator based on multiple-field registration

Abstract. This paper proposes an accurate deinterlacing algorithm using a maximum a posteriori (MAP) estimator. First, we produce accurate motion vector fields between the current field and adjacent fields by employing an advanced motion compensation scheme that is suitable for an interlaced format. Next, the progressive frame corresponding to the current field is found via the MAP estimator based on the derived motion vector fields. Here, in order to obtain a stable solution, well-known bilateral total variation–based regularization is applied. Then, at a specific mode decision step, it is decided whether the result from the aforementioned temporal deinterlacing is acceptable or not. Finally, if the temporal deinterlacing is determined to be inappropriate by the mode decision, a typical spatial deinterlacing is applied instead of the MAP estimator-based temporal deinterlacing. Experimental results show that the proposed algorithm provides at maximum 2 dB higher PSNR than a cutting-edge deinterlacing algorithm, while providing better visual quality than the latter.


Introduction
Deinterlacing is an important technique because it converts interlaced video sequences into progressive sequences for progressive digital display devices, such as LCDs, plasma display panels, and organic light-emitting diode TVs.However, visually annoying artifacts, such as edge flicker, jagging, blurring, and feathering, often occur due to imperfect deinterlacing.Thus, a critical issue related to deinterlacing is removing such artifacts as much as possible.
During the past several decades, numerous deinterlacing algorithms have been developed. 1][14][15][16][17][18][19][20][21][22] Spatial deinterlacing algorithms have the advantages of simple operation and straightforward integration into hardware.The edge-based line averaging (ELA) algorithm is one of the most popular deinterlacing algorithms in this category.However, since all the spatial deinterlacing schemes interpolate the missing pixels by using only intrafield information, the interpolated pixels in the texture area tend to be somewhat blurred.
0][11] MA deinterlacing methods detect the existence of motion and then use either an intrafield method, i.e., a typical spatial deinterlacing, or an interfield method.Here, the interield method is a simple interfield interpolation without motion compensation.However, this MA deinterlacing method still suffers from blur or jagging artifacts because it should apply the intrafield method when interpolating missing pixels in moving objects.
4][15][16][17][18][19][20][21][22] For example, Chang et al. presented an adaptive four-field global/local motion-compensated approach where the same parity four-field motion detection and four-field motion estimation detect static areas and fast motion by four reference fields, and global motion estimation detects camera panning and zooming motions. 13owever, this algorithm may give rise to feathering artifacts when the assumption of strong continuity of local motion does not work.Fan and Chung proposed a temporal deinterlacing algorithm using strong spatial-temporal correlationassisted motion estimation. 17To our knowledge, Fan's algorithm is the state-of-the-art deinterlacing scheme.However, since conventional temporal deinterlacing algorithms inherently cannot compensate for registration error between the current field and its reference fields, they have a limitation in further improving visual quality.
On the other hand, maximum a posteriori (MAP) estimation has recently been applied to numerous image reconstruction techniques, e.g., deblur, denoising, and superresolution.In statistics, MAP estimation is a method of estimating the parameters of a statistical model.When applied to a data set and given a statistical model, an MAP estimator provides estimates for the model's parameters.For example, MAP estimators are often used for superresolution reconstruction. 23,24Superresolution restoration aims to solve the following problem: given a set of observed low-resolution images, estimate a high-resolution image.The observed lowresolution images are regarded as degraded observations of a real, high-resolution texture.These degradations typically include geometric warping, optical blur, spatial sampling, and noise.Given several such observations, the MAP estimate of the superresolution image may be obtained such that, when reprojected back into the images via a generative imaging model, it minimizes the difference between the actual and predicted observations. 24Note that if accurate motion information of a certain missing pixel is given, the MAP-based superresolution can reconstruct the missing pixel such that it is very close to its original.
We showed that if the MAP estimator is applied to temporal deinterlacing, subjective visual quality as well as objective visual quality can be improved by minimizing inherent registration errors in missing pixels in Ref. 20.In this paper, we propose an advanced MAP-estimator-based deinterlacing algorithm using high-performance MC method and strong mode decision.The proposed algorithm consists of four steps.First, accurate multiple-field registration is performed between the current field and its neighboring fields.Second, the progressive frame corresponding to the current field is reconstructed via an L 1 -norm-based MAP estimator using the predicted motion fields.Here, in order to obtain an optimal solution, the well-known steepest descent algorithm is employed.Also, bilateral total variation (BTV)-based regularization is applied to obtain a stable solution and preserve edges well.Third, a mode decision module determines whether the result from the aforementioned temporal deinterlacing is acceptable or not.The proposed mode decision relies on three factors: feathering artifacts, registration errors, and motion vector (MV) correlation.Fourth, if the temporal deinterlacing is determined to be inappropriate by the mode decision, a typical spatial deinterlacing based on edge-directional interpolation is applied instead of the MAP estimatorbased temporal deinterlacing.Experimental results show that the proposed algorithm obtains at maximum 2 dB higher peak signal-to-noise ratio (PSNR) than the state-of-the-art spatiotemporal deinterlacing algorithm, i.e., Fan's algorithm, 17 while providing better visual quality.
The remainder of this paper is organized as follows.Section 2 describes the proposed deinterlacing algorithm in detail.Section 3 provides intensive experimental results.Finally, we conclude this paper in Sec. 4.

Proposed Algorithm
As illustrated in Fig. 1, the proposed deinterlacing algorithm has a structure similar to a typical spatiotemporal deinterlacing scheme based on a hard decision between spatial deinterlacing and temporal deinterlacing.The main contribution of this paper is that we can reconstruct even registration errors in temporal deinterlacing by employing a MAP estimator.Note that previous temporal deinterlacing methods output motion-compensated pixels only.The proposed algorithm, meanwhile, can produce outstanding visual quality, especially around diagonal edges, while significantly reducing jagging or feathering artifacts in comparison with the state-of-the-art deinterlacing schemes.
The proposed algorithm consists of four steps, as seen in Fig. 1.First, an advanced motion estimation (ME) algorithm, which is an improved version of the spatial-temporal correlation-assisted search (STFS) proposed in Ref. 17, is applied to the current field and adjacent fields.Second, based on the estimated motion information, MAP estimator-based temporal deinterlacing for the current field is performed.Third, on a block basis, a mode decision module determines whether the result from the temporal deinterlacing is acceptable or not.Fourth, if the temporal deinterlacing result is determined to be unacceptable, a typical spatial deinterlacing is applied.The following subsections describe each step of the proposed algorithm in more detail.

Conventional STFS
Prior to describing the advanced STFS, we explain the conventional STFS (Ref.17) in detail as follows.As illustrated in Fig. 2(a), three consecutive fields, i.e., f n−1 ; f n ; f nþ1 , are involved in the estimation of the motion trajectory.The bidirectional ME and the two kinds of unidirectional ME, i.e., forward and backward ME, are combined in order to fully exploit the information from the three fields.
Let SAD P and SAD N be the sum of absolute differences (SAD) between two blocks used in the backward ME and forward ME, respectively, and let SAD B be the SAD between two blocks used in the bidirectional ME, where these terms are defined as follows: where (x; y) denotes a position in the current block b, and q ≡ ðm; nÞ indicates a candidate MV within the given search Fig. 2 Conventional spatial-temporal correlation-assisted search (STFS) scheme.
range.The best MV q ¼ ð m; nÞ for the block b is then obtained by minimizing the following cost: where Note that the information from the three consecutive fields is involved in the estimation of the motion trajectory.From Fig. 2(a), q is chosen as the initial MV candidate ṽini .For each matching block, there are four spatial MV candidates and five temporal MV candidates, as in Fig. 2(b).The spatial/temporal bidirectional MV candidates are sorted according to the number of occurrences.For the current block, the matching error corresponding to each spatial/temporal MV candidate ṽST is compared to the matching error of the initial MV candidate ṽini according to Eq. ( 4).The comparison is performed starting from the candidate with the maximum number of occurrences.
where T 1 is a predetermined threshold that is a function of SADðṽ ini Þ.If Eq. ( 4) is satisfied, ṽST is chosen as the final MV.Otherwise, the candidate with the second largest number of occurrences is compared to the initial candidate, and so on.If all the spatial/temporal MV candidates fail to pass the matching error check, ṽini is chosen as the final MV.

Advanced STFS
In order to obtain more accurate and denser MVs than the conventional STFS, we apply the overlapped motion compensation scheme to the STFS.Note that MV estimation is performed on an 8 × 8 block basis, but the matching block size is 16 × 16.As shown in Fig. 3, the best MV for a specific 8 × 8 block is obtained by block matching for the extended 16 × 16 block with the 8 × 8 block as a center.The conventional STFS is used for this block matching.Finally, the estimated MV for the extended 16 × 16 block is allocated to the 8 × 8 center of the matching block.Table 1 shows the superiority of the advanced STFS to the conventional STFS in terms of PSNR.For this experiment, we employed the same progressive common intermediate format (CIF) sequences as in Sec. 3 and produced their interlaced versions in the same fashion as described in Sec. 3.For each method, we computed the PSNR between the motion-compensated frame and the original frame.We compared the averaged PSNR for the first 300 frames of each sequence as an objective evaluation measure.For example, Table 1 shows that the advanced STFS outperforms the conventional STFS by 0.8 dB at maximum for the container sequence.This is because the advanced STFS can produce more accurate and denser MVs than the conventional STFS.Thus, we applied this advanced STFS for registration of the proposed temporal deinterlacing, which is described in the following subsection.

MAP Estimator-Based Temporal Deinterlacing
The key point of the proposed algorithm is to reconstruct even registration errors in temporal deinterlacing by employing a MAP estimator, whereas previous temporal deinterlacing methods output motion-compensated pixels only.In order to formulate the MAP estimation problem for temporal deinterlacing, we should define the acquisition model of an interlaced field.First, we assume that a progressive frame F may have a motion model M with adjacent frames.Next, a space-invariant point spread function (PSF) H is applied to F for antialiasing.Finally, an interlaced field f is generated by applying an interlace decimation operator D and adding a noise component V.Note that the decimation operator D alternatively subsamples even lines and odd lines, and the decimation operation is defined as follows: According to Eq. ( 5), the n'th field f n is derived from the n'th frame F n .The current field and its neighbor fields are then defined as Let the processing block size be b × b and the scaling ratio be set to r∶1.Then, F in Eq. ( 6) has a dimension of [r 2 b 2 × 1], which is arranged in lexicographic order.The [r 2 b 2 × r 2 b 2 ] matrix M n is the geometric motion operator 16x16 matching block MV allocation on a 8x8 block basis Fig. 3 The overlapped matching of the advanced STFS.In general, a MAP estimator for the original progressive frame F given the observed interlaced fields f n can be formulated to find the MAP estimate F via the following L p minimization: where λ is a control parameter and N is the number of reference fields for MAP estimation.Note that H n ¼ H because a common PSF is assumed for all fields without loss of generality.The second term is known as regularization.Consider regularization in deinterlacing is very useful for finding a stable solution. 23Among many possible regularization terms, we select BTV-based regularization, which results in a full progressive frame with sharp edges and is easy to implement.The BTV-based regularization term is defined by where τ and α are parameters to control the regularization level, and S l x and S m y are operators to shift by l and m pixels in the horizontal and vertical directions, respectively.This paper employs L 1 -norm regularization owing to its simplicity and edge-preserving property.Also, L 1 -norm is applied to the data term, i.e., the first term of Eq. ( 7), to minimize sensitivity to noise, i.e., p ¼ 1.
In addition, we compare BTV-based regularization of Eq. ( 8) with a typical TV-based regularization 23 to show the superiority of the BTV-based regularization quantitatively and qualitatively.Table 2 compares BTV-based regularization and TV-based regularization in terms of PSNR.
The PSNR value in Table 2 is the average PSNR for the first 50 deinterlaced frames per sequence.We can see that BTV-based regularization provides at maximum 1.1 dB and on average 0.4 dB higher PSNRs than TV-based regularization.For example, Fig. 4 shows the deinterlacing results for carphone and foreman sequences.Note that the TV-based regularization causes some artifacts for flat areas due to its sensitivity to noise.On the other hand, the proposed BTV-based regularization seldom shows such a phenomenon because it is inherently robust against noise and preserves the details such as edges and textures better than TV-based regularization.As a result, we can achieve high-quality temporal deinterlacing via this robust regularized MAP estimator.
We use the steepest descent algorithm to find the solution to Eq. (7).We can thus obtain an optimal solution in the following iterative manner.That is, the (t þ 1)'th MAP estimate Ftþ1 is derived from the t'the estimate Ft .
where β is a scalar defining the step size in the direction of the gradient and I is an identity matrix.The matrices M, H, and D and their transposes can be exactly interpreted as direct image operators, such as shift, blur, and decimation. 23Noting and implementing the effects of these matrices as a sequence of operators spares us from explicitly constructing them as matrices.This property allows the proposed temporal deinterlacing method to be implemented in an extremely fast and memory efficient way.After several iterations according to Eq. ( 9), an optimal progressive frame F corresponding to the current field is reconstructed from N consecutive fields including the current field.Finally, MAP estimator-based temporal deinterlacing is completed by replacing the missing field pixels in the current field with the corresponding pixels in F.

Mode Decision
Conventional spatiotemporal deinterlacing methods often suffer from feathering, blur, and jagging artifacts caused by false mode decision.For instance, feathering artifacts occur near object boundaries due to occlusion or inaccurate MVs.Mode decision, hence, significantly affects the visual quality of spatiotemporal deinterlacing algorithms.Fan and Chung employed the so-called slope detector (SD)-based feathering artifact detector due to its low complexity and high detection ability. 17his paper presents a stronger mode decision method that additionally takes into account mean of absolute differences (MAD) values, MV correlation, and MAD correlation.Figure 5 illustrates the proposed mode decision scheme where the MV reliability is examined on a block basis in two steps.Assume that the MVs are produced by the advanced STFS.First, for each block that is motion-compensated by the advanced STFS, feathering artifacts are detected by the aforementioned SD-based detector of Ref. 17.If a feathering artifact is not detected, temporal deinterlacing is applied to the block.Otherwise, additional examination is performed with more information.The reason why we chose such a conservative approach is that the blur phenomenon caused by spatial deinterlacing is visually less annoying than feathering artifacts.The second examination procedure is explained in greater detail below.
Since the SD-based detector depends on the motion-compensated image only, it may cause false positives in detecting various feathering artifacts.We hence propose an additional step to reduce the false positive rate of the first step by taking into account MAD values, MV correlation, and MAD correlation.As shown in Fig. 5, if the MAD of the current block is sufficiently small, i.e., MADðvÞ < δ 1 , and the MAD and MV of the current block are highly correlated with those of its neighbors, i.e., corðMVÞ < δ 2 & corðMADÞ < δ 3 , we determine that the current block has a reliable MV.Let corðMVÞ be the L 1 -norm distance between the current MV and the component-wise median of MVs of its eightconnected blocks, while corðMADÞ indicates the L 1 -norm distance between MADðvÞ and the average MAD for the eight-connected neighbor blocks.In this paper, δ 1 , δ 2 , and δ 3 are empirically set to 20, 5, and 5, respectively.Therefore, since this second decision step further examines the MV reliability independently of the motion-compensated image, it can effectively prevent the entire mode decision from being trapped in false positives.

Spatial Deinterlacing
For spatial deinterlacing of the proposed algorithm, we adopted a block-based directional interpolation algorithm proposed by Chen and Tai in Ref. 14.This subsection briefly introduces Chen's algorithm.Figure 6 shows a deinterlaced frame, where the dotted line indicates the missing scan line that needs to be interpolated and the solid lines indicate the original field data.It is assumed that the missing pixel f n ðx; yÞ is centered in a target block BC; BU q and BL q are defined as the corresponding upper and lower referenced blocks, respectively, and q is assumed to be a candidate directional vector between the BC and its upper candidate block.Note that the blocks BC, BU q , and BL q only contain pixels in f n .The best directional vector q is detected using the following equation: where SAD U ðqÞ and SAD L ðqÞ denote the SAD between BC and BU q , and the SAD between BC and BL q , respectively.The search range of q, i.e., Ω, includes 23 directional vectors so as to detect edges whose degrees are greater than 7.5.Let SADð qÞ be the minimal cost according to Eq. (10).When SADð qÞ is greater than a certain threshold θ 1 , or the difference between SADð qÞ and SADð q þ 90 degÞ is not greater than a threshold θ 2 , the selected q is not reliable, and the typical line average filter is used to construct the final result.In this paper, θ 1 and θ 2 are empirically set to 30 and 10, respectively.
3 Experimental Results

Experimental Condition
First, for an objective quality evaluation, we used 12 progressive CIF sequences; foreman, mobile, mother and daughter (M&D), Stefan, coastguard, table tennis, container, flower garden, football, highway, Paris, and tempete.
We generated the interlaced fields by applying a simple Fig. 4 The deinterlaced frames according to regularization term.low-pass filter of {1, 2, 1} to those progressive frames and alternatively subsampling the low-pass filtered frames.After deinterlacing, we computed the PSNR between the reconstructed progressive frame and the original frame.Note that only Y components are treated here.We chose the averaged PSNR for the first 300 frames of each sequence as an objective evaluation measure.We compared the proposed algorithm with LA, ELA, Lee's algorithm, 9 Chang's algorithm, 13 Chen's algorithm, 14 Fan's algorithm, 17 Yang's algorithm, 18 Mohammadi's algorithm, 21 and Trocan's algorithm. 22The number of reference fields for the proposed temporal deinterlacing was set to 3, i.e., the current field and its temporally previous and next fields.
The search range for registration in the proposed algorithm and Chang's algorithm was AE32 both horizontally and vertically.All the block sizes for matching in the advanced STFS were set to 16 × 16 and the block sizes of overlapping are set to 8 × 8.In Eq. ( 8), α and τ are fixed to 0.5 and 2, respectively.Figure 7 shows the influence of a parameter α on the overall performance of the proposed algorithm for four test sequences, i.e., highway, carphone, foreman, and football.For this experiment, when deinterlacing the first field of each sequence, we progressively changed α from 0.1 to 1.0 and tracked the PSNR values accordingly.From Fig. 7, we can observe that as α becomes larger than 0.5, the PSNR starts to decrease.So, we set α to 0.5.Through a similar experiment, we found that another parameter τ rarely affects the overall performance of the proposed algorithm.So, we fixed τ to an acceptable value.Table 3 shows the appropriate values of the other parameters, i.e., β and λ chosen for each sequence to maximize the temporal deinterlacing performance in terms of PSNR.Note that β and λ should be determined on a shot basis in a sequence.For instance, since the foreman and table tennis sequences consist of two different shots, we derived and applied two different parameters to those sequences, as given in Table 3.
Figure 8 shows the effects of the parameters in Eq. ( 7) on the overall performance of the proposed temporal deinterlacing.This experiment was performed for a field in the first shot of the foreman sequence.From Fig. 8(a), we can observe that if the optimal parameters are used, the PSNR performance is very stable.On the contrary, Fig. 8(b) shows that if nonoptimal parameters, i.e., the parameters of the second shot, are employed, the PSNR performance drastically decreases as the iterations go on, and the peak PSNR also becomes low in comparison with that in Fig. 8(a).From Fig. 8, we can also find that the termination point of iteration is important.Thus, we selected the optimal number of iterations for each sequence, as given in Table 3.

Performance Evaluation
Table 4 shows the PSNR results of several algorithms for 12 test sequences.For this experiment, we implemented LA,   ELA, Chang's algorithm, 13 and Mohammadi's algorithm 21 personally and verified them.We can see that the proposed algorithm provides 5 dB higher PSNR than Chang's 13 and Mohammadi's algorithms 21 on average.Especially, for container sequence, the proposed algorithm goes beyond Chang's algorithm 13 up to 12 dB.In addition, Table 5 provides PSNR comparison results for several recent methods in the literature.In Table 5, the numerical values of Refs.9, 14, 17, 18, and 22 were directly extracted from those papers.Hence, a few values are not available in the table.Note that the proposed algorithm provides on average 0.8 dB higher PSNR than the cutting-edge algorithm, i.e., Fan's algorithm. 17For the Stefan sequence, in     particular, we achieve a significant PSNR improvement of 2 dB over Fan's algorithm.We also find from the available PSNR values in Table 5 that the proposed algorithm is superior to Trocan's algorithm. 22As a result, the proposed algorithm outperforms the previous works in terms of objective visual quality of PSNR.In addition, Fig. 9 shows the PSNR values according to the frame number.We can state that the proposed algorithm consistently provides higher PSNR values than the previous works.In Figs. 10 and 11, the deinterlaced results for mobile and Stefan sequences are compared.Note that the proposed algorithm produces a deinterlaced result close to its original frame, while Chang's algorithm and LA suffer from visually annoying artifacts, such as jagging and blur.For the mobile sequence (see Fig. 10), we find that the proposed algorithm outperforms the others.Especially observing the numbers in the calendar, the proposed algorithm shows almost the same visual quality as the original frame.From Fig. 11, we can find a perfect court line generated by the proposed temporal deinterlacing.Note that the competitors show severe jagging artifacts and line-crawling in the near horizontal edges.
In addition, Figs. 12 and 13 show the deinterlaced results for real interlaced video sequences.We adopted two wellknown interlaced sequences (720 × 480i), car and table tennis.Even for real interlaced video sequences, the proposed algorithm still provides outstanding visual quality without any artifacts in comparison with the existing methods.For example, LA and Chang's algorithm show jagging and feathering artifacts [see Figs.12(b) and 12(c)], but the proposed algorithm does not cause such artifacts, as shown in Fig. 12(d).
In addition, we measured the execution times of several algorithms.This experiment was executed on a quad-core CPU at 2.66 GHz with 3 GB DDR2 DRAM.Since motion estimation occupies most of the complexity of MC-based deinterlacing algorithm, we adopted a fast full search 25 and a famous fast search algorithm, i.e., enhanced predictive zonal search (EPZS) 26 for fast motion estimation in this experiment.Table 6 compares the proposed algorithm with LA, ELA, and Ref. 13 in terms of the CPU running time.Note that each numerical value indicates the average for the first 10 fields of foreman sequence, and all the algorithms were implemented in MATLAB®.We could find that in comparison with fast full search, the EPZS cuts the CPU running time of the proposed algorithm in half with an acceptable PSNR drop of 1 dB on average.The proposed algorithm using EZPS still provides 1.8 times longer execution time than Ref. 13.However, if we employ several optimization skills additionally, we can significantly reduce the computational cost of the proposed algorithm.

Concluding Remarks
This paper presents a robust temporal deinterlacing algorithm based on an MAP estimator.First, registration using an advanced STFS algorithm is performed between the current field and its neighboring fields.Second, the progressive frame corresponding to the current field is found via an L 1norm-based MAP estimator based on the predicted interfield MV information.Third, a mode decision module determines whether the result from the temporal deinterlacing is acceptable or not.Finally, edge-directional interpolation is applied to the pixels whose MVs are not reliable, instead of the aforementioned temporal deinterlacing.Experimental results show that the proposed algorithm yields at maximum 2 dB higher PSNR than the cutting-edge deinterlacing algorithm, 17 while providing better visual quality.

Fig. 1
Fig.1Block diagram of the proposed deinterlacing algorithm.

Fig. 7
Fig.7Peak signal-to-noise ratio (PSNR) performance according to α for four video sequences.

Fig. 8
Fig. 8 PSNR values according to the number of iterations for the first shot of the foreman sequence.(a) Results with the optimal parameters.(b) Results with nonoptimal parameters.

Fig. 13
Fig.13 The results for a cropped region of the table tennis sequence.(a) Input interlaced frame.(b) LA.(c) Chang's method.13(d) Proposed method.

Table 1
Peak signal-to-noise ratio (PSNR) comparison between conventional spatial-temporal correlation-assisted search (STFS) and the advanced STFS (dB).between the original progressive frame F and the n'th interlaced field f n of size [b 2 × 1].The PSF is modeled by the [r 2 b 2 × r 2 b 2 ] matrix H.The [b 2 × r 2 b 2 ] matrix D n represents the decimation operator and the [b 2 × 1] matrix V n denotes the system noise.Based on this model, we propose the MAP estimator-based temporal deinterlacing algorithm.

Table 3
Parameter setting for temporal deinterlacing.

Table 5
PSNR comparison for different deinterlacing algorithms in the literature (dB).Note: Bold values indicate the maximum PSNR for each sequence.