JPEG 20001,2 is the latest standard for still image coding. It has high compression performance and provides new features. However, the high computation complexity that grants excellent performance and rich features also restricts the real time applications of JPEG 2000.3,4
Embedded block coding with optimal truncation (EBCOT), proposed by Taubman, is the most complicated and time consuming part of JPEG 2000.5,6 It is a bit-plane coder. Each bit-plane goes through three coding passes, called the significant propagation pass (Pass 1), the magnitude refinement pass (Pass 2) and the clean up pass (Pass 3). The context of a sample coefficient is formed according to the significant state of the sample and its eight neighbors within a context window. Next, the context data goes into the arithmetic coder. The scan order and the context window are shown in Fig. 1. During each pass, all the samples of the bit-plane are scanned to determine whether or not each sample is encoded in the current pass. Therefore, all the samples need to be scanned three times, requiring excessive processing time. Recently, a new method7 was proposed by Jen et al. based on parallel processing by three passes. In this method, parallel processing of passes is achieved by scanning samples belonging to Pass 1 and Pass 2 concurrently and samples belonging to Pass 3 are delayed by one column. Since the Pass 1 and Pass 2 are executed concurrently, the output of Pass 1 cannot be used by Pass 2. We propose a fast context modeling method based on the parallel-pass scheme. The strategy aims to process the three coding passes of the same bit-plane in parallel.
In EBCOT, a proper coding pass for the sample must be determined first, then the sample is encoded during the coding pass. In this way, each sample in the bit-plane is encoded in one of the three passes. In order to reduce the processing time, three passes could be processed in parallel. However, the parallel processing causes a problem. If the three coding passes are concurrently executed, a sample in Pass 3 can become significant prior to its neighboring samples in Passes 1 and 2, resulting a wrong implementation of EBCOT. Moreover, in EBCOT, the processing results of samples in Pass 2 or 3 depend on those of Pass 1. However, in parallel-pass mode, samples in Pass 2 or 3 can not use the results of Pass 1.
In order to solve this problem, the coding operations for Passes 2 and 3 are delayed by one column to use the result of Pass 1, and Passes 2 and 3 are simultaneously processed. Figures 2 and 3 show the proposed scheme. The results of four samples (numbered as 1) are stored after they are encoded in Pass 1. Then, the samples (numbered as 2, 3) are encoded in Pass 2 or 3. In this case, the results of four samples (numbered as 1) are used as neighbors for Passes 2 and 3. After Passes 2 and 3 are completed, the two columns in box move to the right by one stripe. Hence in the proposed method, the time required to wait for the completion of scanning and coding of a strip for Pass 1 is reduced to the waiting time of single column. As a consequence, all three passes are encoded in one scan. Additionally, Kakadu5,6 uses the masking algorithm to extract a single bit-plane for each coding pass, and the three coding passes require three times of masking operations. In the proposed method, Passes 2 and 3 can reuse the result of the Pass 1, thus eliminating the masking overhead for Passes 2 and 3.
Results and Discussion
We tested the processing time of encoding three images (Lena, Baboon, and Peppers) to prove the effectiveness of the proposed method compared to the Taubman’s Kakadu architecture (version 3.4). Simulations have been conducted using a TMS3206416DSP. Test results are shown in Table 1. For Pass 1, the proposed method does not affect the execution time because there is no difference between the proposed method and the Taubman’s architecture. As shown in Fig. 3, all samples have to be scanned and samples associated with Pass 1 are encoded instantly. For Passes 2 and 3, the proposed method reduces the calculation time up to 41% (Pass 2) and 32% (Pass 3) and up to 22.6% of all three passes. This result indicates that the proposed method significantly reduces the processing time for scanning and masking. In average, the computation complexity of the whole EBCOT can be reduced by 22.6% as compared with the Taubman’s architecture. Since the proposed method changes only the scanning and coding time of the Passes in the original Kakadu method, the bit stream generated by the proposed method is same as that of original Kakadu method. Hence there is no change in PSNR performance of the proposed method and Kakadu method.
Experimental results for processing time of proposed architecture compared with David Taubman’s Kakadu, on three different images with size of 512×512.
|Kakadu (ms)||Parallel-pass (ms)||Pass-parallel/Kakadu||Total|
|Pass 1||Pass 2||Pass 3||Pass 1||Pass 2||Pass 3||Pass 1||Pass 2||Pass 3||Time reduced (%)|
In this letter, we proposed a pass-parallel context modeling method to merge the three-pass coding into a single pass coding. With the processing of three coding passes concurrently the coding efficiency can be significantly improved. The experimental results show that the computational complexity is reduced by 22.6% as compared with Taubman’s architecture.