Fringe patterns occur in many applications across different scientific disciplines and are mainly used for the characterization of object shapes and dimensions. Applications include interferometric synthetic aperture radar, holographic interference microscopy, photoelasticity measurements, digital holography, and even noninterferometric patterns such as structured imaging applications. However, such fringe patterns do not possess the typical frequency distributions found in natural photographic imagery, which results in suboptimal compression performance when using common image compression coder-decoder architectures (codecs). This paper focuses on one particular application, namely digital holographic microscopy (DHM), which is essentially digital holography applied to microscopy.
Holography, first discovered in the late 1940s by Dennis Gabor, measures the full wavefront of a scene by capturing both the amplitude and the phase information. However, practical applications for optical holography only started to appear in the 1960s, mainly due to the development of the laser. For many years, the recording of holograms was only possible by means of analog high-resolution light-sensitive film (like photographic film). Digital representation and reconstruction of holograms were already proposed in the 1970s,1 but the lack of adequate computing power and digital imaging devices made it impractical. Over time, however, with the advent of high-resolution digital image sensors (like CCDs) and increasingly faster computers, practical implementations and applications of digital holography started to appear in the 1990s. As a consequence, DHM has been successfully utilized for many different purposes such as the analysis of biological samples2 and materials, characterization of lenses,3 and tomography.
Because the digital holography is getting more widespread and applied in an increasing number of scientific fields, the need for an efficient digital representation technology is growing. Given the multidisciplinary nature of holography, various techniques have been experimented with. Earlier experiments involved the use of histogram-based approaches4 or relied solely on quantization principles.5 Over time, more advanced codecs were evaluated such as directly coding phase-shifted holograms with the AVC and HEVC codecs6 or the JPEG and JPEG 2000 codecs.7 More recently, compression frameworks have been proposed which were tailored for digital holography such as computer-generated holograms from encoded multiview video streams8 and a vector-lifting scheme for phase-shifted holograms.9 Alternatively, some proposals even designed specialized transforms for the efficient compression of holograms. One such example is Fresnelets,10 which can be interpreted as Fresnel-transformed B-spline wavelets that asymptotically converge to Gabor functions.11 Although they have several interesting properties, Fresnelets are not suitable for our present coding requirements. The reasons are (1) the lack of support for integer taps that enable lossless compression, (2) the inherent presence of complex-valued coefficients, even for real-valued off-axis holograms, (3) the occasional requirement of zero-padding to correctly simulate Fresnel propagation in order to avoid aliasing, and (4) the need to predetermine a focus depth parameter prior to compression of the data. Several other analysis functions have been investigated as well; one notable example is the use of Gabor wavelets12 for hologram coding: their excellent time-frequency uncertainty bounds and orientability allow for effective descriptions of localized frequency content. Unfortunately, Gabor wavelets are not reversible and form an overcomplete representation, requiring some coefficient selection method (such as using the ridge of the Gabor wavelet transform12).
Moreover, the wide range of applications and recording setup parameters of DHM typically make the compression requirements dependent on the use case. In contrast, we aim to propose a generic, modality-independent coding architecture, targeted primarily at lossless and near-lossless compressions. Doing so enables us to provide a framework suited for archiving the raw hologram data, while still allowing postprocessing algorithms (such as speckle filtering, unwanted order removal, extended focusing, and so on) after compression. As such, we propose a framework based on JPEG 2000, with specific additional extensions that significantly improve the off-axis DHM data representation.
This paper is an extension of our work published in two different conference papers.13,14 We extend our previous work by (1) thoroughly presenting and comparing our coding architecture against the state-of-the-art, (2) providing the necessary technical details in order to practically realize our coding system and ensuring the reproducibility of the results, (3) providing experimental results on a larger and more varied collection of holograms, and (4) evaluating more coding technologies (such as JPEG and JPEG-LS) and more wavelet decompositions (5-level wavelet decompositions, with and without directional wavelets and/or packet decompositions).
This paper is structured as follows. In Sec. 2, we briefly explain the principle of off-axis holography and discuss the data characteristics of such digital holograms. Subsequently, we introduce JPEG 2000 in Sec. 3 and explain how it can be efficiently configured to improve the compression performance for digital holographic imagery. Section 4 then discusses our proposed extensions to the JPEG 2000 standard to further improve the compression efficiency by optimally exploiting the specific data characteristics of hologram recordings. We then report on the experiments in Sec. 5. Finally, we present the conclusions in Sec. 6.
Coding of Holograms
Digital holography is a measurement technique based on the interference of electromagnetic waves. This technology allows for the recovery of both the amplitude and the phase of the wavefront and enables the full description and visualization of three-dimensonal objects. Besides the attractiveness for entertainment purposes, this is an extremely useful property for many measurement and visualization applications. In particular, the use of digital holography offers many advantages for microscopic applications. Regular microscopes only provide a two-dimensional (2-D) snapshot of the intensity with a single-focal plane, while holographic microscopes, on the other hand, capture the full wavefront emanating from the sample. This offers several benefits and substantially expands the available tools for data analysis. For example, the phase data give quantifiable information about optical distance and topographical information, enabling postcapture digital refocusing capabilities. Moreover, holographic microscopy has no image forming lens and will not suffer from typical optical aberrations caused by intrinsic lens imperfections of regular microscopes.
Many methods with varying properties, degrees of quality, and feasibility exist for recording holograms15 such as Fourier holography, Gabor holography, phase-shifting digital holography, and off-axis (Fresnel) holography. The holograms used in this paper were recorded using the off-axis configuration, as shown in Fig. 1, also known as Leith–Upatnieks holography.16 Such a configuration allows one to capture a single real-valued recording from which the sought wavefront can be subsequently extracted. Basically, the CCD sensor captures the amplitude of the interference pattern that results from the superposition of a reference beam and an object beam. The expression for the detected irradiance , with and representing the amplitudes of the reference and object beams, respectively, and where * is the complex conjugate operator, is then given by
Reproduction of the hologram using the same reference beam to illuminate the recorded hologram effectively requires modulating the irradiance2 and can be described by
The term in Eq. (2), or the real image, is directly proportional to the sought object beam. However, the same equation shows that the reproduced hologram also contains a number of additional undesired terms. The first term represents the zero-order diffraction and the second term is a so-called twin image. With off-axis holography, the reference beam will reach the CCD at an offset angle instead of being collinear with the object-beam axis. The detected reference wave will, therefore, approximately be a tilted plane wave, denoted by . The spatial frequency depends on the incidence angle .17 The resulting irradiance will now be the following:
Equation (3) shows that the terms can be effectively separated in the frequency domain. In principle, a sufficiently high carrier frequency allows the object beam to be recovered unambiguously. However, in practice, large overlaps are often still present in the frequency domain, especially with the zero-order term: the total term separation constraints are often too stringent, leaving limited spectral support for the real image. Moreover, directly extracting the object wave field is not straightforward as it requires additional nontrivial processing steps. In fact, various techniques have been proposed in the literature for extracting the real image,18 all with their respective advantages and disadvantages. Examples of such methods are (1) the basic spatial filtering techniques19,20 that have the drawback of altering the reconstructed signal as a consequence of their global filtering effect,21 or the more advanced (2) wavelet-domain coefficient selection,22 (3) (linear and nonlinear) Fresnelet filtering,18 or (4) nonlinear cepstrum filtering.23,24
Moreover, the quality of the resulting real image depends on additional factors for which no general solution exists: (1) Scenes with objects at multiple depths have to be modeled using postprocessing such as extended focusing,25 (2) the applied quality metric is extremely use case-specific, as it will, e.g., have to determine the relative importance between the amplitude and the phase information, and (3) distortions caused by the setup’s nonidealities, such as aberrations in the microscope objective and in the wave planarity, have to be taken into account and compensated for.26
The selection of specific methods for preprocessing the recorded image before compression will thus inevitably limit the scope of an applied compression algorithm with respect to the range of supported modalities. Thus, our proposal instead uses a modality-independent compression architecture, intended to allow for compressing the entire interferogram in a progressive lossy-to-lossless manner; that is, the proposed coding architecture controls the losses incurred by coding, eventually offering lossless decoding of the input data when this is needed.
Due to the nature of off-axis holography, the omnipresent high-frequency components manifest themselves as salient fringes in the hologram and cause the power spectrum distribution to significantly deviate from the typical distribution found in regular imagery (Fig. 2). The important basis functions, therefore, have to consist of well-oriented high-frequency components, as they will largely represent the real image to be viewed. We confirmed this13 by using independent component analysis to formally characterize the nature of the information content in multivariate data. The twin image is also contained in these high-frequency components. However, this is not an issue in the case of off-axis holography compression, because the twin image is the complex conjugate of the real image in the frequency domain. Thus, no extra information will be coded in this respect. Some recent publications also have confirmed the importance of orientation and high frequencies in holography by using Gabor wavelets, evaluated at multiple orientations,12 or by using the wavelet-bandelets transform.27 However, these publications mainly used coefficient thresholding and did not provide complete coding frameworks. Finally, the good space localization properties of wavelets are of larger importance in DHM data, because the shallow focus distances result in spatially localized structures in the hologram (see Fig. 7).
In the following section, we will first concisely introduce the JPEG 2000 coding architecture on which we will build our system.
JPEG 2000 is a scalable wavelet-based still image coding system. The JPEG 2000 standard represents a family of standards, where Part 1 describes the core coding technology. The other parts define the extensions by amending Part 1 with new features or capabilities, thus making JPEG 2000 modular by design. Its core technology, as defined in Part 1, offers a rich set of features such as native tiling support, resolution and quality scalability, progressive decoding, lossy and lossless compressions, region-of-interest coding, error resilience, true random access in the code-stream, and so on. Moreover, JPEG 2000 natively supports various color formats, such as RGB, , and arbitrary -channel, at bit depths ranging from 1 to 38 bits per channel. Especially important is the rate-distortion (RD) optimization capability that is inherent to the design of JPEG 2000 and allows for optimal control of the bit rate of the produced code-stream while minimizing the overall distortion in a lossy compression scenario.
Both its modular and extendable designs and its excellent RD characteristics make JPEG 2000 a well-suited codec for the compression of many types of holographic images.
This section gives a brief description of the core coding technology of JPEG 2000. As shown in Fig. 3, the core architecture of JPEG 2000 can be roughly divided into two main parts: (1) the discrete wavelet transform (DWT), and (2) the two-tiered embedded block coding by optimized truncation (EBCOT).28,29
The first step in encoding an image with JPEG 2000 consists of a multilevel 2-D DWT using the Mallat dyadic decomposition structure,30 where only the low-pass sub-bands are further decomposed in the subsequent resolution levels. JPEG 2000 employs two wavelet kernels, both synthesized using the lifting scheme: (1) the integer kernel for lossless coding and (2) the floating-point kernel for lossy to near-lossless codings. Both kernel implementations are strictly defined by the JPEG 2000 specification. Because the lifting coefficients of the kernel are rational numbers with denominators of powers of 2, and in order to be able to guarantee lossless reconstruction at the decoder side, the specification restricts its implementation to integer calculus only using well-defined rounding rules. The kernel, on the other hand, offers much better energy compaction over the kernel. However, due to its coefficients being real numbers, it cannot be easily fit into an integer-based calculus system without severely sacrificing the energy compaction performance. For this reason, JPEG 2000 specifies this kernel using floating-point calculus, making it an irreversible transform. Extensive results on natural data31 show that the lossy wavelet kernel performs better in the RD sense than the wavelet kernel; however, due to its pure-integer implementation, the kernel is better suited for lossless compression.
After the wavelet decomposition step, the resulting sub-bands are further entropy encoded with the two-tiered EBCOT. For each of the sub-bands, EBCOT Tier-1 starts out by grouping the wavelet coefficients into equally sized rectangular areas, so-called code-blocks. Then, it performs entropy coding on each of these code-blocks by employing context-based binary arithmetic coding. The wavelet coefficients are scanned per bit-plane, starting with the most significant bit-plane, using three types of coding passes in alternating order, namely the significance coding pass, the magnitude refinement coding pass, and the cleanup pass. As such, every coefficient bit becomes a member of exactly one of these three coding passes and gets encoded into the respective code-block bit-stream. The end of every code-pass and inherently also the end of every bit-plane scan, marks a potential truncation point in the resulting bit-stream. Along with each of these truncation points, Tier-1 also estimates the associated mean square error (MSE) distortion reduction values that will drive the Tier-2 RD optimization process. Thus, after Tier-1 is done, every code-block is represented by an independently compressed bit-stream and an associated table of truncation points with distortion reduction estimates per truncation point.
EBCOT Tier-2 represents the actual RD optimization and packetization process, responsible for generating the final JPEG 2000 code-stream. Given the rate and/or quality constraints, pieces of the individual code-block bit-streams from Tier-1 are selected and recombined into larger packets to form the final JPEG 2000 code-stream. The selection of the bit-stream pieces is performed in an RD optimal manner by prioritizing bit-stream chunks based on their respective RD costs over less important chunks, while still maintaining causality—i.e., by maintaining the information dependency between chunks to allow for correct decoding. The RD optimization stops when the rate and/or quality constraints are met, or when all bit-stream chunks are included in the final code-stream. Finally, the necessary JPEG 2000 headers and markers are appended in order to signal the required decoding options.
Full Packet Decomposition with JPEG 2000
As stated before, due to the nature of off-axis holography, a significant part of the important information in these recordings is contained in the high-frequency bands. This contrasts with natural images where most of the visually meaningful information resides in the lower-frequency bands. Thus, replacing the Mallat dyadic wavelet transform with a full packet wavelet transform allows for further decomposition of the high-pass sub-bands to improve the compression efficiency.
By default, JPEG 2000 Part 1 features only the Mallat dyadic decomposition. However, JPEG 2000’s Part 2 Arbitrary Decomposition (AD) extension enables the use of alternative decomposition structures, signaled within two additional marker segments in the code-stream. As such, using this extension enables the configuration of various packet decomposition structures. The AD extension specifies a decomposition structure in two parts: (1) an underlying decomposition to generate the resolution levels and (2) per resolution level, the extra sublevel decomposition of the respective high-pass sub-bands.
The resolution levels are defined similarly to Part 1 with the Mallat dyadic decomposition, with the difference that the splitting of the low-pass sub-band at each level can be either in both horizontal and vertical directions or only in one of the two directions. This sequence of resolution reduction splits is signaled in the down-sampling factor styles (DFS) marker in the code-stream, represented as an array, Ddfs, containing two-bit symbols (“1” = both rows and columns, “2” = rows only, and “3” = columns only). In the absence of a DFS marker, the decoder assumes both-ways splitting as the default, which is the compliant case with a Part 1 code-stream.
Subsequently, one or more AD Style (ADS) markers can be used to signal the sublevel decomposition of high-pass sub-bands. Unfortunately, according to the standard, the ADS syntax only allows for two additional decompositions of the high-pass sub-bands, inherently limiting the possible wavelet packet transforms. The ADS marker contains two arrays: (1) DOads specifies the maximum number of split levels per resolution using entries of two-bit values, and (2) DSads specifies the type of extra split using two-bit symbols (0 = no extra split, 1 = both rows and columns, 2 = rows only, and 3 = columns only). DOads entries with a value of 1 indicate that no extra high-pass decompositions are required, while values 2 and 3 indicate one and two extra decompositions, respectively. The DSads array, on the other hand, describes the depth-first traversal of the decomposition tree, with the sub-bands ordered from the highest resolution to the lowest and within each level as , , or , depending on the applied split type.
Hence, the AD extension of JPEG 2000 limits full packet decompositions to levels. As such, the application of four or more levels in such a full packet decomposition structure, as illustrated in Fig. 4(c), is not possible without modification of the standard. Figure 4(b) visualizes the closest matching decomposition style to a full packet with four levels that can be described by the AD extension (designated as the “partial packet decomposition”). Finally, to illustrate how the AD extension works, we show in Table 1 some of the more commonly known decomposition structures and how to signal them. The last column lists the code-stream signaling cost in bits.
Various well-known decomposition structures, using the arbitrary decomposition (AD) extension syntax (signaling cost reflects additional required bits, excluding the cost for NL).
|Decomposition style||NL||Ddfs||DOads||DSads||Signaling cost|
|3-level full packet||3||111||321||18 1’s||80 bits|
|4-level partial packet||4||1111||3321||33 1’s||112 bits|
|4-level full packet||NA||NA||NA||NA||NA|
|5-level full packet||NA||NA||NA||NA||NA|
|5-level federal bureau of investigation (FBI)||5||11111||2321||11101111111111111||88 bits|
Proposed Extensions for JPEG 2000
This section discusses the extensions to JPEG 2000 that can significantly enhance the compression efficiency of holographic microscopy images. Figure 5 shows that our two proposed extensions are part of the transform phase of the codec and replace the default DWT block in the JPEG 2000 encoder scheme of Fig. 3.
Truly Arbitrary Packet Decompositions
As explained in Sec. 3.3, JPEG 2000’s AD extension can only handle two additional decompositions within high-pass sub-bands. To overcome the limitation of the AD extension, we designed our codec to employ an alternative code-stream syntax that is able to truly describe arbitrary wavelet decompositions14 including full packet decompositions containing more than three levels.
The proposed syntax describes a decomposition structure as an ordered array of split-operations that work on a stack of available sub-bands. The split-operations are each represented as a tuple in the array:
1. Symbol represents the split type (XY = both rows and columns, , , and ). Furthermore, the selected split type also determines the number of bits required to signal and their relation with the generated sub-bands, as given in Table 2.
2. A binary pattern mask indicates which of the resulting sub-bands will be added to the stack after splitting for further processing: a bit-value of marks the respective sub-band for further processing, whereas signals termination. When , contains no bits, meaning that in this case is simply not signaled.
3. A positive integer value is reserved for which of its functional definitions depends on the actual applied split type in and the associated value of mask :
a. If and , then specifies the number of times to recursively repeat the split operation onto all of the respectively generated sub-bands.
b. If and , then the value of is not signaled, because none of the created sub-bands will be further processed anyway.
c. On the other hand, if , then specifies the number of times to repeat this termination operation to the stack of sub-bands (thus, one termination is applied with ). Applying the termination operation on the sub-band stack is identical to removing the top element.
The relation between the applied split type, signaled by s, and the definition of mask bits mn, indicating the termination of the decomposition (mn=1 marks the respective sub-band for further processing, whereas mn=0 signals termination). It also specifies the bit-stream encoding for r, with Lsbs equals to the sub-band stack size just before processing the respective tuple.
|Split type s||Code-words for s||Generated sub-bands (in order)||Associated mask bits||Bit coding for r||No. bits for r|
|XY||11b||HH, LH, HL, LL||m3, m2, m1, m0||Golomb||r+1|
|X−||10b||HX, LX||m1, m0||Golomb||r+1|
|−Y||01b||XH, XL||m1, m0||Golomb||r+1|
The decomposition process starts with (i.e., the original image) as the only available sub-band on the stack. Subsequently, the process iterates over the list of -tuples and with each tuple the according split-operation is executed. Generated sub-bands that result from a split operation are immediately pushed onto the sub-band stack, following , , or ordering. The process ends when all tuples on the list are processed (left-over sub-bands on the stack are not further processed).
Table 3 shows how to specify some commonly used decomposition styles. Please note that although possible, in practice the extended syntax will not be used to specify a Mallat dyadic decomposition style, as this is the JPEG 2000 default anyway.
Various well-known decomposition structures using our proposed syntax.
|Decomposition style||Operation tuples||Signaling cost (bits)|
|3-level full packet||(XY,1111b,2)||9|
|4-level partial packet||(XY,1111b,2), (XY,0000b,0)||15|
|4-level full packet||(XY,1111b,3)||10|
|5-level full packet||(XY,1111b,4)||11|
|5-level FBI||(XY,1111b,1), (−,13), (XY,1111b,1) (−,16), (XY,1111b,1), (−,16), (XY,1111b,1), (−,15), (XY,0000b,0)||65|
Signaling of the decomposition style in the final code-stream happens via a newly proposed XAD marker that encapsulates the binary representation of the array of split-operation tuples, padded to the byte boundary with 0 bits. The marker length field determines the number of elements in the array. It is valid in the main and tile-component headers to allow defining different decomposition structures per tile-component.
Directional Adaptive Discrete Wavelet Transform
A salient feature of off-axis digital holographic images is the strongly oriented interference fringes. This hints that the use of directional wavelet transforms can improve the compression performance, as they are able to align with the directional features of the data.13 For that reason, we show how the JPEG 2000 architecture can easily be extended to include the block-based directional adaptive DWT (DA-DWT).3233.–34
We employ a separable lifting scheme, similar to that of JPEG 2000 DWT, but with modified prediction and update functions that are no longer confined only to the horizontal direction (1,0) for row-based splits and the vertical direction (0,1) for column-based splits. Doing so enables the directional DWT to adapt to local geometric features by adjusting its operational direction. However, all the applied direction vectors are also required at the decoder side in order to perform the inverse DA-DWT operation. From a compression performance point of view, it is evident that the unavoidable increment in rate for signaling these directions to the decoder should not jeopardize the rate reduction brought by the improved energy compaction of the transform. Thus, in practice, the adaptability of the directional DWT is restricted by (1) allowing the selection of only one vector per block of samples for the row and column splits, 2) limiting the directions to a discrete set of vectors, and (3) only performing the DA-DWT on low-pass sub-bands (i.e., , , or bands).
With the first restriction, DA-blocks are defined in an identical way as JPEG 2000’s code-blocks and precincts. Per decomposition level, they represent a grid of equally sized rectangles that anchor at (0,0) with their dimensions restricted to powers of 2. The width and height parameters are signaled in a dedicated marker segment, which we label XDA, for the DA-DWT. The smallest possible DA-block is .
Second, the proposed extension uses the following set of vectors for row-based splits and orthogonally for column-based splits (for more information on direction vectors and the associated lifting schemes, we refer to Chang and Girod32). Note that the inclusion of the vectors (1,0) and (0,1) allows the DA-DWT to fall back to the classic DWT in the case where no dominant direction is present. Thus, each vector can be represented as an index in the set of available vectors. Moreover, it is also possible to use different direction vectors sets, depending on the use case and specific image characteristics. Per DA-DWT level, we use the JPEG 2000 tag-tree system to encode the two grids of direction indexes (one for the row-based split and one for the column-based split). The actual tag-tree values (two for every DA-block) are coded in synchronization with the first instance of any of the possibly associated code-blocks—i.e., depending on the chosen dimensions of code-blocks and DA-blocks, each DA-block can relate to one or more code-blocks. This also means that, at very low bit-rate constraints, it can happen that no code-block contributions exist whatsoever for a specific DA-block. In such a case, the associate direction indices are simply skipped and not encoded in the tag-trees. The resulting encoded bit-stream is signaled in an XDA marker segment.
Third, the restriction to allow only the DA-DWT on the low-pass sub-bands does not negatively impact the compression performance capabilities of the coding framework. As demonstrated in Fig. 6, an intrinsic property of the directional DWT causes resulting high-pass coefficients to be already horizontally or vertically aligned after the directional wavelet prediction. As such, a single parameter (i.e., one byte) in the XDA marker segment signals the number of decomposition levels that use the DA-DWT, starting at .
Technically, an encoder implementation is free to use any type of direction vector selection mechanism to drive the forward DA-DWT. To avoid being trapped in local minima, our framework takes a full-search approach by trying all directions and selecting the one that minimizes the -norm of the high-pass coefficients.
The experiments in this paper make use of 12 off-axis holographic test images, courteously provided by the Lyncée Tec SA, the Microgravity Research Center (ULB), and Nicolas Pavillon. For the acquisition of microscopic off-axis holographic images, two typical setups exist. One setup uses transmission imaging, which is well suited for transparent specimens such as biological cells or lenses. Another setup uses reflection imaging, which is mainly useful for capturing opaque objects such as surface measurements. Figure 7 shows the thumbnail versions of the holographic images with their specifications given in Table 4. All images contain 8 bpp samples.
Specification of the 8-bit images that were used for the experiments.
|Image||Provider||Content description||Dimensions||Imaging type|
|Neuron||Lyncée Tec||Slice of neuronal tissue||1024×1024||Transmissive|
|Erythrocyte||Lyncée Tec||Erythrocytes from a blood sample||512×512||Transmissive|
|Microlenses||Lyncée Tec||Array of microlenses||1024×1024||Transmissive|
|Ball||Lyncée Tec||Surface of a rough microball||1024×1024||Reflective|
|Scratch||Lyncée Tec||Scratch in brittle material||1024×1024||Reflective|
|Seaweed 1||MRC/ULB||Green seaweed specimen||1280×1024||Transmissive|
|Seaweed 2||MRC/ULB||Green seaweed specimen||2048×2048||Transmissive|
|Seaweed 3||MRC/ULB||Green seaweed specimen||1280×1024||Transmissive|
|Coin||N. Pavillon||Speckle hologram of a coin||512×512||Reflective|
|Mirror||N. Pavillon||Scratch on a mirror||512×512||Reflective|
|Sine||N. Pavillon||Artificial object with sinusoidal amplitudes at multiple frequencies and orientations||512×512||Transmissive|
|Pollen||N. Pavillon||Solution of yew pollens||512×512||Transmissive|
Objective Quality Metrics
The experiments in this paper report on both lossless and lossy compression performances. In the case of lossless compression, where the input signal and the reconstructed signal are identical, the compression performance is quantified in terms of average bit rate in bits per pixel. In order to facilitate easier comparisons between different compression strategies, we present relative bit rates with respect to a common reference, which is JPEG 2000 with a 4-level Mallat wavelet decomposition structure.
In the case of lossy compression, we calculate for a given set of bit rates the respective reconstructed quality (or distortion) as the peak signal-to-noise ratio (PSNR). The PSNR is basically a logarithmic representation of the MSE between the original signal and the reconstructed signal and is defined as
In lossy compression, the paper reports summarized RD results using the Bjøntegaard delta PSNR metric (BD-PSNR),35 which is a commonly accepted objective metric for image compression performance evaluations. The BD-PSNR methodology calculates the difference between two such RD curves as the surface area size between the curves within the operating bit-range divided by the integration interval (see Fig. 8). The bit-rates where PSNR differences are measured for the BD-PSNR metric35 in this paper are taken between 0.125 and 2.00 bpp.
Decomposition Structures and JPEG 2000 Settings
The purpose of these experiments is to assess whether JPEG 2000 or the proposed, extended JPEG 2000 compatible coding architecture can be used to efficiently compress off-axis holographic image data. As such, it is evident to include configurations that are fully compliant with the current JPEG 2000 standard, thus including the configurations that rely on the AD extension of JPEG 2000 Part 2. More specifically, we test the “Mallat dyadic,” the “3-level full packet” and the “4-level partial packet” decomposition structures as listed in Table 1. On the other hand, by using our proposed extended syntax for the decomposition structures, we also test the “4-level full packet” and the “5-level full packet” decompositions, as mentioned in Table 3. Still, given the relative image dimensions, a decomposition tree of typically four levels does suffice to reach optimal energy compaction.
In addition to the wavelet packet transform, our framework also provides support for directional wavelets, for which results are also presented. In combination with the described packet decomposition structures, we include the results using a DA-DWT for the first two decomposition levels, applied only on the low-pass sub-bands with DA-blocks of . Please note that the results include the extra overhead cost for signaling the direction vectors, using the described tag-tree encoding methodology.
For the lossless compression experiments, we make use of the standard integer-based wavelet kernel. For the lossy compression results, we rely on the more efficient, but inherently lossy, kernel. All the experiments use code-blocks of , and precincts and tiling are disabled.
In order to give an indication of the expected compression performance on holographic images using common JPEG 2000 compression settings and in comparison to regular images, such as Lena, Barbara, and Mandril, we first present results using a conventional 4-level Mallat decomposition. These results, as shown in Table 5, indicate that such a regular Mallat wavelet decomposition performs similarly well for off-axis holographic recordings as for regular images. In fact, all subsequently reported results will be determined relative to these figures.
Lossless compression rates and peak signal-to-noise ratio (PSNR) results on holographic and natural imageries at 2 bpp down to 0.125 bpp JPEG 2000, when applying a 4-level Mallat wavelet decomposition structure.
|Image||Lossless (bpp)||2 bpp (dB)||1 bpp (dB)||0.5 bpp (dB)||0.25 bpp (dB)||0.125 bpp (dB)|
Table 6 summarizes the obtained lossless compression results, presented as the bit-rate gains relative to the lossless rate when using the wavelet kernel in a default 4-level Mallat decomposition mode. These results clearly show that in most cases, the largest compression efficiency gain is obtained by enabling the DA-DWT transform and using a conventional Mallat decomposition structure. A notable exception is the Seaweed recordings that benefit from the packet decompositions alone. This is caused by the recording setup in which the fringes align with the horizontal and vertical axes. Such axis alignment during the image acquisition is, in fact, sub-optimal as it minimizes the available bandwidth for the spectral separation of the real and conjugate image parts. The results also show that, even without the DA-DWT, most packet decomposition structures already significantly improve the compression efficiency for most holograms.
Results for lossless compression where the values represent bit-rate reductions (in Δ bpp) in comparison to the standard 4-level Mallat decomposition. The second column shows the bitrates obtained using the default JPEG 2000 configuration with a 4-level Mallat decomposition. The third column is the default results obtained with JPEG-LS, while the other columns report the results obtained with lossless JPEG 2000, using the 5×3 wavelet kernel. Column-notations use abbreviations for Mallat (M), partial Packet (PP), full Packet (FP), and DA-DWT enabled (+DA), preceded by the number of decomposition levels. Only the columns marked with an asterisk are JPEG 2000 Part 1/Part 2 compliant. The last row shows the averages for the holographic images.
|Lossless 5×3||Orig. (4M)||JPEG-LS||3FP*||4PP*||4FP||5M*||5FP||3FP + DA||4M + DA||5M + DA||4PP + DA||4FP + DA||5FP + DA|
Note: The bold values indicate which compression parameters give the highest gain for a given JPEG 2000 configuration.
Table 7 shows the BD-PSNR results relative to the 4-level Mallat configuration using the wavelet kernel. These results indicate that the kernel with lossy coding provides the largest compression performance gain when applying the 4-level partial packet decomposition, in combination with the DA-DWT transform. Again, similar to lossless coding, the Seaweed images benefit even more from using the packet decompositions alone.
Results for lossy compression, with the values representing the BD-PSNR improvements (in dB) w.r.t. to the 4-level Mallat decomposition, in the range of 0.25 to 2.00 bpp. The second column shows the results obtained with JPEG, while the other columns report the results obtained with lossy JPEG 2000, using the 9×7 wavelet kernel. Column-notations use abbreviations for Mallat (M), partial packet (PP), full packet (FP), and DA-DWT-enabled (+DA), all preceded by the number of decomposition levels. Only the columns marked with an asterisk are JPEG 2000 Part 1/Part 2 compliant. The last row shows the averages for the holographic images.
|Lossy 9×7||JPEG||3FP*||4PP*||4FP||5M*||5FP||3FP + DA||4M + DA||5M + DA||4PP + DA||4FP + DA||5FP + DA|
Note: The bold values indicate which compression parameters give the highest gain for a given JPEG 2000 configuration.
The results from both Tables 6 and 7 show that even in a JPEG 2000 constraint application, the compression efficiency for off-axis holographic images can seriously benefit from the use of a limited packet decomposition, such as the 3-level full packet or 4-level partial packet structures. However, our proposed extensions enhance the compression efficiency even more drastically. The DA-DWT proves to be a very powerful tool, significantly increasing the compression performance on off-axis microscopic holography data.
It should be noted that the measured distortion introduced by the lossy compression of the recorded hologram is not necessarily directly proportional to the actual perceived distortion of the reconstructed object. This depends entirely on the nature of the introduced distortion. Improvement of the reconstruction quality can be achieved by modifying the used quality metric (which is the conventional MSE employed by the JPEG-2000 standard), so that it better models the relation between objective and subjective distortions of the hologram. However, as noted in Sec. 2 and due to the large number of possible requirements, it is unlikely that creating a universal quality metric directly applicable for every possible type of measurement would be desirable or possible. Still, it is possible to further improve upon the default MSE-based distortion metric (e.g., a weighted MSE metric would allow one to assign lower weights to code-blocks representing frequencies lying far from the carrier frequency, which generally contain less important information). This is similar to the visual frequency weighting used in JPEG 2000 compression for improving the perceived quality of regular imagery.36,37 However, this subject is beyond the scope of our paper.
We demonstrate how JPEG 2000 can be efficiently used to compress microscopic off-axis holograms by proposing two extensions to the standard:
1. We replace the existing ADSs feature such that any decomposition structure becomes available. Along with this extension, we provide the means to efficiently signal these AD structures in the code-stream. This means that our proposed code-stream syntax for the XAD marker requires up to 10 times less bits in the header than that of JPEG 2000’s AD syntax (ADS and DFS markers) for equal decomposition styles, and it has a lower implementation complexity.
2. We introduce a practical implementation of a block-based DA-DWT for JPEG 2000 (DA-DWT). From the results, it is clear that the compression performance for off-axis microscopic holography data benefits from employing the DA-DWT, even with the overhead of signaling the direction vectors in the code-stream.
In doing so, we realized a framework that is specific enough to compress DHM data with significantly improved compression performance, and yet general enough to leave room for subsequent filtering and postprocessing of the hologram data, depending on the use case. Additionally, we postulate that this framework can be extended to other imaging technologies based on fringe pattern data, as they largely share the frequency and directionality properties of DHM data. The encoding framework also allows for using other basis functions as well.
Using the proposed techniques, we report significant compression performance gains of 1.3 up to 11.6 dB (BD-PSNR) for lossy compression and bit-rate reductions of over 1.6 bpp for lossless compression of off-axis holographic images.
We would like to thank Nicolas Pavillon, the Lyncée Tec SA (Lausanne, Switzerland) and Ahmed El Mallahi (Microgravity Research Center, ULB, Brussels) for providing the digital holographic recordings used in these experiments. The research leading to these results has received funding from the Research Foundation Flanders (FWO) with project no. G014610N and the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC Grant Agreement n. 617779 (INTERFERE).
David Blinder is a PhD student working at the Vrije Universiteit Brussel (VUB). He received his BSc degree in electronics and information technology engineering from the VUB and graduated with the MSc degree in applied sciences and engineering at the VUB and the École Polytechnique Fédérale de Lausanne (EPFL) in 2013. His research focuses on the efficient representation and compression of static and dynamic holograms.
Tim Bruylants graduated with an MSc degree in 2001 at the University of Antwerp. In 2005, he participated as a member of the Forms Working Group (W3C). In 2006, he became a PhD student at the VUB. His main research topic is the compression of medical volumetric datasets. He is an active member of the ISO/IEC JTC1/SC29/WG1 (JPEG) and WG11 (MPEG) standardization committees. He is coeditor of the JPEG 2000 Part 10 (JP3D) specification.
Heidi Ottevaere has been a full professor at Vrije Universiteit Brussel (VUB), since 2009. She is responsible for the instrumentation and metrology platform at the Photonics Innovation Center and for the biophotonics research unit of the Brussels Photonics Team (B-PHOT). She is coordinating and working on multiple research and industrial projects focusing on the design, fabrication, and characterization of different types of photonic components and systems in the field of biophotonics, interferometry, holography, and imaging.
Adrian Munteanu has been a professor at VUB since 2007 and a research leader of the 4Media group at the iMinds Institute in Belgium. He is the author or coauthor of more than 200 journal and conference publications, book chapters, patent applications, and contributions to standards. He is the recipient of the 2004 BARCO-FWO prize for his PhD work. He currently serves as an associate editor for IEEE Transactions on Multimedia.
Peter Schelkens currently holds a professorship at the VUB and is a research director at the iMinds Research Institute. In 2013, he obtained an EU ERC Consolidator Grant focusing on digital holography. He (co-)authored over 200 journal and conference publications and books. He is an associate editor of the IEEE Transactions on Circuits and Systems for Video Technology. He is also participating in the ISO/IEC JTC1/SC29/WG1 (JPEG) and WG11 (MPEG) standardization activities.