Video compression to support the expansion of whole-slide imaging into cytology

Abstract. Digital screening and diagnosis from cytology slides can be aided by capturing multiple focal planes. However, using conventional methods, the large file sizes of high-resolution whole-slide images increase linearly with the number of focal planes acquired, leading to significant data storage and bandwidth requirements for the efficient storage and transfer of cytology virtual slides. We investigated whether a sequence of focal planes contained sufficient redundancy to efficiently compress virtual slides across focal planes by applying a commonly available video compression standard, high-efficiency video coding (HEVC). By developing an adaptive algorithm that applied compression to achieve a target image quality, we found that the compression ratio of HEVC exceeded that obtained using JPEG and JPEG2000 compression while maintaining a comparable level of image quality. These results suggest an alternative method for the efficient storage and transfer of whole-slide images that contain multiple focal planes, expanding the utility of this rapidly evolving imaging technology into cytology.


Introduction
Whole-slide imaging (WSI) refers to the process of digitizing glass slides at high resolutions, supporting digital workflows in pathology and providing the first necessary step for automation, digital consultation, and archival, and the application of emerging technologies in artificial intelligence to "virtual slides" of histologic tissue. [1][2][3] However, even following recent FDA clearance of WSI for primary diagnosis, 4 adoption of WSI has been slow due to a number of factors, including the potentially massive storage and bandwidth costs associated with storing, retrieving, and viewing virtual slides. Although conservative data retention policies or selective scanning can help institutions overcome these barriers, collecting images that require scanning at multiple focal planes (or z-stacking) still represents a technical challenge for many laboratories, despite potentially significant diagnostic gains. 5 Modern WSI scanners often support z-stack scanning, but the storage requirements are typically an order of magnitude greater than for single-plane scans. As laboratories begin to rely on their image management needs being met by networked servers or the cloud, this also introduces the issue of increased bandwidth requirements to support z-stacking for live WSI viewing. Since current guidelines for single-plane virtual slides recommend a minimum of 1 Gb∕s network speeds, 4 additional networking infrastructure may need to be considered for viewing large z-stack virtual slides at high resolutions. Therefore, alternative methods to reduce the data storage and bandwidth burden must be explored to mitigate some of the costs associated with the expansion of WSI into areas such as cytology that benefit from multiple focal planes captured for each slide. Methods have been developed that include selectively sampling and storing the single focal plane from a z-stack that contains the most in-focus material, 6 but this approach carries the risk that important information is discarded, especially in spatial regions where multiple focal planes carry important information (e.g., overlapping cells or structures).
We explored whether the inherent redundancy observed across focal planes in virtual slides 7 could be harnessed to support a strategy to compress images across focal planes. Noting that video compression faces a similar redundancy, where a series of video frames are often similar to one another and usually compressed using a strategy that accounts for redundancy across frames, we turned to an open source and industrystandard video compression algorithm. The high-efficiency video coding (HEVC) video standard is designed to represent video frames in a reduced form; a key frame (I-frame) is identified in a series of video frames and encoded at a relatively high bitrate, and unidirectional or bidirectional information from the series of video frames is then used to compress subsequent frames at lower bitrates (P-frames and B-frames, respectively). 8 By converting z-stacks of cytology images into video frames and storing the entire z-stack as a single HEVC file, we examined whether this approach could represent an alternative that improved compression ratios while preserving the original image content.

Slide Selection
We retrospectively analyzed three ThinPrep (Hologic, Inc.), three SurePath (Becton, Dickinson and Company), and three conventional smear slides obtained as part of the clinical activities of the Department of Pathology. Slides were randomly selected by an honest broker and delivered to the investigators in a deidentified fashion. The experimental protocol was considered exempt by the Drexel University Institutional Review Board under exemption #4.

Whole-Slide Scanning
Slides were scanned at 0.23 μm∕pixel using the Hamamatsu Nanozoomer S210 whole-slide scanner (Hamamatsu Corporation, Bridgewater, New Jersey), considered by the manufacturer equivalent to ×40magnification. All analyses were performed on the full-resolution images except where otherwise indicated. Ten focal planes were captured in 2 μm increments centered on a plane determined by the whole-slide scanner following the selection of 20 focus points positioned across the slide. Each slide was saved as an 8-bit-per-channel file stored in an image pyramid with downsample factors of 1, 4, 16, and 64. The scanner required the use of JPEG compression by default, but this was applied at the highest quality factor (QF) available (QF ¼ 0.99). As shown in Sec. 5 Appendix A, compression of ×40 histology images at QF ¼ 0.99 only negligibly altered their quality and did not appreciably impact downstream measurements.

Image Analysis
We used the Bioformats Library 9 to load images and MATLAB (Mathworks, Natick, Massachusetts) to perform image processing and analysis. We analyzed a 1.5 mm × 1.5 mm region from each slide to ensure analysis was applied only to regions that contained material. We analyzed regions in their entirety, not further dividing images into tiles. To compare the pixelwise similarity between images before and after compression, we used the structural similarity index (SSIM), 10 a metric designed to take perceptual factors into account and which has previously been used to characterize the impact of compression on image quality. 11-13

Video Encoding
We used HEVC compression 8 as the basis for the video compression scheme described, relying on the x265 algorithm (MulticoreWare, San Jose, California) running within an open source video manipulation tool (ffmpeg). Since this algorithm depends on a sequence of video frames as input, we first extracted individual focal planes from the whole-slide image and saved each frame as a separate image file compressed in a lossless fashion. Since the lossless conversion was only an intermediate step to ensure compatibility with ffmpeg, we do not further consider it here. We passed the following optional parameters to the x265 algorithm: preset = slow, tuning = ssim, qcomp = 1.0, b-frame-bias = 100. These parameters ensured that frames 2 to 9 would consist solely of B-frames, thereby instructing the algorithm to use the entire set of frames as a data reference to provide optimal compression efficiency. The quality of the encode was controlled by passing a bitrate control parameter (CRF: constant rate factor) that adjusted the overall bitrate of the file.

Adaptive Compression
We tested three compression schemes: JPEG, JPEG 2000 (denoted JP2), and HEVC video compression. JPEG and JP2 compression were performed within MATLAB using the imwrite function. For JPEG compression, we tested QF values of 0.60, 0.70 (the default setting for our scanner), and 0.80, in which higher values indicate higher preservation of image quality but also larger file sizes. To appropriately compare the three compression schemes, we applied an adaptive compression algorithm to JP2 and HEVC to derive the quality setting necessary to achieve the same SSIM value as the corresponding JPEG compressed image. This was accomplished by applying an optimization algorithm using golden-section search that iteratively adjusted the JP2 or HEVC quality setting to achieve an SSIM value within 0.001 of the corresponding JPEG-compressed SSIM value. The lower and upper bounds for the optimization function were set to compression ratio values of 1 and 600, respectively, for JP2, and CRF values of 10 and 60, respectively, for HEVC. None of the analyses arrived at local minima at the boundaries.

Compression Ratio
Compression ratio was defined as the ratio between the file size of the raw image (without compression) and the file size of the image following the compression method under test. For JPEG and JP2 compression ratio measurements, we summed the file sizes of the individually encoded focal planes. For HEVC compression ratio measurements, we used the size of the video file created. We note that metadata is typically stored within wholeslide image files, but this alters file size by a negligible amount. 1 3 Results

Video Compression as a Method to Compress Images Across Focal Planes
We scanned a total of nine cytology slides at 10 focal planes spanning 18 μm in 2 μm increments, values similar to those used in previous studies. 14-16 A representative virtual slide is shown in Fig. 1 at three levels of magnification. The scanning regions of virtual slides varied considerably, ranging from 186 to 529 mm 2 in size (Table 1), and using JPEG compression was associated with file sizes estimated to be between 3.6 and 9.8 GB each. Images were similar across focal planes, although some cells were in focus only at superficial focal planes while others (often overlapping) came into focus at deeper focal planes [ Fig. 1(b)]. Given the substantial redundancy in image information across focal planes, we reasoned that a compression method that represents multiple similar image instances in a reduced form may be suited to reduce the large file sizes associated with z-stack whole-slide images. We turned to a popular video compression standard, HEVC, to test whether an alternative representation can be adopted to reduce file sizes without visually impacting image quality. We converted images acquired at different focal planes into video frames and applied video compression to these frames using the x265 algorithm. HEVC compression resulted in a reduction of file size by a factor of 2.6 to 6.1 (median: 3.6) in comparison to standard JPEG compression, and by a factor of 1.0 to 2.1 (median: 1.4) in comparison to JP2 compression (Table 1). We converted the video file back to its constituent focal planes in a lossless fashion to compare the differences between the original image [ Fig. 2(b)], the HEVC compressed image [ Fig. 2(c)], and for a comparative reference, the JPEG compressed image [ Fig. 2(a)].
The observed improvement in compression efficiency using HEVC is assumed to be accomplished by harnessing the redundancy across focal planes. However, HEVC also utilizes a modern compression algorithm that may also contribute to the improvement in compression ratio on an intraframe basis. We also measured the compression ratio of single frames using intraframe HEVC to determine whether the improvement in compression efficiency was due primarily to intrinsic improvements in HEVC image encoding or the interframe relationships.
We found that HEVC in intraframe mode usually performed at approximately the same level as JP2 (Table 1). These results are in contrast to the observation that encoding multiple focal planes with HEVC generally outperformed JP2, suggesting that HEVC achieved improved performance by efficiently encoding the redundancy across multiple focal planes.

Compression Performance Among Multiple Quality Settings
We used SSIM to quantify image quality following compression and compared the compression ratios that were produced by each algorithm when the same SSIM value was reached. We established SSIM values for each image following JPEG compression at QF ¼ 0.60 (low), QF ¼ 0.70 (medium), or QF ¼ 0.80 (high) settings. The mean compression ratios achieved by HEVC and JP2 compression were significantly greater than JPEG (Fig. 3) for all quality settings tested (p < 0.01, Wilcoxon sign rank test). Furthermore, HEVC achieved significantly higher compression ratios than JP2 for all quality settings tested (p < 0.01, Wilcoxon sign rank test).

Impact of Range and Number of Frames on Compression Performance
To examine the performance of HEVC compression for different z-stack configurations, we measured compression ratio while varying the number of focal planes compressed. Holding the focal plane spacing constant at 2 μm, we tested configurations with 4, 6, 8, and 10 frames [ Fig. 4(a)] and found that compression ratio increased when fewer frames were compressed [ Fig. 4(b)]. However, by keeping the focal plane spacing constant, this also carried with it a decrease in the spatial range over which images were compressed. When we compressed four frames with plane spacings of 6 μm, the compression ratio remained high (Fig. 4, dark bar), suggesting that the number of frames compressed may be the dominant factor dictating the compression efficiency of HEVC.

Multiscale Representation
We tested whether compression ratio and image quality were influenced by the magnification at which the compression was applied. We applied compression to ×40, ×10, and ×2.5 magnifications and found that JPEG and JP2 compression exhibited a reduction in compression ratio with lower magnifications. HEVC, on the other hand, maintained a high compression ratio at 0.23 and 3.7 μm∕pixel (×40 and ×2.5, respectively), although exhibited a reduction in compression ratio at an intermediate resolution (Fig. 5). These results suggest that the improvement   in compression ratio observed in this study using HEVC may be even more pronounced for lower resolution scanning than we report at ×40.

Video Compression as a Viable Alternative to Conventional Methods
We demonstrated that HEVC compression can be applied as a viable alternative to conventional storage methods to reduce file sizes of z-stack images without sacrificing image quality. For 10-frame z-stacks, file sizes were reduced by a factor of over three compared to standard JPEG compression. Although we focused specifically on cytology slides, we expect that the results apply to a number of other applications where z-stacking is useful. Notably, JP2 compression also achieved high compression ratios; in two of the nine slides, JP2 compression exhibited similar performance to HEVC, suggesting that JP2 compression may be an appropriate alternative for a subset of cases. However, when we examined the compression ratio at lower resolutions or fewer number of frames, HEVC efficiency increased while JP2 did not. We assumed that the improvement in compression that we observed with HEVC is due to efficiency resulting from interframe compression, a factor that the standard JPEG and JP2 algorithms did not exploit. However, it remains possible that HEVC benefits from a more efficient single-frame compression performance. We tested this possibility by compressing single frames using the x265 algorithm in intraframe mode and found that while compression performance was indeed superior to JPEG, it was inferior to HEVC in all images tested. These results imply that HEVC intraframe improvement is partially responsible for the efficiency gains noted versus JPEG, but that interframe compression was necessary to achieve the additional improvement that enabled HEVC to outperform JP2 in this study.
Although HEVC generally outperformed JPEG and JP2 when all 10 frames were compressed, the improvement appeared to be greatest when only four frames were compressed, regardless of the focal plane spacing. Since we forced all but the first frame to be a highly compressed B-or P-frame, our observations may be due to having a suboptimal number of key frames in the video stream. Further research may be warranted to find the optimal set of algorithm tuning parameters for this unique application.

Perceptual Impact of Compression
Several studies have examined the effects of image compression on whole-slide image interpretation. For instance, using a two alternative forced choice (2AFC) test, Johnson et al. 11 demonstrated that pathologists are sensitive to even minute differences between images when they are presented on the screen together, and that the threshold of detection corresponded to JP2 compression ratios as low as 7. 17 In comparison, in this study, JP2 at the highest quality setting tested produced compression ratios of 50-130, suggesting that the use of standard WSI scanner settings may potentially introduce visually discriminable compression artifacts. However, it should be noted that the carefully controlled 2AFC test employed by Johnson et al. introduced significant artifice; pathologists are rarely faced with the task of comparing nearly identical images side-by-side in a controlled setting with the intent to determine whether differences exist. This task can often be accomplished by comparing pixelwise differences, which are not typically accessible in standard viewing conditions. Furthermore, it is not clear from their study that the differences detected by pathologists had an impact on perceived quality or that it influenced diagnostic accuracy. Further research is needed to determine how much image compression can be tolerated before diagnosis is potentially impacted.

Proposed Workflow and Pipeline
The compression method we describe requires considerable encoding and decoding time to transform a virtual slide into a video and then back again into a viewable image. Nevertheless, we do not expect that this is an impediment for modern computer systems. First, the video codec we used is commonly employed in high definition video playback and runs seamlessly on most modern desktop computers, laptop computers, and tablets. Second, we previously showed that virtual slide viewing is typically a much more disk-and network-intensive process than a CPU-intensive process, 18 and therefore most computers will likely have the available resources to additionally decode and render images stored in even a computationally complex format. Third, for network installations in which bandwidth is not a significant limitation, decoding can occur on the server side and the images can be delivered to the local computer in a standard fashion. Although the third option would not necessarily achieve the improvements in bandwidth offered by HEVC, reductions in storage size would still be realized. Regardless of approach, WSI viewers may benefit from predictive prefetching to mitigate some of the computational demands associated with more complex encoding/decoding schemes.
We suggest that the proposed compression strategy can be employed in two ways. First, upon image capture, z-stacks can immediately be compressed using the HEVC codec (either by the WSI scanner software or on the server side) and permanently stored in this format. Image viewers can then be adapted to retrieve and render this file format, or image management systems can be adapted to decode the HEVC-encoded file prior to delivering it to the image viewer. Alternatively, a tiered model can be employed where images are captured using conventional compression methods at high-quality settings and HEVC compression can later be deployed as part of an archival strategy. If archived image data are infrequently accessed or used only for research rather than slide viewing, then the image viewer would not necessarily have to be adapted to handle this new format.
We envision HEVC being applied in practice in one of two ways. One way is to simply use a CRF to encode z-stacks guided by some expectation of the quality produced by the selected value. The second way is to use the adaptive algorithm to achieve a target SSIM. At present, most users do not routinely measure SSIM values and instead simply use a recommended JPEG (or JPEG2000) QF setting, even though a constant setting can produce vastly different SSIM values across images. However, as labs begin using WSI on a broader scale and the concern for storage space continues to be an impediment, device vendors may elect to incorporate "smarter" automated methods in their product offerings.

Conclusion
Despite the many recent advances in the area of digital pathology, adoption is still hindered by a number of cost and reimbursement considerations. Nevertheless, WSI continues to expand its footprint in pathology, and cost-effective data management becomes a critical factor to support the storage, backup, and bandwidth limitations that many laboratories face. Although a number of methods can be employed to reduce the data burden for laboratories, including implementing data retention policies that purge older cases or selectively scanning only certain slides, image compression represents an alternative approach that enables laboratories to scan and retain more slides. Continued progress in this area will present significant cost savings to laboratories that may currently view storage costs as the major impediment to adoption of WSI. The effect of applying JPEG compression to histologic images was examined in additional detail by acquiring four raw images of histologic tissue from a QImaging MicroPublisher 5.0 RTV camera attached to an Olympus BX40 microscope and captured using a ×40 objective. Images were compressed in a lossless fashion upon acquisition, enabling restoration of the raw uncompressed image which served as the reference image for all SSIM comparisons. The SSIM was measured following JPEG compression of the reference image with QF ¼ 0.70. Subsequently, an adaptive algorithm was applied to measure the compression ratio achieved using JP2 or HEVC-IF compression of the reference image that produced the same SSIM as JPEG compression ( Table 2, second and third columns, respectively). As expected, JP2 and HEVC-IF achieved higher compression ratios for all four images. We assessed the extent to which an initial JPEG compression step at the highest available QF degraded image quality. Following JPEG (QF ¼ 0.99) compression of the reference image, we again compressed the image using either JPEG compression, JP2 compression, or HEVC-IF compression in an adaptive manner to achieve the same SSIM originally obtained using JPEG compression alone (QF ¼ 0.70). The compression ratio of an image with the equivalent SSIM was reduced by a negligible amount following an initial JPEG (QF ¼ 0.99) compression step ( Table 2, right columns in comparison to left columns). The slight reduction in the compression ratio that we observed was due to the increase in QF necessary to achieve an equivalent SSIM. Nevertheless, this increase was much smaller than the differences in compression ratio observed using the compression algorithms under test in this study. Therefore, the results imply that the presence of an intermediate JPEG QF ¼ 0.99 compression step did not appreciably alter the compression ratio measurements in this study. Table 2 Compression ratios achieved with (right columns) and without (left columns) an initial round of JPEG compression (QF ¼ 0.99) first applied. JPEG, JP2, and HEVC intraframe (IF) compression were applied to only a single image. Using an adaptive algorithm, the SSIM value for each row was held constant and determined by the SSIM produced by applying JPEG QF ¼ 0.70 compression to the raw image. The introduction of an initial JPEG QF ¼ 0.99 round reduced compression ratios by a value much lower than the differences across compression formats. HEVC compression of the single frame typically produced compression ratios that were higher than JPEG but often lower than JP2.