Ultrasonic logging image denoising algorithm based on variational Bayesian and sparse prior

Abstract. An image denoising method is proposed for ultrasonic logging images with severe noise. The proposed method works on a variational Bayesian framework using block sparse prior. First, the sparse coefficients are simulated by a more appropriate distribution—Laplacian distribution. Then the variational Bayesian denoising model in which Laplacian distribution is used as a prior term of sparse coefficients is proposed. Finally, semiquadratic regularization is used to solve the model with a simplified process. Moreover, during the denoising process, a relaxation factor is introduced to improve the accuracy. In the vast majority of cases, the proposed method obtained better results in both the visual quality and the objective evaluation. It achieves better denoising performance than the existing denoising methods when the edge details of the images are contaminated by noise, especially severe noise. The experimental results show that the proposed method is practical in ultrasonic logging images.


Introduction
With the increasing demand for oil and gas in the 21st century, ultrasonic logging imaging will be more and more widely used due to its intuitive feature. During the logging, a motor drives the transducer and magnetometer to rotate around the axis of the instrument at a fixed rate to scan the entire borehole wall. In contrast, the ultrasonic logging image is inevitably contaminated by noise. It is necessary to remove noise and improve the image quality, which will guarantee subsequent image processing performance, such as for fracture segmentation, hole recognition, and reservoir interpretation. The purpose of denoising is to preserve image details while removing noise. In the past decades, scholars have proposed a variety of denoising methods, including average filter, total variation, sparse coding, and deep learning. [1][2][3][4] The average filter is an effective method for removing noise, but the edges become blurry after denoising. Garnett et al. 5 presented a new trilateral filter based on the bilateral filter to remove noise. A weighting cost function was designed to calculate the weights of neighbor pixels and operate the filter. It effectively removed noise while preserving the image edges when the noise intensity was small. However, the denoised results became very poor, and the noise intensity was severe. Li and Suen 6 proposed a new nonlocal means method based on gray theory. The experiments showed that their method has a superior denoising ability. The method accurately discriminated between the information and noise while effectively reducing the pseudo-Gibbs artifacts. Unfortunately, although the noise was severe, some pseudo-Gibbs artifacts were still visible after denoising. Then, Li and Wang 7 presented an improved wavelet threshold denoising method combining non-local mean filtering. However, it had difficulty extracting the redundancy of visual images, especially for the images with severe noise. Recently, to solve the problem that *Address all correspondence to Luoyu Zhou, luoyuzh@yangtzeu.edu.cn the early lesions of COVID-19 are not obvious and the generated images are easily contaminated with noise, Guo et al. 8 proposed an adaptive two-stage filtering method for COVID-19 CT images. The results demonstrated that their method achieved satisfactory denoising performance when the images were contaminated with impulse noise. However, its denoising performance of Gaussian noise still needs to be improved.
In the last decade, total variation has been one of the popular methods in image denoising. The total variation method, was first proposed by Rudin et al.; 9 it achieved a trade-off between noise removal and edge preservation. However, the method easily generated block effects. A number of methods based on total variation have been proposed to suppress the block effects. You and Kaveh 10 proposed a fourth-order partial differential equation for noise removal. In their method, a cost function was proposed based on the image intensity function. Then the minimization of the cost function was solved by the time evolution of the partial differential equation. Chan et al. 11 proposed an improved model by adding a nonlinear fourth order diffusive term to the Euler-Lagrange equations of the total variation model, and then Shahdoosti et al. 12 proposed a new hybrid denoising scheme using total variation and ripplet transform. The visual quality demonstrated that their schemes provided sharper edges. Recently, Wang et al. 13 proposed a vector total fractional-order variation for image denoising. Furthermore, they introduced a regularization term for describing texture space to improve the denoising performance. Guo and Chen found that the main shortage of the traditional total variation was that it cannot reflect the local feature owing to the same weight of different orientation of total variation. Therefore, they proposed a nonconvex anisotropic total variation for image denoising. 14 The results showed that it was very effective for suppressing staircase effects. Overall, the total variation method can achieve good performance, but it has difficulty minimizing the cost function because of the nondifferentiability of the model. Sparse coding is a newly built image representation method in signal processing. 15,16 It shows that the signals can be exactly reconstructed by fewer coefficients than the traditional methods. In recent years, sparse coding has been studied quite extensively for image denoising. The existing image denoising methods based on sparse coding include two steps. First, image blocks are expressed by a linear combination of a few coefficients taken from the basis functions. Second, the noise is removed from the image because the noise distribution does not meet the sparse assumption in general. Recently, Dong et al. 17 proposed a nonlocally centralized sparse representation denoising method combining nonlocal self-similarity and sparse representation of images. It has a very powerful denoising performance. However, solving K-means and principal component analysis (PCA) used with this method is difficult. Inspired by Dong's method, Nejati et al. 18 proposed a denoising method based on sparsity and low-rank representation that also took advantage of nonlocal self-similarity and image sparsity. Liu et al. 19 proposed a correlation adaptive sparse model for image denoising. Their proposed model adaptively selected different image data using local and nonlocal correlation. In contrast, Zhang and Li 20 presented an image denoising and repair model based on compressed sensing theory, and the results demonstrated that their model effectively removed noise.
Deep learning was first used for image denoising in 2015. 21 It does not need the parameters to be manually set in the denoising process. Then, Mao et al. 22 introduced multiple convolutions to remove noise and obtain the high-resolution image and Zhang et al. 23 proposed a flexible neural network for denoising by introducing different noise levels and the image patch as the input. Scetbon et al. 24 presented an end-to-end deep architecture with the exact K-singular value decomposition (K-SVD) computational path and trained it for optimizing the denoising performance. The deep learning methods can improve denoising results; however, they have high requirements of the hardware for training deep networks.
Although various image denoising methods have been applied to numerous fields, few methods have been proposed for ultrasonic logging images. Motivated by the research, we propose a novel variational Bayesian inference framework for ultrasonic logging image denoising using a sparsity prior term. This prior assumes that the outputs of local sparse coefficients of the ultrasonic logging image obey a Laplacian distribution, which is demonstrated by many simulations and tests. The main contributions of this work rely in three aspects. First, we use the Laplacian distribution to simulate the sparse coefficients of ultrasonic logging images. Second, we introduce the sparse prior term into the variational Bayesian model to improve the denoising performance. Finally, semiquadratic regularization is used to solve the model. Moreover, during the denoising process, a relaxation factor is introduced to further improve the accuracy. The experiments demonstrate that our proposed algorithm can obtain a competitive performance compared with existing denoising algorithms, especially for ultrasonic logging images with severe noise. This paper is organized as follows. Section 2 shows the image model and sparse coefficients of the ultrasonic logging images. The variational Bayesian denoising model is detailed in Sec. 3. Experimental results are shown in Sec. 4, and finally conclusions and future research directions are presented in Sec. 5.
2 Image Denoising Model and Sparse Coefficients of Image

Image Denoising Model
For convenience but without loss of generality, we use 1D notation to represent the imaging model, as shown in the following equation: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 5 3 7 g ¼ f þ n; (1) where g denotes the observed image, f denotes the original image, and n denotes the additive noise.
To utilize the image sparsity, we define K filters D k [assumed to be an orthonormal basis, e.g., discrete cosine transform (DCT) and wavelet], and the outputs of these filters are shown as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 4 5 8 where ε k satisfies the sparsity constraint and ε ¼ ½ε 1 ; · · · ; ε K . Now we determine the appropriate statistical model to simulate the distribution of ε k .

Statistical Model of Sparse Coefficients
Historically, sparse coding can be traced back to the proposition of wavelet transform 25 and multiresolution analysis. 26 Further, sparse coding can be successfully employed in image processing. To date, studies on sparsity can be divided primarily into two kinds: basis function and sparse coefficients. For basis functions, the main aim is to represent the sparse vector under a certain basis function. For sparse coefficients, the aim is to find an appropriate statistical model to express sparse coefficients. We assume that the distribution of sparse coefficients ε k is independent and identically distributed, with the parameters ⇀ θ k ¼ ½θ k1 ; θ k2 ; · · · . So the prior of ε k is expressed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 2 7 0 As for the ultrasonic logging images, the distribution of the sparse coefficients obeys a heavytailed distribution, as shown in Fig. 1. First, the heavy-tailed distribution of the sparse coefficients results from the filter outputs of the image edge areas being usually small. Then the heavytailed distribution can be regarded as a prior and used as a regularization term for the denoising model. Second, unlike the visual images with heavy-tailed distributions of the sparse coefficients that obey the Gaussian distribution, the sparse coefficients of ultrasonic logging images obey a different distribution.
As a family of statistical models, Gaussian distribution, Laplacian distribution, and Student-t distribution all obey a heavy-tailed distribution. To determine the appropriate statistical model to simulate the distribution of sparse coefficients, we randomly choose 100 ultrasonic logging images (the sample is shown in Fig. 2) and utilize the above three distributions to simulate their spare coefficients. The simulated results are shown in Fig. 3.   As seen from Fig. 3, three simulated results are different obviously. The Gaussian distribution matches well with the center of the original data, and the Student-t distribution matches well with the kurtosis of the original data. By contrast, the Laplacian distribution matches well with both the center and the kurtosis of the original data. Moreover, we used the root mean square error (RMSE) to quantitatively evaluate the simulated results. RMSE is defined as where Ys i is the simulated values, Yo i is the original values, and Num is the length of Ys i . As seen from Table 1, the results simulated by the Laplacian distribution are better than the other two distributions. Therefore, we use the Laplacian distribution as a prior term of sparse coefficients, expressed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 3 6 1 Existing models always assume the mean θ k2 ¼ 0, which violates the actual results. Therefore, in our model, the mean θ k2 ≠ 0. However, solving this prior term is difficult because its normalization constant cannot be found in closed form. One contribution of this paper is that a constrained variational Bayesian model is derived for solving this problem. This algorithm is described in detail next.

Hierarchical Bayesian Analysis
The model Eq. (1) is converted into the following form via Eq. (2): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 1 7 6 So we use the commuting property and rewrite the model as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 1 3 2 where w k ¼ D k g denotes the filter outputs of the observed images and n k denotes the filter outputs of the additive noise n k ∼ Nð0; σ 2 n Þ. As for the parameters of the Laplacian prior, two parameters ⇀ θ ¼ ½θ 1 ; θ 2 are estimated. To simplify the computation, we assume the parameters (θ 1 ¼ θ; θ 2 ¼ u) that are directly calculated by the sparse coefficients. Then the Laplacian prior term is written in a simplified form as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 7 1 1 The maximum a posteriori estimator is obtained as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 6 5 4 Pðg; ε; θÞ ¼ Pðgjε; θÞPðε; θÞ ¼ Pðgjε; θÞPðεjθÞPðθÞ: As for the parameters, Jeffrey's prior has been proposed as an appropriate choice, that is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 6 1 0 Therefore, Eq. (9) is translated into the new form using Eqs. (7), (8), and (10) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 6 ; 5 5 3 Jðε; θÞ ¼ arg max log½Pðgjε; θÞPðεjθÞPðθÞ where J is a cost function. For Jeffrey's prior, log θ k → ∞ as θ k → 0, which results in Eq. (11) being unstable. So we have logðθ k þ δÞ instead of logðθ k Þ, where δ is a very small number for improving the stability of Eq. (11). In this case, the model Eq. (11) is written in the following form: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 1 1 6 ; 4 4 1 Jðε; θÞ ¼ arg min For the Laplacian prior term, we have ε ¼ ρυ and u ¼ ρμ, where ρ ¼ diagðθ k Þ is a diagonal matrix for normalizing ε of the image block. Then the Bayesian model is converted into the joint estimation of Jðυ; θÞ as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 3 ; 1 1 6 ; 3 5 7 Jðυ; θÞ ¼ arg min Unlike Chantas et al., 27 who utilized the Student-t distribution as the prior term and solved the variational Bayesian model by approximation of a lower bound on the logarithm of the marginal likelihood, or Shanthi et al., 28 who utilized the Gaussian scale mixture distribution as the prior term and converted their model into L2-norm, we explicitly utilize the general Laplacian distribution as the prior term and convert the sparse model into L1-norm. In fact, compared with L2-norm, L1-norm is more specifically suitable for image sparsity. Such a sparse model is appealing and motivates us to further exploit the solution.

Solving the Variational Bayesian Model
For sparse coefficients, it is necessary to extract similar blocks from the observed image, with sparse coefficients ε that should be simulated by the same prior. Thus, these similar blocks are combined to extend Eq. (13) as where G ¼ ½g 1 ; g 2 · · · g n is the collection of n similar blocks. Accordingly, Y ¼ ½υ 1 ; υ 2 · · · υ n and M ¼ ½μ 1 ; μ 2 · · · μ n are the collection of υ and μ, respectively.
For the model Eq. (14), there exists a very popular approach for minimizing J called alternating minimization, that is, one starts with some initial guess, and then one successively obtains the alternating sequence of conditional minimizers.

Optimization of θ
For the optimization of θ, it is simplified by fixing Y as shown in the following equation: θ ¼ arg min where W ¼ DG. To seem more intuitive, the equation is written as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 7 ; 1 1 6 ; 5 3 1 The optimization of θ is transformed into the minimization of quadratic function fðθÞ. The function is further decomposed into a series of subproblems as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 8 ; 1 1 6 ; 4 6 4 Obviously, this is solved after taking the derivative of fðθ k Þ to θ k and equating it to zero, followed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 9 ; 1 1 6 ; 3 9 9 So the minimum of fðθ k Þ is minðfðθ k1 Þ; fðθ k2 ÞÞ. In contrast, if Δ ¼ b 2 k − 8a k c < 0, the function fðθ k Þ is a monotone function. Then the minimum of fðθ k Þ is fð0Þ, and the optimization of θ k is shown as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 1 ; 1 1 6 ; 2 5 6

Optimization of Y
To solve Eq. (22), this algorithm is derived from the well-known variable-splitting and penalty techniques. Therefore, according to semiquadratic regularization, we introduce a relaxation factor to transform Eq. (22) into an equivalent form as follows: For this, the unique minimum is given by the following two-dimensional shrinkage equation: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 5 ; 1 1 6 ; 4 8 8 (2) The optimization of Y In contrast, for a fixed z, the optimization of Eq. (23) is also simplified as shown in the following equation: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 6 ; 1 1 6 ; 3 8 0 Obviously, Eq. (26) is quadratic for Y, and the minimum of Y is given by the normal equations: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 7 ; 1 1 6 ; 3 1 1 where I is the identity matrix. Noting that ρ ¼ diagðθ k Þ is a diagonal matrix, ρ T ρ þ σ 2 n βI is also a diagonal matrix, and therefore its inverse can be easily computed.
By Eqs. (21), (25), and (27), the denoised result of the noisy image is shown as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 8 ; 1 1 6 ; 2 2 9 f ¼ DρY:

Steps of the Method and Parameters Setting
The steps of this method are summarized below.
① The initial definition for f, D.
The parameter σ n is empirically estimated, as shown in Refs. 29 and 30. In addition, as discussed in Sec. 3.2.2, when β → ∞, Eq. (23) is equivalent to Eq. (22). However, the larger β is, the slower the convergence rate is. Considering the convergence rate and equivalence of the function, the parameter β is set as an increasing sequence. The initial value is 1, and the maximum value is 10 6 ∼ 10 7 . After every iteration of z and Y, the value is modified as β ¼ 3β.

Experimental Results and Analysis
Now the experiments are presented to demonstrate the denoising performance for ultrasonic logging images. Moreover, we compare our method with three other denoising methods.

Denoising for Synthetic Noisy Images with Different Noise
In the following section, we employ the above three methods and our proposed method [called Laplacian prior and sparse regularization (LPSR)] on six images contaminated by different noises. These six original ultrasonic logging images are shown in Fig. 2. To comprehensively evaluate the denoised image quality, peak signal to noise ratio (PSNR) and structural similarity (SSIM) are used as criteria for evaluating the quality. In this section, several experiments are reported to validate the denoising performance. As stated above, we test the other three denoising algorithms on these ultrasonic logging images and show the sample of the results for a visual comparison.
Generally speaking, when the standard deviations of noise is more than 50, the noisy image is considered to be a severe noisy image. In our experiments, these six images are contaminated by different noise levels and the standard deviations of noise are 20, 40, 60, and 80. The PSNR and SSIM of all denoised images are shown in Table 2, with the best results marked in bold.
As seen from Table 2, compared with the other algorithms, our proposed LPSR method obtained the best denoising performance. For all denoised images and their PSNR and SSIM, the highest values of PSNR and SSIM in the majority of cases belong to the proposed LPSR. In addition, for the WGSM method and the TWMF method, their PSNR and SSIM are significantly less than those of the NCATV method and LPSR (our proposed method), which demonstrates that WGSM and TWMF are ineffective on these ultrasonic logging images, especially the images with severe noise (σ n ¼ 60 and σ n ¼ 80). Several reasons for these improvements were found. First, the WGSM method and the TWMF method follow the hypothesis that the local sparse coefficients of an image obey a Gaussian distribution. However, as demonstrated in Sec. 2, for ultrasonic logging images, a Laplacian distribution is a more appropriate distribution. Therefore, our method introduces the Laplacian distribution as a prior term and obtains the better performance. Second, the NCATV method, as well as most of the total variation methods, generates some block effects, which influence the evaluation values.
In terms of subjective vision, the denoised images of Fig. 2(d) at medium noise (σ n ¼ 20) are shown in Fig. 4. Two zoomed regions of these images are shown to the right of each result. As seen from Fig. 4, we found that the denoised results of all algorithms have a good subjective visual quality at the medium noise, especially for our proposed LPSR method. These denoised images have few artifacts in the smooth regions. Of course, there are slight differences among these algorithms. The WGSM method and the TWMF method more easily generate blur edges. This is because WGSM and TWMF cannot preserve the edge details effectively.
Similarly, the denoised images of Fig. 2(f) at severe noise (σ n ¼ 80) are shown in Fig. 5. It is observed that the proposed LPSR outperforms the other three algorithms in preserving smooth regions and image edges and obtains the most visually pleasant results that have fewer artifacts and clearer edges. We also observe that the improvement is easier to distinguish when the noise contamination is severe, especially for the zoomed regions of these denoised images.

Denoising for Real Ultrasonic Logging Images
In this section, we test our method on various real ultrasonic logging images. All images are denoised using the denoising methods of WGSM, TWMF, NCATV, 14 and our proposed LPSR. The real ultrasonic logging images and their denoised results are respectively shown in Figs. 6-8. First, for Fig. 6, all denoised results exhibit clearer image details compared with the noisy image. However, it is seen from the enlarged rectangle that the result of WGSM [ Fig. 6 has significant noise in the denoised results. In Figs. 6(c) and 6(d), the details in the denoised results of TWMF are over smooth. In contrast, our proposed method [ Fig. 6(e)] produces a better effect in vision over the smooth image. Second, for Fig. 7, the small fractures in our denoised results [ Fig. 7(e)] are clearest, especially in the enlarged rectangle. The same denoised effects are also shown in Fig. 8. In particular, the denoising of ultrasonic logging images is challenging. However, our results exhibit sharper details and fewer ringing artifacts compared with the other methods.

Conclusions and Future Work
In this paper, an image denoising method is proposed for the ultrasonic logging images with severe noise. The denoised images of our method have clearer edges and fewer artifacts. The success of our method benefits from three aspects. First, the sparse coefficients are simulated by a more appropriate distribution-Laplacian distribution. Second, we use the Laplacian distribution as a prior term and propose the variational Bayesian denoising model. Finally, a   relaxation factor is introduced to solve the proposed model. Numerical experiments demonstrate that the proposed algorithm outperforms other previous algorithms in terms of both visual quality and objective evaluation. However, due to the complexity of the solution of variational Bayesian model, when the size of noisy images becomes large, it takes too much time to remove noise. For example, it takes about 280 s to remove noise for a 512 × 512 image with medium noise (σ n ¼ 20). Obviously, the defect limits the extension of our method, and thus future research will focus on developing parallel technology to decrease the running time.