*in vivo*imaging, this value is not normally accurately known. Here, an autofocus approach for automatically selecting the sound speed is proposed. This is based on maximizing the sharpness of the reconstructed image as quantified by a focus function. Several focus functions are investigated, and their performance is discussed. The method is demonstrated using phantom measurements made in a medium with a known sound speed and

*in vivo*measurements of the vasculature in the flank of an adult mouse.

Photoacoustic tomography is an emerging biomedical imaging modality that is particularly useful for imaging vascular features within small animals and humans.^{1} The technique is based on the localized absorption of pulsed laser light, which creates an increase in acoustic pressure via thermoelastic expansion. By recording the temporal evolution of the acoustic waves reaching the tissue surface, an image of the initial pressure distribution, and thus the regions of optical absorption, can then be reconstructed.^{2} To perform this reconstruction, a value for the sound speed within the tissue medium must be specified. However, in many cases this value is not known. Although book values for the range of sound speeds in soft tissue are readily available, these vary between 1400 and 1600 m/s, depending on the exact tissue composition.^{3} For example, a higher proportion of lipids results in a relative decrease in the sound speed, while a higher proportion of proteins results in a relative increase. This range of observed sound speeds is sufficiently wide to defocus the reconstructed image if the correct value is not selected.

In practice, a manual optimization of the utilized sound speed is often performed to maximize the sharpness of prominent features visible within the reconstructed image. Such an optimization assumes that the image features have intrinsically sharp edges, at least in relation to resolution of the imaging device. In the case of vascular features, this is generally a valid assumption. The sound speed can thus be considered as a focusing parameter that is modified based on a visual assessment of the image sharpness. This has analogs in many other imaging modalities in which a parameter optimization is performed to focus the generated image, for example, in microscopy,^{4} optical coherence tomography,^{5} and computed tomography.^{6} Here, a similar method to automatically select the sound speed that maximizes the sharpness of reconstructed photoacoustic images is proposed. This is based on an autofocus approach in which a focus function is used to give quantitative assessment of the image sharpness.

There is a large body of literature concerning the quantification of image sharpness. In general, a sharp image will contain more high-frequency information than its blurry counterpart, however, there is no universally accepted measure for this difference. A variety of sharpness metrics (or equivalently, focus functions) have been proposed, including those based on the image gradient, pixel count, power, variance, entropy, and autocorrelation.^{4} The relative performance of these different measures have been reviewed by a number of authors.^{4, 7, 8, 9, 10} The best functions produce a single maximum with a sharp peak when the image is in focus, have a large range between the local minima on either side of the maximum, and are robust to noise in the image.^{4} Here, three focus functions that generally perform well—the Brenner gradient, the Tenenbaum gradient, and the normalized variance—are applied to autofocusing in photoacoustic tomography. (A full description of these metrics and their relative merits can be found in the references.)

The Brenner gradient computes the difference between a pixel value and its neighbors two points away. In two dimensions, this may be written as

## 1

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} {F_{\rm Brenner}} = \sum _{x,y} (f_{x+2,y} - f_{x,y})^2 + (f_{x, y+2} - f_{x,y})^2, \end{equation} \end{document} $${F}_{\mathrm{Brenner}}=\sum _{x,y}{({f}_{x+2,y}-{f}_{x,y})}^{2}+{({f}_{x,y+2}-{f}_{x,y})}^{2},$$*f*

_{x, y}≡

*f*(

*x*,

*y*) is the gray-level intensity of the pixel at (

*x*,

*y*). The Tenenbaum gradient makes use of the Sobel operators used in edge detection and is given by

## 2

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} {F_{\rm Tenenbaum}} = \sum _{x,y} (g*f_{x,y})^2 + (g^T*f_{x,y})^2, \end{equation} \end{document} $${F}_{\mathrm{Tenenbaum}}=\sum _{x,y}{(g*{f}_{x,y})}^{2}+{({g}^{T}*{f}_{x,y})}^{2},$$*g*is the Sobel operator given by

## 3

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} g = \left({ {{\begin{array}{c@{\quad}c@{\quad}c} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{array}}} } \right). \end{equation} \end{document} $$g=\left(\begin{array}{ccc}\hfill -1& \hfill 0& \hfill 1\\ \hfill -2& \hfill 0& \hfill 2\\ \hfill -1& \hfill 0& \hfill 1\end{array}\right).$$*g*is replaced with a centered finite-difference operator. In comparison, the Tenenbaum gradient also applies a smoothing perpendicular to the direction of the derivative. This increases the computational complexity of the metric, however, it also improves its robustness to image noise. Finally, the normalized variance quantifies variations in the pixel values about the mean and can be written as

## 4

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} {F_{\rm variance}} = \frac{1}{\mu } \sum _{x,y} (f_{x,y} - \mu)^2, \end{equation} \end{document} $${F}_{\mathrm{variance}}=\frac{1}{\mu}\sum _{x,y}{({f}_{x,y}-\mu )}^{2},$$where μ is the mean pixel value. Each of the focus functions also have 3-D equivalents, the form of which may be inferred from the equations given here.

Given a particular focus function, the selection of the optimum sound speed (i.e., the sound speed that maximizes the image sharpness as quantified by the focus function) progresses as follows. First, the image is reconstructed using an initial estimate of the sound speed. For *in vivo* imaging in soft tissue, this will typically be in the range 1500–1550 m/s. Next, the value of the sharpness metric is calculated using the focus function. The sound speed is then systematically modified, the image reconstruction repeated, and the value of the sharpness metric updated. This process continues until a maximum is found. Because the complete procedure can be automated, it can be considered analogous to a conventional autofocusing procedure.

To evaluate the performance of the different focus functions and the autofocus approach more generally, photoacoustic measurements of a blood vessel phantom in a 1.5% solution of intralipid were reconstructed using sound speeds varying from 1400 to 1600 m/s (a detailed description of the experimental measurements is given in Ref. 11, Sec. 3.C, Figs. 14–16). For each value of sound speed, the image sharpness was quantified using the three focus functions [Eqs. 1, 2, 4]. The metrics were calculated using both the 3-D voxel data and a 2-D depth-direction (*en face*) maximum intensity projection (MIP). The variation of the focus functions with sound speed is shown in Fig. 1. The different metrics all demonstrate a maximum for a sound speed of 1483 m/s. The corresponding sound speed in a 1.5% solution of intralipid at room temperature is similarly 1483 m/s.^{12} The excellent agreement between the optimum sound speed determined by the focus functions and the actual sound speed within the medium demonstrates the quantitative validity of using an autofocus approach to select the sound speed.

It is evident from Fig. 1 that the metrics computed from the 2-D MIP perform better than those computed directly from the 3-D voxel data (they have a sharper and better defined maximum). This is because the MIP reduces the overall noise level by isolating the strongest features within the reconstructed volume, in effect acting as an adaptive threshold. It is possible to improve the performance of the 3-D metrics by also introducing a threshold parameter such that only the largest values contribute.^{7} However, using the MIP avoids the task of selecting the value of such a threshold. It is useful to note that in both cases the metric will be dominated by the strongest features in the reconstructed image. These will typically be surface-level features unless compensation for optical and acoustic attenuation is considered.^{13} By adjusting the region of the reconstructed image used to compute the focus metric, it is also possible to focus on particular features or depths within the image or to create composite images using different sound speeds.

An example of using the autofocus approach to select the optimum sound speed in the reconstruction of an *in vivo* image of the vasculature in the flank of an adult mouse is shown in Fig. 2 (a detailed description of the experimental protocol is given in Ref. 14). Again, the three focus functions are unimodal and yield the same maximum, in this case at 1513 m/s (only the metrics computed using the MIP are shown). A depth direction (*en face*) MIP reconstructed using the optimum sound speed is given along with an example of a defocused image produced when the sound speed is overestimated by 5%. The focused image shows a high level of detail in the vascular features down to the pixel level. For quantitative comparison, the change in the full width at half maximum (FWHM) of a representative vessel within the MIP (denoted by the arrows) is shown in Fig. 2a by the dotted line. The minimum FWHM corresponds with the maximum of the focus function.

Although each of the three investigated focus functions give virtually identical results for the examples presented here, the gradient functions were found to consistently yield the best performance when applied to a wide range of other photoacoustic measurements.^{11, 14} It is also important to note that for all test cases, the MIP-derived focus functions remained unimodal over the range of biologically relevant sound speeds (1400–1600 m/s). Consequently, any standard optimization techniques could be used to find the maximum. In the simplest case, a linear-spaced parameter sweep could be performed. Alternatively, to reduce the number of required reconstructions, direct-search or derivative-based optimization methods could also be used. For example, using the Brenner gradient and the golden section search method with parabolic interpolation (fminbnd in MATLAB), the maximum in Fig. 2a can be found in six iterations (the steps are shown with dots).

In summary, a novel method for selecting the sound speed used in photoacoustic reconstruction algorithms has been proposed. This is based on an autofocus approach, which allows the sound speed that maximizes the image sharpness (as determined by a particular focus metric) to be automatically obtained. Although the approach works robustly, there are several items that warrant further discussion. First, the use of an iterative reconstruction procedure inherently increases the computational time. Fortunately, it is possible to minimize this burden using standard optimization techniques. It may also be possible to only reconstruct a small subsection of the image or to downsample the measurement data used during the autofocus procedure.^{6} In any case, given that a manual optimization is often performed, the computational penalty of using the autofocus approach may be justifiable.

Second, the method neglects the fact that the sound speed distribution within the tissue may be nonuniform. In this case, the autofocus approach will yield a kind of average value for the sound speed that places the dominant features in the reconstructed image in focus. For the purpose of generating sharp photoacoustic images, this may be appropriate, even if the optimized sound speed does not correlate exactly with any particular tissue type. Moreover, the vast majority of reconstructions of experimental data assume a constant sound speed, in which case the autofocus approach is applicable. A possible extension would be to use a parametrized sound speed map to account for different tissue layers, for example, the skin and skull layers encountered in *in vivo* mouse brain imaging.

Finally, it is important to note that the efficacy of the proposed method is dependent on both the intrinsic content of the reconstructed image in addition to any image artifacts (for example, limited-view artifacts). If the reconstructed image is rich in features and any image artifacts are comparatively small in magnitude, then the autofocus approach will work correctly to maximize the sharpness of the image features. However, if the converse is true, then the autofocus approach may incorrectly attempt to maximize the sharpness or the prevalence of edges in the image artifacts.

## Acknowledgments

This work was supported in part by the Australian Research Council/Microsoft Linkage Project No. LP100100588 and the Engineering and Physical Sciences Research Council, UK. The authors thank Ben Cox for useful discussion.

## References

*In vivo*high-resolution 3D photoacoustic imaging of superficial vascular anatomy,” Phys. Med. Biol. 54(4), 1035–1046 (2009). 10.1088/0031-9155/54/4/014 Google Scholar