Spatiotemporal image reconstruction to enable high-frame-rate dynamic photoacoustic tomography with rotating-gantry volumetric imagers

Abstract. Significance Dynamic photoacoustic computed tomography (PACT) is a valuable imaging technique for monitoring physiological processes. However, current dynamic PACT imaging techniques are often limited to two-dimensional spatial imaging. Although volumetric PACT imagers are commercially available, these systems typically employ a rotating measurement gantry in which the tomographic data are sequentially acquired as opposed to being acquired simultaneously at all views. Because the dynamic object varies during the data-acquisition process, the sequential data-acquisition process poses substantial challenges to image reconstruction associated with data incompleteness. The proposed image reconstruction method is highly significant in that it will address these challenges and enable volumetric dynamic PACT imaging with existing preclinical imagers. Aim The aim of this study is to develop a spatiotemporal image reconstruction (STIR) method for dynamic PACT that can be applied to commercially available volumetric PACT imagers that employ a sequential scanning strategy. The proposed reconstruction method aims to overcome the challenges caused by the limited number of tomographic measurements acquired per frame. Approach A low-rank matrix estimation-based STIR (LRME-STIR) method is proposed to enable dynamic volumetric PACT. The LRME-STIR method leverages the spatiotemporal redundancies in the dynamic object to accurately reconstruct a four-dimensional (4D) spatiotemporal image. Results The conducted numerical studies substantiate the LRME-STIR method’s efficacy in reconstructing 4D dynamic images from tomographic measurements acquired with a rotating measurement gantry. The experimental study demonstrates the method’s ability to faithfully recover the flow of a contrast agent with a frame rate of 10 frames per second, even when only a single tomographic measurement per frame is available. Conclusions The proposed LRME-STIR method offers a promising solution to the challenges faced by enabling 4D dynamic imaging using commercially available volumetric PACT imagers. By enabling accurate STIRs, this method has the potential to significantly advance preclinical research and facilitate the monitoring of critical physiological biomarkers.


Introduction
2][3] By combining the high spatial resolution of ultrasound imaging with the high soft tissue contrast of optical imaging, PACT offers unique advantages for imaging biological structures while avoiding the use of ionizing radiation.In PACT, a fast laser pulse in the near infrared range illuminates the object.Absorption of optical energy by various molecules within the object (chromophore) induces a localized increase of acoustic pressure through the photoacoustic effect.The acoustic wavefield propagating through the object and coupling medium (water) is subsequently detected by ultrasonic transducers.The measured wavefield data can then be utilized to reconstruct an image that depicts the initial induced pressure distribution within the object.
5][6] For example, tumor vascular perfusion is one such dynamic processes that is critical in the study of cancers.High vascular perfusion is indicative of angiogenesis, a well-established hallmark of cancerous growth. 7,8 ][11][12][13] Despite its considerable promise, current dynamic PACT technologies suffer from fundamental limitations.[16][17][18] Most existing 3D PACT imagers developed to-date utilize a rotating measurement geometry in which the tomographic data are sequentially acquired, 2,[19][20][21] as opposed to being acquired simultaneously at all views.This design is advantageous because it reduces system costs by employing a limited number of acoustic transducers and associated electronics.However, data-acquisition times for a complete tomographic scan can be tens of seconds.Due to the relatively slow rotational speed, the temporal resolution is significantly limited.While enhancing temporal resolution by using sparsely sampled tomographic data is possible, the associated dynamic image reconstruction problem becomes ill-posed and highly challenging.
Previous studies on dynamic PACT, 10,11,[14][15][16][17][18]22 have primarily focused on scenarios where the sufficiently sampled tomographic data can be rapidly acquired. In sch cases, a straightforward approach is to employ a frame-by-frame image reconstruction (FBFIR) method.10,11,[14][15][16][17][18] These techniques utilize conventional static image reconstruction methods to estimate a sequence of images from sufficiently sampled tomographic data.The temporal resolution is limited by the duration of the complete data acquisition process.Rapid data acquisition is feasible either with 2D PACT imaging or by leveraging a dense, albeit expensive, static transducer array in 3D imaging.However, for volumetric imagers with sequential scanning strategy, FBFIR methods are not applicable, primarily due to the extended time required to accumulate the complete set of tomographic measurements.
On the other hand, spatiotemporal image reconstruction (STIR) methods estimate a sequence of images simultaneously instead of frame-by-frame, and they have demonstrated their efficacy in accurately reconstructing dynamic objects from sparsely sampled data in various medical imaging modalities, including computed tomography, 23 positron emission tomography, 24 single photon emission computed tomography, 25 and magnetic resonance imaging. 26While a few STIR techniques have been proposed for PACT, some of them still presume the availability of sufficiently sampled tomographic data and strive for enhanced accuracy and/or reduced computational complexity compared to FBFIR techniques. 27The STIR methods [28][29][30] considering sparsely sampled tomographic data, have relied on the principles of compressed sensing and required specific data sampling schemes that are different than the sequential sampling schemes employed by currently available volumetric imagers.
This study introduces a novel spatiotemporal image reconstruction method based on low-rank matrix estimation (LRME-STIR), which is applicable to currently available sequential 3D PACT imaging systems without requiring hardware modifications.By employing the LRME-STIR technique, it becomes possible to overcome the challenges caused by the sparsely sampled tomographic data.The proposed approach holds the potential to advance the field by enabling accurate and efficient spatiotemporal image reconstruction, thereby facilitating the monitoring of dynamic physiological changes using PACT.
The remainder of the paper is organized as follows.Section 2 presents an imaging model for sequential volumetric imagers and introduces the inverse problem formulation.The proposed LRME-STIR method is described in Section 3. The conducted numerical and experimental studies, and the results are provided in Sections 4 and 5, respectively.Finally, the paper concludes with a discussion in Section 6.

Imaging Model for Sequential Volumetric Imagers and Inverse Problem Formulation
In the context of PACT, the sequential data acquisition strategy commonly involves utilizing one or more rotating or translating ultrasonic transducer arrays within single or multiple acoustic probes. 2,19,20 Tis approach facilitates data collection by rotating the probes along a fixed axis, resulting in the acquisition of a few tomographic measurements at each step, as depicted in Figure 1.These measurements are accumulated sequentially to form a complete tomographic measurement set.During each step of sequential data acquisition, the object can be considered as static (quasistatic assumption), which is justified by the negligible data acquisition time during each step (on the order of 10 −4 s), significantly shorter than the time between consecutive measurements (on the order of 10 −1 s).An object frame is defined as the short period of time when the object is considered static.The sequence of object frames constitutes the dynamic object.Essentially, each data acquisition step, hereafter referred to as an imaging frame, corresponds to an object frame.Although object and imaging frames may seem interchangeable, the distinction lies in the fact that the set of imaging frames is essentially a subset of the set of object frames because the dynamic object might not be imaged throughout all object frames.Under the quasi-static assumption, the time-dependent object function at the k-th imaging frame, specifically the dynamic induced initial pressure distribution, can be represented as f k (r r r) = f (r r r, k∆t) for k = 1, ..., K. Here, K represents the number of imaging frames, and ∆t is the time interval between consecutive imaging frames, which is equal to the laser repetition rate.Finally, the rotation speed of the gantry determines the angular spacing between imaging frames.
Under the quasi-static assumption, the data acquisition process at each imaging frame can be described by a continuous-to-discrete (C-D) imaging model as: 27,31 where τ denotes the fast-time (i.e., the arrival time of acoustic signals to the transducer), and ∆T corresponds to the fast-time sampling interval.The object function at the k-th imaging frame, f k (r), is assumed to be bounded and contained within the volume V .The scalar c 0 denotes the speed of sound, which is assumed to be constant throughout the volume V .The quantities r and r ′ qk specify the spatial coordinates within V and the location of the q-th transducer at the k-th imaging frame, respectively.The vector g k ∈ R P Q represents lexicographically ordered pressure traces measured by the transducers at the k-th imaging frame.Here, Q stands for the number of transducers, and P represents the number of electrical signals recorded by each transducer.The notation [g k ] (q−1)P +p refers to the (q − 1)P + p-th entry of the measurement vector g k .Here, the integer-valued indices q and p denote the transducer index and temporal sample, and Ω qk denotes the detection area of the q-th transducer at the k-th imaging frame.When the transducer size is small and/or the object is located near the center of a relatively large measurement geometry, an idealized point-like transducer model can be assumed and the surface integral over Ω qk can be neglected. 27o facilitate the implementation of an iterative image reconstruction algorithm, a discrete-todiscrete (D-D) imaging model is defined as follow.The spatially continuous object functions f k corresponding to the k-th imaging frame are approximated by use of a finite linear combination of spatial expansion functions {ψ n (r)} N n=1 : where N denotes the number of spatial expansion functions.In this study, the expansion functions are piecewise trilinear Lagrangian functions defined on a uniform Cartesian grid.Their expression is given by: 31 Here, r = (x, y, z) denotes the spatial coordinate and r n = (x n , y n , z n ) specifies the location of the n-th node of the uniform Cartesian grid.The parameter ∆s indicates the distance between adjacent grid points.The coefficients {α k n } N K n=1,k=1 can be organized into a matrix F ∈ R N ×K with entries [F ] nk ≡ α k n .The k-th column of F is denoted by f k ∈ R N and represents the discrete approximation of the object function at the k-th imaging frame.That is, where e k ∈ R K represents the k-th column of the identity matrix in R K and ⊗ denotes the vector outer product.Correspondingly, a frame-dependent D-D imaging model accounting for measurement noise can be expressed as: where g k represents the (noisy) tomographic measurements at the k-th imaging frame and η k accounts for measurement noise and modeling and discretization errors.The operator H k ∈ R QP ×N stems from discretization of the C-D PACT imaging operator associated with the k-th imaging frame.In particular, the vector g k ∈ R QP representing the action of H k on f k is defined as in Eqn.
(1) but with the continuous in space object function f k (r) replaced by its finite dimensional approximation f (N ) k (r) defined in Eqn.(2).Given the data matrix G = K k=1 g k ⊗ e k ∈ R P Q×K and the set of imaging operators H k (k = 1, . . ., K) corresponding to each imaging frame, the goal of dynamic PACT image reconstruction is to find a matrix F = K k=1 f k ⊗e k ∈ R N ×K , whose column f k represents an object estimate at the k-th imaging frame; for simplicity hereafter f k is referred to as the k-th frame of the spatiotemporal object estimate.Given the sparse tomographic measurements acquired for each imaging frame, the task of dynamic image reconstruction constitutes a significantly ill-posed inverse problem.A unique estimator, F , can be obtained by solving the following penalized least squares optimization problem: where R(F ) is the regularization term, which is convex but possibly non-smooth.The total data fidelity term L(F ) = K k=1 L k (F ) is the sum of data fidelity terms L k (F ) associated with the k-th imaging frame.These quantities are defined as respectively, where ∥•∥ F denotes the Frobeniuos norm.
In addition to the inherent ill-posed nature of the considered dynamic image reconstruction problem, significant implementation challenges exist.Firstly, unlike the relatively straightforward task of static image reconstruction in which a single image is estimated, spatiotemporal image reconstruction involves dealing with a considerably higher computational burden as all frames are reconstructed simultaneously, particularly when each frame is 3D in space.Secondly, as the number of frames increases, there is a growing need for memory space during computation, which necessitates effective computational strategies with minimal memory usage.

Low-rank Matrix Estimation-based Spatiotemporal Image Reconstruction
][34][35][36][37] Leveraging the semiseparable approximation, the spatiotemporal reconstruction problem can be reduced to the problem of estimating R weights, and R spatial and temporal functions.In a discretized formulation, this is algebraically equivalent to enforcing a low-rank structure on the matrix F .
A penalty scheme can then be imposed on the nuclear norm of the spatiotemporal reconstruction matrix to promote low-rankness.Specifically, the regularization term stemming from the nuclear norm of F is defined as where σ r (r = 1, . . ., min(N, K)) represent the singular values of F .This approach not only effectively regularizes the ill-posed inverse problem 38 but also reduces memory demands without sacrificing accuracy.Rather than explicitly storing F in memory, its truncated singular value decomposition U R Σ R V T R is stored, resulting in decreased memory requirement.Here, R < min(N, K) denotes the truncation index, Σ R is the diagonal matrix comprising the largest R singular values, U R and V R are matrices with orthonormal columns collecting the left and right singular vectors, respectively, corresponding to the R largest singular vaslues.This approach not only facilitates an efficient approximation of F but also enforces the space-time semiseparability.Specifically, the columns of U R and V R are the algebraic counterpart of the functions u r (r r r) and v r (t), respectively.
To further regularize the inverse problem, under the assumption that the object undergoes a smooth and slow temporal change, another regularization scheme penalizing the difference between two consecutive frames can be employed.This can be accomplished by penalizing the squared Frobenius norm of the temporal difference matrix, which is expressed through the temporal (forward) difference operator, denoted as D ∈ R K×(K−1) , applied to F : where, D is defined as: Here, d k corresponds to the k-th column of the temporal (forward) difference operator, D. Within this column, the k-th and (k + 1)-th elements possess values of −1 and 1, respectively, while all other elements are set to 0.
In this way, the sought after estimate of the dynamic reconstruction problem with consideration of a maximum rank constraint and temporal and nuclear norm penalties can be formulated as: where R N ×K Rmax denotes the set of all N -by-K matrices of rank at most R max and the cost function J(F ) is comprised of the data fidelity term L(F ) in Eqn.(7), the temporal regularization term R t (F ) in Eqn.(9), and the nuclear norm regularization term R nn (F ) in Eqn.(8).Here, the parameters γ ≥ 0 and λ ≥ 0 control the strength of the temporal and nuclear norm regularization terms, respectively.Similarly to the data fidelity computation in Eqn.(7), the temporal regularization term R t (F ) = K k=1 R t k (F ) can written as the sum of contributions R t k from each imaging frame.Each term is defined as where, for uniformity of notation, d K ∈ R K is the zero vector.To highlight the contribution from each frame, the minimization problem in Eqn.(11) can then be rewritten as In the formulated minimization problem, the data fidelity and temporal penalty terms are convex and smooth, while the nuclear norm penalty term is convex but non-smooth.Accordingly, the minimization problem can be solved by use of a proximal gradient descent (PGD) method. 39,40 e convergence speed of the PGD method can be improved with momentum schemes. 41,42  further improve the convergence speed, especially in the early iterations, an ordered subsets (OS) approach 43 can be incorporated with momentum schemes, 44,45 despite the lack of theoretical guarantees of convergence.In addition to acceleration, applying the OS approach to find an approximate solution to Eqn. (13) offers several other significant benefits.Notably, it substantially reduces memory requirements by a factor proportional to the number M of subset used.While the gradient with respect to all imaging frames requires O(N K) memory usage, the gradient corresponding to each subset only requires O(N K/M ) storage.Furthermore, it preserve the low-rank structure of the reconstructed object function estimate at each step of the proximal gradient iteration.
For the optimization problem given by Eqn.(11), the OS-based cost function can be expressed as: 44,45 J where, K j represents the set of frame indices in the j-th ordered subset.The OS approach relies on the "subset balance" approximation, 44,45 implying that J K j (F ) ≈ J(F ).The update procedure in PGD with OS consists of two main steps.Initially, a gradient descent step is performed moving in the negative gradient of the smooth components of the OS-based cost function, which yields Above, η is the step size and F j denotes the spatiotemporal object estimate at the beginning of the j-th update.The first component of the gradient, associated with the data fidelity term for the k-th frame, results in an outer product that produces a rank-1 matrix.Similarly, the second component of the gradient, related to the temporal regularization term for the k-th frame, also yields a rank-1 matrix.Thus, the maximum rank of the gradient of the OS-based cost function J K j is bounded by 2|K j |, where |K j | denotes the number of imaging frames in the ordered subset.Following the gradient descent step, the proximal step is executed, where the proximal operator is applied to account for the nonsmooth component of the objective function.The proximal step corresponds to the solution of the following minimization problem: whose solution can be efficiently implemented via a truncated SVD factorization of F j+ 1 2 and the application of the soft-thresholding operator, S(.), to its singular values {σ i }. 46 The solution of the Eqn.( 16) is expressed as following: where with maximum rank R max and Σj+ 1 2 := S ηλ Σ j+1/2 .The soft-thresholding operator, S(.), is defined (component-wise) as below: The soft-thresholding is computationally efficient and enforces a low-rank structure effectively attenuating the singular values.
Algorithm 1 Low-rank matrix estimation-based spatiotemporal image reconstruction Input: R max , ϵ, γ, λ, η, M Output: F Initialization: ▷ Select the current subset of frames Compute gradient descent for the current subset of frames: Perform proximal operator: 46 ▷ Soft thresholding for nuclear norm regularization ▷ FISTA update for the current iteration; rank( ▷ Increment outer loop iteration end while Algorithm 1 summarizes the proposed accelerated proximal gradient descent algorithm, integrating both momentum and ordered subset techniques to efficiently find an approximate solution to the minimization problem in Eqn.(11).The algorithm takes the following parameters as input: the maximum allowed rank R max ; the threshold ϵ for the stopping criterion; the regularization parameter γ for temporal regularization; the regularization parameter λ for nuclear-norm regularization; the step size η; and the number M of subsets.At the beginning of each iteration, the sequence of frame indices undergoes random shuffling.Within the subsequent inner loop, these shuffled indices are partitioned into M subsets.The gradient pertaining to both the data fidelity and temporal regularization components is computed for these subset frame indices, and the gradient descent step is executed.Then, the proximal mapping associated with nuclear norm regularization is evaluated efficiently by using randomized singular value decomposition (SVD) 47 and soft thresholding.Subsequent to this step, the FISTA 41 acceleration is deployed to enhance the convergence rate.The algorithm terminates when the squared Frobenius norm of the difference between two successive iterations ||F (i) −F (i−1) || 2 F , normalized by its maximum max l≤i ||F (l) −F (l−1) || 2 F over the previous iterations, falls below the threshold ϵ defined by the user.This metric is equivalent to monitoring the norm of the gradient in smooth optimization. 40Study Description

3D PACT Imaging System Specifications for Experimental and Numerical Studies
The TriTom preclinical imaging system 48,49 developed by PhotoSound was employed for the experimental studies and emulated for the numerical in-silico studies.It integrates photoacoustic (PA) and fluorescence (FL) imaging modalities, harnessing their individual strengths.The system comprises a central rotary scanning stage, an optical excitation setup for PA and FL imaging, a curvilinear 96-element PA transducer array, and a fluorescence-enabled scientific complementary metal-oxide-semiconductor (sCMOS) camera.The configuration of the TriTom setup is depicted in the left panel of Figure 2, and the imaging chamber is shown in the right panel.During imaging, the object is immersed in water and continuously rotated while optically stimulated by a short laser pulse with a 10Hz repetition rate.The maximum rotation speed is 10 degrees per second, and therefore completing a full 360-degree scan requires 36 seconds.Four optical fiber bundles are located on the outer circumference of the cylindrical imaging chamber, perpendicular to the scanning plane of the PA array.The laser emits pulses in the 670 to 1064 nm wavelength range at a frequency of 10 Hz, with each pulse lasting 5 ns.Acoustic waves generated by the laser excitation are detected by the PA transducer array, which comprises piezoelectric transducer elements, each measuring 1.3×1.3mm 2 .The center frequency of the transducer elements is 6 MHz ± 10% (at -6 dB) with bandwidth ≥ 55%.The array is vertically oriented and cylindrically focused, with the central element positioned 65 mm from the center of the imaging chamber.For fluorescence imaging, an sCMOS camera with a 2048×2040 pixel resolution and a 40×40 mm 2 field-of-view is placed outside the imaging chamber.
In the acoustic modeling of the TriTom imaging system, the transducer array was assumed to consist of idealized point-like transducers positioned at the central locations of the transducer elements.The acoustic simulation was implemented with a GPU-accelerated D-D imaging model 31 assuming acoustically homogeneous medium.Although the TriTom system has a single transducer arc, other existing 3D PACT designs (e.g. 20,50  feature multiple transducer arcs, thus the numerical studies also explored scenarios where multiple tomographic measurements were acquired per imaging frame.Specifically, measurement configurations in which 2 transducer arcs separated by 90 • and 4 transducer arcs separated by 45 • were considered, as illustrated in Figure 3.The sampling rate for both the experimental and numerical studies was set to 31.25 MHz, with 2048 temporal samples collected per imaging frame.

Inverse Crime Validation Study
To verify that Algorithm 1 was correctly implemented, an inverse crime validation study was conducted in silico in which a simple rank-4 dynamic phantom was employed.The phantom consisted of 40×40×3 spatial voxels and 360 object frames, with a voxel size of 0.4×0.4×0.4 mm 3 .The induced pressure in the phantom was assumed constant along the z-axis and piecewise constant within the xy-plane at each object frame.The phantom was structured into four distinct regions, each characterized by varying temporal activities as shown in Figure 4.The left panel of Figure 4 illustrates the central z-slice of the phantom at the 120-th object frame, while the right panel displays the time activity curves corresponding to the numbered regions.For each imaging frame (360 in total), 4 tomographic measurements separated by 45 • were considered as depicted in the right panel of Figure 3. Simulated acoustic pressure data were generated using a grid voxel size of 0.4×0.4×0.4 mm 3 , assuming the phantom was centered within the imaging system. 31The speed of sound was assumed constant at 1495 m/s, and no noise was added to the simulated measurement data.
During image reconstruction, the same computational grid that was used for generating the measurement data was employed.In the algorithm, the maximum allowed rank, R max , was set to 4, and no temporal or nuclear norm penalties were applied (λ = γ = 0).The step-size, η, was tuned empirically to ensure convergence.To explore the impact of the number of ordered subsets used in the randomized evaluation of the data fidelity term, three different number of ordered subsets were investigated: M ∈ {1, 2, 6}.

Numerical Phantom Study
A dynamic numerical phantom was utilized to evaluate the performance of the proposed method through in silico experiments.The phantom consisted of 360 object frames and contained four convex ellipsoidal blobs within a larger ellipsoidal blob, along with a vasculature mimicking structure as shown in the bottom panel of Figure 5.The size of the numerical phantom was 40×40×30 mm 3 , with a voxel size of 0.4×0.4×0.4 mm 3 .The time activity at each voxel was designed to mimic a contrast agent's flow along the paths from ellipsoidal blobs 1 to 2 and 3 to 4. The top right panel of Figure 5 illustrates the time activity at the center of each ellipsoidal blob, and the blobs are numbered in the bottom right panel.The singular values of the dynamic numerical phantom are shown in Figure 5, where a rapid singular value decay is observed.The rapid singular decay indicates that the phantom can be accurately approximated with a low-rank representation, as demonstrated by the mean squared error (MSE) vs. rank plot depicted in the top middle panel of Figure 5.
To generate the synthetic measurement data, the grid voxel size was set to 0.2×0.2×0.2 mm 3 , and 360 imaging frames (corresponding to a complete rotation of the system) were simulated.The speed of sound was assumed to be constant at 1495 m/s.Zero-mean Gaussian noise with a standard deviation equivalent to a specified percentage of the maximum value of the simulated data (1%, 3%, or 5%) was added to the pressure signals.
The primary objective of this numerical study was to investigate the effect of the following physical factors on the proposed method: • Number of tomographic measurements per imaging frame: In this study, the noise level was fixed at 1%, and three scenarios were examined with 1, 2, and 4 tomographic measurements per imaging frame, as illustrated in Figure 3. • Measurement noise level: In this study, the number of tomographic measurements per imaging frame was set to 2, and three different noise levels were considered: 1%, 3%, and 5%.
For image reconstruction, the maximum allowed rank in the algorithm was set to R max = 40, as it allows for an accurate approximation of the dynamic numerical phantom with MSE around 10 −8 , as shown in Figure 5.To avoid the discretization inverse crime, a coarser grid with voxel size 0.4×0.4×0.4 mm 3 was used for the reconstruction.The speed of sound value was the same used for simulating the data.The threshold ϵ for the stopping criterion of Algorithm 1 was set to 2.5 × 10 −1 .The step-size, η, was tuned empirically to ensure convergence.The number of subsets, M , was set to 18.To select appropriate regularization parameters, the balancing principle 51 was employed to reduce the number of tunable regularization parameters from two to one.The balancing principle rescales each regularization term by an estimate of their value at the object function F true .While an iterative procedure is proposed in 51 to estimate such values, for simplicity, this work assumes direct knowledge of the actual nuclear norm and the Frobenius norm of the temporal difference of F true .Specifically, in this study, the solution to the dynamic image reconstruction problem was defined as: For each case, four different values of the parameter κ were explored: The κ parameter yielding the best average normalized squared error (nSE) over frames was selected. 52The nSE for each frame was computed as: Specifically, the regularization parameter value that yielded the optimal results was κ = 5 × 10 −4 ∥G∥ 2 F when 4 tomographic measurements per frame were available, and κ = 2.5×10 −3 ∥G∥ 2 F for all other cases.

Experimental Study
An open-ended dynamic flow phantom was constructed by bending a silicone tube into a U-shaped structure, as depicted in Figure 6.The left panel of the figure provides an illustration of the phantom within the imaging chamber, while the right panel shows an actual photograph of the physical phantom.The inner diameter of the silicone tube was 0.0635 cm, and the outer diameter was 0.1194 cm.The dynamic phantom, positioned at the center of the imaging system, was illuminated by a laser pulse with a wavelength of 770 nm and an energy of 100 mJ (before entering the fiber optic light delivery unit).PACT data were acquired with the TriTom imaging system during the injection of a photoacoustic-fluorescent contrast agent (PAtIR 53 ) through one end of the tube.Over a span of 36 seconds, the dynamic phantom underwent scanning which resulted in 360 imaging frames, with a frame rate of 0.1 seconds and a single tomographic measurement per imaging frame.Two-dimensional fluorescence images were concurrently gathered, which serve as reference for the time evolution of the contrast agent concentration within the tube.Based on region illuminated by laser, the reconstruction volume was set to a region of 30×30×30 mm 3 located at the center of the imaging system.The voxel size was set to 0.4×0.4×0.4 mm 3 , resulting in 75×75×75 spatial voxels.The maximum allowed rank, R max , was set to 40, and the number of subsets, M , was set to 18.For the stopping criterion of Algorithm 1, the threshold ϵ was set to 5 × 10 −1 .The step-size, η, was tuned empirically to ensure convergence.To calibrate the speed of sound, multiple static estimates of the dynamic object were reconstructed using the Universal Back-Projection (UBP) algorithm 54 assuming speed of sound values in the range of 1480-1520 m/s, with 5 m/s increments.The speed of sound value of 1495 m/s resulted in the best visual appearance and was selected to perform the dynamic image reconstruction.To choose the regularization parameters, 9 dynamic image reconstructions where performed for all possible combinations of γ ∈ {5.5 × 10 0 , 5.5 × 10 1 , 5.5 × 10 2 }, and λ ∈ {10 −1 , 10 0 , 10 1 }.Ultimately, the spatiotemporal object estimate that most closely captured the observed dynamic changes in the reference fluorescence images, as determined through visual examination, was selected.The corresponding regularization parameters were γ = 5.5 × 10 1 and λ = 10 0 .

Inverse Crime Validation Study Results
In this inverse crime study, the step sizes η for M = 1, 2, 6 were empirically tuned to ensure convergence, resulting in values of {10 −4 , 10 −4 , 3.3 × 10 −5 }, respectively.The algorithm was allowed to run for 2500 iterations in each case to ensure convergence of the solution up to numerical precision.For the purpose of this validation study, Algorithm 1 was modified to re-evaluate the total data fidelity term L(F i ), which accumulates the contributions from all time frames, at each iteration.
The results of the validation study are depicted in Figure 7.The left panel exhibits the data fidelity L(F i ) vs. iteration count, while the right panel presents the average nSE vs. iteration count.For all values of M , A significant decrease of approximately 11 and 13 orders of magnitude can be observed in the data fidelity term and average nSE, respectively.This indicates that the proposed method using momentum-acceleration in combination with ordered subsets (case M ∈ {2, 6}), although lacking theoretical guarantee of converge, can achieve (up to machine precision) to the same object estimate produced by the momentum-accelerated proximal gradient descent without subsampling method (M = 1).It is also evident that a larger number of subsets improves the convergence speed, particularly in the early iterations.This also translate in possibly faster time to solution, as the cost of each iteration is dominated by the evaluation of the imaging operator, while the time spent in performing the truncated SVD factorization using randomized method is neglegible.In the numerical studies presented here, the computational time required per iteration was approximately 15 minutes on a workstation (AMD EPYC 7702P 64-Core processor, 32GB RAM, one Nvidia Geforce RTX 2080 graphic processing unit) independently of the number M of subsets used.
In summary, this validation study demonstrates the correct implementation, efficiency, and robust convergence, despite lack of theoretical guarantees, of the proposed method.

Numerical Phantom Study Results
Sensitivity to the number of tomographic measurements per imaging frame: Figure 8 displays a selection of MIP images depicting the numerical phantom and the spatiotemporal estimates from simulated data with varying numbers of tomographic measurements per imaging frame.A video featuring the spatiotemporal evolution of the numerical phantom and its corresponding estimates is available as supplemental material. 55Upon careful inspection, it is apparent that increasing the number of tomographic measurements per imaging frame leads to more accurate estimates of the dynamic object.For instance, object estimates reconstructed from data with 2 and 4 measurements per imaging frame correctly capture the increase in intensity of blob-4 after frame-301.However, this intensity change is less prominent in the dynamic estimate reconstructed from data with only one tomographic measurement per imaging frame, which also exhibits artifacts around blob-4.In addition, the estimate reconstructed using 4 measurements per imaging frame better captures the intensity difference between the two ends of the blob-1 in frame-1.These observations are more evident in Figure 9, which shows the time-activity curves (TACs) at each ellipsoidal blob's center in both the numerical phantom and the spatiotemporal estimates from data with different numbers of tomographic measurements per imaging frame.Notably, the TACs for the reconstruction from 4 tomographic measurements per imaging frame closely aligns with those of the numerical phantom.The normalized squared error (nSE) vs. imaging frame plot for the reconstructions with varying numbers of tomographic measurements per imaging frame is depicted in the left panel of Figure 11.Both the TACs and nSE vs. frame plots confirm the observation that a higher number of tomographic measurements per imaging frame leads to more accurate spatiotemporal reconstructions.
Sensitivity to measurement noise: Figure 10 displays the TACs at each ellipsoidal blob's center in both the numerical phantom and its spatiotemporal estimates from data with varying noise levels, when the number of tomographic measurements per frame is kept at two.Notably, the estimated TACs remain similar across all noise levels, highlighting the algorithm's robustness in capturing dynamic changes even in the presence of increased noise.Figure 11 (right panel) shows the nSE vs. imaging frame plot for estimates reconstructed from data with varying noise levels.As anticipated, the nSE exhibits an upward trend with increasing noise levels.
The numerical phantom study results underscore the effectiveness of the proposed method in accurately estimating dynamic changes, when limited tomographic measurements per frame are available.The study reveals that the ill-posed nature of the dynamic reconstruction problem diminishes significantly as the number of tomographic measurements per imaging frame increases.Notably, the method's robustness in faithfully capturing the temporal dynamics of the object remains evident even in the presence of increased noise levels.These findings collectively underscore the algorithm's reliability and potential significance in addressing challenges associated with dynamic imaging scenarios.A close examination reveals improved reconstruction accuracy with an increased number of tomographic measurements.Notably, the reconstructions using 2 and 4 tomographic measurements per imaging frame more accurately capture the intensity change in blob-4 (located at the bottom right) compared to the one using a single tomographic measurement per imaging frame.This observation is further supported by the TACs shown in Figure 9.A video featuring the spatiotemporal evolution of the numerical phantom and its reconstructed estimates is available as supplemental material.

Experimental Study Results
Figure 12 displays dynamic PACT images reconstructed from experimental data (bottom row), along with the corresponding reference fluorescence images (top row).The fluorescence images were processed to suppress the background, and the contrast agent is highlighted in yellow.This  the imaging frame number for reconstructions with varying numbers of tomographic measurements per imaging frame; the noise level was fixed at 1%.A direct correlation between an increase in measurements and reduction in nSE is observable, thus highlighting the improved reconstruction accuracy.Right: nSE vs. the imaging frame number for reconstructions from data with varying noise levels; the number of tomographic measurements per frame was fixed at 2. As anticipated, the nSE exhibits an upward trend with increasing noise levels.
enhancement was accomplished through manual segmentation of the tube and contrast agent from the raw fluorescence images.The spatiotemporal PACT object estimates were visualized using ParaView. 56To ensure a qualitatively close alignment of the field-of-view between the ParaView visualization of the spatiotemporal PACT object estimate and the 2D fluorescence images, the view angle in ParaView was adjusted manually at each frame; however, a slight misalignment remains.A video featuring the raw images, visually enhanced images through the segmentation of the tube and contrast agent, and ParaView visualizations of the spatiotemporal PACT object estimate, is provided in the supplemental materials. 55n particular, when examining frames 30 to 120, one can readily observe the accurate recovery of dynamic contrast flow within the tube.A similar trend is evident in frames 210 to 300, with the lower right section of frame-240 showing the presence of the contrast agent, albeit with reduced contrast.This reduced contrast is also noticeable in specific frames such as 330 and 360, possibly due to light obstruction by other tube segments.Nevertheless, the overall effective recovery of dynamic flow remains apparent, as convincingly demonstrated in the supplementary video.This highlights the effectiveness and practicality of the proposed method for experimental data beyond simulated measurements, even when only one tomographic measurement per imaging frame is available.
Furthermore, it is worth noting that the spatiotemporal reconstruction offers a frame rate of 0.1 seconds.In contrast, an FBFIR technique would provide a frame rate of 36 seconds, which is the time required for a complete tomographic measurement.This underscores the reconstruction method's potential for monitoring dynamic physiological processes that demand enhanced frame rate.This can enhance the value of 3D PACT systems that involve rotating measurement gantries for preclinical research, enabling spatiotemporal image reconstruction from limited tomographic measurements per imaging frame.

Conclusion and Discussion
This study presents an accurate and computationally efficient LRME-STIR method for dynamic PACT attuned for commercially available volumetric imagers that employ a rotating measurement gantry in which the tomographic data are sequentially acquired.The implementation of the method was verified by an inverse-crime numerical validation study.The effect of varying number of tomographic measurements per imaging frame and the noise level on the method's accuracy was investigated in an in-silico numerical study.The experimental study demonstrated the LRME-STIR method's ability to reconstruct the flow of a contrast agent at a frame rate of 0.1 s, even when only a single tomographic measurement per imaging frame was available.Numerical and experimental studies confirm the accuracy of the proposed technique.Thus, this work will potentially have an immediate and sustained positive impact by creating a new capacity to perform dynamic 3D PACT.
The LRME-STIR method proposed in this study employs an accelerated proximal gradient descent method combined with an ordered subsets approach, distinguishing it from previously proposed LRME-STIR methods. 26,32,33,37 Ths approach yields a rapid and computationally efficient algorithm with reduced memory demands.Nonetheless, it should be noted that the proposed approach lacks theoretical convergence guarantees.While in theory the method may exhibit instabilities, numerical and experimental studies showcase the robustness of the proposed method when the regularization strength and step size are properly chosen.
Future enhancements to the method may involve exploring different heuristics to enhance its stability. 44,45 ubsequent investigations may focus on evaluating the method's efficacy in imaging complex dynamic physiological processes and quantifying relevant parameters, such as wash-in and wash-out rates for tumor vascular perfusion. 34,57 hese evaluations might encompass both in-vivo and in-silico experiments, further advancing the understanding and application of the proposed method.

Disclosures
The author S. Ermilov discloses ownership interest in Photosound Technologies, Inc.Other authors have no relevant financial interests and no potential conflicts of interest to disclose.

Fig 1
Fig 1 Schematic illustrating the sequential scanning strategy employing a rotating acoustic probe around the object.

Fig 2
Fig 2 Illustration of the TriTom imaging system (left) and an image of the imaging chamber (right).

Fig 3
Fig 3 Top view of the virtual imaging systems collecting 1, 2, or 4 tomographic views per imaging frame.

Fig 4
Fig 4 Central z-slice of the simple rank-4 phantom at the 120-th object frame (left), time activity curves at the numbered regions in the z-slice (right).Region 1 denotes a static background.

Fig 5 (
Fig 5 (a) Singular value plot of the dynamic numerical phantom, (b) MSE of low-rank approximations, (c) time activity curves at the center of each ellipsoidal blob, (d) and maximum intensity projection (MIP) images of the 180-th object frame of the dynamic phantom (bottom).The ellipsoidal blobs are numbered in the MIP along the z-axis image.The red dot in panels (a) and (b) denotes the index R max = 40 used as maximum rank constraint in the reconstruction studies.

Fig 6
Fig 6 An illustration (left) and a picture (right) of the experimental dynamic phantom.

Fig 7
Fig 7The data fidelity vs. the iterations (left), and the average nSE vs. iterations (right).The plots confirm the correct implementation of the algorithm and illustrate how increasing the number of subsets can accelerate the reduction of the data fidelity term.Additionally, the plots illustrate the similar order-of-magnitude reduction in both data fidelity and average nSE values for every number of subsets M ∈ {1, 2, 6}.

Fig 8
Fig 8 Selected MIP images (along the z-axis) depicting the numerical phantom and corresponding spatiotemporal estimates from simulated data with varying numbers of tomographic measurements per imaging frame.A close examination reveals improved reconstruction accuracy with an increased number of tomographic measurements.Notably, the reconstructions using 2 and 4 tomographic measurements per imaging frame more accurately capture the intensity change in blob-4 (located at the bottom right) compared to the one using a single tomographic measurement per imaging frame.This observation is further supported by the TACs shown in Figure9.A video featuring the spatiotemporal evolution of the numerical phantom and its reconstructed estimates is available as supplemental material.

Fig 9
Fig 9 TACs at each ellipsoidal blob's center, comparing the numerical phantom with its estimate reconstructed from different numbers of tomographic measurements per imaging frame.The noise level was kept at 1%.The curves demonstrate that increasing the number of measurements enhances the fidelity of temporal activities in the reconstructions.

Fig 10
Fig 10 TACs at each ellipsoidal blob's center, comparing the true phantom with reconstructions from data with different noise levels; number of tomographic measurements per frame was fixed at 2. It is seen that the recovered TACs are close to each other; which shows the robustness of the algorithm against the noise level.

Fig 11 Left:
Fig 11 Left: Normalized Squared Error (nSE) vs. the imaging frame number for reconstructions with varying numbers of tomographic measurements per imaging frame; the noise level was fixed at 1%.A direct correlation between an increase in measurements and reduction in nSE is observable, thus highlighting the improved reconstruction accuracy.Right: nSE vs. the imaging frame number for reconstructions from data with varying noise levels; the number of tomographic measurements per frame was fixed at 2. As anticipated, the nSE exhibits an upward trend with increasing noise levels.

Fig 12
Fig 12 Sample instances of the dynamic recovery from PACT data (at the bottom of each row) and reference fluorescence images (at the top of each row).The spatiotemporal evolution of the contrast agent inferred from the spatiotemporal estimates reconstructed from PACT data is in strong agreement with the reference images collected by the fluorescence-enabled sCMOS camera.A video featuring the raw images, visually enhanced images through the segmentation of the tube and contrast agent, and ParaView visualizations of the spatiotemporal PACT object estimate, is provided in the supplemental materials.