## 1.

## Introduction

The eye is a window through which the vasculature can be directly observed. Recently, the adaptive optics scanning laser ophthalmoscope (AOSLO) has made it possible to directly acquire videos of leukocyte movement through the smallest capillaries in the human eye, without the use of injected dyes.^{1} AOSLO is a research-grade instrument that can take images at higher resolution and contrast compared to what is clinically available and is based on adaptive optics technologies, which correct for optical aberrations of the eye.^{2} The AOSLO design has been described in detail elsewhere
^{3, 4} and has been used to study phenomenon such as the relationship between photoreceptor function and visual receptive fields^{5} and interpretation of static vascular features.^{6} However, there are key issues that must be addressed to make AOSLO an effective system for the study of hemodynamics.

Quantification of object speeds in AOSLO videos is important for hemodynamics, but is confounded by raster scanning, eye motion, and object motion. Since videos are acquired using a raster scanning system, different pixels within a video frame are acquired at different points in time. This affects the appearance of moving objects – the apparent object speed is dependent on the speed of the raster scan (Fig. 1). The magnitude of the error in measured speed due to raster scanning depends on the configuration of AOSLO imaging parameters, but can be as large as 37.8% (Sec. 2.6). There is also constant eye motion that occurs during acquisition of video frames.
^{7, 8} Since the raster scan continuously scans in a fixed pattern, this results in unique distortions in each video frame.
^{9, 10} Finally, the object itself is also in constant motion, simultaneous to raster scanning and eye motion. The motions of the object, eye, and raster scan must be considered simultaneously for accurate quantification of object speeds in an AOSLO system.

In this paper, we describe methods for tracking and accurate speed quantification of moving objects in AOSLO videos. We use spatiotemporal (ST) plot analysis and motion contrast enhancement to track moving objects and measure apparent object speeds. Apparent object speeds are then corrected using a slope modification method to correct for errors introduced by eye motion and raster scanning. The accuracy of the proposed methods is validated using synthetic data sets generated by a virtual AOSLO.

## 2.

## Materials and Methods

## 2.1.

### AOSLO Imaging

Videos were acquired on an AOSLO as previously described.^{1} AOSLO videos can be acquired using different imaging configurations, depending on the application. We consider three different configurations of imaging parameters (Type 1, 2, and 3), as described in Table 1. Type 1 videos were acquired for the purpose of analyzing blood flow, Type 2 for analyzing photoreceptors, and Type 3 for analyzing both blood flow and photoreceptors. Type 1 videos were acquired using a green wavelength laser, which has been reported to be the optimal wavelength for obtaining good contrast in blood flow imaging.^{11} Type 2 and 3 videos were acquired using a near-IR laser — a more desirable wavelength in terms of (i) risks due to laser damage on the retina, (ii) compatibility with the AOSLO hardware, and (iii) overall subject comfort. Type 1 and 3 videos were acquired for longer durations in order to increase the number of leukocytes that could be counted. Examples of frames from Type 1, 2, and 3 videos are shown in Fig. 2.

## Table 1

Imaging parameters and subject data for various types of AOSLO videos. Three representative configurations of imaging parameters used for AOSLO imaging, for three different subjects. Representative values are given for the scale factors, which vary by small amounts across different imaging sessions due to small variations in hardware alignment and eye morphology.

Specification | Type 1 | Type 2 | Type 3 |
---|---|---|---|

Imaging wavelength [nm] | 532 | 840 | 840 |

Frame rate [Hz] | 30 | 30 | 60 |

Raw video frame size [pixels^{2}] | 525×512 | 512×512 | 512×525 |

Field of view (approx.) [deg^{2}] | 1.5×1.5 | 1.2×1.2 | 1.5×1.5 |

X scale factor [pixels/deg] | 342 | 414 | 328 |

Y scale factor [pixels/deg] | 342 | 409 | 330 |

Length of video [seconds] | 40 | 2–10 | 40 |

Retinal scale factor [mm/deg] | 0.28008 | 0.28697 | 0.28889 |

Subject age | 37 | 26 | 24 |

Refractive error, sphere [D] | +1.0 | −1.0 | +0.5 |

Refractive error, cylinder [D] | −0.25 | 0.0 | 0.0 |

Since the Type 1 videos had the highest spatial contrast for the moving blood cells, we developed our methods using only Type 1 videos, and used Type 2 and 3 videos as well as synthetic videos generated by a virtual AOSLO for verification.

## 2.2.

### Preprocessing

Raw videos were preprocessed to correct for distortions due to raster scanning and eye motion, without considering object motion. Preprocessing involves desinusoiding, stabilization, cropping, and frame deletion.

## 2.2.1.

#### Desinusoiding and stabilization

To achieve high line density and high frame rates, the AOSLO employs a resonant scanner combined with a sensor that reads in data at a constant rate. The velocity of the scanner varies sinusoidally across each scan line, which results in a horizontal distortion in the raw videos. Desinusoiding corrects this distortion, which is characterized from videos of calibration grids. The velocity of the scanner is slowest at the left and right edges of the frame and fastest in the middle; thus, there are more pixels per retinal area toward the edges compared to the center. The redistribution of pixels can result in a desinusoiding artifact due to a change in the distribution of noise. We minimized this artifact using median and Gaussian filtering (Sec. 2.3).

Stabilization is the process that corrects for the distortions due to eye motion that occur during acquisition of each raster-scanned frame. Detailed procedures for desinusoiding and stabilization can be found elsewhere.
^{9, 10} Briefly, the task involves splitting each frame in a video into a set of horizontal strips, each of which is registered using affine transformations to a desinusoided reference frame and reassembled using linear interpolation. The result is a desinusoided and stabilized video as well as a high-frequency eye motion trace. This trace is important for accurate calculation of the distance that a moving object has traveled (Sec. 2.5).

## 2.2.2.

#### Cropping the video

Due to eye motion, there are regions of the retina that are not present in all video frames, particularly at the edges of each frame. To account for this, the desinusoiding and stabilization process introduces borders around each frame so that each registered frame will be of the same size. The thickness of each border changes according to the eye motion. We crop the video such that each frame contains only the portion of the video that was visible in the majority of all frames, thereby eliminating the black borders. The number of lines at the top edge that were removed due to cropping were stored in a lookup table and used in the calculation of object speeds (Sec. 2.5).

## 2.2.3.

#### Frame deletion

In the processed videos, there were three types of improper frames that were identified for deletion. First, insufficient overlap between the image and the reference frame resulted in poor stabilization. This occurred when the eye wandered too far away from its fixation target. Second, blinks resulted in the image intensity dropping to zero throughout the blink. Third, large saccades, involuntary fast eye movements, caused intraframe shearing and distortion on single frames and prevented proper image stabilization (1). To generate high quality images of photoreceptors and vessels, these “improper” frames were deleted. However, we did not delete any frames for the speed analysis since deletion of frames would increase the apparent speed of a moving object.

10.1117/1.3548880.1## 2.3.

### Visualization of Moving Objects and Vessels

Since spatial contrast is low, motion contrast enhancement is used to visualize moving objects and vessels (Fig. 3). Methods for motion contrast enhancement have been previously described.
^{6, 12, 13, 14} We implement a method that works well with AOSLO videos,^{6} using a multiframe division video and a standard deviation image. The multiframe division videos were used to visualize moving objects and the standard deviation image was used to visualize vessels. Median and Gaussian filtering were applied before and after calculation of the multiframe division video, respectively.

A preprocessed video has moving blood cells in front of a stationary background tissue, consisting of photoreceptors and vessels. Given two frames, *I*
_{j(x,y)} and *I*
_{j+1(x,y)}, the division image
[TeX:]
$D_j \left({x,y} \right) = \frac{{I_j \left({x,y} \right)}}{{I_{j + 1} \left({x,y} \right)}}$
${D}_{j}\left(x,y\right)=\frac{{I}_{j}\left(x,y\right)}{{I}_{j+1}\left(x,y\right)}$
emphasizes the objects in motion as long as the intensity of background tissue remains relatively constant. Here, *I*
_{j(x,y)} represents the intensities of frame *j* at position (*x,y*). Division images are used instead of difference images to enable arithmetic averaging of multiple frames, which improves the signal to noise ratio, as opposed to using the arithmetic average of two consecutive difference images, which yields no improvement in signal to noise.^{15} We defined a multiframe division video as
[TeX:]
$M_j \left({x,y} \right) = \frac{{D_j \left({x,y} \right) + D_{j + 1} \left({x,y} \right)}}{2}$
${M}_{j}\left(x,y\right)=\frac{{D}_{j}\left(x,y\right)+{D}_{j+1}\left(x,y\right)}{2}$
.

To visualize the perfused vessels, an image was calculated from the multiframe division video, using the geometric standard deviation. For a video with *n* frames, the geometric standard deviation image, *S(x,y)*, is defined as

## 1

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} S({x,y}) = \exp \left[ {\sqrt {\frac{{\sum\nolimits_{j = 1}^n {[ {\ln M_j ({x,y}) - \ln \overline M ({x,y})} ]^2 } }}{{n - 1}}} }\, \right],\hspace*{-10pt} \end{equation}\end{document} $$S\left(x,y\right)=\mathrm{exp}\left[\sqrt{\frac{{\sum}_{j=1}^{n}{\left[\mathrm{ln}{M}_{j}\left(x,y\right)-\mathrm{ln}\overline{M}\left(x,y\right)\right]}^{2}}{n-1}}\phantom{\rule{0.16em}{0ex}}\right],$$## 2.4.

### Object Tracking

We used cell paths on ST plots for object tracking.^{15} ST plots are a method to visualize hemodynamics and offer two major advantages. First, the ST plot representation is more compact, which assists in pattern identification. Variables such as density, frequency, and variations in speed both spatially and temporally can easily be observed on ST plots, but not from direct examination of the original video. Second, the dimensional complexity of the problem is reduced from a 4-dimensional (3D+1T) problem to a 3-dimensional (2D+T) problem, which minimizes the computation cost. These plots have been used for many systems,
^{12, 16, 17, 18} including AOSLO systems.
^{15, 19}

ST plots were generated by converting an *X-Y-T* coordinate system into an *s-T* coordinate system (Fig. 4). Consider an arbitrary vessel in a sequence of frames, with an object that moves along the trajectory of the vessel, given by *f(x,y)*. By plotting intensity values along the vessel, and discarding all other pixel values, a two-dimensional plot can be generated that shows the movement of individual objects traveling through a one-dimensional line, given by *f(*s), where *f(*x,y) ↔ f(s) is naturally defined, with the first coordinate of *f(*x,y) mapping to the first element of *f(*s), the second coordinate to the second element, and likewise for the remaining elements. Since we are exactly specifying the mapping from each pixel in *X-Y-T* space to *s-T* space, the mapping is invertible.

For single-file flow through capillaries seen by AOSLO videos, there is no loss in speed information when switching from the *X-Y-T* representation to the *s-T* representation.^{15}

Motion contrast enhancement improves the ST plots by increasing the accuracy of vessel centerline extraction and by increasing the contrast of cell paths.^{15} Using motion contrast enhanced ST plots, we manually extracted cell traces for the tracking and speed quantification. To identify traces, the user was presented with a graphical user interface showing a portion of a ST plot. The user identified traces by selecting points along that trace. For consistency, points were selected at the border between the dark and bright regions of the trace, on the leading edge (Fig. 5). After points were selected, interpolation was performed using piecewise splines constrained to the pixel resolution.

For tracking, the coordinates of each extracted trace were used to register the location of the blood cells in the video. ST coordinates *(s, t)* were converted back to video coordinates *
(x, y, t)* using the invertible mapping defined during generation of ST plots. Video coordinates were compiled into a list and then used to mark object locations to visualize the tracking results.

## 2.5.

### Quantification of Object Speeds

In the absence of eye motion, the speed of an object in a raster scanning system can be explicitly computed using line information from pairs of frames.^{20} The correction is based on computing the actual time at which a given line was acquired, as opposed to assuming that the entire frame was acquired at the same time. The true time, *t*, can be computed as

## 3

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} t = T_f \left({1 + \frac{{l_2 - l_1 }}{{N_1 }}} \right), \end{equation}\end{document} $$t={T}_{f}\left(1+\frac{{l}_{2}-{l}_{1}}{{N}_{1}}\right),$$*T*

_{f}is the time per field,

*l*

_{1}and

*l*

_{2}the scan lines of the object center in the first and second frames, and

*N*

_{1}the number of scan lines per frame. However, this approach [Eq. 3] must be modified for AOSLO videos since the effect of the raster scan is confounded with eye motion and desinusoiding.

In a nonraster scanning system without eye motion, the speed of an object can be computed by simply computing the slope of the extracted trace from an ST plot. However, in our system, the slope of the trace gives speeds in time units of frames. We present a slope modification procedure to correct speeds, based on computing the acquisition time coordinates in the extracted traces. In order to perform the correction, line numbers on preprocessed videos need to be transformed back to line numbers on the raw videos (not preprocessed). This is important because the correction for intraframe eye motion results in local stretching or compression of pixels, thereby altering line numbers.

The AOSLO uses a fast horizontal scan and a slower vertical scan, from left to right and top to bottom directions, respectively. The main component of the error is due to the slower vertical scan; the error due to the horizontal scan is small and does not need to be corrected (Secs. 2.6, 3.1). As an example, in the absence of eye motion, a downward moving object will have a larger observed displacement compared to the actual displacement, since the scan is chasing a moving target. More generally, any object that is moving in a nonhorizontal trajectory has a vertical component of velocity that needs to be corrected. If there is eye motion, then the actual displacement is also dependent on the amount that the eye has moved.

Consider coordinates from the extracted traces, given as (frame number, *s*). The acquisition time for each line (in units of partial frames) can be computed as

## 4

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} \mbox{acquisition\,time\,} = \mbox{\,frame\,number}\, + \,\frac{L}{{512}}. \end{equation}\end{document} $$\text{acquisition}\phantom{\rule{0.16em}{0ex}}\text{time}\phantom{\rule{0.16em}{0ex}}=\phantom{\rule{0.16em}{0ex}}\text{frame}\phantom{\rule{0.16em}{0ex}}\text{number}\phantom{\rule{0.16em}{0ex}}+\phantom{\rule{0.16em}{0ex}}\frac{L}{512}.$$
*L* is the line number at which the data was taken on the raw video, and can be recovered in the following manner:

1 Recover the line number,

*L*_{crop}, of the object on the cropped video by determining the*y*coordinate from the inverse transformation*s → (*x,y).2 Correct the line number for cropping by adding back the number of lines at the top of the image that were removed during cropping,

*dL*_{crop}, stored in cropping lines (Sec. 2.2.2).3 Correct for eye motion by applying the inverse transformation,

*S*^{−1}, from raw to stabilized videos.*S*^{−1}was stored during preprocessing in the eye motion trace (Sec. 2.2.1).

Compute *L = S*
^{−1}(L_{crop} + dL_{crop}). The extracted traces are then plotted as (acquisition time, *s*).

For each corrected trace, a linear regression was applied and the slope of the line, with units of pixels/frame, was used to compute the speed of the leukocyte (in units of mm/s) through the selected vessel segment in the following manner:

## 5

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} \rm slope^{*}\left({\frac{{2^{*} \, frame \ rate^{*} \frac{{mm}}{{deg}}}}{{X \ scale \ factor + Y\ scale \ factor}}} \right) = speed\,in\,\frac{{mm}}{s}. \end{equation}\end{document} $${\mathrm{slope}}^{*}\left(\frac{{2}^{*}\phantom{\rule{0.16em}{0ex}}\mathrm{frame}\phantom{\rule{0.33em}{0ex}}{\mathrm{rate}}^{*}\frac{\mathrm{mm}}{\mathrm{deg}}}{\mathrm{X}\phantom{\rule{0.33em}{0ex}}\mathrm{scale}\phantom{\rule{0.33em}{0ex}}\mathrm{factor}+\mathrm{Y}\phantom{\rule{0.33em}{0ex}}\mathrm{scale}\phantom{\rule{0.33em}{0ex}}\mathrm{factor}}\right)=\mathrm{speed}\phantom{\rule{0.16em}{0ex}}\mathrm{in}\phantom{\rule{0.16em}{0ex}}\frac{\mathrm{mm}}{\mathrm{s}}.$$^{6, 21}

## 2.6.

### Expected Error due to Raster Scanning (RS)

The raster scan error is significant for AOSLO videos. In this section, we develop a theoretical model to quantify the magnitude of the raster scan error. To understand the nature of the expected raster scan error, consider the case of a vertically oriented vessel with a downward-moving object that starts at the top of the image (Fig. 1). Assuming that there is no eye motion, we derive the expected raster scan error in the vertical and horizontal cases for comparison to actual measured error rates and show that it is significant in the vertical direction, but not the horizontal direction.

We introduce the dimensionless number,

## 6

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} RS = \frac{{{ v}_o }}{{{v}_s }}, \end{equation}\end{document} $$RS=\frac{{v}_{o}}{{v}_{s}},$$*v*

_{o}is the speed of the object and

*v*

_{s}is the speed of the scanning line.

If *RS* > 1, then the system is unable to image the object and the error becomes infinite. When *RS* = 0.5, the error is exactly 100%. When *v*
_{o} ≪ *v*
_{s}, *RS* → 0, and the raster scan error is negligible. By convention, leukocyte speeds on the retina are reported in mm/s. For an object speed given in mm/s,
[TeX:]
$\overline {{v}_o }$
$\overline{{v}_{o}}$
, with scan speed, *v*
_{s}, given in pixels/frame, with scale factor as defined in Table 1, *RS* can be calculated as

## 7

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} RS = \frac{{\rm scale\ factor\ in\ scan\ direction}}{{\rm frame\ rate^{*} mm/deg\ on\ the\ retina}}*\frac{{\overline {{ v}_o } }}{{{v}_s }}. \end{equation}\end{document} $$RS=\frac{\mathrm{scale}\phantom{\rule{0.33em}{0ex}}\mathrm{factor}\phantom{\rule{0.33em}{0ex}}\mathrm{in}\phantom{\rule{0.33em}{0ex}}\mathrm{scan}\phantom{\rule{0.33em}{0ex}}\mathrm{direction}}{\mathrm{frame}\phantom{\rule{0.33em}{0ex}}{\mathrm{rate}}^{*}\mathrm{mm}/\mathrm{deg}\phantom{\rule{0.33em}{0ex}}\mathrm{on}\phantom{\rule{0.33em}{0ex}}\mathrm{the}\phantom{\rule{0.33em}{0ex}}\mathrm{retina}}*\frac{\overline{{v}_{o}}}{{v}_{s}}.$$*v*

_{e}, will be overestimated. The percent error due to raster scanning is given by

## 8

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\rm percent\,error}\, = \,\frac{{{v}_e - {v}_o }}{{{v}_o }}. \end{equation}\end{document} $$\mathrm{percent}\phantom{\rule{0.16em}{0ex}}\mathrm{error}\phantom{\rule{0.16em}{0ex}}=\phantom{\rule{0.16em}{0ex}}\frac{{v}_{e}-{v}_{o}}{{v}_{o}}.$$*v*

_{o}and

*v*

_{s}, the relationship between object speed and measured speed is

## 9

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {v}_e = \frac{{{v}_o }}{{1 - RS}}. \end{equation}\end{document} $${v}_{e}=\frac{{v}_{o}}{1-RS}.$$## 10

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\rm percent \ error = 100\%^{*}}\,\,\frac{{RS}}{{1 - RS}}. \end{equation}\end{document} $$\mathrm{percent}\phantom{\rule{0.33em}{0ex}}\mathrm{error}=100{\%}^{*}\phantom{\rule{0.16em}{0ex}}\phantom{\rule{0.16em}{0ex}}\frac{RS}{1-RS}.$$Thus, the raster scan error increases when one or more of the following occur: *v*
_{s} decreases,
[TeX:]
$\overline {{v}_o }$
$\overline{{v}_{o}}$
increases, or the field of view decreases (the field of view inversely varies with the scale factor in the scan direction).

A similar analysis can be used to estimate the expected error in the horizontal scan direction. Since the scan speed is defined as the number of rows required to reach the edge of the frame, for the horizontal direction, *RS*
_{horizontal} = *RS*
_{vertical}/512. The percent error in the horizontal direction = 100%^{*}
[TeX:]
$\frac{{RS_{\rm vertical} }}{{512 - RS_{\rm vertical} }}$
$\frac{R{S}_{\mathrm{vertical}}}{512-R{S}_{\mathrm{vertical}}}$
. Since *RS*
_{vertical} < 1 for objects of interest, the percent error in the horizontal direction will always be less than 0.2%. When *RS*
_{vertical} = 0.5, the percent error in the horizontal direction drops to 0.098%. Therefore, we do not need to apply the raster scan correction to the horizontal component of calculated speeds.

## 2.7.

### Virtual AOSLO

The AOSLO is a custom-built, unique instrument with ∼5× the resolution of a commercial scanning laser ophthalmoscope. Typically, ground truth for new systems is generated using manual analysis performed by subject experts. However, due to the low contrast of the moving objects, it is difficult and unreliable to analyze videos by naked eye. Therefore, we used a virtual AOSLO to simulate realistic videos to create a synthetic dataset for use as ground truth in order to validate our methods.

## 2.7.1.

#### Parameters

A virtual AOSLO has previously been used to characterize scanning distortions due to raster scanning and eye motion for static images.^{22} We modified this virtual AOSLO to simulate the acquisition of a video in the presence of an object that is moving at the same time as the scanner. For the virtual AOSLO, we selected scanning parameters for a Type 1 video, which is the data set on which the proposed methods were developed. Since we were able to exactly specify the speed and position of the moving object, we considered the simulated videos to be ground truth. Due to the complexity of the AOSLO, the following assumptions were used for the virtual AOSLO:

1 The imaging laser is a perfect, dimensionless dot that samples with true fidelity one pixel of the input image at one time.

2 The retina is rigid and planar across the field of view.

3 Eye motion is strictly translational with no torsional component.

4 Imaging parameters are constant.

The first assumption bypasses sampling and resolution issues introduced by the optics of a human eye. This means that the quality and appearance of the simulated video primarily depends on the input image. Second, for the field of view of the simulation (1.5 deg in each direction), it is reasonable to assume that this region is both rigid and planar. Third, as previously described, the primary components of eye motion are translational.^{7} While we have observed torsional motions, they are typically small (unpublished experimental observations). The final assumption is that imaging parameters are constant — in actual practice, due to additional complexities such as calibration and temperature-dependent drift of electronic components, different imaging sessions have minor variations in imaging parameters (these variations are addressed using calibration steps prior to each imaging session).

An overview of the virtual AOSLO is shown in Fig. 7, and a summary of the parameters selected for the simulation is shown in Table 2. The input image was generated using individual frames from overlapping AOSLO videos near the fovea, which were scaled to the appropriate size (circle diameter = 3.75 deg). The spatial resolution of the input image was selected to be twice that of the output video and sampled by the virtual AOSLO using nearest neighbor interpolation. Thus, the simulated videos are similar to actual AOSLO videos, which allow us to apply the same correction steps that we would have applied to actual AOSLO videos.

## Table 2

Summary of parameters for the virtual AOSLO.

Parameter | Value |
---|---|

Horizontal raster frequency | 15.36 kHz |

Vertical raster frequency | 30 Hz |

Video frame size | 512×512 pixels |

Video acquisition rate | 30 fps |

Sampling band | Central 80% of forward sweep |

Retinal scale factor | 0.296 mm/deg |

X and Y scale factors | 341.33 pixels/deg |

Vessel diameter | 5 μm |

Leukocyte length | 15 μm in the direction of travel |

Leukocyte speeds | 1.00 to 3.00 mm/s |

Simulated videos were generated pixel by pixel. The basic steps were to 1. calculate the pixel timing given the raster parameters, 2. convert timings to spatial coordinates on the input image, and 3. sample the input image at the specified spatial coordinates. The time that each pixel is sampled can be directly computed using the raster scan parameters. To convert times to spatial coordinates, two calculations were done. First, the spatial coordinates were calculated assuming a static image. Second, the spatial coordinates were horizontally and vertically translated as specified by the *X* and *Y* components of eye motion, corresponding to the timing at each pixel location. Finally, sampling was performed after insertion of a moving object into the input image. For the moving object, we specified the trajectory and speed and used a high contrast, elongated object oriented along the direction of travel. The time-dependent input image was then sampled at the corresponding spatial coordinate for each pixel to generate a simulated AOSLO video.

## 2.7.2.

#### Experiments

The virtual AOSLO was used to generate a synthetic data set for validation. The synthetic data set consisted of simulated videos with different configurations, varying in object speed, vessel geometry and orientation, eye motion, and noise (Table 3). A single moving object was used for each of the videos.

## Table 3

Evaluation of speed quantification.

Video code | Vessel orientation | Object speed | Eye motion |
---|---|---|---|

H3 | Horizontal | 3 mm/s | No |

V1 | Vertical | 1 mm/s | No |

V3 | Vertical | 3 mm/s | No |

V1_EM | Vertical | 1 mm/s | Yes |

A2 | Arbitrary | 2 mm/s | No |

A2_EM | Arbitrary | 2 mm/s | Yes |

The goals of these experiments were to 1. verify that the error in measured speed was negligible in the horizontal direction, 2. verify that the theoretical errors in measured speeds were consistent in the vertical direction, and 3. examine the expected effects on calculated speeds due to experimental conditions. H3 was used to quantify the error due to raster scanning in the horizontal case, for an object moving at 3 mm/s – the faster the object moves, the greater the error expected. Most objects traveled at speeds between 1 and 3 mm/s. V1 and V3 were used to quantify the error due to raster scanning alone for an object traveling in the vertical direction, as modeled in Sec. 2.6. For the experimental conditions, the two factors that contribute most to changes in measured speeds were considered: vessel trajectory and eye motion. For the vessel trajectory videos (A2, A2_EM), we used the vessel centerline extracted from the Type 1 video as the vessel input. For the eye motion, we used the extracted eye motion trace from the first 1.3 s of the Type 1 video. We considered these two factors both separately (V1_EM, A2), and simultaneously (A2_EM).

## 3.

## Results

The proposed methods performed well on both the synthetic data set generated using the virtual AOSLO and on experimental videos acquired on the AOSLO.

## 3.1.

### Evaluation of Accuracy and Validity Using a Virtual AOSLO

We applied the proposed methods for tracking and speed quantification (Fig. 8). To measure speeds in the simulated videos, we repeated the analysis five times and took the average of computed speeds, in order to reduce the errors due to operator bias and differences in data precision. The data precision varied since there were three times as many data points that could be extracted to measure speeds at 1 mm/s versus 3 mm/s. At 3 mm/s, due to large pixels/frame displacements, the traces on the ST plots were disconnected.

To validate that objects were being correctly tracked, we generated a tracked video and used frame-by-frame examination. As expected, for all videos, the extracted traces corresponded to the moving objects. However, the labeled lines would sometimes lead or lag the moving objects by small amounts. Since the amount of lag/lead was preserved for each moving object, the slope of the traces was accurate. The error was due to the estimation of frame number from the coordinates of the extracted traces, due to the low temporal resolution relative to the speed of the leukocytes. Taking this into consideration, there were no false positives and no false negatives.

We compared the corrected speeds to the actual speeds (Table 4). We define the residual error as the percent difference between corrected and actual speeds and found that the residual error was on average 2% for moving objects traveling between 1 and 3 mm/s. The sources of error are most likely due to vessel and trace extraction, which are dependent on user interaction. For experimental data, these sources of error are likely to increase due to 1. lack of prior information about vessel trajectories and 2. variations in trace slopes. For the synthetic data sets, extraction is more accurate due to prior knowledge about the shape of the vessel (since it was specified), and due to the fact that object speeds are uniform (so that there is no variation in trace slopes).

## Table 4

Evaluation of speed quantification. Speeds are reported before (no RS) and after (RS) the proposed correction. Actual speeds are the object speeds corresponding to each video, as listed in Table 3.

Video code | H3 | V1 | V3 | V1_EM | A2 | A2_EM |
---|---|---|---|---|---|---|

No RS [mm/s] | 3.0082 | 1.1428 | 3.9370 | 1.1242 | 1.8956 | 2.0067 |

RS [mm/s] | 3.0110 | 1.0552 | 3.0608 | 1.0361 | 1.9766 | 2.0283 |

No RS versus RS% | −0.09% | 8.31% | 28.63% | 8.50% | −4.10% | −1.07% |

RS versus Actual% | 0.37% | 5.52% | 2.03% | 3.61% | −1.17% | 1.42% |

H3 confirms that the error in measured speed is negligible in the horizontal direction, since the calculated error was −0.09%. This is also in agreement with the theoretical model, which specifies an upper bound of 0.2% for the error. Therefore, it is a reasonable assumption to neglect the error due to horizontal scanning.

V1 and V3 confirm the theoretical errors due to the vertical component of raster scanning. We found errors of 8.31 and 28.63%, which are in agreement with the theoretical errors of 8.1 and 29.1%. Therefore, in the absence of eye motion, the computed errors are in agreement with the expected errors.

Eye motion can either increase or decrease the magnitude of the error. If eye motion is random and isotropic, then over time the average speed should not be affected by eye motion. However, if the eye favors motion along a preferred direction, then the computed speed is affected — the computed speed is maximally increased when the object, raster scan, and eye motion are in the same direction (i.e., all vertical and downward). Initially, the vertical component of the eye motion trace input is in the same direction of the scan — as expected, the error for V1_EM is slightly larger than V1.

In practice, vessels are rarely horizontal or vertical, particularly when considering capillaries. First, the magnitude of the error in calculated speed depends on the trajectory of the vessel at the object location, since only the vertical component of speed is corrected. Therefore, deviations from a vertically-oriented vessel should result in diminishing error magnitudes. Second, the start and end points of the vessel ultimately determine whether speeds are over- or underestimated. The vessel in A2 and A2_EM has both up- and downward components, but since the endpoint is lower than the starting point, this means that we should expect speeds to be underestimated when comparing A2 to V2. Since the eye motion results in a slight overestimation (comparing V1_EM to V1 and using the same eye motion input), this explains why the error for A2_EM is less than the error for A2.

## 3.2.

### Evaluation on Experimental AOSLO Videos

We performed the proposed methods on 40 vessels from ten AOSLO videos; first we report results across all videos and then we show detailed results for one representative vessel for each video Type.

Ten vessels were analyzed from 1 Type 1 video, ten vessels from 3 Type 2 videos, and 20 vessels from six Type 3 videos. The average absolute error in measured speed was 2.59% for the Type 1 video, 3.39% for the Type 2 video, and 2.04% for the Type 3 video, where absolute error was defined as the absolute value of the percent difference between corrected and noncorrected speeds for one trace, and the average absolute error was defined as the average absolute error across all extracted traces for each video Type. For comparison, we estimated the error using the RS parameter as defined in Sec. 2.6, taking [TeX:] $\overline {{v}_o }$ $\overline{{v}_{o}}$ to be the average object speed. In the absence of eye motion, for a vertically-oriented vessel, the theoretical error was 12.56% for the Type 1 video, 12.84% for the Type 2 video, and 5.27% for the Type 3 video. This suggests that either vessel orientations are horizontally biased or that eye motion is not uniformly distributed across all orientations.

We selected three representative vessels from Type 1, Type 2, and Type 3 videos to further characterize the error in measured speeds. For each vessel, traces were extracted from ST plots and used for tracking on the original videos and speed quantification (Fig. 9). Close examination of Fig. 9 shows that the orientation of the vessel has an effect on the slope modification as long as the effect due to eye motion is small (i.e., one can see whether the slope was over- or underestimated corresponding to a downward- and upward-oriented vessel). We will discuss this effect in more detail considering the actual errors in average speeds that were calculated (Table 5). In experimental videos, there are complexities such as arbitrary vessel shapes and orientations, eye motion, and variations in cell speeds both temporally and spatially. There is also noise due to variations in the intensity of the background photoreceptor tissue, likely due to dynamic scattering changes over time^{23} and also coherent artifacts.
^{24, 25} These variations generate noise in the multiframe division videos and affect the appearance of the ST plots. Therefore, the actual error in calculated speeds due to raster scanning and eye motion will be different. We compared our corrected speeds to uncorrected speeds, where uncorrected speeds were simply taken as the slope of the manually extracted trace, which assumes that the entire frame was acquired at the same moment in time.

## Table 5

Summary of cell speeds in selected vessel segments with and without the raster scan correction.

Parameter | Type 1 | Type 1 | Type 2 | Type 2 | Type 3 | Type 3 |
---|---|---|---|---|---|---|

RS | no | yes | no | yes | no | yes |

N | 50 | 50 | 4 | 4 | 12 | 12 |

Mean [mm/s] | 2.04 | 1.99 | 1.89 | 1.73 | 2.18 | 2.12 |

SD [mm/s] | 0.62 | 0.58 | 0.30 | 0.26 | 0.45 | 0.43 |

Min [mm/s] | 0.93 | 0.94 | 1.62 | 1.50 | 1.56 | 1.53 |

Max [mm/s] | 3.47 | 3.27 | 2.16 | 1.97 | 2.90 | 2.80 |

The actual error in the average speeds is 2.51% for the Type 1 video, 9.25% for the Type 2 video, and 2.83% for the Type 3 video. As previously described, the magnitude and sign of the error is largely determined by the trajectory of the vessel. For all three types, the vessels deviate from a purely vertical vessel, and so the magnitude of the error is diminished compared to the model (Sec. 2.6). In addition, because the end point of each vessel is lower than the start point, we expect an overestimation of speed for all three Types. The Type 2 video has the largest error, as predicted by the theoretical model (Fig. 6). Notice that there is a nonlinear shift that results due to the raster scan correction. Close examination of the Type 1 traces in Fig. 9 show that the slopes at the bottom of individual traces were decreased after application of the raster scan correction, while slopes at the top were increased. This suggests that the entrance side of the path segment underestimated speeds, while the exit side overestimated speeds, corresponding to a net upward and net downward vertical orientations, respectively. As can be seen from Fig. 5, this was exactly the case. As a final comparison, for the Type 1 video, the error was 2.51%, compared to −1.07% for the same vessel in A2_EM. The reason for this is due to a small difference in the starting and ending points of the vessel. Although we used the same trajectory, the end point of the vessel slightly terminates higher than the starting point for A2_EM, which explains the difference in the sign of the error.

To verify that each extracted trace corresponded to an object on the input video, extracted traces were registered (Sec. 2.4; illustrated in Fig. 10). We individually verified each extracted trace by examining the tracked video frame-by-frame. Overall, the labeled lines tracked the leukocytes well. There were no false positives; it was not possible to calculate a false negative rate.

## 4.

## Discussion

This paper presents a method for quantifying object speeds in AOSLO videos. We demonstrated a multiframe approach for motion contrast enhancement that improves the contrast of moving objects and vessels. Motion contrast enhanced ST plots were used to visualize hemodynamics and individual traces were extracted for analysis. Extracted traces were used to track objects on the input videos and also for speed quantification. Speed quantification was done using a slope-modification technique that corrects for raster scanning in the presence of eye motion. We validated our results using a virtual AOSLO. The combination of selected techniques is significant in terms of putting together a complete system of video and image analysis for noninvasive vascular video imaging.

Our results are similar to other methods. A previously reported method using manual identification and analysis on the same vessel from the same Type 1 AOSLO video used in this paper found a total of 35 objects with a speed of 1.82±0.42 mm/s, without considering the error due to raster scanning or eye motion; our uncorrected speed was 2.04±0.62 mm/s for 50 objects. While the numbers are similar, the discrepancies can be explained with the following considerations: The number of objects identified by the manual method was less than our method, probably due to difficulties in visualizing objects without motion contrast enhancement. It may have been more difficult to visualize objects that were traveling at faster speeds using the manual method. Finally, the vessel trajectory may not have been as accurate in the manual method.

There are a few similar results from different imaging modalities. Using fluorescein-aided scanning laser ophthalmoscopy, blood flow velocity was measured to be 3.29 ± 0.45 mm/s (standard deviation) in the parafoveal capillaries of 21 healthy volunteers.^{26} Our measured speeds are similar in magnitude to these results, but we can explain the discrepancies as follows. First, the location and size of the capillaries was different. Second, we measured leukocyte speeds, while they measured whole blood speeds using fluorescein. It is known that leukocytes travel slower through capillaries than erythrocytes,^{27} which constitute the majority of blood by volume. Thus, differences in spatial locations, small sample size, and differences in the element of blood that is being measured could account for differences in measured speeds. The blue field entoptic phenomenon is another method to examine capillary flow in the parafoveal region that can be used for estimating blood velocities.^{28} The blue field entoptic phenomenon refers to the movement of “flying corpuscles” that can be seen when looking at an illuminated blue background.^{29} It is thought that these flying corpuscles are in fact leukocytes. By having observers compare the speeds of these moving objects to those of simulated velocity fields, one can estimate speeds. One study found a speed of 0.89±0.2 mm/s,^{28} while another found speeds between approximately 0.5 and 1 mm/s.^{30} These speeds are similar in magnitude to those that we obtained, but one needs to be cautious since the blue field technique is subjective in nature.

The methods presented in this paper can be potentially applied to other high-resolution scanning systems with moving objects. There are many areas for future work, including full automation and application of more advanced detection and tracking methods. There are also important microcirculation studies that can be performed, including development of a family of hemodynamic markers to investigate leukocyte behavior. Such markers could be used to quantify changes in leukocyte behavior for normal and diseased retinas. The human eye allows for a unique opportunity to directly examine the microcirculation, which has been made possible due to improvements in imaging techniques (AOSLO) combined with the image analysis algorithms presented in this paper.

## 5.

## Conclusion

Raster scanning and eye motion contribute to significant sources of error when quantifying speed on AOSLO videos. The magnitude of this error depends on the speed of the moving object, configuration of AOSLO imaging parameters, the orientation of the vessel, and the isotropy of the eye motion, but can be as large as 37.8%. Slope modification on ST plots can correct for this error, improving the accuracy of hemodynamics using AOSLO.

## Acknowledgments

The authors would like to thank Mark Campanelli for developing the initial implementation of the virtual AOSLO, Scott Stevenson for his valuable insights regarding ST plots, and Qiang Yang for his help with video stabilization and desinusoiding. This work was based on research funded in part by the NSF Center for Adaptive Optics AST-9876783 and the NIH Bioengineering Research Partnership EY014375. Johnny Tam is supported in part by a National Defense Science & Engineering Graduate Fellowship (NDSEG), sponsored by the Department of Defense, and in part by a National Science Foundation Graduate Research Fellowship.