Optical triangulation is one of the oldest and widely used optical shape measurement methods, particularly in industrial applications, but its use in medical applications is not yet fully utilized. Laser triangulation can be performed in different ways, such as single-point measurements, pointed laser position scanning, or simultaneous multipoint measurements using laser line projection method.1, 2, 3, 4 One major advantage of laser triangulation is that its working range, resolution, adaptiveness to different applications, etc., can be easily manipulated by controlling the triangulation angle, dimensions, magnification of the imaging optics, etc. However, for clinical in vivo applications, the design of a triangulation laryngoscope is much more complex due to the relatively small working space available inside the mouth and, hence, the device has to be compact and easy to use by the clinician.
Vocal folds, the major source of sound production in human beings, are two similar flexible muscular structures covered with soft mucous membrane stretched horizontally inside the larynx. Vocal folds vibration is a complex three-dimensional movement, and its fundamental frequency of closing and opening spread over several hundred Hertz. These complex three-dimensional movements are not yet completely investigated due to the lack of experimental data. One of the most commonly used methods to track the vibration dynamics is stroboscopic imaging, in which a flash stroboscope triggered with the simultaneously acquired audio signal to visualize the vocal folds vibrations in slow motion.5 However, during some clinical investigations on subjects having irregular vocal folds vibrations or for very short phonation, the stroboscopic visualization may fail due to false audio triggering, because the microphone may pick up the fundamental frequency of vibrations as well as its harmonics. Švec and Schutte introduced high-speed videokymography (VKG) in 1996 to visualize the horizontal movements of certain selected positions of vocal folds in very high speed and resolution.6, 7 Two-dimensional imaging of complete vocal folds using a high-speed camera is also introduced in the same period and in recent years it gained a high momentum due to the technological advances in digital imaging and efficient data storing.8, 9 However, all these laryngeal imaging methods are only capable of the 2-D imaging of the horizontal movements of the vocal folds. Vertical movements of the vocal folds have never been successfully imaged in a quantitative way in vivo, except for some work by Larsson and Hertegård.10 However, their measurement was a single-point laser spot projection method, with which only one single spot on one of the vocal folds was imaged at very low resolution and speed and, hence, the results were very limited for any further analysis.
In this paper, we report a compact and high-resolution novel laser-sheet triangulation laryngoscope capable of visualizing vertical movements of certain selected positions of both vocal folds, similar to the function of VKG for horizontal movements. Because we used high-speed imaging of the full vocal folds, 3-D visualization of the movement of the upper surface of the vocal folds during vibration can be done by generating a VKG pattern at the same position where we measured the vertical movements. More significantly, with our new device we measured the vertical and horizontal movements of the vocal folds vibration in absolute SI units. The 3-D measurements not only give an insight into the complex movements of human vocal folds, but they also provide sufficient feedback for numerical modeling and its analysis, and can also help the medical doctors for a quantitative understanding of the pathological conditions of the vocal folds.
Laser triangulation is projecting a laser beam onto the object surface and collecting its image at an angle with respect to the incident laser beam. The angle between the incident beam and the imaging optics is the triangulation angle. It is well known that the resolution of a triangulation device increases with the increase in the triangulation angle,11 and hence, in most of the triangulation devices a large triangulation angle in the range of is used. However, in this particular triangulation laryngoscope, the triangulation angle has to be kept very small due to the limited working space available inside the human mouth. Figure 1 shows three different possible configurations of small-angle laser triangulation. In Fig. 1a, the laser beam incident at an angle to the object surface and the image collection optics is perpendicular to the object surface. In Fig. 1b, the laser beam is incident perpendicular to the object surface but the image collection optics is placed at an angle to the object surface. In both cases, the angle between the incident laser beam and the imaging optics, , is the same.is the magnification factor of the imaging system. For small values, Eqs. 1a, 1b lead to almost identical results. As an example, with typical values for a triangulation laryngoscope, such as , , and , [Eqs. 1a, 1b], we arrive at and , respectively. The basic difference between Figs. 1a and 1b is that in the case of Fig. 1a, when is changing, the beam will illuminate different positions and, hence, the measured vertical positions correspond to different closely spaced positions on the object (vocal folds in the present case) surface. However, if the triangulation angle is small, the change in laser illumination position is very small to take into account. For the same values of , , and , given above, a vertical movement of by the vocal folds will lead to a horizontal positional displacement of the laser spot on the vocal folds surface by . But, in the case of Fig. 1b, the same spot on the vocal folds surface keeps being illuminated as the laser beam is incident perpendicular to the object surface. Therefore, the option in Fig. 1b is more favorable than the option in Fig. 1a. But in real in vivo measurements, it is practically impossible to introduce the endoscope into the mouth at a predefined angle with respect to the vocal folds, and hence, an in vivo measurement may satisfy either Fig. 1a or 1b or it can be a composite of both options, which is shown in Fig. 1c. In Fig. 1c, both the incident laser beam and the imaging optics are at an angle with respect to the vocal folds surface. If is the angle between the vocal folds surface and the incident laser beam, then the measured movement of the vocal folds along the direction of the laser beam will be the component of along the direction of the laser beam given by, . For small tilt angles, this effect will be very small. For a reasonable assumption of maximum tilt angle of of the endoscope with respect to the vocal folds, the measured vertical amplitude will only be less than the actual value. Please note that this is the theoretically calculated error due to the endoscope tilt and not the total error in the final measured data.
Design of the Triangulation Laryngoscope
We designed and developed a unique endoscopic laser projection system, which stretches a laser line beam in one direction and projects it as a thin laser sheet at an angle onto the vocal folds surface. The laser projection channel consists of a semiconductor diode laser, an aspheric convex lens to focus the laser beam on to the vocal folds surface, a pair of convex and concave cylindrical lenses to stretch the laser beam in one direction, and a mirror at the end to achieve the desired triangulation angle. In our first prototype, the laser projection channel is firmly attached to a rigid endoscope, which acts as the receiving channel. Figure 2 shows the schematic view of the laser triangulation laryngoscope. The optical axes of the two channels are separated by a distance of at the tip of the system, and the triangulation angle is . From trial experiments, we have concluded that this is the maximum triangulation angle that can be used for a successful in vivo measurement. Within a normal working distance of from the tip of the endoscope, the laser line is long and wide. We used a semiconductor laser emitting at , which deliver an effective laser power density of , keeping below the exposure limit of for human skin recommended by the International Commission on Noionizing Radiation Protection.12 The red laser is used because it gives minimum absorption and satisfactory reflectance by the tissue in the visible spectral region. Only a small area around the center of the original laser beam is used for the projection to keep a uniform intensity distribution ( variation) along the length of the stretched laser beam.
A compact high-speed digital color camera (HResEndocam 5562, Richard Wolf GmbH, Germany) is used for recording the images. This camera is developed exclusively for the laryngeal imaging. This system can record images continuously for a maximum of at the rate of with an image resolution of , or for at the rate of . We used in our measurements to achieve maximum temporal resolution. Temporarily stored data in the camera memory is downloaded to the computer for further analysis. A white light source coupled to the endoscope through an optical fiber is used to illuminate the larynx to record the high-speed 2-D image of the vocal folds. The white light is used to visualize the vocal folds in its natural color, because the 2-D image is used to investigate any pathological conditions, in addition to the 3-D profiling.
For typical vibration amplitudes of the vocal folds of , the maximum intensity position of the image envelope of the laser line shifts only one or two pixels. A typical width of this envelope is . Therefore, we have to employ a proper image-processing technique to achieve subpixel resolution to track the vocal folds movements with sufficient resolution. The center of the image of the laser line is tracked for this purpose, and a curve fitting method is used to locate the exact center with subpixel resolution. The algorithm written in Matlab reads each frame of the movie, and the intensity profile of the image of the laser along its width is fitted with a Lorentzian curve of the formis the area, is the width, and is the maximum intensity position of the profile. The procedure is repeated for all points along the length of the laser line to locate the peak intensity position of the whole laser. Only the significant area of the image with sufficient laser intensity is used for the fitting procedure. High-frequency noise in the fitted data is filtered out using a two-level Daubechies wavelet (db10) filter. The resulting data provide us the information about the vertical position of vocal folds during phonation, with respect to a baseline position. In fact, the vertical position of the vocal folds is measured as the distance between the endoscope tip and the vocal folds surface. Later, this distance is converted as the vertical height of the vocal folds surface from the baseline. The baseline is the largest distance between the endoscope tip and the vocal folds surface, or in other words, it is the lowest position of the vocal folds during vibration.
The values obtained from the above fitting method are in pixels. In order to quantify the data, we have to calibrate the image to convert the pixel values into SI units. A single-step external calibration method is used for both horizontal and vertical directions. For this purpose, a black-square pattern of known length and width in white background placed at a series of known distances from the tip of the endoscope system, and the images are recorded for each distance. For any particular distance between the endoscope and the object, the projected laser line assumes a unique position in the image and the image size of the calibration pattern is also different for different distances. Because of the complex optical focusing mechanism of the endoscope, the position of the image of the laser line as well as the size of the calibration pattern varies in a nonlinear fashion with respect to the distance between the endoscope and the calibration object surface. A second-order polynomial fit is used to correlate them, and the resulting data are used for the calibration of in vivo measurements. In order to compare the horizontal movements at the same position as that of the laser line, we used a Matlab algorithm to generate a 2-D kymogram from the high-speed video.
In Vivo Measurements
The rigid endoscope is inserted through the mouth, and the high-speed video from the camera is displayed in a TV monitor in real time for guiding and proper positioning of the endoscope. The endoscope is positioned in such a way that the laser line is projected near the middle (in the anterior-posterior direction) of the vocal folds, where the vibration amplitudes are expected to be maximum. The recording procedure takes only few seconds, and the recorded high-speed video is processed later. Depending on the quality of the image, the entire image-processing procedure takes using a computer. Such a long processing time may restrict the widespread use of this method in real-time clinical investigations, but it can be used for the in-depth analysis of the clinical data.
We have performed a series of in vivo measurements on several volunteers. In order to demonstrate the working of our new device, one of the measurements carried out on a vocally healthy male subject is shown here. The subject produced a sustained phonation at . Figure 3a shows one frame of the high-speed movie. This frame of the movie corresponds to partially opened vocal folds. The projected laser beam on the both vocal folds surface is visible in the (color) image. Because we used a red laser beam, the red component of each frame of the video is used for further processing. Figure 3b shows the red component of the image shown in Fig. 3a. Here you can see the intensity profile of the laser beam on top of the relatively flat background intensity profile that is contributed by the red component of the white light. Several consecutive frames of the movie is converted into such intensity patterns and then processed to extract the triangulation data over a period of several opening and closing cycles. Figure 4 shows the laser intensity profile on the right vocal fold at a distance of from the glottal midline. For visual clarity, only four alternate frames are shown in Fig. 4. The position corresponding to the peak laser intensity is calculated with an accuracy of . Wavelet filtering of the data further improved the accuracy of the measurement. In Fig. 4, different positions of the peak intensity in the theoretical fit correspond to different vertical positions of the upper surface of the vocal folds. The absolute values of the vertical positions of the vocal folds are determined from the external calibration procedure described in the image-processing section. Figure 5a depicts the colour coded 3-D vibration pattern of the left and right vocal folds. Figure 5b shows the horizontal component of the vibration pattern at the same location where the vertical component is measured. The “glottal midline” is an imaginary line of contact of the two folds in their closed position, and normally, the position of the two vocal folds are marked in either direction from this glottal midline. Because glottal midline is used as the reference line for marking the horizontal position, the calculated vertical positions for each lateral distance from this reference line is, in fact, the instantaneous vertical positions of the vocal folds upper surface at a certain time. Hence, the mucosal wave propagation in the lateral direction is also included in the calculation and is clearly visible in the 3-D profile. Video 1 shows the vocal fold’s 3-D vibration dynamics, where, (in milliseconds) is the time with respect to the first frame. From Fig. 5a, it is clear that the vertical component of the vibration is slowly decaying as it goes away from the glottal midline, which might correspond to a mucosal surface wave. The fact that no irregularities in the vibration amplitudes is visible in either the horizontal or vertical component of the vibration dynamics of the left and right vocal folds indicates that our test subject is vocally healthy. Further clinical measurements on subjects having vocal problems are in progress, and the results and detail analysis will be published in a clinically oriented journal.10.1117/1.3041164.1
One of the major problems with the acquired image is the specular reflection from the vocal folds surface. The vocal folds surface is highly moisturized and, depending on the positioning of the endoscope, certain frames or even the entire movie may contain the specular reflection. In most cases, the specular reflection is limited to small-area localized patterns, but in some case, the reflection pattern may spread over the entire vocal folds surface. For small-area localized patterns, the specular reflection of the laser beam in one or two movie frames can be filtered out using intensity thresholding with respect to the surrounding pixels along the length of the laser line. Another difficulty faced while performing the measurement was the reduction of laser intensity during the measurement due to the condensation on the laser channel window. The tip of the endoscope is warmed up to using a stream of hot air to avoid the condensation. Considering the fact that the examiner holds the endoscope in his hand, during the measurement a relative movement of the endoscope with respect to the vocal folds is possible. However, this movement is at very low frequency and, because the recording is at very high speed, it is very easy to single out regions with no endoscope movement from the movie. The absolute values of the vibration amplitudes measured with the present method depend on the in vivo measurement configuration. During the external calibration procedure, the calibration object surface is moved in a direction perpendicular to the plane of the projected laser beam. However, during an in vivo examination, the endoscope may be slightly tilted with respect to the vocal folds surface and this tilt may be different during different measurements. This tilting can cause a small error in the measured absolute values as we discussed in the theory section. In a few cases, patients cannot tolerate a rigid endoscope for laryngeal examination. In these cases, the aforedescribed triangulation method may not be fully successful.
With the help of laser triangulation, we have not only extracted the 3-D vibration profile of the vocal folds but also calibrated and quantified the 2-D image. Vocal fold’s vibration amplitudes and velocities are dependent on different parameters, such as the fundamental frequency of phonation, type of phonation, and the sound pressure level.13, 14 However, the fluctuations in these variables are minimum or negligible within one measurement for a very short time. Hence, a comparison of complete 3-D vibration profile of left and right vocal folds can be very useful in the investigation of vocal folds pathology. The present high-speed triangulation method does not require any audio triggering and, hence can be used to investigate any type of complex vibration dynamics, such as hoarse voice, paralyzed folds, etc., where stroboscopes cannot precisely track such motions. The position of the laser line on the vocal folds can be easily controlled by tilting the endoscope, and hence, this device can be easily used to study the physical size of vocal folds nodules, polyps, or cysts in 3-D, and this can help a phonosurgeon for vocal folds surgeries.
This work was supported by the Technology Foundation STW (Stichting Technische Wetenschappen) Project No. 06633, Applied Science Division of NWO (Natuurwetenschappelijk Onderzoek), and the technology program of the Ministry of Economic Affairs, The Netherlands. The authors are thankful to Richard Wolf GmbH, Germany, for providing the high-speed camera.