Synergetic use of thermal and visible imaging techniques for contactless and unobtrusive breathing measurement

Abstract. We present a dual-mode imaging system operating on visible and long-wave infrared wavelengths for achieving the noncontact and nonobtrusive measurements of breathing rate and pattern, no matter whether the subjects use the nose and mouth simultaneously, alternately, or individually when they breathe. The improved classifiers in tandem with the biological characteristics outperformed the custom cascade classifiers using the Viola–Jones algorithm for the cross-spectrum detection of face and nose as well as mouth. In terms of breathing rate estimation, the results obtained by this system were verified to be consistent with those measured by reference method via the Bland–Altman plot with 95% limits of agreement from −2.998 to 2.391 and linear correlation analysis with a correlation coefficient of 0.971, indicating that this method was acceptable for the quantitative analysis of breathing. In addition, the breathing waveforms extracted by the dual-mode imaging system were basically the same as the corresponding standard breathing sequences. Since the validation experiments were conducted under challenging conditions, such as the significant positional and abrupt physiological variations, we stated that this dual-mode imaging system utilizing the respective advantages of RGB and thermal cameras was a promising breathing measurement tool for residential care and clinical applications.


Introduction
Breathing rate, along with blood oxygen saturation, heart rate, and blood pressure, is considered as one of four main vital physiological signs. 1 The breathing rate is 12 to 18 breaths per minute (bpm) for a healthy adult at rest, 2 whereas it will increase to the range of 35 to 40 bpm when this person is undergoing or has just done exercise. 3 Alterations in the breathing rate and pattern are known to occur with serious adverse events 4 or early clues of the pathology processes. 5 Some diseases such as sleep disorders cause abnormal breathing rhythms such as Kussmaul breathing. 6 Moreover, the observation of breathing plays a crucial role in many other applications and research, including sport studies, 7 quarantine, and security inspections. 8,9 Current noninvasive breathing measurement approaches contain electrical impedance tomography, respiratory inductance plethysmography, capnography, tracheal sound measurement, spirometers, respiratory belt transducer, and electrocardiography-derived method. [10][11][12] Nonetheless, the above devices carry out breathing rate estimation in a contact way, which leads to discomfort, stress, and even to soreness of a subject. 12 Increasing daily and clinical demands for contactless and unobtrusive yet accurate breathing measurement alternatives in uncontrolled environments have spurred considerable interest among researchers on the application of innovative tools for breathing observation solutions. Doppler radar was used in the noncontact and through-clothing breathing evaluation via the measurement of chest wall motion. 13 This method is yet limited by the potential radiation and high sensitivity to motion artifacts. A laser Doppler vibrometer determined the breathing rate by the assessment of the chest wall displacements; 14 however, its result will not be accurate when, for example, improper measurement points are selected on the thoracic surface. Min et al. developed an ultrasonic proximity sensing approach to measure breathing signatures by means of calculating time intervals between the transmitted and received sound waves during the abdominal wall fluctuation. 15 The subjects under this test are required to remain still and refrain any other movements. In addition, owing to the mature image processing techniques, visible imaging sensors have attracted much attention for breathing evaluations. 16,17 Shao et al. determined the breathing patterns using the cameras in the visible region to track the small shoulder movements associating with breathing. 18 Although the random body movements can be corrected by the motion-tracking algorithm, breathing rate estimation based on visible imaging is by nature sensitive to the slight movements, thus not being appropriate for the long-term monitoring.
Compared with the aforementioned active sensors, the passive thermal infrared imaging that records the emitted energy from the objects does not need any harmful radiation and light source. 19 The principle of thermal imaging for breathing estimation is based on the fact that the changes of temperature around the nostrils and mouth will accompany the inhalation and exhalation. 20 Temperature variation is, in contrast to the displacement change, more significant and thus more suitable for deriving breathing signature. Despite many face recognition algorithms in the visible band, 21 locating and tracking face and facial tissues in the thermal band are highly challenging due to the few geometric and textural details as well as the various physiological changes in a thermal image. Basu et al. manually selected the nasal area and it was afterward tracked by the corner detection in conjunction with the registration process. 22 The hyperventilation was therefore successfully monitored by the thresholding technique. Abbas et al. extracted the breathing signals from manually-selected region in thermal images, and the good performance of the proposed method had been shown on the breathing measurements of eight adults and one infant. 23 Several investigators applied the template matching method to track the predefined nasal tissues in thermal infrared images. 24,25 Some literature has attempted to automatically identify the nasal region. Fei and Pavlidis first determined the nasal contour by the use of horizontal and vertical projection profiles in spatial dimension, and subsequently, the nostril regions were found by taking the temporal variances into account. 2 A retained boosted cascade classifier based on the temperature feature was utilized to detect the nasal cavities, 26 while the classification accuracy seemed to be unacceptable for the purpose of breathing measurement. The salient physical features of the human face in a thermal image were used to segment the nasal region; 27 however, it will not succeed when wearing glasses or conducting open-mouth respiration. Moreover, the direct identifications of facial tissues in thermal images are camera-dependent and extremely interfered with by the abrupt physiological changes such as perspiration. To face these challenges, the crossspectrum face and facial tissue recognitions may provide the possibility to locate and track regions of interest exactly in thermal images.
The objectives of the current study are to: (1) establish and register the thermal and visible dual-mode imaging system; (2) develop a cross-spectrum face and facial tissue recognition algorithm for long-wave infrared and visible bands and obtain the temperature variation signal; and (3) validate the dual-mode imaging system and the proposed algorithm for breathing rate and pattern measurements.

Methods
The thermal imager for breathing rate and pattern measurement is based on the fact that the temperature around the nose and mouth fluctuates throughout the inspiration and expiration cycles. The disadvantage compared to the RGB image is that, due to few geometric and textural facial details, the thermal image is at present inadequate to design fast and reliable face detection algorithms. 28 Therefore, in this study, the visible imaging technique is adopted to aid in the automatic recognitions of face and facial tissue in thermal images.
The steps of the development of a dual-mode imaging system, image registration, detection of face, and its tissue in two spectral domains, region of interest (ROI) tracking, computation of breathing signal, and validation experiment are elaborated in the following sections.

Thermal and Visible Dual-Mode Imaging System
The experimental setup is shown in Fig. 1

Image Registration
After establishing the dual-mode imaging system, the affine transformation is required to register thermal and visible images. 29 The first step is to select the strongly correlated points in the first frame of bimodal videos so as to define the control point pairs, viz., the fixed points in the thermal image and the moving points in the RGB image. Subsequently, these points are adjusted by the cross-correlation to obtain the transformation matrix. We align each frame from RGB video according to where I vis is the original RGB image and its corresponding transformed RGB image is I visr ; T represents the transformation matrix; s, θ, and b denote the scaling, rotation, and translation vectors, respectively. The row and column of the registered RGB images are resized to be equal to those of thermal images.

Cross-Spectrum Face and Facial Tissue Detection
The cascade object detector using the Viola-Jones algorithm 30 coupled with the screening technique based on biological characteristics was used to detect face and nose as well as mouth in the RGB image, and subsequently, the linear coordinate mapping was conducted to determine the corresponding regions in the thermal image. The Haar-like features are extracted from the integral images and afterward served as the input of the custom cascade classifier. This algorithm can be summarized as follows. 30 Let us assume that there is a dataset U ¼ fx 1 ; : : : ; x N g, and each data sample x i ∈ U carries a label variable y i ∈ YðY ¼ f1; −1gÞ, where i ¼ 1; : : : ; N. Hence, the initial distribution for the samples in training set can be represented as D 1 ðiÞ ¼ 1∕N. For every weak classifier, h t ¼ U → Y, the error of distribution D t can be denoted as ε t ¼ P D t ½h i ðx i Þ ≠ y i and therefore the weight of the weak classifier as α t ¼ 0.5 ln½ð1 − ε t Þ∕ε t , where t ¼ 1; : : : ; T. T is the number of weak classifiers. The final strong classifier is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 2 3 3 H final ðxÞ ¼ sign (2) where H final represents the final strong classifier. When more than one region is considered as a face using the custom Viola-Jones algorithm, we design the algorithm for searching the facial tissues such as nose in these regions and the region inclusive of facial contents is chosen as the real face. Once the face position has been confirmed, the above procedure will be repeated within the face region to detect the locations of nose and mouth. Nevertheless, several potential nose and mouth regions may be found by the custom cascade classifiers. To solve this problem, the biological characteristic of nose that locates on the center line of face is utilized [ Fig. 2(a)].
The minimum distance between center lines of face and nose candidate regions is calculated by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 3 2 6 ; 5 1 6 n final ¼ arg min n;n 0 ¼1;: In the equation, n final is the final nose region; x f1 and x f2 are the horizontal ordinates of two corners of the face region's top side; x n and x n 0 are the horizontal ordinates of two corners of the top side of the nose region; k is the number of nose candidate regions.
If there still exist several nose candidate regions, Eq. (4) is applied to find the largest nose candidate region as the real nose region In the case of further screening of the mouth region, due to the biological characteristics of facial tissues, the vertical ordinate of mouth should be smaller than that of nose. Simultaneously, the horizontal ordinate of mouth should be near to that of mouth [ Fig. 2(b)]. This step can be expressed by where m final is the final mouth region; ðx n ; y n Þ and ðx m ; y m Þ are the coordinates of the confirmed nose region and mouth candidate regions, respectively; m denotes the number of mouth candidate regions; h is the distance between the mouth and nose; α ∈ ½0;1 is the arbitrary value defined by the priori knowledge.
The algorithm of searching mouth is further refined by introduction of Eq. (4) to eliminate the small interfering blocks near to the center line.
Later, the corresponding positions of nose and mouth can be automatically found in the thermal images via the linear coordinate mapping.

ROI Tracking
The Shi-Tomasi corner detection algorithm 31 derived from the Harris-Stephens method 32 is applied to extract the interest points from nose and mouth ROIs in the visible gray images. For each pixel in input image, the covariance matrix M corresponding to its neighborhood SðpÞ is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 6 3 ; 6 7 9 M ¼ X x;y∈SðpÞ wðx; yÞ I 2 x I x I y I x I y I 2 y ; where wðx; yÞ represents the given feature window; I x and I y are the differences of x and y directions, respectively. The strongest key features C s is calculated by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 6 3 ; 5 9 7 C S ¼ ðði; jÞj min½detðMÞ ≥ k max i;j∈I fmin½detðAÞgÞ; where M is the covariance matrix of the pixel to be detected; A denotes the vector containing the covariance matrices in input image Iði; jÞ preprocessed by a Gaussian filter, and k is the empirical constant for tuning the threshold (here is 0.01). Next, the ROI is tracked via the Kanade-Lucas-Tomasi algorithm 33,34 where ε is the sum of squared intensity difference between the local image model A at the current time t and local image model B at time t þ τ; Δx and Δy are the displacements in the x and y directions, respectively; X is the vector including the displacement and time variables; W is the given window and ω is the weighting function (here is 1).
Based on the above equation, the tracking of ROI in video sequences can be realized by the use of the displacement ðΔx; ΔyÞ, which is determined by minimizing the ε. Furthermore, the tracking procedure is refined by the forwardbackward error. 35 This method invalidates the tracked points if their errors exceed the setting value, thus enabling the selection of more reliable trajectories among the consecutive frames. In this study, the threshold is set as 2 pixels. The cross-spectrum ROI tracking is achieved by the linear coordinate mapping.

Extraction of Breathing Signature and Pattern
Because the shape of original ROI may change from the rectangle to the polygon during the tracking operation, the equation listed below is available to acquire the average pixel intensitȳ sðkÞ within the ROI of thermal image E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 6 3 ; 1 9 1s ðkÞ ¼ 1 n where sði; j; kÞ is the pixel intensity of thermal image at pixel ði; jÞ and video frame k; N is the vector of pixel coordinates in ROIs and n is its number. The raw pixel intensities of ROIs in all the frames are smoothed by the moving average filter with the data span of 5. The abrupt change in the breathing waveform can be eliminated by the above smoothing method. The breathing rate and pattern as a function of video frame can be therefore simply estimated from the smoothed intensity data by computing the number of the obvious peaks. Considering both the speed and purposes of the analysis, it is not necessary to carry out the unit conversion to make the breathing signature a function of measurement time.

Validation of Proposed Method
To evaluate the performance of the cascade classifier using the Viola-Jones algorithm in tandem with the biological characteristic screening for the cross-spectrum face and facial tissue detection, a total of 66 image pairs cross the two domains, collected from 11 volunteers aged 23 to 28 under various breathing conditions, were used for the validation experiment through comparing the detection accuracy to that of the custom cascade classifier.
A database of thermal and visible dual-mode videos is constructed to quantitatively and qualitatively verify the proposed cross-modal breathing measurement method. The videos were captured with the frame rate of 30 frames per second under the uncontrolled illumination and room temperature. All volunteers involved in the experiments consented to be subjects, and were instructed to breathe using the nose and mouth simultaneously, alternately, or individually. Moreover, in the current work, the different breathing situations, such as the translations and rotations of body and the variations of facial expression when laughing, yawning, and speaking, were allowed (and even encouraged) during the experiments to guarantee that there are larger variations in the obtained videos. The distance between the volunteer and cameras is about 150 cm.
For the quantitative validation, the volunteers were asked to breathe at their own pace for 1 min, and this procedure was repeated six times for each volunteer. At the same time, the reference breathing rate was recorded by two dedicated and qualified human observers. The Bland-Altman plot 36,37 and linear correlation analysis were used to check the effectiveness of our approach.
In terms of qualitative testing, the volunteers were required to complete two intended breathing sequences: (I) eupnea (normal breathing) and tachypnea, followed by apnea and Kussmaul breathing (deep breathing); and (II) Cheyne-Stokes respiration, 38 which is an abnormal periodic breathing pattern containing the progressively deeper and sometimes faster breathing, followed by the gradual decrease and temporary apnea at the bottom of Fig. 8. The recorded breathing sequences extracted using a dualmode imaging system were visually compared with the intended breathing patterns.

Detection Accuracy of Face and Facial Tissue
Prior to the breathing signature extraction, the locations of face and facial tissue should be determined in the RGB and thermal images. We introduced the biological characteristics into the classification framework, aiming to attain higher accuracy than the custom cascade classifier using the Viola-Jones algorithm. The detection accuracies of face and nose as well as mouth are shown in Fig. 3 nose and mouth separately from 47.69% to 95.38% and 0% to 84.62%. For breathing measurement, previous literature on detecting the face and its tissue usually applied image processing and analysis in thermal images directly. Pereira et al. segmented the thermal image by the multilevel Otsu's method and identified the largest area of the remaining regions in the binary image as the real face region. 12 The human anatomy and physiology that limited the nose search window in the hottest region were also utilized by them to locate the nose. However, this algorithm will fail when the other large or hot objects appear in the thermal image. Deepika et al. implemented the thresholding operation in the green component of thermal images to extract the nose region. 39 This method cannot work if the subject breathes through the mouth. Despite being advantageous over the single mode imaging, the visible-thermal imaging system had been scarcely reported for recognition of face and facial tissue in breathing estimation applications, 28,40 due in part to the relatively complicated imaging architecture and data processing. Although there exist more effective and efficient approaches to find the face region in the face detection domain, 41,42 the proposed face and facial tissue detection method can achieve the acceptable accuracy for breathing measurement using triple coordinate calculation operations based on the traditional algorithm, thus having met the objectives of the current study. Consequently, considering the results and discussion mentioned above, we state that the visible and thermal dual-mode imaging framework and related algorithm in this study offer an alternative or complementary solution to face and nose as well as mouth detection in breathing research. Figure 4 shows the breathing signature processing interface for a visible and thermal dual-mode imaging system. This screenshot was one frame extracted from the short video series in the video, which was an attempt to illustrate the robustness of our system and the corresponding algorithms. As shown in the video, the imaging system on visible and long-wave infrared wavelengths, associated with the proposed object detection and tracking  algorithms, was capable of following the ROIs regardless of the actions of the other persons such as walking into the field of view (Fig. 4). In addition, this system was robust against the translations and rotations of body (e.g., head) within the angle of 90 deg in the field of view, as well as the abrupt physiological variations (e.g., yawning, swallowing, and speaking). Hence, with the help of the RGB camera, the thermal imagingbased breathing measurement device can detect and track the nose and mouth accurately, thus being able to maximally avoid the erroneous measurement of breathing signature.

Validation of Breathing Rate Measurement
To test the performance of breathing measurement with the dual-mode imaging technique in a contactless and unobtrusive manner, the statistical analysis approaches viz., linear correlation analysis and Bland-Altman plot were used for the validation of data from the small-scale pilot experiment. The scatter plot and regression line of estimated and reference BR are shown in Fig. 5. As shown in Fig. 5, most of the scatter points were close to the line of perfect match (slope ¼ 1) and within the 95% confidence intervals. By means of linear correlation analysis, the strong relationship (R 2 ¼ 0.971) between the simultaneously acquired measured and reference BR was found over the range from 9 to 42 bpm, indicating that the proposed method was acceptable for BR estimation.
The corresponding Bland-Altman plot in respect to two techniques is demonstrated in Fig. 6 with the mean of differences of −0.304 bpm and limits of agreement of −2.998 and 2.391 bpm. It could be observed that the majority of points were dispersed around the line of perfect agreement (BR differences ¼ 0), and the 12 points approximately located on this line and were considered to be fully consistent. There was one point (magenta) out of the upper limits of agreement with the offset about 0.6 bpm. By checking the original video, we inferred that the reason for this might be that the testing subject conducted very significant and frequent as well as irregular body motion during the experiment. This is also illustrated in Video 3. Three points from two subjects (one in dark yellow and the others in red) approximately fell on the lower limits of agreement, perhaps because of the alternative use of mouth and nose and the changes of facial expression when breathing. The distribution of points in the Bland-Altman plot in Fig. 6 was to great extent similar to that of scatter points in Fig. 5. In general, the result of the Bland-Altman plot demonstrated the feasibility of using the visible and thermal dual-mode imaging system in tandem with the proposed algorithm for the contactless and unobtrusive BR estimation.
A group of investigators measured the breathing rate by the application of dual RGB cameras installed in a smartphone. 17,43 They extracted the BR from the recorded chest movement signals, and the lower and upper limits of agreements were −0.850 and 0.802 bpm, respectively. In the other research, the ranges of limits of agreement between −1.4 and 1.3 bpm were obtained from the thermal image sequences, but the values had increased from −3.7 to 3.9 bpm since the subjects followed the complex breathing profile. 12 The manually defined ROIs in the RGB and infrared images were selected for prompt infection screening at airports, 8 and the limits of agreements varied between −1.0 and 0.9 bpm for the measurement of breathing rate. Compared to the published literature, though limits of agreement covered a relatively wider range, the dual-mode imaging system proved to be more immune to various variations for BR extraction via adding the camera operating at visible wavelengths. Notice that the imaging system coupled with the proposed algorithm that can minimize the BR measurement mistakes cannot eliminate the errors caused by a variety of uncontrolled variations.

Validation of Breathing Pattern Measurement
The breathing pattern sequences (I) of three subjects, corresponding to the use of nose-dominated, mouth-dominated, and nose and mouth combined breathing manners, are shown in Fig. 7. According to the labels in Fig. 7, it could be intuitively observed that the waveforms from the dual-mode imaging system successfully reproduced the predefined breathing sequences, containing eupnea, tachypnea, apnea, and Kussmaul breathing patterns. In fact, some noise events existing in the extracted signature led to the distortion of waveforms, which might in turn cause the misclassifications of breathing patterns. These unwanted signatures mainly attributed to alternately breathing through mouth and nose, for example, in the case of nose-dominated breathing [ Fig. 7(a)], the waveform in the eupnea phase was largely affected by occasional open-mouth  breathing. Fortunately, we could still correctly classify the different breathing patterns in the obtained sequences.
For the sake of further validating the reliability of the proposed method to identify the breathing pattern, the more complex breathing sequence (II), called Cheyne-Stokes respiration, was applied in this study. Figure 8 exhibits the Cheyne-Stokes respiration sequences of two subjects and the corresponding standard profile. In Fig. 8, the top and middle subplots are the breathing sequences of two subjects breathing through nose and mouth simultaneously and mouth primarily, respectively. Overall, the Cheyne-Stokes breathing sequences obtained using the visible and thermal dual-mode imaging system were basically consistent with the standard profile.

Failure Measurement Case Analysis
The validation experiments had demonstrated the robust performance of our dual-mode imaging system for breathing rate and pattern measurements. Nonetheless, this would be insufficient when, for example, the tracked points of ROI were completely obscured. Figure 9 displays two failed breathing measurement cases resulting from losing the tracked points. In Fig. 9(a), the subject pushed his glasses during the experiment, thus leading to the failure of cross-spectrum ROI tracking. For the second case, the targeted points were missing because of the out-of-plane movement. The relevant improvement of the algorithm should in the future be made to let the measurement continue after losing the tracked points.

Conclusion
A dual-mode imaging system, on visible and long-wave infrared wavelengths, has a capability of being used as a noncontact and nonobtrusive measurement tool to estimate breathing rate and pattern, instead of the conventional methods. The addition of RGB images allowed the more accurate and faster detection and tracking of face and facial tissue in thermal images. Moreover, integrating the biological characteristics into the custom cascade classifier using the Viola-Jones algorithm yielded the superior classifiers for detecting face, nose, and mouth with classification accuracies of 98.46%, 95.38%, and 84.62%, respectively. For breathing rate estimation, the dual-image derived results were in agreement with those measured by the reference method, regardless of whether the subjects used nose and mouth simultaneously, alternately, or individually when they breathed. Taking the open-mouth breathing into account made the system highly adaptable for home care and clinical applications. Through visual comparison, the different breathing patterns could be clearly revealed by the extracted pixel intensities of thermal images. Apart from the situations requiring recovering the ROIs, the proposed system proved to be robust against challenging conditions such as significant positioning and abrupt physiological variations.

Disclosures
Dr. Hu, Dr. Zhai, Mr. Li, Mr. Fan, Mr. Chen, and Dr. Yang have nothing to disclose. The authors have no relevant financial interests in the manuscript and no other potential conflicts of interest to disclose.