Correlation is a robust and efficient pattern recognition technique, which was widely studied in the literature.1 Correlation based discrimination measures the degree of similarity between a target image (unknown image to be recognized) and a reference image (belonging to a known database). Two families of optical architectures are used by most researchers to implement the correlation technique which are the VanderLugt correlator (VLC),2 and the joint transform correlator (JTC).3 Unlike the JTC, VLCs require the implementation of a complex-valued filter. The optical implementation of the JTC architecture does not need to align the optical setup, especially in the Fourier domain between the filter and the image target spectrum (VLC). Moreover, the reference images can be updated in near real time.
In this paper, we utilized the JTC-based correlation architecture to realize a near real-time face tracking system. The proposed system must be robust to changes in in-plane rotation of the target, while ensuring accurate discrimination between multiple targets. The main objective, of this paper, is to accurately recognize and track the desired target, such as the face of person in a given scene or to recognize underwater mines from video sequences.
To enhance the performance of the proposed tracking system, we used an optimized version of the JTC architecture, namely the fringe-adjusted JTC (FJTC).4 This fringe-adjusted JTC provides excellent correlation discrimination, but the robustness of FJTC may not be sufficient for some applications. Consequently, we added a nonlinearity function to the fringe-adjusted JTC architecture to enhance the robustness and discrimination capability. In addition, special attention is given to generate an optimized correlation output by applying a decision criterion in the correlation plane. The peak to correlation energy (PCE) criterion is not very suitable for automatic decision making.5,6 The presence of autocorrelation peak, and the presence of two correlation peaks, can perturb the decision criterion. To address this limitation, we used an adaptive FJTC architecture to suppress the zero-order autocorrelation peak. Since the correlation plane is symmetric, we work on one half of the correlation plane and modify the criterion to obtain values included in the interval [0 1]. Test results are presented to verify the effectiveness of the proposed technique.
The JTC technique is one of the most widely used correlation methods reported in the literature.1 This technique has been adapted and used in many recognition and/or tracking applications.12.–4,7,8 In this work, we have used this JTC method for a very the demanding application of underwater mines recognition. The underwater mines recognition application requires an adaptive JTC method to overcome the various problems caused by the specific conditions of underwater imaging. We begin this section by reviewing the principles of several main JTC architectures. The principle of the JTC correlation architecture has been described in detail in the literature.12.–4,7,8 In the classical JTC, the known reference image, , and the unknown input scene, , (image to be recognized) are displayed side-by-side in the input plane of the correlator, which are separated by a distance, , along the -axis between the target image and the reference image, to form the input joint image, expressed as1) such as the input plane, we can record the intensity or the joint power spectrum (JPS), , using a square law device such as a CCD detector array. In a classical JTC, an inverse Fourier transform of the JPS yields the correlation output in the output plane.3 This correlation output contains two important terms which are a central autocorrelation peak, and two cross-correlation peaks corresponding to the crosscorrelation between the unknown input scene, , and the known reference image, . These two cross-correlation peaks are located on either side of the central zero-order peak separated by a distance, .
The classical JTC provides broad correlation peaks and strong side lobes which can complicate the decision making. Several JTC architectures have been developed to solve this problem such as the binary JTC, nonzero-order JTC, and fringe-adjusted JTC.4,7,9 In a binary JTC,7 the JPS is binarized using a suitable threshold before applying the inverse Fourier transform step. The binarized JPS is defined as3 3), the user sets the value of depending on the application. If is set to 1, we obtain the classical JTC. The nonlinear JTC provides superior discrimination compared to the classical JTC. It is more efficient when the value of is lower than 0.5.10,11
The last optimization we present, herein, is based on the fringe-adjusted JTC. Alam et al.4,10,11 proposed to introduce a real-valued filter, called fringe-adjusted filter (), in the Fourier plane of the correlator, defined as:1 However, the FJTC architecture, based on spectral filtering, makes it very sensitive to noise changes which are a very common problem in underwater images. Examples of noise in underwater images are presented in Fig. 1. Thus, to overcome this problem and to make an efficient architecture, we need to update the noise function used in the spectral filter for each image, which may become a bottleneck for real-time applications. Thus, it is necessary to make optimizations to obtain a method such that it is less sensitive to noise encountered in underwater imagery. To achieve this objective, we introduced a special image preprocessing step and added nonlinearity to the JTC architecture to make it more robust, while ensuring excellent discrimination.
Another class of architecture, called the multi object JTC or nonzero-order JTC, has been found to be very useful for enhancing the correlation discrimination.9,11 In this architecture, the reference-only and the input-scene-only power spectra are subtracted from the JPS in the Fourier domain, given by:
In this paper, we propose and validate a nonzero-order fringe-adjusted JTC architecture (NFJTC) to ensure robust discrimination while simplifying the decision making. It may be mentioned, that the nonlinearity increases the robustness of the JTC architecture. Thus, we applied a special nonlinear function to the NFJTC to increase the robustness of the architecture.
To investigate the performance of the proposed technique, let us consider two practical applications. The first application involves recognizing and tracking a human face that changes according to different rotation angles. The system proposed, in this paper, allows robust tracking and discrimination of a face turning in different rotation angles ranging from to . Facial images contain information as well as noise. In general, the noise is known, and is quite similar, from one image to another in the sequence. Once our system is validated, we applied it for underwater mine detection and recognition.12 The main problem, in this application, is the poor visibility which can limit the visibility to only a few meters or less in turbid waters.
Challenges of Underwater Imaging
A generic underwater image formation model is shown in Fig. 2.13 In Fig. 2, light source (1) transmits light (2) towards the object (6). On the trajectory to and from the object, light is intercepted by undesired underwater particles (3), and light rays (2), (5), (7) and (9) become exponentially dimmer due to direct attenuation. Part of the light is also reflected towards the camera by medium (3), resulting in the backscattered component (4), which corresponds to the image of the water volume. Finally, some light is slightly deviated by medium (3) on its way back to the camera, resulting in the forward scattered component (8) which adds blurring to the image. Underwater images are very noisy images. Moreover, for underwater imaging, this noise changes from one image to another. This limits the efficiency of image recording medium and requires a noise model to reduce its impact. Detection and recognition can be significantly perturbed by this noise component.
To address this challenge, we introduced a three-stage preprocessing step as shown in Fig. 3. The first stage removes the moiré effect which is generated due to the use of analog cameras. It is modeled by a sine function following the procedure outlined in Ref. 14. Weak contrast often reduces the efficiency of recognition methods. In order to improve the underwater image contrast, we adapted the preprocessing technique presented in Ref. 15. The overall algorithm involved in the implementation of the second stage is summarized in Fig. 4. The output image with enhanced contrast is calculated as:Ref. 15. Finally, in the third stage, we calculate the phase image and we reduce noise with a band pass filter created with a Gaussian wavelet as depicted in Fig. 5.12
To investigate the robustness of the proposed technique, we selected several existing algorithms and compared their performance by using three images as shown in Figs. 6 to 8. The first two images came from one video sequence acquired by a diver and represents an upside down Manta mine (MN 103). These two image frames are closely located in the video (8 frames apart) where the mine-like characteristic did not change significantly. These two images were used to test the robustness of the proposed and alternate architectures. Since these two images were shot within a small timeframe and are similar, so the output of the filter should also be similar. The discrimination of our system is tested by applying JTC architectures on a mine image and the Lena image as shown in Fig. 8.
At first, we compared the classical JTC and the nonzero-order JTC on the manta mine image as shown in Fig. 6. We used image frame 1082 of the test video as the reference image and as the target image and the results are shown in Fig. 9. From Fig. 9, it is evident that the suppression of the autocorrelation peak simplifies the decision making. Moreover, this suppression reduces the effect of noise in the correlation plane.
Next, we investigated the performance of the nonzero-order JTC for the above mentioned images as shown in Figs. 6 to 8. We used PCE as the decision criterion in the nonzero-order JTC correlation plane, defined as:5Fig. 10.
Figure 9 shows that the nonzero-order JTC architecture yields helpful results for decision making. Figure 10 shows that the fringe adjusted JTC architecture observes these constraints on PCE values better than the binary JTC architecture. The nonlinear JTC architecture is a robust one, such that, PCE values are close between Figs. 9 and 10 as shown in the left columns. However, it does not provide robust discrimination as shown in Fig. 10. PCE values are close for the cases of (i) two mines on the left, and (ii) one mine and the Lena.
In this paper, our idea is to combine the attractive features, of fringe-adjusted JTC and nonlinear JTC, in order to obtain a robust architecture for correlation discrimination. In the proposed architecture, called nonlinear fringe-adjusted JTC (NNJTC), the JPS is modified to obtain the nonlinear fringe-adjusted JPS. The mathematical steps needed to obtain this nonlinear fringe-adjusted JPS are summarized in Eq. (8):
Binary JTC and Fringe-Adjusted JTC Performance Comparison
To compare the performance of nonzero-order binary JTC and the nonzero-order fringe-adjusted JTC, we consider the Lena image, Fig. 8, as the known reference image as well as the unknown target image, which is a scenario of perfect match between the reference and the target. To mimic the effects of underwater imaging, we modified the target image with underwater-like noise. At first, we added blur as if the image is perturbed by forward scattered light. Then we added nonuniform luminosity, which is a problem when the underwater robot uses artificial light. The next step was to combine the effects of blur and nonuniform light into the final target image. For the last test, we manually removed the background of the reference image. The target image is the Lena image corrupted with blur and nonuniform light.
The correlation output results are presented in Table 1. It is evident that the nonzero-order, fringe-adjusted JTC yields better results compared to the binary JTC. The FJTC is robust against different type of noise, such as underwater noise depicted in Table 1. According to these results and for underwater videos, the FJTC can provide better results compared to the binary JTC.
Comparison of the binary JTC and the fringe adjusted JTC on images with underwater noises. a: Lena-Lena. b: Lena-blurred Lena. c: Lena-Lena with non uniform light. d: Lena-Lena with blurred and nonuniform light. e: Modified Lena- Lena with blurred and nonuniform light.
|Reference and target images||Correlation plane, binary JTC||Correlation plane, fringe adjusted JTC|
At first, we applied the fringe-adjusted JTC for face recognition and tracking application. The objective is to detect and track a person via face recognition in a video sequence as shown in Fig. 11. Figure 12 presents the results obtained with fringe-adjusted JTC where only one reference image is used to construct the fringe-adjusted filter . It should be noted that the face of the subject shown in Fig. 11 without rotation. The reference image is introduced on one side of the input plane. The background is defined as information outside a square around the face in the reference image considered. Then we display the target image on the other side of the input plane. The target faces are obtained from the database presented in Fig. 11, one at a time starting from to in a sequential manner.16
We initialize the system with a known reference image, such as the subject’s face (number 1), to be tracked so that initial position of subject’s face is known. By comparing this reference face with the target image, we ascertain the presence or absence of the subject (number 1) in the target image and the position of the subject’s face in the target image. Thereafter, only the information around the position of the face is selected in this target image. To achieve this, we multiply the target image, with a filter equal to 1, around the position of the face found and 0 elsewhere. Afterwards, we introduce the preprocessed image as the reference image in the fringe-adjusted JTC input plane. Then a new target image, subject number 2, is introduced in the input plane and the process is repeated until all images from the database are processed. After various tests performed by using this algorithm, we have identified a tolerance rotation angle equal to 15 deg between the target image and reference image in which the correlation remains robust. Beyond this value between the reference and target images, the correlation becomes less robust and may lose track of the subject’s face. With a tolerance angle equal to 15 deg, the tracking problem of a person, with in-plane rotation between and , consists of correlating 13 pairs of images: , and .
The tracking results of subject number (1) are shown in Fig. 12 (blue curve). The PCE criterion was used for decision making in the correlation plane.5 Figure 12 also shows the results when subject number 2, such as an undesirable subject present in the input scene (false alarm: red curve). The results, depicted in Fig. 12, show that the proposed fringe-adjusted JTC yields good results. However, for some cases, the values of the two cases are too close for making a robust system. A robust system is a system where the values of recognition are close to 1. A discriminated system is a system where the false alarms are close to 0, with significant difference between true and false alarms.
To optimize the performance of the fringe-adjusted JTC system, we propose to utilize an adaptive fringe-adjusted filter () in the Fourier plane.17 In this technique, the background noise is calculated for each reference. For this purpose and after finding the position of head in the target image at time , we multiply the target image with a filter that has a value of 0 around the position of the face and 1 elsewhere. One can also replace the 0 value with the average of the values around the 0 value zone.
The target image obtained at time, , becomes the reference image at time, , in the proposed tracking system. Thereafter, we introduce the preprocessed image as the background noise in the fringe-adjusted filter formulation. The results obtained after the first iteration are shown in Fig. 13 which clearly shows that the adaptive fringe-adjusted JTC improves discrimination performance considerably, when compared to the first case. Figure 13 shows a system that has a better robustness and discriminatory, but we think that we can find a system and a criterion that increase robustness and discriminatory.
To further increase the performance of the proposed system, we propose to incorporate a decision criterion in the correlation plane which involves the following optimizations steps:
Step 1: Enhanced PCE decision criterion (P): we propose to take into account the correlation peak energy as well as the five highest peaks excluding the highest correlation peak. The Enhanced PCE decision (P) criterion is defined as:
Figure 14 shows the different test results obtained with our tracking system based on the optimized fringe-adjusted JTC. By comparing the results presented in Figs. 12 and 13, it is evident that the proposed technique yields better discrimination. Indeed, the difference between the two curves (red and blue) is much larger. The robustness has also improved as shown by the shorter range of values of the blue curve compared to the previous tests.
For noisy images, we searched for a robust criterion that provides values close to 1 in case of recognition and values close to 0 in case of false alarm. An idea is to compare the highest peak of the correlation plane to other high peaks. Consequently, we remove the autocorrelation peak in order to compute a criterion only with inter-correlation peaks and noise. Since the correlation plane is symmetric, we can work on one half of the correlation plane. Our objective is to find the maximum value, M, of the new correlation plane. We set a first threshold, to the value of the correlation peak M divided by , and n this value corresponds to 3 dB and to 50% of the correlation peak energy. We compute the cumulative sum (S1) of the pixels around the correlation peak and with values higher than . Next we compute the sum (S2) of all the pixels excluding the highest correlation peak with values higher than , as shown in Fig. 15. Finally, we introduce a distance metric defined as:
We applied this criterion to the previous test and the corresponding results are shown in Fig. 16. This criterion provides recognition values close to 1 and false alarm values close to 0. Thus, the robustness and discrimination ability of the proposed technique is very high. Using the aforementioned procedure, users can set a threshold for automatic decision making.
Underwater Mine Recognition
In the previous section, we validated the proposed architecture and criterions for face recognition applications. In this section, we applied the proposed technique for underwater mine detection. As there are a lot of images, we used composite filter techniques to fuse close references and reduce the size of the database.12 For this application, we selected five filters to recognize mines in a specific region of an underwater video sequence. As explained earlier, underwater images are perturbed by the medium and need preprocessing. Without preprocessing, correlation results will not provide desired output since target related contours will not stand out. The preprocessing step consists of resizing the image to reduce computation time. Then, we enhance the contrast of images using the procedure described in Ref. 14.
To obtain the contour image, we chose to compute the phase of the image. In fact, edges correspond to sharp phase changes.18 To achieve this, we compute the `Fourier transform of the image where the image can be expressed as . The amplitude information is contained in and the phase information is contained in the exponential function. To keep only the phase information, all values are divided by their amplitude yielding . Next, we use a band pass filter, constructed with a Gaussian wavelet, to reduce the remaining noise. The results of this preprocessing step are shown in Fig. 5.
From this contour image database, we conducted five composites filters for mine recognition in the mine-presence zones as illustrated in Fig. 17. Each filter is constructed with five reference contour images. With these five composite filters and preprocessed images, we tested the performance of the nonzero-order fringe adjusted JTC and the nonzero-order nonlinear fringe adjusted JTC. We set the nonlinearity factor to 0.85 and corresponding correlation outputs are shown in Table 2. From Table 2, it is obvious that the nonzero-order fringe adjusted JTC provides a high false alarm rate while the nonzero-order nonlinear fringe adjusted JTC provides moderate recognition rate.
Results of the different JTCs for mine recognition.
|Method||Recognition (%)||False alarm (%)|
|Fringe adjusted JTC||93.10||56.44|
|Nonlinear fringe adjusted JTC, k=0.85||52.07||16.29|
To enhance the decision performance further by increasing the recognition rate and decreasing the false alarm rate, we decided to add an optimization step. We assume that for 10 images before the present target image if we have five correlations, it means that a mine should be detected in the present target image. Knowing that the mine is fixed and knowing the navigation parameters, a mine, which is detected in one image frame, is highly unlikely to disappear in the following image frame. Our choice of 10 images has been made after several tests in order to find the right compromise between robustness and computing time. Using this concept, our optimization consist in considering only the last 10 raw decisions, coming from last 10 images, for a given present target image. Using last 10 nonoptimized results leads us to decrease the false alarm rate. In fact, if we use the optimized results, the detector never stops detecting after five detections, even if the five detections are false. However if we use our raw data, this optimization leads to a stop of detection, five images after the last image where a mine has been detected. On the test results, we can see that the recognition rate has improved while the false alarm rate has decreased, as illustrated in Table 3. In Fig. 17, we see regions where a mine is supposed to be detected and regions where the algorithm is supposed to find nothing. The results presented Table 3 and Fig. 17 shows that we managed to increase the recognition rate to 65.52 percent and we decrease the false alarm rate to 10.13 percent.
Results of the nonlinear fringe-adjusted JTC with optimization.
|Method||Recognition (%)||False alarm (%)|
|Optimized nonlinear fringe adjusted JTC, k=0.85||65.52||10.13|
Nevertheless, our optimized algorithm detected some objects, in the last region of the video sequence, where there is no mine. This can be explained by the presence of the camera support in target images as shown in Fig. 18. We did not add a mask to suppress this man-made object because there are images with no camera support and images with camera support, but it did not occupy the same proportion of the image as shown in Fig. 18, however, this constraint can be avoided when recording the images.
To show the robust performance of our proposed approach, several tests were conducted and the results were compared with nonoptimized methods. To do this, we selected a video sequence where the same mine is recorded, but the distance between the viewer and the mine is shorter in this case compared to the previous case shown in Fig. 19. Results obtained, in terms of recognition and false alarm probabilities, are summarized in Table 4. The visualization of the robustness of the optimized NNJTC is shown in Fig. 20. The nonzero-order fringe adjusted JTC still provides a high false alarm rate while the nonzero-order nonlinear fringe adjusted JTC presents good recognition rate. The optimization allows an increase of the recognition rate and a decrease of the false alarm rate, when compared to the nonlinear fringe adjusted JTC.
Robustness results of the different JTCs.
|Method||Recognition (%)||False alarm (%)||True to false alarm ratio (%)|
|Nonlinear fringe- adjusted JTC, k=0.85||34.07||12.33||2.76|
|Optimized nonlinear fringe-adjusted JTC, k=0.85||41.11||5.48||7.5|
In this paper, we proposed and validated the first version of a robust recognition and tracking system based on the fringe adjusted JTC architecture. In the proposed technique, we introduced nonlinearity to the fringe adjusted JTC, and also used an adaptive correction criterion in the correlation plane.
On the face recognition application, we show that a rotation angle of 15 deg between the target image and the reference image does not disable the recognition ability of the system, while considerably reducing the number of correlations necessary and, therefore, increasing the processing speed of the system. Moreover, the adaptive decision criterion, used in the output plane, minimized the number of false correlation peaks in the correlation plane thus reducing the false alarms. Finally, the addition of nonlinearity in the Fourier plane makes it possible to adapt or find a compromise between the robustness and discrimination of our tracking system by adjusting the degree of nonlinearity (). For underwater mine recognition application, the proposed NNJTC architecture yields promising results.
This paper complements our paper "Nonlinear fringe-adjusted JTC-based face tracking using an adaptive decision criterion".19 The raw mine images used in this publication were provided by the GESMA (Groupe d’Etudes Sous Marines de l’Atlantique) under the TOPVISION project coordinated by Thales Underwater Systems SAS. This project is related to Techno-Vision Program, which was launched by the French Ministry of Defense. Detailed information can be found in the website http://topvision.gesma.fr.This work is also partly supported by the Regional Council of Brittany.
Isabelle Leonard received the engineering diploma in optronic from the French engineering school Polytech'Paris-Sud in 2009. Since 2009, she is a PhD student at Institut Supérieur d'Electronique et du numérique (ISEN). Her research interests deal with image processing and underwater medium. She has published over 7 refereed journal articles or conference papers.
Ayman Alfalou received his PhD in telecommunications and signal processing from École nationale supérieure des télécommunications de Bretagne (ENSTB)-France and the University of Rennes 1 in 1999. He held a one-year post-doctoral position at ENSTB during which he designed and realized a compact and high-rate optical correlator. Since 2000, he has been a professor of telecommunications and signal processing at ISEN-Brest. At ISEN, he founded the Vision-L@bISEN. His research interests deal with optical engineering, optical information processing, signal and image processing, telecommunications and optoelectronics. He has supervised several PhD, MSc, and engineering school students. He has published over 110 refereed journal articles or conference papers and special sessions. He is a senior member of SPIE, OSA, IEEE, and a member of IoP.
Mohammad S. Alam is a professor and Chair at the University of South Alabama (USA). His research interests include ultra fast computing architectures and algorithms, image processing, pattern recognition and tracking, biometric recognition, infrared imaging systems, and smart energy management and control. He is the author or co-author of over 475 refereed journals, conference publications, or project reports, 15 book chapters, and a book on IPTV (IEC Press). He received numerous excellences in research, teaching and service awards. He served or serves as the PI or Co-PI of many research projects totaling over $14M, supported by NSF, FAA, DoE, ARO, AFOSR, SMDC, NASA, WPAFB, BP and ITT industry. He has organized and chaired many international conferences and serves as a Guest Editor for several professional journals. He is a Fellow of OSA, SPIE, IoP, IET, a Life Fellow of the Bangladesh Computer Society (BCS), and of the Institution of Engineers Bangladesh (IEB), and a Senior Member of IEEE.
Andreas Arnold-Bos graduated from École nationale supérieure de l'aéronautique et de l'espace (SUPAERO) [(now Institut Supérieur de l'aéronautique et de l'espace (ISAE)], Toulouse, France, in 2004, with the engineering degree of aerospace engineering, as well as a master's degree in Signal and Image Processing. From 2004 to 2007 he worked towards a PhD thesis on radar engineering at the École Nationale Supérieure des Ingénieurs des Études et Techniques d'Armement in Brest, France. In 2007, he joined Thales Underwater Systems as a full-time R & D engineer while simultaneously pursuing his PhD thesis, which he successfully defended in 2010. His past and current interests involve radar and sonar engineering, underwater optics and autonomous systems.