Traditionally, pipeline surveillance is conducted qualitatively by aircraft, driving patrol, and walking inspection to record features along the RoW that are important to the pipeline's safety and security. These manual techniques produce results that may not be sensitive or reach desired accuracy to localize or subtle problem identification. Considering the vast amount of area to be monitored in regions with less population, aerial monitoring is found to be the most viable option.
Rapid advances made in the area of camera and sensor technology have enabled the use of video acquisition systems to monitor the RoW of pipelines. Huge amount of data is thus made available for analysis. However, it would be very expensive to employ analysts to scan through the data and identify threats to the RoW in the vast amount of wide area imagery. This warrants the deployment of an automated mechanism that is able to detect threats to the RoW and send out warnings when threats are detected.
Machinery objects, such as construction equipment and heavy vehicles, have been major threats to pipeline infrastructure. Several vehicle detection algorithms have been proposed in the literature. Zhao and Nevatia  effectively utilized car body, edges of the front windshield, and the shadow as the features for cat detection. Moon et al.  introduces a simple vehicle detection algorithm by exploring four elongated edge operators. A top-down matching method is developed for vehicle detection from high resolution aerial imagery . Grabner et al.  exploits on-line boosting with interactive training framework for automatic car detection. Sahli et al.  present an alternative approach to the car detection using scale-invariant feature transform features and affinity propagation algorithm. Recently, a three-stage pattern recognition framework is proposed to detect construction equipment in various lighting conditions and different object orientations .
However, the majority of these techniques either computationally expensive or suffer from complex environment in aerial imagery, and neither of them consider the potential security issue of the detected objects for wide area surveillance. Thus, we present a multistage framework for the analysis of aerial imagery for automatic detection and identification of machinery threats along the pipeline right of way which would be capable of taking into account the constraints that come with aerial imagery such as low resolution, lower frame rate, large variations in illumination, motion blurs, etc.
The rest of paper is organized as follows. In section 2, an implementation framework of the proposed scheme is provided. In section 3, experimental results are presented and discussed. Finally, section 4 outlines concluding remarks and future research direction in this technology.
The proposed machinery threat detection technology can be categorized into three phases, namely background elimination, part-based object detection and risk assessment. Figure 1 depicts flow diagram of the proposed scheme.
The aim of developing the background elimination model was to provide information regarding the contents of an image that could be used in the process of threat detection. Some of the key observations made in the study are: (a) aerial imagery consists of various kinds of regions, (b) the regions can be segmented based on the information content in image domain or in a transformed domain. It is observed that plain ground does not contain much information contents, while buildings in an image have strong edge features and the trees have strong textural contents. Based on these observations, an algorithm is designed to efficiently segment regions in an image.
On the other hand, during the process of monitoring pipeline through a small aircraft, the experienced observers will adaptively eliminate most objects from their vision system that are no recognized as threats, such as houses, tress, etc., are less likely be a threat to pipeline. To mimic this kind of human vision system, we propose an automatic background elimination algorithm which can be broken into two parts: local textural features based segmentation (LTFS) and adaptive perception based segmentation (APS). The advantage of developing an automatic background elimination technique can be summarized as follows:
• Eliminate background in aerial imagery for a faster threat identification.
• Extract semantic information from scenes that can aid in threat detection.
• Utilize context cues to identify proper landmarks for better accuracy during change detection processes.
• Gather intelligence from a scene to aid in decision making for users.
Image segmentation plays an important role in enhancing the object detection rate. We herein introduce a new segmentation method, named LTFS, which is expected to improve both the accuracy and efficiency of our present threats detection algorithm. The LTFS is based on the property of the neighborhood area around every pixel within an image. The output of the LTFS only contains prominent information of the input image, such as abnormal regions or full connected inhomogeneous area.
The concept of the proposed algorithm is illustrated in Fig. 2. Let P be a point on the edge of an object in an image, and the edge separates the pixel points into two groups, so that the neighbor pixel around P can be separated into two classes. Each class has the same intensity value as shown in Fig. 2 as represented in two colors. Thus, the average intensity of all the neighbor pixels will be larger than the intensity values of one group of pixel points, and less than the other group of pixel points. If the neighbor pixels are thresholded by the average intensity, one group of the points will be 1, the other group will be 0, so that the sequence of P, contributed by the threshold pixels, will be a uniform pattern  in the circular direction.
For a given image, let Ip (p = 1,2,…, 8) be the intensity value of a pixel in a 3 × 3 neighborhood. Then the average intensity of the related neighbor pixels is computed by
If Ip > Iave, Ip = 1, otherwise Ip = 0, expressed as
Thus, the related neighbor pixel values are 0 and/or 1. Let IneWp be the new value of the pth neighbor pixel (1 or 0), then, the new value of the center pixel is concatenated by all the neighbor pixels, expressed by
If Inewc is a uniform pattern (except 00000000 and 11111111), then IneWc = 1, otherwise, IneWc = 0. The last step of the LTFS is to perform morphological operation to remove imperfections in the binary image.
Even though the threat objects are not single intensity object, the intensity levels of the most pixel points have less differences so that if the difference between the average intensity level and the neighbor pixel's intensity level is within a small range, we consider the neighbor pixels as one intensity level. The output of the LTFS only contains prominent information of the input image, such as abnormal regions or full connected inhomogeneous area. As shown in Fig. 3, the majority of background is eliminated and only some protruding regions are remained which indicates the possible location of the target.
The LTFS method provides global background elimination. However, it cannot be trained to eliminate specific regions in a given image. Therefore it is necessary to develop an advanced algorithm for semantic segmentation purpose. Thus, we propose the APS which is an artificial neural net based segmentation algorithm that can be trained to segment out specific objects from images.
The idea of APS model comes from the key observations in aerial imagery. One of the main observations during data analysis is the fact that most of the regions in the image do not contain a lot of information. There are also a lot of regions where the probability of finding threats are considerably low. In order to reduce the computational load on the object detection, we design a framework to segment out non-salient regions of an input image. A complete architecture of the algorithm is shown in Fig. 4. A test image is divided into various segments and passed through the trained model to detect the presence of an object. In Fig. 4, the local phase and local contrast are contextual features that are computed from the monogenic signal , expressed separately by
where A(x) represents the local phase, and φ(x) is the local amplitude. To obtain f(x), f1(x) and f2(x), we assume that an image S(x) is represented by
where x = (x, y) is the spatial coordinates of the signal S. Then if S is convolved with the transform function of even and odd pairs of spherical quadrature filters (SQFs) as shown in Eqs. (7), (8) and (9), we can obtain the components of the monogenic signal representation (f(x), f1 (x), f2 (x)).
where '*' represents the 2D convolution, ge(x) is the spatial domain representations of log Gabor filter, and go1(x) and go2 (x) are the odd set of SQFs, respectively. In terms of physical interpretation, the local phase contains the structure information of the objects while the local contrast information is represented by the local amplitude. In this research, the local phase and the local amplitude are used to represent regions of the image both in training and testing phases. For illustration, a sample result of the APS algorithm is shown in Fig. 5. In this specific example, buildings are being segmented.
Part-based object detection
In aerial imagery, a major challenge for detection is when the object of interest is partially occluded by shrubs, trees, buildings, etc. The part-based model has been shown attractive performance in object recognition due to its ability to cope with partial occlusions and large appearance variations [9-. Our proposed part-based model is demonstrated in Fig. 6. At first, an object is partitioned into a certain number of parts that varying by the size of object, then local phase information is used to extract informative attributes for describing individual parts. Next, object parts represented using local phase are converted into a larger number of clusters, similar parts are grouped into same cluster and then represented by histogram of oriented phase to describe specific pattern of the parts. The next step is to organize each of the detected parts and their attributes as an integrated entity. Since a target can be represented by certain number of patterns, we can train a classifier to detect such local patterns of the target, so that an occluded object can be detected by parts in the input scene.
The part- based object detection technique outputs the pixel location of the threat object in the input image. However, in real world, a pipeline operator has to know the geolocation of an object for preventing any damage to the pipeline. This requires a registration process between images and geographical map. In addition, some detected machinery threats may be placed far away from the pipeline or even they are not being operated where the probability of that to be a threat is significantly low. Considering this issue, we designed a framework, as shown in Fig. 7, which can automatically analyze the geolocation and temperature of the detected object and assign risk level of a threat as high, medium and low.
If a threat is assigned as a “high” which delivers a meaning that this object is the more potential threat to the pipeline, whereas if it is marked as a “low”, then the detected object has less risk to the pipeline. In Fig. 8, assume that a detected object located in P1 in an input image (rectangular region in blur color), and P2 is the nearest point to Pt and it locates on the pipeline centerline with geo coordinates, then we compute the shortest distance between pipeline centerline and the object, denoted as D. Notice that the coordinates of P1 is the spatial location in the image. In order to compute the physical distance between P1 and P2, we need to convert the pixel coordinates of P1 into geo coordinates. Since we know the geo coordinates of pixels in the four-corner of the image, we can easily find a transformation matrix to map image spatial location to geolocation, so that the geo coordinates of the object will be attained. Moreover, the temperature information of the target is obtained using the pixel value of the object in corresponding infrared imagery.
To evaluate accuracy of the proposed distance measurement technique, we embedded five sample targets in testing images, and the distances from targets to the pipeline are provided by Global Mapper, which is used as a ground truth for our analysis. The comparison of our proposed method with the ground truth is shown in Table 1.
Distance calculation statistic.
|Method/Object||Ground truth (feet)||Proposed (feet)||Average Mean Square Error|
|Target 1||7.897684||7.928991||0.1768 feet|
In this section, we will show results of our automatic machinery threat detection technique on a real-world dataset. The images in the database have been captured at altitude around 1000 feet along the pipeline RoW and data capture was done by one infrared and one visible cameras pointing towards the pipeline centerline. The objective in this research is develop a full-fledged system that can automatically detect potential machinery threats and aid human analysts for subsequent actions. The results of our proposed method are presented in two stages. The first stage shows the performance of the proposed background elimination technique in varying background conditions. The second stage presents the detection output using the proposed part-based model after background elimination. Risk assessment results will be generated as a text file after these two stages, however, we are not showing it here.
Results of background elimination
In Fig. 9, the results are shown in sequential order for the proposed LTFS and APS algorithms. Fig. 9 (a) is a sample test image which contains a threat object (red circled) that closes to the pipeline RoW. Fig. 9 (b) shows the output of the LTFS algorithm, as seen in the result, most of undesired background has been removed but the object is kept in the output image. Next, the APS is applied to the output image of the LTFS. Fig. 9 (c) shows the local phase image which was used for training and testing during APS process as mentioned in Section 2.1.2. Fig. 9 (d) is the final output after LTFS and APS processes. At seen in Fig. 9 (d), there are only few regions of the original image are left, this output would significantly contribute to the object detection stage since only few patches of the image will be considered for searching the object.
Part-based object detection
After background elimination, the part-based object detection model described in Section 2.2 is used for threat object detection. In this model, the sliding windows technique is used for scanning the image while SVM is employed for finding the object. During the sliding windows, due to the background segmentation technique eliminates most of non-target regions and sets the intensities of those regions are zero as shown in Fig. 10 (b), only if the amount of non-zero intensity values great that 30% in a local region is computed, this accelerates the processing speed as well as the detection rate. Figure 10 shows a sample result using the proposed part-based technique. As shown in Fig. 10 (c), the multiple parts of the object are detected without any false alarm.
In this paper, we have presented a new automated monitoring system to mimic human vision system on the application of machinery threat detection on pipeline RoW. The proposed technique has been simulated and tested on real-life dataset under various challenging conditions to investigate its reliability and feasibility. The proposed system yields above 85% accuracy for machinery threat detection and offers a practical candidate for a wide area surveillance to protect our pipeline infrastructure. Currently, work is in process to refine algorithm with respect to computation speed as well as accuracy. In addition, we are establishing a standard databased for construction vehicles detection which will be available to public soon.
This project has been funded by the Pipeline Research Council International (PRCI) with the test imagery captured in Gary, Indiana. (Project Number: PR-433-133700).