Open Access
6 June 2019 Ship detection and tracking method for satellite video based on multiscale saliency and surrounding contrast analysis
Haichao Li, Liang Chen, Feng Li, Meiyu Huang
Author Affiliations +
Abstract
In port surveillance, monitoring based on satellite video is a valuable supplement to a ground monitoring system because of its wide monitoring range. Therefore, automatic ship detection and tracking based on satellite video is an important research field. However, because of the small size of ships without texture and the interference of sea noise, it is also a challenging subject. An approach of automatic detection and tracking moving ships of different sizes using satellite video is presented. First, motion compensation between two frames is realized. Then, saliency maps of multiscale differential image are combined to create dynamic multiscale saliency map (DMSM), which is more suitable for the detection of ships of different sizes. Third, candidate motion regions are segmented from DMSM, and moving ships can be detected after the false alarms are removed based on the surrounding contrast. Fourth, important elements such as centroid distance, area ratio, and histogram distance from moving ships are used to perform ship matching. Finally, ship association and tracking are realized by using the intermediate frame in every three adjacent frames. Experimental results on satellite sequences indicate that our method can effectively detect and track ships and obtain the target track, which is superior in terms of the defined recall and precision compared with other classical target tracking methods.

1.

Introduction

The automatic monitoring of ships using video surveillance plays an important role in ocean security and maritime transportation, fishery management, ship traffic surveillance, and so on. In general, the video sources of ship monitoring systems are mainly taken from ground-based1,2 or aerial-based3,4 cameras. However, these systems have some shortcomings such as poor concealment, limited space coverage, and the required sensor installation and maintenance.

Fortunately, satellite-based monitoring can overcome the above shortcomings. However, video satellite was not available until recently. Therefore, in the past several years, many ship detection algorithms have been proposed based on synthetic aperture radar (SAR) images because of SAR’s imaging capacity during day and night under most meteorological conditions.58 Compared with SAR images, optical satellite images have higher spatial resolution that are more suitable for ship detection, and they are important supplements to SAR images.

Therefore, although optical satellite images are usually disturbed by weather conditions, such as clouds or ocean waves,9 a number of ship detection methods have been developed from a single high-resolution optical satellite image.1013 Most of these methods adopt the coarse-to-fine strategies, which can be divided into ship candidate extraction stage and false alarm elimination stage. In the first stage, these methods extract candidate ships according to the differences of gray values between potential targets and background.14,15 In the second stage, most of the algorithms utilize ship features with candidate classifiers to discriminate ships from false alarms,16 and an important issue is to find efficient descriptors to describe the ship targets. Further, some existing methods use a priori coastline data to detect sea area;17 however, they are still disturbed by coastal areas due to low accuracy of coastline data.

However, the low-temporal resolution of existing satellite images limits the timeliness of ship detection and tracking. Fortunately, optical video satellites, such as Skysat and Jilin-1, have been developed due to the advancement of camera technology for highly spatial resolution and temporal resolution systems. In recent years, several researches on target detection and tracking based on optical satellite videos have been carried out.1820

It should be noted that satellite video plays an important role in ship monitoring because of its strong concealment, wide monitoring range, and real-time continuous monitoring. Although much progress has been achieved in ship detection based on static satellite images, few studies have focused on moving ship detection and tracking based on satellite video, which will become a new research topic in the field of remote sensing. Rao et al.21 proposed to estimate ship speed and direction by locating them and their tracks from multisatellite imagery. Yao et al.22 and Zhang et al.23 proposed ship-tracking methods in GF-4 satellite sequential imagery based on the automatic identification system data of ships, which are used for cooperative ships’ surveillance. As shown in Fig. 1, ship features can be easily extracted and recognized from ground-based video [see Fig. 1(a)] and aerial-based video [see Fig. 1(b)] because of the sufficient spatial resolution, which are convenient for feature description in ships detection and tracking. Compared with general ground-based video and aerial-based video, there are several problems in ship detection and tracking based on satellite video (see Fig. 1). The moving ships in satellite-based video range from just a few pixels to dozens of pixels with similar brightness due to the limited resolution and even exhibit low contrast to the background due to the influence of cloud and sea wave. From the above characteristics of moving ships in satellite-based video, it is difficult to extract available appearance feature or texture information inside the ship. Maybe some small ships are submerged in a complex background. Another problem for the ship detection and tracking is the complex background due to the large field of view, in which there are disturbances, such as clouds, land moving targets, and so on. Accounting for the characteristics of satellite video, the traditional target detection and tracking methods are difficult to perform well for ships in satellite video.

Fig. 1

Comparison of three types of videos. (a) Ground video frame. (b) Aerial video frame. (c) Satellite video frame.

JARS_13_2_026511_f001.png

The goal of this paper is to detect and track moving ships based on satellite video. Inspired by the multiscale selective cognition property of the human visual system, we propose an integrated algorithm of ship detection and tracking. The main contributions can be summarized as the following three aspects: (1) in order to detect ships of different sizes, a dynamic multiscale saliency map (DMSM) is proposed to compute the differential image between two frames, which can detect motion ships of small displacement and avoid the possible holes inside motion ships generated by frame difference method. (2) The surrounding contrast and ship characteristics are utilized to discriminate moving ships from false alarms such as clouds and land moving target. (3) An integrated ship detection and tracking scheme using the intermediate frame in every three adjacent frames is proposed, which avoids the problem that all frames are registered to the same reference frame.

The rest of this paper is organized as follows. Section 2 introduces the proposed moving ship detection and tracking method. Section 3 describes the experimental results and analysis, and Sec. 4 gives the conclusion.

2.

Methodology

The flowchart of the proposed ship detection and tracking method is shown in Fig. 2, which mainly includes two stages, ship detection stage and ship tracking stage.

Fig. 2

Flowchart of the proposed ship detection and tracking method.

JARS_13_2_026511_f002.png

At the first stage, for given two frames, first, camera motion compensation based on least square image matching is presented. Second, a saliency detection method is presented to construct DMSM based on the differential image. Third, the DMSM is segmented to obtain a binary image, and then moving ships are extracted with the help of ship characteristics and surrounding contrast.

At the second stage, first, important elements such as centroid distance, area ratio, and histogram distance are used to perform ship matching between two frames. And then, ship association and tracking is proposed by using the intermediate frame in every three frames.

2.1.

Motion Compensation

The video captured by the satellite-mounted camera always includes camera motion. Therefore, given two frames t1 and t, the natural solution is to estimate the transformation parameters and try to compensate the camera motion. Many approaches of affine transformation between two images have been reported in the literature. Most of these methods require a set of matched feature points to estimate the affine transformation parameters. However, the matched feature points are easy to fall on the moving objects which can result in the wrong affine parameters.

Therefore, the least square image matching method is used to compensate motion which does not require matched feature points. Denote frame t1 and frame t as the source frame fS(x,y,t1) and target frame fT(x,y,t), respectively. To account for intensity variations, the relationship between two frames can be modeled by the following transform:

Eq. (1)

m7fS(x,y,t1)+m8=fT(m1x+m2y+m5,m3x+m4y+m6,t),
where m1m8 are the transform parameters. Among them, m1m4 form the 2×2 affine matrix, m5 and m6 are the translation vector, and m7 and m8 embody a change in contrast and brightness. In order to estimate these parameters, the following quadratic error is to be minimized:

Eq. (2)

E=x,yΩ[m7fS(x,y,t1)+m8fT(m1x+m2y+m5,m3x+m4y+m6,t)]2.

Here, first, to simplify the minimization, the error function of Eq. (2) is derived through a Taylor-series expansion. A more accurate estimate of the actual error function can be determined using a Newton–Raphson style iterative scheme.24 In particular, on each iteration, the estimated transformation is applied to the source image, and a new transformation is estimated between the newly registered source and target images. Second, in order to adapt to the large displacement between two frames and improve the registration speed, a coarse-to-fine registration scheme is adopted. The details can be found in Ref. 25. After the affine parameters are obtained, it is applied to the source frame to obtain the registered source frame. This method can achieve subpixel registration accuracy and adapt to the registration between two frames with illumination changes.

2.2.

Dynamic Multiscale Saliency Map

If we have given two registered frames obtained at different time, the simplest way to detect motion regions is by frame differencing.26 However, if the displacement of motion ship between two frames is small, the holes often appear inside motion ships in the differential image. If there are similar intensity values in the entire ship, it will also cause the holes inside motion ship. In all these cases, it is difficult to detect a complete moving ship, as shown in Fig. 3(c).

Fig. 3

Example of saliency map of differential image. (a) and (b) The registered source frame and the target frame. (c) The differential image of (a) and (b). (d) The saliency map based on SR model. (e) DMSM of proposed method. To display clearly, two regions possibly including ships are shown in detailed forms. (f1) and (f2) Two zoomed-in regions of (c). (g1) and (g2) Two zoomed-in regions of (d). (h1) and (h2) Two zoomed-in regions of (e).

JARS_13_2_026511_f003.png

As is known, visual saliency is one of the preattentive processes which makes us to focus our eyes on attractive regions of the scene.27 Due to the ability to capture the salient region, visual saliency has been widely applied in target detection, which is usually used to segment the salient target.28,29 Therefore, we take advantage of this technique to highlight ship areas while suppressing the background in the differential image.

However, there may be many moving ships with different sizes and speeds in each frame, so it is difficult to extract a complete ship without holes from the single-scale saliency map of the differential image. As shown in Fig. 3(d), the single-scale saliency map with spectral residual (SR) model30 cannot eliminate the holes inside motion ship effectively. Therefore, we propose to compute the DMSM of the differential image based on SR model, which is calculated with the following steps:

  • Step 1: A Gaussian pyramid of the differential image is built with n scales, expressed as {Gi|i=1,2,,n}.

  • Step 2: Calculate the log amplitude and phase spectrum of image Gi

    Eq. (3)

    Fi(f)=F[Gi]Li(f)=log[Fi(f)]ϕi(f)=ph[Fi(f)],
    where F(·) is the Fourier transform, ph(·) is a function for computing phase spectrum, ϕi and Fi(f) are phase and amplitude spectra, respectively, and Li(f) denotes the log amplitude spectrum.

  • Step 3: Calculate the SR of Gi:

    Eq. (4)

    Ri(f)=Li(f)h*Li(f),
    where h is the 3×3 averaging filter.

  • Step 4: Do an inverse Fourier transform for SR by keeping the phase spectrum, and we obtain saliency map of Gi:

    Eq. (5)

    Si=F1{exp[Ri(f)+jϕi(f)]}2.

  • Step 5: The saliency map Si is resized to the size of differential image

    Eq. (6)

    S˜i=resize(Si),
    where the resized saliency map S˜i has the same width and height as that of differential image.

  • Step 6: DMSM is calculated based on the combination of all resized saliency maps of n scales with a two-dimensional Gaussian filter g

    Eq. (7)

    DMSM=g*1ni=1nS^i.

As shown in Fig. 3(e), regions of moving ships are highlighted in the output DMSM, and the holes inside the motion ship are successfully eliminated compared with the single SR model. As seen clearly from the zoomed-in regions [shown in Figs. 3(f1)3(h1) and 3(f2)3(h2)], our proposed method can obtain better performance in eliminating holes, because DMSM is generated under different resolutions just like the human visual system.

2.3.

Ship Region Detection

Because the ship regions are relatively salient in DMSM, ship candidates can be obtained by the segmentation of DMSM. Here, a simple segmentation method based on the mean and standard deviation of DMSM is applied to compute an adaptive threshold with the following equation

Eq. (8)

TD=μs+λ·σs,
where μs and σs are the mean and standard deviation of DMSM, respectively, λ is a coefficient which was empirically set to 1.0 to 2.0 in our experiments.

Further, mathematical morphology dilation operation is applied to eliminate the remaining holes within regions and a binary image is obtained [Fig. 4(a)]. Then, the candidate ship regions can be obtained by AND operation between a binary image and each frame of the two frames, respectively [Figs. 4(b) and 4(c)].

Fig. 4

Ship candidate regions in two frames. (a) Regions after morphology operation. (b) The candidate motion regions in registered source frame. (c) The candidate motion regions in target frame.

JARS_13_2_026511_f004.png

After candidate ship regions obtained, there may exist some false alarms, such as ship wakes, ocean waves, land, and clouds. As shown in Figs. 4(b) and 4(c), we can see that there are two regions without moving ships. Therefore, we need to further eliminate obvious false candidates with the following steps:

  • Step 1: For each candidate region [Fig. 5(a)], it is segmented by Otsu segmentation method31 [result shown in Fig. 5(b)]. Then, to eliminate the small holes possibly existed in the candidate region [Fig. 5(b)], the morphology dilation operation is applied to the segmentation result [result shown in Fig. 5(c)], and the corresponding moving ship is shown in Fig. 5(d).

  • Step 2: Ships always have a limited area, length, and width range. According to these constraints, false candidate regions, such as very large or very small islands and clouds, can be eliminated with proper thresholds. Furthermore, ships are commonly long and thin. Therefore, the ratio of the length to the width is larger than a given threshold. According to this condition, obvious false alarms, including islands and clouds with very small ratios, are eliminated.13

  • Step 3: To obtain more background region, the morphology dilation operation is further applied to Fig. 5(c) [result shown in Fig. 5(e)], and the corresponding region is shown in Fig. 5(f), which includes both the background and foreground. Figure 5(g) is background region after subtraction operations.

  • Step 4: Because the intensity of the sea surface in the image is quite different from that of the ship, the surrounding contrast between sea and ship would be helpful to eliminate obvious false candidates. Based on the above characteristic, we regard ships which are not satisfied with the following conditions as obvious false alarms:

    Eq. (9)

    μFG>μFG+BG>μBGσFG>σFG+BG>σBG,σFG>γσBG,
    where μFG and σFG are the mean and standard deviation of the foreground region, μBG and σBG are the mean and standard deviation of the background region, and μFG+BG and σFG+BG are the mean and standard deviation of the foreground and background region, respectively, γ is empirically set to 1.5 to 2.0 in our experiments.

After false alarm elimination, the resulting regions are returned as moving ships.

Fig. 5

Ship detection from candidate ship region based on surrounding contrast. (a) Candidate ship region. (b) Otsu segmentation result. (c) Dilation operation result of (b). (d) Foreground. (e) Dilation operation result of (c). (f) Foreground and background. (g) Background.

JARS_13_2_026511_f005.png

2.4.

Ship Matching between Two Frames

Moving ships are detected after false alarm elimination, respectively, in two frames, as previously discussed. Suppose the moving ships in the registered source frame are expressed as {S1,t,,Sm,t}, and in the target frame are expressed as {S1,t1,,Sn,t1}. The performance of ships matching is determined by Algorithm 1.

Algorithm 1

Ships matching between two frames.

Input:{S1,t,,Sm,t} and {S1,t1,,Sn,t1}.
Td: Threshold of the distance between ships.
Tψ: Threshold of the matching ships.
Output: Matched ship pairs.
fori=1 to m do
forj=1 to n do
Step 1. Compute the centroid distance Dist(t,t1)i,j between the i’th ship Si,t and the j’th ship Sj,t1

Eq. (10)

Dist(t,t1)i,j=(xtixt1j)2+(ytiyt1j)2,
where (xti,yti) and (xt1j,yt1j) are the coordinates of centroids.
Step 2. If Dist(t,t1)i,j<Td
  a. Compute the area ratio AR(i,j) between the i’th ship Si,t and the j’th ship Sj,t1:

Eq. (11)

AR(i,j)=min[area(Si,t),area(Sj,t1)]max[area(Si,t),area(Sj,t1)],
where area() is the area, that is, the pixels number. Area of Sj,t1 and Si,t should be similar, even though the ship has been moved or rotated.
  b. Compute the histograms of Si,t and Sj,t1 with H-bin, pi={pki}k=1,2,,H and qj={qkj}k=1,2,,H.
  c. The Bhattacharya coefficient is

Eq. (12)

ρ(pi,qi)=k=1Hpkiqki.
   Bhattacharya distance32 between two distributions is defined as

Eq. (13)

BD(i,j)=1ρ(pi,qi).
  d. The total metric is finally defined by the two metrics

Eq. (14)

ψ(t,t1)i,j=δ[1BD(i,j)]+(1δ)AR(i,j)
where δ is a weighting coefficient.
  e. if ψ(t,t1)i,j>Tψ, {Si,t,Sj,t1} is one pair of candidate matching ships.
end forj
A high total metric value indicates a good matching with the target ship. If Si,t have several candidate matching ships satisfied the above conditions, we take the ship pair with the highest total metric value as the matched ships.
end fori

2.5.

Ship Tracking

From the above sections, the detected and matched moving ships are obtained from every two frames. Suppose two pairs of matched moving ships S1 and S2 have been obtained based on frame t1 and frame t [Figs. 6(a) and 6(b)], and three pairs of matched moving ships S3, S4, and S5 have been obtained based on frames t and t+1 [Figs. 6(c) and 6(d)]. Then, all the moving ships in frames t1, t, and t+1 are associated using the intermediate frame t, which can realize ship tracking and give the ship trajectory in the satellite video.

Fig. 6

An example of ship association and tracking in three frames. (a) Two ships detected from frame t1 based on frames t1 and t. (b) Two moving ships detected from frame t based on frames t1 and t. (c) Three moving ships detected from frame t based on frames t1 and t. (d) Three moving ships detected from frame t+1 based on frames t1 and t.

JARS_13_2_026511_f006.png

Let Si,t(t1,t) denote the ship-i in frame t which is detected based on frames t1 and t, and Sj,t(t,t+1) denote the ship-j in frame t which is detected based on frames t and t+1. We define the overlap score Ri,j between the two ships as

Eq. (15)

Ri,j=area[Si,t(t1,t)Sj,t(t,t+1)]area[Si,t(t1,t)Sj,t(t,t+1)],Ri,j[0,1],
where Si,t(t1,t)Sj,t(t,t+1) and Si,t(t1,t)Sj,t(t,t+1) are the intersection and the union of two ships, respectively, and area() denotes the area.

If Ri,j is higher than a set threshold TR, Si,t(t1,t) and Sj,t(t,t+1) are regarded as the same ship, through which the ship association and tracking in three frames is realized. As shown in Figs. 6(b) and 6(c), S1 and S3, S2 and S4 are two pairs of associated ships, through which ship tracking can be realized.

The proposed algorithm resolves some problems in multiship tracking, such as the appearance of new moving ships and the disappearance of old moving ships. As shown in Fig. 6, it can be seen that a new moving ship S5 is detected and tracked based on frames t and t+1. Furthermore, the proposed method avoids the registration problem of all frames to the same reference frame for moving ship detection and tracking.

3.

Experiments Results

Considering the lack of benchmarking dataset of satellite video including moving ships, we use the sequences of Geostationary Orbit Space Surveillance System (GO3S) satellite video and our synthetic satellite video to evaluate the performance of the proposed method. All the experiments in the following are conducted on a desktop PC with a 1.40-GHz CPU and 4-GB memory, and our code are written in Microsoft Visual Studio 2013 with C++ and OpenCV library.

3.1.

Data Set Description

The first type of satellite image sequences is cut out from the image sequences of geostationary GO3S satellite and include moving ships from the 1871st frame (see Fig. 7). Video 1 is cut out from the frame 1871 (see Fig. 7) of synthetic GO3S satellite video.33 The cutting frame size is 650×360  pixels containing different sizes of ships, and the original frame rate is 10 frames per second (fps). However, because some ships may move slowly, if the video has a high frame rate, the change in the differential image is not obvious. Therefore, we reduce the frame rate to 5 fps to make the differential image change obvious, and the used video consists of 19 frames.

Fig. 7

The frame 1871 in the GO3S satellite video, and the area of the red rectangle is image frame of Video 1.

JARS_13_2_026511_f007.png

In Video 2, due to the lack of benchmarking datasets of satellite video including moving ships, we further evaluate our proposed algorithm with our synthetic satellite video including large and small ships. The used video consists of 19 frames, and the frame size is 1024×768  pixels containing eight different sizes of ships (Fig. 9).

3.2.

Parameter Selection

In this section, several parameters used in our method during detection and tracking are presented. First, to improve the processing speed, in motion compensation, we decompose the two each frame into five layers of pyramid, and each layer of pyramid image only needs three iterations to get better results. Second, in DMSM detection, we set pyramid scales n=3, and the Gaussian kernel size of filter, g, 15×15  pixels. Third, in ship region detection, the coefficient, λ, is set to 1.0 for the two image sequences. The structuring element in all the morphology dilation operation is 3×3. Following, for ship matching, we set the three parameters during the ship matching process, i.e., Td=70, threshold of the distance between ships, Tψ=0.7, threshold of the matching ships, and δ=0.5, the weighting coefficient. Finally, in ship associate and tracking, we set TR to 70%. We employ Recall and Precision to evaluate the performance, which are defined as

Eq. (16)

Recall=TPTP+FN,

Eq. (17)

Precision=TPTP+FP,
where TP (true positive) is a moving ship detected and tracked that turned out to be an actually moving ship, FN (false negative) is an actually moving ship but not detected and tracked, and FP (false positive) is a target detected and tracked that is not a moving ship.

In addition, we apply the widely used spatial overlap to measure whether a bounding box is true positive or false positive, and the threshold of the spatial overlap is set to 0.6. The spatial overlap is calculated by the following equation:34

Eq. (18)

A(GTik,DTjk)=area(GTikDTjk)area(GTikDTjk),
where area(GTikSTjk) is denoted as the intersection of the ground truth bounding box GTi and the detected bounding box DTj in frame k, and area(GTikDTjk) is denoted as the union of GTik and DTjk.

3.3.

Effectiveness of Our Method

Example 1: In Video 1 [Figs. 8(a)8(e)], the ships are so small that they only take up dozens of pixels. Furthermore, the ships have similar shape and intensity values, and even are covered with thin clouds. All these factors introduce more difficulties for our detection and tracking task. Figure 8(a) shows the results of the proposed method for frame 2. We can observe that there are five moving ships, including a smaller ship or yacht (ship 5) that was leaving the coast and covered by thin clouds. However, this ship has been tracked failure in frame 3, frame 4 [see Fig. 8(b)], and frame 5, because the contrast of this ship has the similar contrast of thin cloud. Due to the ship 5 with very fast velocity, the ship wave is misjudged as the ship [see Fig. 8(d)]. As shown in Fig. 8(e), a yacht (the ship-6) is released from ship 4, fortunately, our algorithm can detect and track the new-emerging ship (ship 6).

Fig. 8

Results of ship detection and tracking in the GO3S satellite video. (a)–(e) Frames 2, 6, 10, 14, 18, respectively. (f) The trajectory of moving ships.

JARS_13_2_026511_f008.png

The estimated trajectory of the ships movement based on the proposed ship tracking algorithm is shown in Fig. 8(f). It can be seen that ship 1 to ship 6 have been tracked very well, except three false alarms are present in ship 5 and one false alarm in ship 6. The recall and precision are listed in Table 1 which are calculated with the total 19 frames.

Table 1

Overall performance of the proposed algorithm for Video 1.

No. of shipRecallPrecision
11.01.0
21.01.0
31.01.0
41.01.0
50.790.83
60.81.0

Example 2: Video 2 is covered by a thin cloud, especially in the right side of the image. These frames are shown in Figs. 9(a)9(e) including eight moving ships. In frame 14 [Fig. 9(d)], one smaller ship (ship 8) has been tracked failure which is disappeared in the thin cloud, because the contrast of this ship has the similar contrast of the thin cloud. In frame- 19 [Fig. 9(e)], the wave of ship 1 is mistakenly detected as a moving ship (enclosed with a very small rectangle).

Fig. 9

Results of ship detection and tracking in the synthetic satellite video. (a)–(e) Frames 2, 6, 10, 14, 19, respectively. (f) The trajectory of moving ships.

JARS_13_2_026511_f009.png

The estimated trajectory of the ships movement based on the proposed method is shown in Fig. 9(f). It can be seen that ship 1 to ship 8 have been tracked very well, except ship 8 is not tracked successfully in two frames, and there is a false alarm near ship 1 in frame 19. The recall and precision are listed in Table 2 which are calculated with the total 19 frames.

Table 2

Overall performance of the proposed algorithm for Video 2.

No. of shipRecallPrecision
11.00.94
21.01.0
31.01.0
41.01.0
51.01.0
61.01.0
71.01.0
80.891.0

3.4.

Comparison with Other Methods

3.4.1.

Saliency map

In this section, we conducted several experiments to compare our DMSM method with several saliency map methods (SR,30 Itti,27 GBVS,35 and Signature algorithm36). Two examples in Figs. 10 and 11 validate that the saliency map obtained by our approach performs better than those obtained by other methods. As shown from the difference image [Figs. 10(c) and 11(c)], due to the ship displacement between two frames is small, and there are holes inside motion ships. In Figs. 10(d) and 11(d), the SR model cannot detect the moving ships without holes clearly. In Figs. 10(e)10(f) and 11(e)11(f), the Itti model and GBVS model cannot detect all the moving ships. In Figs. 10(g) and 11(g), the signature algorithm can highlight the moving ships; however, the true ship regions are enlarged too much, which causes ships close enough to mix up easily. Contrarily, our DMSM highlights the regions with moving ships, and these regions are not enlarged too much, as in Figs. 10(h) and 11(h).

Fig. 10

The saliency map of differential image in Video 1, (AVI, 113 KB) [URL: https://doi.org/10.1117/1.JRS.13.026511.1]. (a) and (b) The compensated frame and the current frame. (c) Differential image of (a) and (b). (d) Saliency map for SR. (e) Saliency map for Itti. (f) Saliency map for GBVS. (g) Saliency map for signature algorithm. (h) DMSM.

JARS_13_2_026511_f010.png

Fig. 11

The saliency map of difference image in Video 2, (AVI, 1,953 KB) [URL: https://doi.org/10.1117/1.JRS.13.026511.2]. (a) and (b) The compensated frame and the current frame. (c) Different image of (a) and (b). (d) Saliency map of SR. (e) Saliency map for Itti. (f) Saliency map for GBVS. (g) Saliency map for signature algorithm. (h) DMSM.

JARS_13_2_026511_f011.png

3.4.2.

Ship detection

To test the effectiveness of the proposed method, we compare it with the recent method of R2CNN_head10 and several background subtraction methods37 using BGSLibrary: available at GitHub repository: https://github.com/andrewssobral/bgslibrary. Here, for R2CNN_head method, we use the pretraining model “ResNet_v1_101.ckpt,” available at GitHub repository: https://github.com/yangxue0827/R2CNN_HEAD_FPN_Tensorflow, to initialize our network. For our dataset with different size of ships, we further train the model with a total of 10k iterations. The results for Video 1 and Video 2 are shown in Tables 3 and 4, respectively. As can be seen, the precision of the proposed method can reach 96% and 98.6%, and the recall of the proposed method can reach 93% and 99.3% for Video 1 and Video 2 respectively, which indicates that the proposed method is of high accuracy. The precision of R2CNN_head method can reach 97.6% and 100%, while the recall of this method is 75% and 64.5% for Video 1 and Video 2 with small ships, respectively.

Table 3

Comparison results of ship detection for Video 1.

MethodTP+FNTP+FPTPRecall (%)Precision (%)
Our method10096939396
R2CNN_head1010086848497.6
AdaptiveBackgroundLearning100155919158.7
FuzzyChoquetIntegral38100150747449.3
DPWrenGABS39100147919161.9
MixtureOfGaussianV1BGS4010039353589.74
MultiLayerBGS41100108949487.03
PixelBasedAdaptiveSegmenter4210032282887.5

Table 4

Comparison results of ship detection for Video 2.

MethodTP+FNTP+FPTPRecall (%)Precision (%)
Our method15215315199.398.6
R2CNN_head15211311374.3100
AdaptiveBackgroundLearning15220215098.6874.25
FuzzyChoquetIntegral1521459965.1368.27
DPWrenGABS15221814998.0268.3
MixtureOfGaussianV1BGS152533523.0266.03
MultiLayerBGS15215614494.7392.3
PixelBasedAdaptiveSegmenter1521097448.767.9

3.4.3.

Ship tracking

We compared our method with classical Lucas–Kanade tracker,43 the other several tracking methods in VIVID Tracking Evaluation Testbed V3.044 (basic mean shift,34 histogram shift, variance ratio,45 and peak difference feature shift).

For Lucas–Kanade tracker,24 corners were first detected in one frame and tracked in the other frame. For the other tracking methods in VIVID Tracking Evaluation Testbed V3.0, each moving ship was manually detected in the first frame, and tracked in the other frames using these tracking methods, and we compute Recall and Precision of this ship.

Figure 12 shows the tracking results of the six ships in Video 1. As can be observed, our method has high recall and precision at ship tracking than other five methods. Moreover, it has strong robustness to ships of different sizes and can provide a feasible way for ship detection and tracking in satellite video. As Fig. 12 shows, our proposed method performs the best among the tested methods even for ship 5 and ship 6 with very little size. Figure 13 shows the tracking results of eight ships in Video 2. Our method has high recall and precision at ship tracking than other methods. As shown in Fig. 13, our proposed method performs the best among the tested methods for ships of different sizes, which can provide a feasible way for ship detection and tracking.

Fig. 12

Recall and precision of six ships with different methods in Video 1. (a) Recall. (b) Precision.

JARS_13_2_026511_f012.png

Fig. 13

Recall and precision of eight ships with different methods in Video 2. (a) Recall. (b) Precision.

JARS_13_2_026511_f013.png

As shown in Figs. 12 and 13, the Lucas–Kanade method fails to track small ships, such as ship 5 and ship 6 in Video 1 and ship 7 and ship 8 in Video 2. The mean shift method gets the poor tracking results for ships of different sizes. The histogram shift method and the peak difference method are good for tracking large ships, but it is very poor for tracking small ships. However, the variance ratio method is tracking instability whether tracking large ships or small ships.

4.

Conclusion

This paper provides a method of ship detection and tracking from satellite video. In the ship detection stage, the motion compensation between two adjacent frames is required to make the background stable. After that, the foreground can be extracted from the background based on DMSM of differential images. Then, a moving ship is detected based on the analysis of the surrounding contrast and ship characteristics. In ship tracking stage, the moving ships are matched based on the combination of centroid distance, area ratio, and histogram distance of ships between every two frames. Finally, the ship tracking is realized based on a ship association scheme. Our method has been tested using a set of satellite videos with different size of ships. The ships have been successfully detected and tracked, and the performance is analyzed by the calculation of recall and precision.

Acknowledgments

The authors would like to thank the editors and anonymous reviews for their valuable comments and helpful suggestions, which greatly improved the paper’s quality. This work was supported by the National Natural Science Foundation of China under Grant Nos. 61773383 and 61702520.

References

1. 

S. P. Zhang, Z. H. Qi and D. L. Zhang, “Ship tracking using background subtraction and inter-frame correlation,” in Int. Cong. Image Sig. Process., 1 –4 (2009). https://doi.org/10.1109/CISP.2009.5302115 Google Scholar

2. 

S. Fefilatyev, D. Goldgof and C. Lembke, “Tracking ships from fast moving camera through image registration,” in Proc. IEEE Conf. Pattern Recognit., 3500 –3503 (2010). https://doi.org/10.1109/ICPR.2010.854 Google Scholar

3. 

J. W. Wu et al., “Ship target detection and tracking in cluttered infrared imagery,” Opt. Eng., 50 (5), 057207 (2011). https://doi.org/10.1117/1.3578402 OPEGAR 0091-3286 Google Scholar

4. 

S. X. Qi et al., “Low-resolution ship detection from high-altitude aerial images,” Proc. SPIE, 10608 1060805 (2018). https://doi.org/10.1117/12.2282780 PSISDG 0277-786X Google Scholar

5. 

W. Liu et al., “Inshore ship detection with high-resolution SAR data using salience map and kernel density,” Proc. SPIE, 10033 100333V (2016). https://doi.org/10.1117/12.2245325 PSISDG 0277-786X Google Scholar

6. 

X. F. Wei, X. Q. Wang and J. S. Chong, “Local region power spectrum-based unfocused ship detection method in synthetic aperture radar images,” J. Appl. Remote Sens., 12 (1), 016026 (2018). https://doi.org/10.1117/1.JRS.12.016026 Google Scholar

7. 

Q. P. Wang et al., “Inshore ship detection using high-resolution synthetic aperture radar images based on maximally stable extremal region,” J. Appl. Remote Sens., 9 (1), 095094 (2015). https://doi.org/10.1117/1.JRS.9.095094 Google Scholar

8. 

S. Tian, C. Wang and H. Zhang, “Ship detection method for single-polarization synthetic aperture radar imagery based on target enhancement and nonparametric clutter estimation,” J. Appl. Remote Sens., 9 (1), 096073 (2015). https://doi.org/10.1117/1.JRS.9.096073 Google Scholar

9. 

F. Yang, Q. Xu and B. Li, “Ship detection from optical satellite images based on saliency segmentation and structure-LBP feature,” IEEE Geosci. Remote Sens. Lett., 14 (5), 602 –606 (2017). https://doi.org/10.1109/LGRS.2017.2664118 Google Scholar

10. 

X. Yang et al., “Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network,” IEEE Access., 6 50839 –50849 (2018). https://doi.org/10.1109/ACCESS.2018.2869884 Google Scholar

11. 

G. Yang et al., “Ship detection from optical satellite images based on sea surface analysis,” IEEE Trans. Geosci. Remote Sens. Lett., 11 (3), 641 –645 (2014). https://doi.org/10.1109/LGRS.2013.2273552 Google Scholar

12. 

C. H. Deng et al., “Ship detection from optical satellite image using optical flow and saliency,” Proc. SPIE, 8921 89210F (2013). https://doi.org/10.1117/12.2031115 PSISDG 0277-786X Google Scholar

13. 

Y. Yao et al., “Ship detection in optical remote sensing images based on deep convolutional neural networks,” J. Appl. Remote Sens., 11 (4), 042611 (2017). https://doi.org/10.1117/1.JRS.11.042611 Google Scholar

14. 

J. Tang et al., “Compressed-domain ship detection on spaceborne optical image using deep neural network and extreme learning machine,” IEEE Trans. Geosci. Remote Sens., 53 (3), 1174 –1185 (2015). https://doi.org/10.1109/TGRS.2014.2335751 Google Scholar

15. 

Z. Shi et al., “Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature,” IEEE Trans. Geosci. Remote Sens., 52 (8), 4511 –4523 (2014). https://doi.org/10.1109/TGRS.2013.2282355 Google Scholar

16. 

Z. Zou and Z. Shi, “Ship detection in spaceborne optical image with SVD networks,” IEEE Trans. Geosci. Remote Sens., 54 (10), 5832 –5845 (2016). https://doi.org/10.1109/TGRS.2016.2572736 Google Scholar

17. 

N. Proia and V. Pagé, “Characterization of a Bayesian ship detection method in optical satellite images,” IEEE Geosci. Remote Sens. Lett., 7 (2), 226 –230 (2010). https://doi.org/10.1109/LGRS.2009.2031826 Google Scholar

18. 

G. Kopsiaftis and K. Karantzalos, “Vehicle detection and traffic density monitoring from very high resolution satellite video data,” in IEEE Int. Geosci. and Remote Sens. Symp. (IGARSS), 1881 –1884 (2015). https://doi.org/10.1109/IGARSS.2015.7326160 Google Scholar

19. 

T. Yang et al., “Small moving vehicle detection in a satellite video of an urban area,” Sensors, 16 (9), 1528 (2016). https://doi.org/10.3390/s16091528 SNSRES 0746-9462 Google Scholar

20. 

S. Larsen, H. Koren and R. Solberg, “Traffic monitoring using very high resolution satellite imagery,” Photogramm. Eng. Remote Sens., 75 (7), 859 –869 (2009). https://doi.org/10.14358/PERS.75.7.859 Google Scholar

21. 

N. S. Rao et al., “Estimation of ship velocities from MODIS and OCM,” IEEE Trans. Geosci. Remote Sens. Lett., 2 (4), 437 –439 (2005). https://doi.org/10.1109/LGRS.2005.853572 Google Scholar

22. 

L. B. Yao et al., “A novel ship-tracking method for GF-4 satellite sequential images,” Sensors, 18 2007 (2018). https://doi.org/10.3390/s18072007 SNSRES 0746-9462 Google Scholar

23. 

Z. X. Zhang et al., “Application potential of GF-4 images for dynamic ship monitoring,” IEEE Trans. Geosci. Remote Sens. Lett., 14 (6), 911 –915 (2017). https://doi.org/10.1109/LGRS.2017.2687700 Google Scholar

24. 

J. Shi and C. Tomasi, “Good features to track,” in IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR), 593 –600 (1994). https://doi.org/10.1109/CVPR.1994.323794 Google Scholar

25. 

S. Periaswamy and H. Farid, “Elastic registration in the presence of intensity variations,” IEEE Trans. Med. Imaging, 22 (7), 865 –874 (2003). https://doi.org/10.1109/TMI.2003.815069 ITMID4 0278-0062 Google Scholar

26. 

R. Jain, “Dynamic scene analysis using pixel-based processes,” IEEE Comput., 14 (8), 12 –18 (1981). https://doi.org/10.1109/C-M.1981.220557 Google Scholar

27. 

L. Itti, C. Koch and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., 20 (11), 1254 –1259 (1988). https://doi.org/10.1109/34.730558 ITPIDJ 0162-8828 Google Scholar

28. 

Z. Liang, L. Yu and S. Yi, “Inshore ship detection via saliency and context information in high-resolution SAR images,” IEEE Trans. Geosci. Remote Sens. Lett., 13 (12), 1870 –1874 (2016). https://doi.org/10.1109/LGRS.2016.2616187 Google Scholar

29. 

S. Wang et al., “New hierarchical saliency filtering for fast ship detection in high-resolution SAR images,” IEEE Trans. Geosci. Remote Sens., 55 (1), 351 –362 (2017). https://doi.org/10.1109/TGRS.2016.2606481 IGRSD2 0196-2892 Google Scholar

30. 

X. D. Hou and L. Q. Zhang, “Saliency detection: a spectral residual approach,” in IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR), 1 –8 (2007). https://doi.org/10.1109/CVPR.2007.383267 Google Scholar

31. 

N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst. Man Cybern., 9 (1), 62 –66 (1979). https://doi.org/10.1109/TSMC.1979.4310076 Google Scholar

32. 

D. Comaniciu, V. Ramesh and P. Meer, “Kernel-based object tracking,” IEEE Trans. Pattern Anal. Mach. Intell., 25 (5), 564 –577 (2003). https://doi.org/10.1109/TPAMI.2003.1195991 ITPIDJ 0162-8828 Google Scholar

34. 

F. Yin, D. Markris and S. Velastin, “Performance evaluation of object tracking algorithms,” in Proc. IEEE Int. Workshop Perform. Eval. Tracking Surveill., 1 –8 (2007). Google Scholar

35. 

J. Harel, C. Koch and P. Perona, “Graph-based visual saliency,” in Proc. Adv. Neural Inf. Process. Syst., 681 –688 (2007). Google Scholar

36. 

X. Hou, J. Harel and C. Koch, “Image signature: highlighting sparse salient regions,” IEEE Trans. Pattern Anal. Mach. Intell., 34 (1), 194 –201 (2012). https://doi.org/10.1109/TPAMI.2011.146 ITPIDJ 0162-8828 Google Scholar

37. 

A. Sobral and A. Vacavant, “A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos,” Comput. Vision Image Understanding, 122 4 –21 (2014). https://doi.org/10.1016/j.cviu.2013.12.005 Google Scholar

38. 

F. El Baf, T. Bouwmans and B. Vachon, “Fuzzy integral for moving object detection,” in IEEE Int. Conf. Fuzzy Syst. (FUZZ-IEEE), 1729 –1736 (2008). https://doi.org/10.1109/FUZZY.2008.4630604 Google Scholar

39. 

C. Wren et al., “Pfinder: real-time tracking of the human body,” IEEE Trans. Pattern Anal. Mach. Intell., 19 (7), 780 –785 (1997). https://doi.org/10.1109/34.598236 ITPIDJ 0162-8828 Google Scholar

40. 

P. Kaewtrakulpong and R. Bowden, “An improved adaptive background mixture model for realtime tracking with shadow detection,” in Eur. Workshop Adv. Video Based Surveill. Syst. (AVSS), (2001). Google Scholar

41. 

J. Yao and J.-M. Odobez, “Multi-layer background subtraction based on color and texture,” in IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR), (2007). https://doi.org/10.1109/CVPR.2007.383497 Google Scholar

42. 

M. Hofmann, P. Tiefenbacher and G. Rigoll, “Background segmentation with feedback: the pixel-based adaptive segmenter,” in IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recognit. Workshops (CVPRW), 38 –43 (2012). https://doi.org/10.1109/CVPRW.2012.6238925 Google Scholar

43. 

J. Y. Bouguet, “Pyramidal implementation of the Lucas Kanade feature tracker,” (2000). Google Scholar

44. 

R. T. Collins, X. H. Zhou and S. K. Teh, “An open source tracking testbed and evaluation web site,” in IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance (PETS) , (2005). Google Scholar

45. 

R. Collins, Y. Liu and M. Leordeanu, “On-line selection of discriminative tracking features,” IEEE Trans. Pattern Anal. Mach. Intell., 27 (10), 1631 –1643 (2005). https://doi.org/10.1109/TPAMI.2005.205 ITPIDJ 0162-8828 Google Scholar

Biography

Haichao Li received his BS and MS degrees in mechanical and electrical engineering from Beijing University of Chemical Technology, Beijing, China, in 2001 and 2004, respectively. And he received his PhD in the School of Instrument Science and Opto-Electronics Engineering, Beijing University of Aeronautics and Astronautics, Beijing, China, in 2008. He is currently an associate professor at Qian Xuesen Laboratory of Space Technology, China Aerospace Science and Technology Corporation. His research interests include motion detection and tracking, deep learning, and stereo vision.

Liang Chen received his BS and MS degrees in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in 2003 and 2006, respectively. And he received his PhD in the State Key Laboratory of Remote Sensing Science, Chinese Academy of Science, Beijing, China, in 2009. He is currently a senior engineer at Qian Xuesen Laboratory of Space Technology, China Aero-space Science and Technology Corporation. His research interests include remote sensing target tracking, data mining, and parameter retrieval.

Feng Li (M’07–SM’10) received his BSEE degree from the Lanzhou Railway University in 1999, his MEng degree from China Academy of Space Technology in 2002, and his PhD in electrical engineering from The University of New South Wales in 2009. Following several years working on astronomical image processing in CSIRO, Australia, and on remote sensing image processing in Chinese Academy of Science, respectively, he is currently a PI in the Qian Xuesen Laboratory of Space Technology. His research interests include image registration, super resolution and compressive sensing, moving detection.

Meiyu Huang received her BS degree in computer science and technology from Huazhong University of Science and Technology, Wuhan, China, in 2010, and her PhD degree in computer application technology from the University of Chinese Academy of Sciences, Beijing, China, in 2016. She is currently an assistant researcher in the Qian Xuesen Laboratory of Space Technology, China Academy of Space Technology, Beijing, China. Her research interests include machine learning, ubiquitous computing, human-computer interaction, computer vision and image processing.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Haichao Li, Liang Chen, Feng Li, and Meiyu Huang "Ship detection and tracking method for satellite video based on multiscale saliency and surrounding contrast analysis," Journal of Applied Remote Sensing 13(2), 026511 (6 June 2019). https://doi.org/10.1117/1.JRS.13.026511
Received: 11 January 2019; Accepted: 10 May 2019; Published: 6 June 2019
Lens.org Logo
CITATIONS
Cited by 32 scholarly publications.
Advertisement
Advertisement
KEYWORDS
Video

Satellites

Video surveillance

Clouds

Detection and tracking algorithms

Image segmentation

Target detection

Back to Top