With the popularity of commercial unmanned aerial vehicles (UAVs), people have easy access to UAV. However, people’s privacy and safety can be threatened if UAV flies at airports, private yards, etc. It is important to be able to detect the illegal UAV accurately and promptly on these vulnerable sites. However, motion blur, occlusion and truncation occur frequently due to fast movement of UAV. It is hard to make correct predictions because of the small size of UAV in images. In this paper, we propose an anchor-free one-stage method for UAV detection. The method eliminates the anchor boxes that are used in most existing detectors, which makes our method simpler and more efficient. We improve the detection accuracy in the following two ways. First, a new multi-scale feature fusion method is proposed to enhance the semantic information exchange between different scales. Second, a reasonable loss function is adopted to increase the proportion of small UAV’s loss. Experimental results validate the effectiveness of our improvements and our proposed detector achieve a superior performance.
KEYWORDS: Visualization, RGB color model, Visual process modeling, Data modeling, Digital signal processing, Classification systems, Video, Performance modeling, Iterated function systems, Network architectures
Zero-shot learning (ZSL) has recently attracted increasing attention in visual tasks like action recognition. We propose a spatiotemporal visual-semantic embedding network (STVSEM) for zero-shot action recognition. First, given the fact that two-stream architecture based action recognition algorithms have achieved excellent results in recent years, the module is assembled to our designed network by simultaneously using the spatial features (e.g., RGB appearance) and optical flow in time domain as visual features to significantly improve the visual expression capability. Then, in order to slightly alleviate the problem of semantic loss that typically occurs in the case of using embedding-based ZSL methods, an autoencoder is introduced to get a better semantic representation and complement semantic relationship information for unseen classes by seen classes. Last but not least, a joint embedding mechanism that explores and exploits the relationships of the visual data and semantic information in an intermediate space is employed to ameliorate the gap between vision and semantics. The experimental results on Charades and UCF101 datasets indicate that the proposed method outperforms the state-of-the-art methods in accuracy, which further demonstrates the effectiveness of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.