Proc. SPIE. 9871, Sensing and Analysis Technologies for Biomedical and Cognitive Applications 2016
KEYWORDS: Visual process modeling, Data modeling, Visualization, Video, Video, Feature extraction, Video surveillance, Video surveillance, Video processing, Video processing, Multimedia, Multimedia, Statistical modeling, RGB color model
Rapid event detection faces an emergent need to process large videos collections; whether surveillance videos or unconstrained web videos, the ability to automatically recognize high-level, complex events is a challenging task. Motivated by pre-existing methods being complex, computationally demanding, and often non-replicable, we designed a simple system that is quick, effective and carries minimal overhead in terms of memory and storage. Our system is clearly described, modular in nature, replicable on any Desktop, and demonstrated with extensive experiments, backed by insightful analysis on different Convolutional Neural Networks (CNNs), as stand-alone and fused with others. With a large corpus of unconstrained, real-world video data, we examine the usefulness of different CNN models as features extractors for modeling high-level events, <i>i.e.</i>, pre-trained CNNs that differ in architectures, training data, and number of outputs. For each CNN, we use 1-fps from all training exemplar to train <i>one-vs-rest</i> SVMs for each event. To represent videos, frame-level features were fused using a variety of techniques. The best being to max-pool between predetermined shot boundaries, then average-pool to form the final video-level descriptor. Through extensive analysis, several insights were found on using pre-trained CNNs as <i>off-the-shelf</i> feature extractors for the task of event detection. Fusing SVMs of different CNNs revealed some interesting facts, finding some combinations to be complimentary. It was concluded that no single CNN works best for all events, as some events are more object-driven while others are more scene-based. Our top performance resulted from learning event-dependent weights for different CNNs.
<strong>Publisher’s Note</strong>: This paper, originally published on 5/12/2016, was replaced with a corrected/revised version on 5/18/2016. If you downloaded the original PDF but are unable to access the revision, please contact SPIE Digital Library Customer Service for assistance. <p> </p>When dealing with sparse or no labeled data in the target domain, transfer learning shows its appealing performance by borrowing the supervised knowledge from external domains. Recently deep structure learning has been exploited in transfer learning due to its attractive power in extracting effective knowledge through multi-layer strategy, so that deep transfer learning is promising to address the cross-domain mismatch. In general, cross-domain disparity can be resulted from the difference between source and target distributions or different modalities, e.g., Midwave IR (MWIR) and Longwave IR (LWIR). In this paper, we propose a Weighted Deep Transfer Learning framework for automatic target classification through a task-driven fashion. Specifically, deep features and classifier parameters are obtained simultaneously for optimal classification performance. In this way, the proposed deep structures can extract more effective features with the guidance of the classifier performance; on the other hand, the classifier performance is further improved since it is optimized on more discriminative features. Furthermore, we build a weighted scheme to couple source and target output by assigning pseudo labels to target data, therefore we can transfer knowledge from source (i.e., MWIR) to target (i.e., LWIR). Experimental results on real databases demonstrate the superiority of the proposed algorithm by comparing with others.
Discovering the variations in human torso shape plays a key role in many design-oriented applications, such as suit designing. With recent advances in 3D surface imaging technologies, people can obtain 3D human torso data that provide more information than traditional measurements. However, how to find different human shapes from 3D torso data is still an open problem. In this paper, we propose to use spectral clustering approach on torso manifold to address this problem. We first represent high-dimensional torso data in a low-dimensional space using manifold learning algorithm. Then the spectral clustering method is performed to get several disjoint clusters. Experimental results show that the clusters discovered by our approach can describe the discrepancies in both genders and human shapes, and our approach achieves better performance than the compared clustering method.