Video-based person reidentification is a challenging and important task in surveillance-based applications. Toward this, several shallow and deep networks have been proposed. However, the performance of existing shallow networks does not generalize well on large datasets. To improve the generalization ability, we propose a shallow end-to-end network which incorporates two stream convolutional neural networks, discriminative visual attention and recurrent neural network with triplet and softmax loss to learn the spatiotemporal fusion features. To effectively use both spatial and temporal information, we apply spatial, temporal, and spatiotemporal pooling. In addition, we contribute a large dataset of airborne videos for person reidentification, named DJI01. It includes various challenging conditions, such as occlusion, illuminationchanges, people with similar clothes, and the same people on different days. We perform elaborate qualitative and quantitative analyses to demonstrate the robust performance of the proposed model.
Visual cryptography is a powerful technique that combines the notions of perfect ciphers and secret sharing in cryptography with that of raster graphics. A binary image can be divided into shares that can be stacked together to approximately recover the original image. Unfortunately, it has not been used much primarily because the decryption process entails a severe degradation in image quality in terms of loss of resolution and contrast. Its usage is also hampered by the lack of proper techniques for handling gray-scale and color images. We develop a novel technique that enables visual cryptography of color as well as gray-scale images. With the use of halftoning and a novel microblock encoding scheme, the technique has a unique flexibility that enables a single encryption of a color image but enables three types of decryptions on the same ciphertext. The three different types of decryptions enable the recovery of the image of varying qualities. The physical transparency stacking type of decryption enables the recovery of the traditional visual cryptography quality image. An enhanced stacking technique enables the decryption into a halftone quality image. Finally, a computation-based decryption scheme makes the perfect recovery of the original image possible. Based on this basic scheme, we establish a progressive mechanism to share color images at multiple resolutions. We extract shares from each resolution layer to construct a hierarchical structure; the images of different resolutions can then be restored by stacking the different shared images together. Thus, our technique enables flexible decryption. We implement our technique and present results.
During data hiding, distortions are introduced in an original image because of quantization errors, bit replacement, or truncation at the gray-scale limit. These distortions are irreversible and visible, which is unacceptable in some applications such as medical imaging. However, the reversible watermarking technique overcomes this problem by retrieving the original image from the watermarked image. We present a novel reversible watermarking algorithm with a high embedding capacity considering the human visual system (HVS). We use the arithmetic coding technique to compress a part of the original image and store the compressed data together with necessary authentication information as the payload. The payload is then embedded within the original image with consideration of the HVS. Due to this, the watermarked image contains no perceptible artifacts. During the extraction phase, we extract the payload, restore the exact copy of the original image, and verify the authenticity. Experimental results show that our method provides a higher embedding capacity compared to the other algorithms proposed in the literature.
Music query-by-humming has attracted much research interest recently. It is a challenging problem since the hummed query inevitably contains much variation and inaccuracy. Furthermore, the similarity computation between the query tune and the reference melody is not easy due to the difficulty in ensuring proper alignment. This is because the query tune can be rendered at an unknown speed and it is usually an arbitrary subsequence of the target reference melody. Many of the previous methods, which adopt note segmentation and string matching, suffer drastically from the errors in the note segmentation, which affects retrieval accuracy and efficiency. Some methods solve the alignment issue by controlling the speed of the articulation of queries, which is inconvenient because it forces users to hum along a metronome. Some other techniques introduce arbitrary rescaling in time but this is computationally very inefficient. In this paper, we introduce a melody alignment technique, which addresses the robustness and efficiency issues. We also present a new melody similarity metric, which is performed directly on melody contours of the query data. This approach cleanly separates the alignment and similarity measurement in the search process. We show how to robustly and efficiently align the query melody with the reference melodies and how to measure the similarity subsequently. We have carried out extensive experiments. Our melody alignment method can reduce the matching candidate to 1.7% with 95% correct alignment rate. The overall retrieval system achieved 80% recall in the top 10 rank list. The results demonstrate the robustness and effectiveness the proposed methods.
Digital video broadcasting is increasingly being adopted all over the world. The video broadcasters would require that the viewable contents of the pay channels be protected from unauthorized copying and distribution by subscribers, which is copyright protection. The subscribers would require that they be not wrongfully implicated by the broadcasters and thus ensure customer's rights protection. In this paper we present an integrated solution to address the copyright protection and customers rights protection for a video broadcasting environment. The copyright protection is addressed using a mask based watermarking technique and customer's rights protection is obtained through the use of an interactive watermarking protocol.
Video segmentation is an important step in many of the video applications. We observe that the video shot boundary is a multi-resolution edge phenomenon in the feature space. Based on this observation, we have developed a novel temporal multi-resolution analysis (TMRA) based algorithm using Canny wavelets to perform temporal video segmentation. Information across multiple resolution is used to help detect as well as locate abrupt and gradual transitions. We present the theoretical basis of the algorithm followed by the implementation as well as the result. In this paper the TMRA technique has been implemented using color histogram in the raw domain and DCT coefficients in the compressed video streams as the feature space. Experimental result shows that this method can detect as well as characterize both the abrupt and gradual shot boundaries. The technique also shows good noise tolerance characteristics.
Interactive needs of medical visualization require fast processing of huge amounts of data. There is a need for compact storage and efficient handling of the voxel input from CT and MRI machines. The linear octree data structure is an efficient representation technique which leads to less storage and is amenable to different kinds of geometric operations. This data structure is particularly useful in visualizing thresholded images which are binary images. There are several algorithms to generate a linear octree from binary voxel data with time complexity O(n3) for an input of size n3 voxels. We present an algorithm which first extracts the surface of the object. Based on this surface data, the object is partitioned into a set of parallelepipeds where each parallelepiped is a contiguous run of voxels along one axis. Starting from the lowest level of the octree, the algorithm proceeds iteratively to the highest level, computing maximal overlaps of the parallelepipeds at each level. For any level, the voxels which are not in the overlap are octree nodes and are output at that level. The maximal overlapped parallelepipeds form the input to the next higher level in the algorithm. For a connected object having n3 voxels, the algorithm has a time complexity of O(S) where S is the size of the surface of the object. The algorithm has been implemented and tested for a variety of medical data. We also show how this algorithm can be parallelized.