SPIE publishes accepted journal articles as soon as they are approved for publication Journal issues are considered In Progress until all articles for an issue have been published. Articles published ahead of the completed issue are fully citable.
Chinese calligraphy is often used to perform the beauty of characters in Chinese culture and is quite suitable in the study of shape representation. The skeleton of a digital line pattern can be treated as the shape descriptor. However, the skeleton-biased and reconstruction-incomplete phenomena often exist in a skeletonization method, which results in the difficulty of using the skeleton to perform the beauty of Chinese calligraphy characters. To overcome this difficulty, skeletal line information derived from the skeletal points and indexed boundary points is defined, and its transformation is implemented by a procedure of two-phase skeletal line placement (SLP). Based on the SLP, an effective algorithm including the SLP-stroke for strokes, SLP-fork for forks, and SLP-end for the end parts of strokes is developed for constructing the skeletal line-based shape descriptor. Four indices of measurement of skeleton deviation, number of distorted forks, number of spurious strokes, and measurement of reconstructability are used to evaluate the performance of the proposed approach. Experimental results show that the skeleton-biased phenomenon can be greatly reduced and the pattern reconstructability close to 100% is achieved, thus confirming that the proposed skeletonization approach is suitable for the Chinese calligraphy character representation and reconstruction.
Vehicle color recognition is easily affected by subtle environmental changes. The existing recognition methods cannot achieve an accurate result. A high-accuracy vehicle color recognition method using a hierarchical fine-tuning strategy for urban surveillance videos is proposed. Different from the conventional convolutional neural networks-based methods, which usually obtain a single classification model, the proposed method combines pretraining and hierarchical fine-tunings to obtain different classification models that can adapt to the change of illumination conditions. First, the GoogLeNet is pretrained using the ILSVRC-2012 dataset to obtain the initial weight parameters of the network. During the first stage of fine-tuning, the whole vehicle color dataset is used to fine-tune the pretrained results to get the initial classification model. Then, an image quality assessment method is proposed to evaluate the illumination conditions of the image. The whole vehicle color dataset is divided into some subdatasets according to the evaluation results. The second stage of fine-tuning is performed on the initial classification model using each subdataset. Thus, the final classification models for the subdatasets are obtained. The experimental results on different databases demonstrate that the recognition accuracy of the proposed method can achieve superior performance over the state-of-the-art methods.
Recent studies in saliency detection have exploited contrast value as a main feature and background prior as a secondary feature. To apply the background prior, most approaches are based on soft- or hard-segmentation mechanisms, and a significant improvement is seen. However, because of contrast feature usage, the soft-segmentation (SS)-wise models have many technical challenges when a high interobject dissimilarity exists. Although hard-segmentation-wise saliency models intuitively use the background prior without usage of the contrast feature, this model suffers from local noises due to undesirable discontinuous artifacts. By analyzing the drawbacks of the existing models, a combination saliency model, reflecting both soft- and hard-segmentation techniques is shown. The proposed model consists of the following three phases: SS-wise saliency, hard-segmentation-wise saliency, and a final saliency combination. In particular, we proposed an iterative reweighting processing for which an influence of outlier segmentation maps is decreased to improve the hard-segmentation-wise saliency. As shown in the experimental results, the proposed model outperforms the state-of-the-art models on various benchmark datasets, which consist of single, multiple, and complex object images.
Eye and mouth state analysis is an important step in fatigue detection. An algorithm that analyzes the state of the eye and mouth by extracting contour features is proposed. First, the face area is detected in the acquired image database. Then, the eyes are located by an EyeMap algorithm through a clustering method to extract the sclera-fitting eye contour and calculate the contour aspect ratio. In addition, an effective algorithm is proposed to solve the problem of contour fitting when the human eye is affected by strabismus. Meanwhile, the value of chromatism s is defined in the RGB space, and the mouth is accurately located through lip segmentation. Based on the color difference of the lip, skin, and internal mouth, the internal mouth contour can be fitted to analyze the opening state of mouth; at the same time, another unique and effective yawning judgment mechanism is considered to determine whether the driver is tired. This paper is based on the three different databases to evaluate the performance of the proposed algorithm, and it does not need training with high calculation efficiency.
One of the most challenging tasks in computer vision is to emulate human cognitive ability to extract the salient object in a scene. We tackle the task of unsupervised salient video object segmentation using boundary connectedness and space-time salient regions. First, boundary prior measure is used to separate salient regions detected in both space and time. Then, background-foreground regions connectedness is computed and combined with appearance model via an iterative energy minimization framework to segment the salient moving object. For temporal consistency, the segmentation result of the current frame is used in addition to the optical flow and the boundary prior to segmenting the next frame. The experiments show a good performance of our algorithm for salient video object segmentation on benchmark datasets even in the presence of different challenges.
Characteristics of an image, such as smoothness, edge, and texture, can be better preserved using the nonlocal differential operator in image processing. We establish an L1-based nonlocal total variational (NLTVL1) model based on Retinex theory that can be solved by a fast computational algorithm via the alternating direction method of multipliers. Experiential results demonstrate that our NLTVL1 method has a good performance on enhancing contrast, eliminating the influence of nonuniform illumination, and suppressing noise. Furthermore, compared with previous works, including traditional Retinex methods and variational Retinex methods, our proposed approach achieves superior performance on edge and texture preservation and needs fewer iterations on recovering the reflectance image, which is illustrated by examples and statistics.
An effective method to develop anatomically real numerical breast phantoms for T1-weighted MR images of different tissue densities is presented. The dielectric properties for breast tissues are calculated and analyzed using different dispersion models (i.e., one- and two-pole cole–cole and Debye models). The method presented in this paper propounds significant improvements in comparison with existing MRI-based numerical phantoms in terms of denoising of images, tissue segmentation, nonlinear mapping of dielectric properties with realistic shapes using all the dispersion models and densitywise classification of phantoms. This method is a multistep approach in which each MRI voxel is mapped with the appropriate dielectric properties according to different dispersion models. The MRI data was collected and interpolated according to the size of the uniform grid for finite difference time domain computations followed by the preprocessing of MR images to enhance them. Thereafter, the voxel intensities were segregated into two groups as adipose and fibro glandular tissues. These tissue intensities were assigned the corresponding dielectric properties. Three-dimensional (3-D) numerical phantoms were created according to all the dispersion models. After the comparison among the models, it has been found that, along with frequency, the dielectric properties vary according to the variation of the dispersion model parameters. It was also observed that the dielectric properties calculated from one-pole cole–cole and two-pole Debye models are more close to the real properties of breast tissues than other models. A generalized method has been defined for developing the 3-D phantoms for all classes of breast according to the inhomogeneity of fibroglandular tissues using the dispersion models. The frequency-dependent and dispersion model parameters-dependent dielectric properties have been assigned to the phantoms. These real-like phantoms after 3-D printing would help researchers working in the field of breast cancer detection studies.
A practical two-level robust hand detection system is developed using the proposed sparse texture features and color–texture features. Traditional dense features such as a histogram of oriented gradients, Gabor, segmentation-based fractal texture analysis, and histogram features are dense in general, with high time-complexity, limiting their use in practical hand detection systems. However, if only the prominent edge or texture parameters of an image are processed, the time complexity of the system can be significantly reduced, along with an increase in performance. Performance of any practical system is most affected by the presence of spurious objects in the background. Proposed approaches use four efficient filtering techniques to extract salient regions of an image retaining significant object-related information, followed by extraction of texture features, as mentioned above. Therefore, in the first stage, 10 sparse variants of these existing texture features are extracted and assessed using five classification models, namely Naïve Bayes, Real AdaBoost, Gentle AdaBoost, Modest AdaBoost, and SVM. In the second phase of the system, a two-level hand detection model is developed using the proposed texture and color–texture features using a sliding window-based SVM classification model. Experimental analysis shows that proposed sparse-based features are not only time efficient but also shows an improvement of 23.4% in the practical two-level detection system over the existing motion and skin filtering-based hand detection system.
Many approaches for background subtraction and people detection have been developed so far. However, the best state-of-the-art methods do not yet give satisfactory results in real transportation environments. Indeed, these latter configurations imply several difficulties such as fast brightness changes, noise, shadows, scrolling background, etc., and a single approach cannot deal with all these. We propose an approach for people segmentation and tracking in videos that is suited for real-world conditions. Our strategy combines several state-of-the-art methods for people detection, silhouette appearance modeling and tracking. Each process also uses its own frame preprocessing pipeline. The optimal combination of the people classifiers used, as well as the optimal parameters of each of the combined methods, being too difficult to be determined altogether, a genetic algorithm is used to determine the optimal classifier parameters and their combination weights. The output of the latter is used as an initialization for a multiframe graph-cut operating on superpixel graphs. Our proposed approach is evaluated on the BOSS European project database that was acquired in moving trains and contains typical scientific locks encountered in real transportation systems.
We present texture-based active contours method for two-phase image segmentation in a statistical framework. The proposed method first combines color, texture, and saliency weight to form an augmented image and introduces the joint distribution of these features into the image likelihood term in the energy function. Second, we use the local probability distribution to obtain a smooth label that can reduce the fragmentation in the initialization and evolution of segmentation contours. Finally, we propose a simple and efficient geometric prior based directly on the level sets and introduce the related spatial constraints into the Bayes inference to estimate the smooth probabilistic label. Therefore, the image is represented by high-dimensional features but segmented in low-dimensional space. Furthermore, evolving of the level-set function and updating of the smooth probabilistic label are run alternately in a fast manner. We experimentally compare our texture-based method with others on complicated natural images and demonstrate its good performance in practice.
Both color and depth information may be deployed to seek by content through RGB-D imagery. Previous works dealing with global descriptors for RGB-D images advocate a decision level merger in which color and depth representations, independently computed, are juxtaposed to pursue a search for similarities. Differently, we propose a “learning-to-rank” paradigm aimed at weighting the two information channels according to the specific traits of the task and data at hand, thereby effortlessly addressing the potential diversity across applications. In particular, we propose a method, referred to as “kNN-rank,” which can learn the regularities among the outputs yielded by similarity-based queries. Another contribution concerns the “HyperRGBD” framework, a set of tools conceived to enable seamless aggregation of existing RGB-D datasets to obtain data featuring desired peculiarities and cardinality.
Histopathologic images are time consuming for both specialist and machine learning methods with their complex structure and huge dimensions. In these cases, delays in the diagnosis of disease occur, as well as the treatment of fewer patients. When the histopathological images are examined at low resolution for shortening the examination time, it is almost impossible to identify the cancerous regions. When examining high-resolution images, it takes a long time to inspect because the image is divided into patches. Despite the fact that fairly fast machine learning methods are offered for the shortening of the analysis period, the number of patches to be examined has a negative effect on the decision time. For this reason, the area under examination needs to be reduced. For this, first of all, the destruction of cell-free areas and then the destruction of areas containing noncancerous cells must be eliminated. An effective and fast method of area reduction is presented for faster analysis and real-time use of histopathological images by machine learning algorithms. A two-step approach is used in the proposed method. In the first step, 3 × 3 texture properties of images are obtained and discrete wavelet transform is applied. Then, the image is cleaned with simple morphological processes. In the second step, the cleaned image is subjected to a discrete wavelet transform. Thus, the changes in cell-containing regions are captured, and regions that may be dangerous are identified. The proposed method reduced the areas to be examined by 98.5% to 99.5% with 95.33% accuracy.
Unlike the conventional feedforward neural network, an emergent learning technique—which we call extreme learning machine (ELM)—provides a generalized performance of neural network with less user intervention and comparatively faster training. We study ELM with five different activation functions, sigmoidal, sine, hard limiter, triangular basis, and radial basis, for handwritten Indic script identification in multiscript documents. To describe scripts, both script dependent and independent features are computed. For validation, a dataset of 3300 handwritten line-level document images (300 samples per script) of 11 official Indic scripts is used. In our study, we observe that the sigmoidal activation function performs the best regardless of the number of scripts used, i.e., script identification cases: biscript, triscript, and multiscript.
Person reidentification is the process of matching individuals from images taken of them at different times and often with different cameras. To perform matching, most methods extract features from the entire image; however, this gives no consideration to the spatial context of the information present in the image. We propose using a convolutional neural network approach based on ResNet-50 to predict the foreground of an image: the parts with the head, torso, and limbs of a person. With this information, we use the LOMO and salient color name feature descriptors to extract features primarily from the foreground areas. In addition, we use a distance metric learning technique (XQDA), to calculate optimally weighted distances between the relevant features. We evaluate on the VIPeR, QMUL GRID, and CUHK03 data sets and compare our results against a linear foreground estimation method, and show competitive or better overall matching performance.
Character recognition is a well-known area because of its wide applications, including text-searchable documents, digital storage and support, and manual work automation. Recognizing characters is challenging, and, on top of that, writer’s writing style habits make handwritten character recognition even more challenging. This article addresses a set of features and classifiers that are less complex to implement and give significant recognition rates for Hindi language characters. The proposed work is based on two shape descriptors, such as (1) histogram of oriented gradients and (2) geometric moments. These descriptors, when used as features, reflect properties of characters that minimize intraclass variations and maximize interclass variations. The generated feature set is tested using two supervised classifiers, namely support vector machine (SVM) and multilayer perceptron (MLP). A thorough investigation into various evaluation parameters has been presented. Experimental results show that high recognition results are obtained when both shape descriptors are used in combination with an SVM classifier. This technique is evaluated on four publicly available databases, and it achieves a recognition rate of 96.8% on one of the datasets. A comparative analysis of the proposed method with relevant and recent works of this field proves its superiority. This work introduces a much less complex and promising approach toward isolated handwritten character recognition.