KEYWORDS: Image segmentation, Skin, Education and training, Transformers, Deep learning, Data modeling, Performance modeling, Medical imaging, Skin cancer, RGB color model
Accurately segmenting skin lesions from dermoscopy images is crucial for improving the quantitative analysis of skin cancer. However, segmenting melanoma automatically is difficult due to the significant variation in melanoma and unclear boundaries of the lesion areas. While Convolutional Neural Networks (CNNs) have made impressive progress in this area, most existing solutions need help to effectively capture global dependencies resulting from limited receptive fields. Recently, transformers have emerged as a promising tool for modeling global context by using a powerful global and local attention mechanism. In this paper, we investigated the effectiveness of various deep learning, including CNN-based and transform-based approaches, for the segmentation of skin lesions on dermoscopy images. We also studied and compared the performance of transfer learning algorithms developed based on well-established encoders such as Swin Transformer, Mix-Transformer, Vision Transformer, ResNet, VGG-16, and DenseNet. Our proposed approach involves training a neural network on polar transformations of the original dataset, with the polar origin set to the object’s center point. This simplifies the segmentation and localization tasks and reduces dimensionality, making it easier for the network to converge. The ISIC 2018 datasets containing 2,594 dermoscopy images with their ground truth segmentation masks was used in the evaluation of our approach for skin lesion segmentation tasks. This dataset was randomly split into 70%, 10%, and 20% groups for training, validation, and testing purposes. The experimental results showed that when we used polar transformations as a pre-processing step, the CNN-based and transform-based approaches generally improved the models efficiency across dataset.
KEYWORDS: Steganography, Image quality, Visualization, Steganalysis, Image retrieval, Image processing, Signal to noise ratio, Data hiding, RGB color model
Steganography has a large range of potential applications, particularly with the advent of the internet and intellectual property concerns. A particular technique, Least Significant Bit (LSB) Steganography, is commonly used for image-in-image steganography. However, LSB steganography is a bit weak against steganalysis attacks that aim to detect the presence of embedded data. This weakness has been improved upon in Least Significant Bit Matching (LSBM) Steganography, which attempts to preserve the underlying structure of the image by maintaining a similar number of 1s and 0s as in the original image. However, standard LSBM steganography is only able to encode 1 bit of information per channel per pixel, which limits the information that could be embedded. To this end, a novel multibit approach to LSBM steganography is proposed named Multibit Least Significant Bit Matching (MLSBM) Steganography. The MLSBM approach preserves the underlying structure of the image, while allowing multiple bits to be encoded of each pixel in each channel. In addition, when the proposed MLSBM technique is used to embed high number of bits, it significantly reduces the visual perceptibility of the embedding when it is compared with the other embedding techniques for the same number of bits.
Measuring the heart rate is one critical piece of information that a health professional uses to diagnose the health state of an individual. Electrocardiogram (ECG/EKG) is essentially responsible for patient monitoring and diagnosis. The extracted feature from the ECG signal plays a vital role in diagnosis of cardiac disease. Therefore, this paper presents how to design, build, and test a cost-effective prototyping tool for ECG feature extraction and recognition. When testing a real ECG from a human subject, the developed tool can preserve useful ECG information while removing unwanted noise and interference components by adaptively determining the filtering values that directly translate to a real time analog circuit for rapid prototyping. Then a decisionmaking model which is based on the peak detection strategy is applied for automated heart rate state recognition in real-time.
How to describe an image accurately with the most useful information is the key issue of any face recognition task. Therefore, finding efficient and discriminative facial information that should be stable under different conditions of the image acquisition process is a huge challenge. Most existing approaches use only one type of features. In this paper, we argue that a robust face recognition technique requires several different kinds of information to be taken into account, suggesting the incorporation of several feature sets into a single fused one. Therefore, a new technique that combines the facial shape with the local structure and texture of the face image is proposed, namely multi-feature fusion (MFF). It is based on local boosted features (LBF) and Gabor wavelets techniques. Given an input image, the LBF histogram and Gabor features histogram are built separately. Then a final MFF feature descriptor is formed by concatenating these three histograms, which feeds to the support vector machine (SVM) classifier to recognize the face image. The proposed MFF approach is evaluated on three different face datasets and provided promising results.
Pipeline right-of-way (ROW) monitoring and safety pre-warning is an important way to guarantee a safe operation of oil/gas transportation. Any construction equipment or heavy vehicle intrusion is a potential safety hazard to the pipeline infrastructure. Therefore, we propose a novel technique that can detect and classify an intrusion on oil/gas pipeline ROW. The detection part has been done based on our previous work, where we built a robust feature set using a pyramid histogram of oriented gradients in the Fourier domain with corresponding weights. Then a support vector machine (SVM) with radial basis kernel is used to distinguish threat objects from background. For the classification part, the object can be represented by an integrated color, shape and texture (ICST) feature set, which is a combination of three different feature extraction techniques viz. the color histogram of HSV (hue, saturation, value), histogram of oriented gradient (HOG), and local binary pattern (LBP). Then two decision making models based on K-nearest neighbor (KNN) and SVM classifier are utilized for automatic object identification. Using real-world dataset, it is observed that the proposed method provides promising results in identifying the objects that are present on the oil/gas pipeline ROW.
Extreme learning machine (ELM), as a single hidden layer feedforward neural network, has shown very effective performance in pattern analysis and machine intelligence; however, there are some limitations that constrain the performance of ELM, such as data multicollinearity issues. The generalization capability of ELM could be significantly deteriorated when multicollinearity is present in the hidden layer output matrix which causes the matrix to become singular or ill-conditioning. To overcome such a problem, ridge regression can be utilized. The conventional way to avoid multicollinearity in ELM is achieved by precisely adjusting the ridge constant, which may not be a sophisticate solution to obtain the optimal value. In this paper, we present a solution for finding a satisfactory ridge constant by incorporating variance inflation factors (VIF) during calculating output weights in ELM, we termed this technique as ELM-VIF. Experimental results on handwritten digit recognition show that the proposed ELM-VIF, compared with the original ELM, has better stability and generalization performance.
A strategy for detecting changes in known building regions in multitemporal visible and near-infrared imagery based on a linear combination of independent features is presented. Features identified for building and background detection include vegetation, texture, shadow intensity, and distance from known road areas. The resulting building candidates are classified by shape using a unique difference of Gaussian technique. Building regions reported in the reference dataset that indicate the initial observation time are revisited to check for changes in building candidates not identified in the feature fusion strategy. The performance of the proposed technique is tested on real-world aerial imagery and is evaluated visually and quantitatively. Compared with the gradient and normalized difference vegetation index-based building detection methods, the proposed fusion methodology yields better results. For building detection, it provided a completeness result of an average 82.08% and building change detection completeness result of an average 85.67% in our evaluations with five sample images, which included rural, suburban, and urban areas.
In imagery and pattern analysis domain a variety of descriptors have been proposed and employed for different computer vision applications like face detection and recognition. Many of them are affected under different conditions during the image acquisition process such as variations in illumination and presence of noise, because they totally rely on the image intensity values to encode the image information. To overcome these problems, a novel technique named Multi-Texture Local Ternary Pattern (MTLTP) is proposed in this paper. MTLTP combines the edges and corners based on the local ternary pattern strategy to extract the local texture features of the input image. Then returns a spatial histogram feature vector which is the descriptor for each image that we use to recognize a human being. Experimental results using a k-nearest neighbors classifier (k-NN) on two publicly available datasets justify our algorithm for efficient face recognition in the presence of extreme variations of illumination/lighting environments and slight variation of pose conditions.
This paper presents a simple but effective algorithm for scene sketch generation from input images. The proposed algorithm combines the edge magnitudes of directional Prewitt differential gradient kernels with Kirsch kernels at each pixel position, and then encodes them into an eight bit binary code which encompasses local edge and texture information. In this binary encoding step, relative variance is employed to determine the object shape in each local region. Using relative variance enables object sketch extraction totally adaptive to any shape structure. On the other hand, the proposed technique does not require any parameter to adjust output and it is robust to edge density and noise. Two standard databases are used to show the effectiveness of the proposed framework.
Challenges in object tracking such as object deformation, occlusion, and background variations require a robust tracker to ensure accurate object location estimation. To address these issues, we present a Pyramidal Rotation Invariant Features (PRIF) that integrates Gaussian Ringlet Intensity Distribution (GRID) and Fourier Magnitude of Histogram of Oriented Gradients (FMHOG) methods for tracking objects from videos in challenging environments. In this model, we initially partition a reference object region into increasingly fine rectangular grid regions to construct a pyramid. Histograms of local features are then extracted for each level of pyramid. This allows the appearance of a local patch to be captured at multiple levels of detail to make the algorithm insensitive to partial occlusion. Then GRID and magnitude of discrete Fourier transform of the oriented gradient are utilized to achieve a robust rotation invariant feature. The GRID feature creates a weighting scheme to emphasize the object center. In the tracking stage, a Kalman filter is employed to estimate the center of the object search regions in successive frames. Within the search regions, we use a sliding window technique to extract the PRIF of candidate objects, and then Earth Mover’s Distance (EMD) is used to classify the best matched candidate features with respect to the reference. Our PRIF object tracking algorithm is tested on two challenging Wide Area Motion Imagery (WAMI) datasets, namely Columbus Large Image Format (CLIF) and Large Area Image Recorder (LAIR), to evaluate its robustness. Experimental results show that the proposed PRIF approach yields superior results compared to state-of-the-art feature based object trackers.
An illumination-robust face recognition system using Local Directional Pattern (LDP) descriptors in Phase Congruency (PC) space is proposed in this paper. The proposed Directional Pattern of Phase Congruency (DPPC) is an oriented and multi-scale local descriptor that is able to encode various patterns of face images under different lighting conditions. It is constructed by applying LDP on the oriented PC images. A LDP feature is obtained by computing the edge response values in eight directions at each pixel position and encoding them into an eight bit binary code using the relative strength magnitude of these edge responses. Phase congruency and local directional pattern have been independently used in the field of face and facial expression recognition, since they are robust to illumination changes. When the PC extracts the discontinuities in the image such as edges and corners, the LDP computes the edge response values in different directions and uses these to encode the image texture. The local directional pattern descriptor on the phase congruency image is subjected to principal component analysis (PCA) for dimensionality reduction for fast and effective face recognition application. The performance evaluation of the proposed DPPC algorithm is conducted on several publicly available databases and observed promising recognition rates. Better classification accuracy shows the superiority of the LDP descriptor against other appearance-based feature descriptors such as Local Binary Pattern (LBP). In other words, our result shows that by using the LDP descriptor the Euclidean distance between reference image and testing images in the same class is much less than that between reference image and testing images from the other classes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.