Image classification is an area where deep learning and especially stacked Auto-encoders have really proven their strength. The contributions of this paper lie in the creation of a new classifier to remedy some classification problems. This new method of classification presents a combination of the most used techniques in Deep Learning (DL) and Sparse Coding (SC) in the field of classification. Proposed deep neural networks consist of three stacked Auto-encoders and a Softmax used as an outer layer for classification. The first Auto-encoder is created from a sparse representation of all images of the dataset. The sparse representation of all images represents the decoder part of the first Auto-encoder. Then the transpose of the matrix is applied to get the encoder part. Experiments performed on standard datasets such as ImageNet and the Coil-100 reveal the efficacy of this approach.
The intraspecific nest parasitism is a phenomenon that attracts the attention of biologists. There are bird species like the Slender-billed which contains at most 3 eggs, but their nests can contain four or five eggs. In fact, a genetic study made on a set of nests has shown that one or two of the eggs belong to a second female named by biologists “a parasitic egg”. As the Gull Mockers are protected by the Law, researchers found it difficult to identify parasite eggs without genetic test. Many studies have been done in order to identify the parasitic egg, based on the morphological parameters and the characteristic of the egg’s shell, but these studies haven’t led to good results. Recent Advances in Artificial Intelligence (AI) and particularly Deep Learning (DL) techniques has increased motivation to use this method to quantify parasitic eggs. In this work, we present a new method to quantify a parasitic egg from a dataset of egg’s image. One of the most used techniques is Convolutional Neural Network (CNN). The technique is a supervised learning method used to classify images. We used this technique to extract features from image to characterize any egg. To evaluate our approach, we use 31 lays of eggs form the 92 eggs dataset to test the performance of our proposed method.
Classical signal representation techniques generally use a description of the components on a basis on which the representation of the signal is unique such as wavelets network. Conversely, sparse representations consist in the decomposition of the signal on a dictionary comprising a number of elements much larger than the dimension of the signal. This technique can be widely used for representation, compression, denoising and separation of all types of signals. Consequently, some researches have confirmed that the use of a predefined dictionary is less efficient than a dictionary from training data. So, the idea of this paper is to propose a new technique for the creation of a dictionary using the wavelet decomposition to enhance the sparse representation of images. This technique is based on the combination of sparse coding and the fast wavelet transform algorithms for image representation. Our results obtained using different universal image databases showed greater performances in the representation of images when compared to some methods from the state of the art.
Proc. SPIE. 11041, Eleventh International Conference on Machine Vision (ICMV 2018)
KEYWORDS: Signal to noise ratio, Image encryption, Image processing, Digital filtering, Digital watermarking, Discrete wavelet transforms, Feature extraction, Image quality, Digital imaging, Data communications
Many techniques are used to solve problems of security in the Internet such as cryptography or watermarking. In this context watermarking is a way for protecting copyright and proving authenticity of a digital data. In this paper, a non blind digital watermark scheme is proposed. It is based on Discrete Cosine transformation (DCT), singular Values Decomposition (SVD) and Beta Chaotic Map (BCM). The experimental results show that this scheme is robust against several attacks compared to other algorithms.
To survive the competition, companies always think about having the best employees. The selection is depended on the answers to the questions of the interviewer and the behavior of the candidate during the interview session. The study of this behavior is always based on a psychological analysis of the movements accompanying the answers and discussions. Few techniques are proposed until today to analyze automatically candidate’s non verbal behavior. This paper is a part of a work psychology recognition system; it concentrates in spontaneous hand gesture which is very significant in interviews according to psychologists. We propose motion history representation of hand based on an hybrid approach that merges optical flow and history motion images. The optical flow technique is used firstly to detect hand motions in each frame of a video sequence. Secondly, we use the history motion images (HMI) to accumulate the output of the optical flow in order to have finally a good representation of the hand‘s local movement in a global temporal template.
The wireless sensor networks (WSN) consist of a set of sensors that are more and more used in surveillance applications on a large scale in different areas: military, Environment, Health ... etc. Despite the minimization and the reduction of the manufacturing costs of the sensors, they can operate in places difficult to access without the possibility of reloading of battery, they generally have limited resources in terms of power of emission, of processing capacity, data storage and energy. These sensors can be used in a hostile environment, such as, for example, on a field of battle, in the presence of fires, floods, earthquakes. In these environments the sensors can fail, even in a normal operation. It is therefore necessary to develop algorithms tolerant and detection of defects of the nodes for the network of sensor without wires, therefore, the faults of the sensor can reduce the quality of the surveillance if they are not detected. The values that are measured by the sensors are used to estimate the state of the monitored area. We used the Non-linear Auto- Regressive with eXogeneous (NARX), the recursive architecture of the neural network, to predict the state of a node of a sensor from the previous values described by the functions of time series. The experimental results have verified that the prediction of the State is enhanced by our proposed model.
The video surveillance is one of the key areas in computer vision researches. The scientific challenge in this field involves the implementation of automatic systems to obtain detailed information about individuals and groups behaviors. In particular, the detection of abnormal movements of groups or individuals requires a fine analysis of frames in the video stream. In this article, we propose a new method to detect anomalies in crowded scenes. We try to categorize the video in a supervised mode accompanied by unsupervised learning using the principle of the autoencoder. In order to construct an informative concept for the recognition of these behaviors, we use a technique of representation based on the superposition of human silhouettes. The evaluation of the UMN dataset demonstrates the effectiveness of the proposed approach.
Camera pose estimation remains a challenging task for augmented reality (AR) applications. Simultaneous localization and mapping (SLAM)-based methods are able to estimate the six degrees of freedom camera motion while constructing a map of an unknown environment. However, these methods do not provide any reference for where to insert virtual objects since they do not have any information about scene structure and may fail in cases of occlusion of three-dimensional (3-D) map points or dynamic objects. This paper presents a real-time monocular piece wise planar SLAM method using the planar scene assumption. Using planar structures in the mapping process allows rendering virtual objects in a meaningful way on the one hand and improving the precision of the camera pose and the quality of 3-D reconstruction of the environment by adding constraints on 3-D points and poses in the optimization process on the other hand. We proposed to benefit from the 3-D planes rigidity motion in the tracking process to enhance the system robustness in the case of dynamic scenes. Experimental results show that using a constrained planar scene improves our system accuracy and robustness compared with the classical SLAM systems.
Currently, there are several fall detection systems based on video analysis. However, these systems have not yet reached the desired level of appropriateness and robustness. To reduce the risk of falling in insecure environments, a new method is developed in this paper to detect and predict human fall detection. We adopt, in this approach, a Block Matching motion estimation algorithm based on acceleration and changes of the human body silhouette area, which are obtained from a single surveillance camera. It presents an algorithm to accelerate the fall detection system on based on a local adjustment of the velocity field.
Automatic identification of television programs in the TV stream is an important task for operating archives. This article proposes a new spatio-temporal approach to identify the programs in TV stream into two main steps: First, a reference catalogue for video features visual jingles built. We operate the features that characterize the instances of the same program type to identify the different types of programs in the flow of television. The role of video features is to represent the visual invariants for each visual jingle using appropriate automatic descriptors for each television program. On the other hand, programs in television streams are identified by examining the similarity of the video signal for visual grammars in the catalogue. The main idea of the identification process is to compare the visual similarity of the video signal features in the flow of television to the catalogue. After presenting the proposed approach, the paper overviews encouraging experimental results on several streams extracted from different channels and compounds of several programs.
In this paper, we proposed a deep self-organizing map model (Deep-SOMs) for automated features extracting and learning from big data streaming which we benefit from the framework Spark for real time streams and highly parallel data processing. The SOMs deep architecture is based on the notion of abstraction (patterns automatically extract from the raw data, from the less to more abstract). The proposed model consists of three hidden self-organizing layers, an input and an output layer. Each layer is made up of a multitude of SOMs, each map only focusing at local headmistress sub-region from the input image. Then, each layer trains the local information to generate more overall information in the higher layer. The proposed Deep-SOMs model is unique in terms of the layers architecture, the SOMs sampling method and learning. During the learning stage we use a set of unsupervised SOMs for feature extraction. We validate the effectiveness of our approach on large data sets such as Leukemia dataset and SRBCT. Results of comparison have shown that the Deep-SOMs model performs better than many existing algorithms for images classification.
This paper presents a new technique for detection of dental caries that is a bacterial disease that destroys the tooth structure. In our approach, we have achieved a new segmentation method that combines the advantages of fuzzy C mean algorithm and level set method. The results obtained by the FCM algorithm will be used by Level sets algorithm to reduce the influence of the noise effect on the working of each of these algorithms, to facilitate level sets manipulation and to lead to more robust segmentation. The sensitivity and specificity confirm the effectiveness of proposed method for caries detection.
Face detection has been one of the most studied topics in the computer vision literature due to its relevant role in applications such as video surveillance, human computer interface and face image database management. Here, we will present a face detection approach which contains two steps. The first step is training phase based on Adaboost algorithm. The second step is the detection phase. The proposed approach presents an enhancement of Viola and Jones’ algorithm by replacing Haar descriptors with Beta wavelet. The obtained results have proved an excellent performance of detection not only when a face is in front of the camera but also when it is oriented towards the right or the left. Moreover, thanks to the start period needed for the detection, our approach can be applied during a real time experience.
This work fits into the context of the interpretation of automatic gestures based on computer vision. The aim of our work is to transform a conventional screen in a surface that allows the user to use his hands as pointing devices. These can be summarized in three main steps. Hand detection in a video, monitoring detected hands and conversion paths made by the hands to computer commands. To realize this application, it is necessary to detect the hand to follow. A classification phase is essential, at the control part. For this reason, we resorted to the use of a neuro-fuzzy classifier for classification and a pattern matching method for detection.
This work is in the field of human-computer communication, namely in the field of gestural communication. The objective was to develop a system for gesture recognition. This system will be used to control a computer without a keyboard. The idea consists in using a visual panel printed on an ordinary paper to communicate with a computer.
Proc. SPIE. 9445, Seventh International Conference on Machine Vision (ICMV 2014)
KEYWORDS: Detection and tracking algorithms, Data modeling, Databases, Wavelets, Fast wavelet transforms, Speech recognition, Network architectures, Decision support systems, Fuzzy logic, Classification systems
This paper aims at developing a novel approach for speech recognition based on wavelet network learnt by fast wavelet transform (FWN) including a fuzzy decision support system (FDSS). Our contributions reside in, first, proposing a novel learning algorithm for speech recognition based on the fast wavelet transform (FWT) which has many advantages compared to other algorithms and in which major problems of the previous works to compute connection weights were solved. They were determined by a direct solution which requires computing matrix inversion, which may be intensive. However, the new algorithm was realized by the iterative application of FWT to compute connection weights. Second, proposing a new classification way for this speech recognition system. It operated a human reasoning mode employing a FDSS to compute similarity degrees between test and training signals. Extensive empirical experiments were conducted to compare the proposed approach with other approaches. Obtained results show that the new speech recognition system has a better performance than previously established ones.