Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.
Local feature detection is a fundamental module in several mobile vision applications such as mobile object recognition and mobile visual search. The effectiveness and the efficiency of a local feature detector decide to what extent it is suitable for a mobile application. Over the past decades, several local feature detectors have been developed. In this paper, we are interested in FAST (Features from Accelerated Segment Test) local feature detector for its efficiency. However, FAST detector shows poor robustness against both scale and rotation changes. Therefore, we aim at enhancing FAST robustness against both scale and rotation changes while maintaining good efficiency. To this end, we propose a Scalable and Oriented FAST-based local Feature detector (SOFF). A comprehensive comparison against FAST detector and its variants is performed on benchmark datasets. Experimental results demonstrate that SOFF detector outperforms other FAST-based detectors in many cases. Furthermore, it is efficient to compute, thereby suitable for mobile vision applications.
Extreme learning Machine is a well known learning algorithm in the field of machine learning. It's about a feed forward neural network with a single-hidden layer. It is an extremely fast learning algorithm with good generalization performance. In this paper, we aim to compare the Extreme learning Machine with wavelet neural networks, which is a very used algorithm. We have used six benchmark data sets to evaluate each technique. These datasets Including Wisconsin Breast Cancer, Glass Identification, Ionosphere, Pima Indians Diabetes, Wine Recognition and Iris Plant. Experimental results have shown that both extreme learning machine and wavelet neural networks have reached good results.
Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.
3D multiresolution mesh compression systems are still widely addressed in many domains. These systems are more and more requiring volumetric data to be processed in real-time. Therefore, the performance is becoming constrained by material resources usage and an overall reduction in the computational time. In this paper, our contribution entirely lies on computing, in real-time, triangles neighborhood of 3D progressive meshes for a robust compression algorithm based on the scan-based wavelet transform(WT) technique. The originality of this latter algorithm is to compute the WT with minimum memory usage by processing data as they are acquired. However, with large data, this technique is considered poor in term of computational complexity. For that, this work exploits the GPU to accelerate the computation using OpenCL as a heterogeneous programming language. Experiments demonstrate that, aside from the portability across various platforms and the flexibility guaranteed by the OpenCL-based implementation, this method can improve performance gain in speedup factor of 5 compared to the sequential CPU implementation.
With the great popularity of the photo sharing site Flickr, the research community is involved to produce innovative applications in order to enhance different Flickr services. In this paper, we present a new process for diverse visual suggestions generation on Flickr. We unify the social aspect of Flickr and the richness of Wikipedia to produce an important number of meanings illustrated by the diverse visual suggestions which can integrate the diversity aspect into the Flickr search. We conduct an experimental study to illustrate the effect of the fusion of the Wikipedia and Flickr knowledge on the diversity rate among the Flickr search and reveal the evolution of the diversity aspect through the returned images among the different results of search engines.
Triangular surface are now widely used for modeling three-dimensional object, since these models are very high resolution and the geometry of the mesh is often very dense, it is then necessary to remesh this object to reduce their complexity, the mesh quality (connectivity regularity) must be ameliorated. In this paper, we review the main methods of semi-regular remeshing of the state of the art, given the semi-regular remeshing is mainly relevant for wavelet-based compression, then we present our method for re-meshing based trust region spherical geometry image to have good scheme of 3d mesh compression used to deform 3D meh based on Multi library Wavelet Neural Network structure (MLWNN). Experimental results show that the progressive re-meshing algorithm capable of obtaining more compact representations and semi-regular objects and yield an efficient compression capabilities with minimal set of features used to have good 3D deformation scheme.
In recent years, Computer vision has become a very active field. This field includes methods for processing, analyzing, and understanding images. The most challenging problems in computer vision are image classification and object recognition. This paper presents a new approach for object recognition task. This approach exploits the success of the Very Deep Convolutional Neural Network for object recognition. In fact, it improves the convolutional layers by adding recurrent connections. This proposed approach was evaluated on two object recognition benchmarks: Pascal VOC 2007 and CIFAR-10. The experimental results prove the efficiency of our method in comparison with the state of the art methods.
In last years, the emergence of 3D shape in face recognition is due to its robustness to pose and illumination changes. These attractive benefits are not all the challenges to achieve satisfactory recognition rate. Other challenges such as facial expressions and computing time of matching algorithms remain to be explored. In this context, we propose our 3D face recognition approach using 3D wavelet networks. Our approach contains two stages: learning stage and recognition stage. For the training we propose a novel algorithm based on 3D fast wavelet transform. From 3D coordinates of the face (x,y,z), we proceed to voxelization to get a 3D volume which will be decomposed by 3D fast wavelet transform and modeled after that with a wavelet network, then their associated weights are considered as vector features to represent each training face . For the recognition stage, an unknown identity face is projected on all the training WN to obtain a new vector features after every projection. A similarity score is computed between the old and the obtained vector features. To show the efficiency of our approach, experimental results were performed on all the FRGC v.2 benchmark.
Driving security is an important task for human society. The major challenge in the field of accident avoidance systems is the driver vigilance monitoring. The lack of vigilance can be noticed by various ways, such as, fatigue, drowsiness and distraction. Hence, the need of a reliable driver’s vigilance decrease detection system which can alert drivers before a mishap happens. In this paper, we present a novel approach for vigilance estimation based on multilevel system by combining head movement analysis and eyes blinking. We have used Viola and Jones algorithm to analyse head movement and a classification system using wavelet networks for eyelid closure measuring. The contribution of our application is classifiying the vigilance state at multi level. This is different from the binary-class (awakening or hypovigilant state) existing in most popular systems.
This work is in the field of human-computer communication, namely in the field of gestural communication. The objective was to develop a system for gesture recognition. This system will be used to control a computer without a keyboard. The idea consists in using a visual panel printed on an ordinary paper to communicate with a computer.
Proc. SPIE. 9445, Seventh International Conference on Machine Vision (ICMV 2014)
KEYWORDS: Detection and tracking algorithms, Data modeling, Databases, Wavelets, Fast wavelet transforms, Speech recognition, Network architectures, Decision support systems, Fuzzy logic, Classification systems
This paper aims at developing a novel approach for speech recognition based on wavelet network learnt by fast wavelet transform (FWN) including a fuzzy decision support system (FDSS). Our contributions reside in, first, proposing a novel learning algorithm for speech recognition based on the fast wavelet transform (FWT) which has many advantages compared to other algorithms and in which major problems of the previous works to compute connection weights were solved. They were determined by a direct solution which requires computing matrix inversion, which may be intensive. However, the new algorithm was realized by the iterative application of FWT to compute connection weights. Second, proposing a new classification way for this speech recognition system. It operated a human reasoning mode employing a FDSS to compute similarity degrees between test and training signals. Extensive empirical experiments were conducted to compare the proposed approach with other approaches. Obtained results show that the new speech recognition system has a better performance than previously established ones.
In this paper we present a method to optimize the computation of the wavelet transform for the 3D seismic data
while reducing the energy of coefficients to the minimum. This allow us to reduce the entropy of the signal and
so increase the compression ratios. The proposed method exploits the geometrical information contained in the
seismic 3D data to optimize the computation of the wavelet transform. Indeed, the classic filtering is replaced by
a filtering following the horizons contained in the 3D seismic images. Applying this approach in two dimensions
permits us to obtain wavelets coefficients with lowest energy. The experiments show that our method permits
to save extra 8% of the size of the object compared to the classic wavelet transform.
Regular colonoscopy has always been regarded as a complicated procedure requiring a tremendous amount of skill to be
safely performed. In deed, the practitioner needs to contend with both the tortuousness of the colon and the mastering of
a colonoscope. So, he has to take the visual data acquired by the scope's tip into account and rely mostly on his common
sense and skill to steer it in a fashion promoting a safe insertion of the device's shaft. In that context, we do propose a
new navigation clue for the tip of regular colonoscope in order to assist surgeons over a colonoscopic examination.
Firstly, we consider a patch of the inner colon depicted in a regular colonoscopy frame. Then we perform a sketchy 3D
reconstruction of the corresponding 2D data. Furthermore, a suggested navigation trajectory ensued on the basis of the
obtained relief. The visible and invisible lumen cases are considered. Due to its low cost reckoning, such strategy would
allow for the intraoperative configuration changes and thus cut back the non-rigidity effect of the colon. Besides, it
would have the trend to provide a safe navigation trajectory through the whole colon, since this approach is aiming at
keeping the extremity of the instrument as far as possible from the colon wall during navigation. In order to make
effective the considered process, we replaced the original manual control system of a regular colonoscope by a motorized
one allowing automatic pan and tilt motions of the device's tip.