Nowadays there is a trend towards the use of unimodal databases for multimedia content description, organization and retrieval applications of a single type of content like text, voice and images, instead bimodal databases allow to associate semantically two different types of content like audio-video, image-text, among others. The generation of a bimodal database of audio-video implies the creation of a connection between the multimedia content through the semantic relation that associates the actions of both types of information. This paper describes in detail the used characteristics and methodology for the creation of the bimodal database of violent content; the semantic relationship is stablished by the proposed concepts that describe the audiovisual information. The use of bimodal databases in applications related to the audiovisual content processing allows an increase in the semantic performance only and only if these applications process both type of content. This bimodal database counts with 580 audiovisual annotated segments, with a duration of 28 minutes, divided in 41 classes. Bimodal databases are a tool in the generation of applications for the semantic web.
The automatic identification and classification of musical genres based on the sound similarities to form musical textures, it is a very active investigation area. In this context it has been created recognition systems of musical genres, formed by time-frequency characteristics extraction methods and by classification methods. The selection of this methods are important for a good development in the recognition systems. In this article they are proposed the Mel-Frequency Cepstral Coefficients (MFCC) methods as a characteristic extractor and Support Vector Machines (SVM) as a classifier for our system. The stablished parameters of the MFCC method in the system by our time-frequency analysis, represents the gamma of Mexican culture musical genres in this article. For the precision of a classification system of musical genres it is necessary that the descriptors represent the correct spectrum of each gender; to achieve this we must realize a correct parametrization of the MFCC like the one we present in this article. With the system developed we get satisfactory detection results, where the least identification percentage of musical genres was 66.67% and the one with the most precision was 100%.
We propose a convolutional neural network to classify images of buildings using sparse features at the network’s input in conjunction with primary color pixel values. As a result, a trained neuronal model is obtained to classify Mexican buildings in three classes according to the architectural styles: prehispanic, colonial, and modern with an accuracy of 88.01%. The problem of poor information in a training dataset is faced due to the unequal availability of cultural material. We propose a data augmentation and oversampling method to solve this problem. The results are encouraging and allow for prefiltering of the content in the search tasks.
In the computer world, the consumption and generation of multimedia content are in constant growth due to the popularization of mobile devices and new communication technologies. Retrieve information from multimedia content to describe Mexican buildings is a challenging problem. Our objective is to determine patterns related to three building eras (Pre-Hispanic, colonial and modern). For this purpose, existing recognition systems need to process a plenty of videos and images. The automatic learning systems trains the recognition capability with a semantic-annotated database. We built the database taking into account high-level feature concepts, user knowledge and experience. The annotations helps correlating context and content to understand the data on multimedia files. Without a method, the user needs a super mind to remember all and registry this data manually. This article presents a methodology for a quick images annotation using a graphical interface and intuitive controls. Emphasizing in the most two important features: time-consuming during annotations task and the quality of selected images. Though, we only classify images by its era and its quality. Finally, we obtain a dataset of Mexican buildings preserving the contextual information with semantic-annotations for training and test of buildings recognition systems. Therefore, research on content low-level descriptors is other possible use for this dataset.
Biometrics refers to identify people through their physical characteristics or behavior such as fingerprints, face, DNA,
hand geometries, retina and iris patterns. Typically, the iris pattern is to acquire in short distance to recognize a person,
however, in the past few years is a challenge identify a person by its iris pattern at certain distance in non-cooperative
environments. This challenge comprises: 1) high quality iris image, 2) light variation, 3) blur reduction, 4) specular
reflections reduction, 5) the distance from the acquisition system to the user, and 6) standardize the iris size and the density
pixel of iris texture. The solution of the challenge will add robustness and enhance the iris recognition rates. For this
reason, we describe the technical issues that must be considered during iris acquisition. Some of these considerations are
the camera sensor, lens, the math analysis of depth of field (DOF) and field of view (FOV) for iris recognition. Finally,
based on this issues we present experiment that show the result of captures obtained with our camera at distance and
captures obtained with cameras in very short distance.
Current search engines are based upon search methods that involve the combination of words (text-based search); which
has been efficient until now. However, the Internet’s growing demand indicates that there’s more diversity on it with each
passing day. Text-based searches are becoming limited, as most of the information on the Internet can be found in different
types of content denominated multimedia content (images, audio files, video files).
Indeed, what needs to be improved in current search engines is: search content, and precision; as well as an accurate display
of expected search results by the user. Any search can be more precise if it uses more text parameters, but it doesn’t help
improve the content or speed of the search itself. One solution is to improve them through the characterization of the
content for the search in multimedia files. In this article, an analysis of the new generation multimedia search engines is
presented, focusing the needs according to new technologies.
Multimedia content has become a central part of the flow of information in our daily life. This reflects the necessity of
having multimedia search engines, as well as knowing the real tasks that it must comply. Through this analysis, it is shown
that there are not many search engines that can perform content searches. The area of research of multimedia search engines
of new generation is a multidisciplinary area that’s in constant growth, generating tools that satisfy the different needs of
new generation systems.
Multimedia content production and storage in repositories are now an increasingly widespread practice. Indexing concepts for search in multimedia libraries are very useful for users of the repositories. However the search tools of content-based retrieval and automatic video tagging, still do not have great consistency. Regardless of how these systems are implemented, it is of vital importance to possess lots of videos that have concepts tagged with ground truth (training and testing sets). This paper describes a novel methodology to make complex annotations on video resources through ELAN software. The concepts are annotated and related to Mexican nature in a High Level Features (HLF) from development set of TRECVID 2014 in a collaborative environment. Based on this set, each nature concept observed is tagged on each video shot using concepts of the TRECVid 2014 dataset. We also propose new concepts, -like tropical settings, urban scenes, actions, events, weather, places for name a few. We also propose specific concepts that best describe video content of Mexican culture. We have been careful to get the database tagged with concepts of nature and ground truth. It is evident that a collaborative environment is more suitable for annotation of concepts related to ground truth and nature. As a result a Mexican nature database was built. It also is the basis for testing and training sets to automatically classify new multimedia content of Mexican nature.
The video annotation is important for web indexing and browsing systems. Indeed, in order to evaluate the performance of video query and mining techniques, databases with concept annotations are required. Therefore, it is necessary generate a database with a semantic indexing that represents the digital content of the Mexican bullfighting atmosphere. This paper proposes a scheme to make complex annotations in a video in the frame of multimedia search engine project. Each video is partitioned using our segmentation algorithm that creates shots of different length and different number of frames. In order to make complex annotations about the video, we use ELAN software. The annotations are done in two steps: First, we take note about the whole content in each shot. Second, we describe the actions as parameters of the camera like direction, position and deepness. As a consequence, we obtain a more complete descriptor of every action. In both cases we use the concepts of the TRECVid 2014 dataset. We also propose new concepts. This methodology allows to generate a database with the necessary information to create descriptors and algorithms capable to detect actions to automatically index and classify new bullfighting multimedia content.
We present a new method for the estimation of non-planar rotations, i.e. rotations around axis parallel to the image plane, in the context of video compression applications. This method is based on a non planar rotation model which assumes that the moving objects have a planar surface. The proposed block-based motion estimation approach is performed between consecutive or non-consecutive images, which may be contained large displacements, and aims at minimizing the motion compensation error. The efficiency of the method has been compared to the results obtained with the classical full search block matching approach. Experimental results have been done on real video sequences. These results show a significant gain in term of PSNR for the motion compensated P or B frames, compared to the classical full search block matching approach, while the coding cost of the additional motion information is very low, which demonstrates the interest of the proposed rotation model in the context of motion compensation for video compression applications.