In areas such as computer vision, the content recognition of an image is a topic of interest in applications such as search engines, biometric security and autonomous cars, among others, since the computer must recognize all the objects that an image can have, which arises as the challenge of localizing and classifying different objects inside a single image in an efficient way. In recent years, this challenge has been approached with the use of region-based convolutional neuronal networks (R-CNN) which are systems that learn to recognize different objects by their representation in a series of images. The proposal of regions is essential for the performance of R-CNN when locating the individual objects of the image with accuracy and in the shortest time. In this article we propose a modification to a method for region proposal based on the density of SIFT like feature points that describe the objects within the image. The selection of regions is made through a decision based on the values of the cumulative distribution function of the normal distribution constructed using points density. The obtained results show a significant reduction in the processing time required for the localization of objects; having slight variations in the classification accuracy with respect to using methods such as KDRP and selective search.
Nowadays there is a trend towards the use of unimodal databases for multimedia content description, organization and retrieval applications of a single type of content like text, voice and images, instead bimodal databases allow to associate semantically two different types of content like audio-video, image-text, among others. The generation of a bimodal database of audio-video implies the creation of a connection between the multimedia content through the semantic relation that associates the actions of both types of information. This paper describes in detail the used characteristics and methodology for the creation of the bimodal database of violent content; the semantic relationship is stablished by the proposed concepts that describe the audiovisual information. The use of bimodal databases in applications related to the audiovisual content processing allows an increase in the semantic performance only and only if these applications process both type of content. This bimodal database counts with 580 audiovisual annotated segments, with a duration of 28 minutes, divided in 41 classes. Bimodal databases are a tool in the generation of applications for the semantic web.
Biometrics refers to identify people through their physical characteristics or behavior such as fingerprints, face, DNA,
hand geometries, retina and iris patterns. Typically, the iris pattern is to acquire in short distance to recognize a person,
however, in the past few years is a challenge identify a person by its iris pattern at certain distance in non-cooperative
environments. This challenge comprises: 1) high quality iris image, 2) light variation, 3) blur reduction, 4) specular
reflections reduction, 5) the distance from the acquisition system to the user, and 6) standardize the iris size and the density
pixel of iris texture. The solution of the challenge will add robustness and enhance the iris recognition rates. For this
reason, we describe the technical issues that must be considered during iris acquisition. Some of these considerations are
the camera sensor, lens, the math analysis of depth of field (DOF) and field of view (FOV) for iris recognition. Finally,
based on this issues we present experiment that show the result of captures obtained with our camera at distance and
captures obtained with cameras in very short distance.
Current search engines are based upon search methods that involve the combination of words (text-based search); which
has been efficient until now. However, the Internet’s growing demand indicates that there’s more diversity on it with each
passing day. Text-based searches are becoming limited, as most of the information on the Internet can be found in different
types of content denominated multimedia content (images, audio files, video files).
Indeed, what needs to be improved in current search engines is: search content, and precision; as well as an accurate display
of expected search results by the user. Any search can be more precise if it uses more text parameters, but it doesn’t help
improve the content or speed of the search itself. One solution is to improve them through the characterization of the
content for the search in multimedia files. In this article, an analysis of the new generation multimedia search engines is
presented, focusing the needs according to new technologies.
Multimedia content has become a central part of the flow of information in our daily life. This reflects the necessity of
having multimedia search engines, as well as knowing the real tasks that it must comply. Through this analysis, it is shown
that there are not many search engines that can perform content searches. The area of research of multimedia search engines
of new generation is a multidisciplinary area that’s in constant growth, generating tools that satisfy the different needs of
new generation systems.