Biometrics has been introduced in many facilities, and face authentication is one of the authentication methods that has attracted attention. However, face authentication is influenced easily by changes in lighting conditions. This study proposes a face authentication method using only a thermal camera, which is robust to lighting changes. The proposed method applies FaceNet[1], an authentication method for visible light images, to each of the four parts of the face (face, eyes, nose, and mouth) and tries to improve the accuracy by using a majority voting method.
The motif drawn on Nishiki-e is needed to register in the database as a search tag. The accuracies of the motif tag that are currently manually registered is unstable because it depends on the knowledge and interests of the registrant. Therefore, this study proposes an automatic generation method of motif tags using deep learning to support cultural activities. Nishiki-e is more difficult to collect training images that include specific motifs than photographs. In this study, we propose three methods for preparing training images. First, we applied a similar image generation model from a single image to a small number of Nishiki-e containing motifs to create training images. Second, we applied a Nishiki-e style processing model to photographs containing motifs to create training images. Third, we combined a small number of photographs with motifs with some background images to create training images. In particular, the third method can detect from a small number of inputs like the first method with an accuracy close to that of the second method.
Color constancy is a human characteristic that can recognize the color of an object correctly, even if the color of the illumination light changes. We constructed a network that reproduces the color constancy by pix2pix, an adversarial generative network. However, the current network has problems. For example, the network cannot output the color and shape of object parts correctly when the illumination is extreme colors, and the object and the background in the image assimilate. This research tries to improve the accuracy of the color constancy network by using the segmentation technique. We generate a mask image by the segmentation network from the input image, where the object part is white and the background is black. Then, we input the mask image to the network in the same way as the input image and add the information of the mask image to the network processing of the input image. By inputting the mask image, the information of the target object region is added to the color constancy network. It is possible to clarify the region of the object in the input image and to reproduce the shape and color of the object, which the existing color constancy network cannot reproduce.
Various models have been proposed to predict the future head/gaze orientation of a user watching a 360-degree video. However, most of these models do not take sound information into account, and there are few studies on the influence of sound on users in VR space. This study proposes a multimodal model for predicting head/gaze orientation for 360-degree videos based on a new analysis of users' head/gaze behavior in VR space. First, we focus on whether people are attracted to the sound source of the 360-degree video or not. We conducted a head/gaze tracking experiment with 22 subjects in AV (Audio-Visual) and V (Visual) conditions using 32 videos. As a result, it was confirmed that whether they were attracted to the sound source differed depending on the video. Next, we trained a deep learning model based on the results and constructed and evaluated a multimodal model that combined visual and auditory information. As a result, we were able to construct a multimodal head/gaze prediction model that used the sound source explicitly. However, from the viewpoint of accuracy improvement, we could not confirm any advantage of multimodalization. Finally, a discussion of this problem and prospects is given.
Deep neural networks (DNNs) are capable of achieving high performance in various tasks. However, the huge number of parameters and floating point operations make it difficult to deploy them on edge devices. Therefore, in recent years, a lot of researches have been done to reduce the weight of deep convolutional neural networks. Conventional research prunes based on a set of criteria, but we do not know if those criteria are optimal or not. In order to solve this problem, this paper proposes a method to select parameters for pruning automatically. Specifically, all parameter information is input, and reinforcement learning is used to select and prune parameters that do not affect the accuracy. Our method prunes one filter or node in one action and compresses it by repeating the action. The proposed method was able to highly compress the CNN with minimal degradation in accuracy and reduce about 97.0% of the parameters with 2.53% degradation in CIFAR10 image classification task on VGG16.
A depth image of a single RGBD camera has many occlusions and noises, so it is not easy to obtain 3D data of the whole human head. Point cloud deep learning has recently attracted much attention, which allows direct input and output of point clouds. One of them, the point cloud completion, which creates a complete point cloud from a partial point cloud, has been studied. However, existing studies of point cloud completion evaluated only the shape and have not focused on colored point clouds. Therefore, this study proposes a colored point cloud completion method for the human head based on machine learning. For deep learning training, the CG dataset was created from the face and hair dataset. The proposed network inputs and outputs point cloud with XYZ coordinates, and 𝐿 ∗𝑎 ∗𝑏 ∗ color information optionally has a Discriminator that processes 𝐿 ∗𝑎 ∗𝑏 ∗ -D images by a differentiable point renderer. This study experimented using the network and the dataset and evaluated using point domain and image domain metrics.
This paper proposes a method for estimating 3D information, such as shape, orientation, size, and position of objects in a monocular image, and reproduce scenes in 3D point clouds using Convolutional Neural Network (CNN). This study proposes a network that combines depth estimation, object detection, and point cloud estimation to estimate 3D information of objects. The proposed network requires networks for object detection and segmentation, and a point cloud estimation for object shape estimation. The point cloud estimation network is robust to the reproduction of the object's surface and can deal with unknown objects through a semantic understanding of the object’s shape. In addition to these networks, we combine a depth estimation network for estimating the depth of the entire scene and the distance between the camera and object. In this paper, we consider the point cloud estimation network. We estimate the point clouds for real objects in the images of the dataset and evaluate the output point clouds.
In recent years, virtual reality (VR) and augmented reality (AR) have been developed and applied to various simulations for business and commercial use. In these simulations, computer graphics (CG) becomes very important to express virtual objects, and there are many studies on the expression of cloth. Some optical properties of an object are necessary to represent cloth with CG. These optical properties depend on the material of the thread, the number of threads, and the thickness. Therefore it is difficult to represent clothes corresponding to these changes. This study proposes a method to formulate the reflection and transmittance that depend on the component of the cloths. To formulate the reflection, we use the Kubelka-Munk theory and the component of the cloth that can be easily obtained using a smartphone, etc.
Recently, object recognition using CNN is widespread. Still, medical images do not have a sufficient number of images because they require the doctor’s findings in the training dataset. On such a small-scale dataset, there is a problem that CNN cannot realize enough high recognition accuracy. As a solution to this problem, there is a method called transfer learning that reuses the weights learned on a large dataset. In addition, there is research on a method of pruning parameters unimportant for the target task during transfer learning. In this study, after transfer learning is performed, the convolution filter is evaluated using pruning, and the low evaluation filter is replaced with the high evaluation filter. In order to confirm the usefulness of the proposed method in recognition accuracy, we compare it with the three methods, i.e., transfer learning only, pruning, and initializing the filter. As a result, we were able to obtain a high recognition accuracy compared to other methods. We confirmed that CNN might be affected by replacing the filter in object recognition of small-scale datasets.
In recent years, research on virtual fitting has been conducted in the fashion field. Many of them have been put to practical use in prepared clothes, and companies are using the information on the shape, size, and fabric of their clothes to provide users with virtual fitting. In the case without known data, there are many methods of estimating the shape and size of the clothes in images. Using these methods, users can try on virtually the clothes they want to wear while fitting the users’ body shape and pose. On the other hand, a method for estimating the fabric of clothes remains to be developed. Because the materials of clothes are related to the softness of clothes in virtual fitting, it is difficult to reproduce the realistic movements and wrinkles of clothes using the conventional virtual fitting system. This study proposes a method for estimating the material of fabric from clothes images, aiming at realistic virtual fitting. A dataset focusing on each fabric’s texture and luster is constructed and estimated using a Convolutional Neural Network (CNN).
In urban development, it is important to make a plan that takes into account the changes in the appearance of natural objects after decades. This study proposes a simulation method of tree growth for the prediction of the appearance change of natural objects.
This study proposes a method generating a 3D model of furniture from 3D point cloud data of a room captured by RGBD camera in order to realize the layout simulation of the real room with furniture.
Cleaning is inseparable in life, but it is impossible to see with the naked eye where the room was actually cleaned. For this
reason, if information on the location where the cleaning was performed cannot be shared when cleaning by multiple
people, there is a possibility that an unclean area is remained. Therefore, if Augmented Reality (AR) can be used to
visualize the passing area of the hand or cleaning tool being cleaned, it will lead to improve cleaning efficiency and increase
motivation by visualizing the cleaning area. The purpose of this research is to obtain and superimpose the location
information of the passing area using Simultaneous Localization and Mapping (SLAM) in order to visualize the passing
area of the hand or the cleaning tool using AR.
When the printed material is imaged by a monocular digital camera, geometric distortions caused due to folds result in a different appearance from the content of the original printed material. This study aims to reproduce appearance by correction the obtained image. As a proposed method, the geometric distortion is corrected by deforming each local area after dividing the printed material image into local areas. In addition, the brightness change by shading is also corrected.
In recent years, many SLAM (simultaneous localization and mapping) systems have appeared showing impressive dense scene reconstruction. However, the normal SLAM system build 3D scenes at point level without any semantic information. Many computer vision applications require high ability of scene understanding and point-based SLAM shows insufficiency in these applications. This paper studies about fusing 3D object recognition into SLAM system, using hand-held RGB-D camera and RTAB-Map to reconstruct dense point cloud of 3D indoor scene. Then we use supervoxel based point cloud segmentation approaches to over-segment the scene. 3D object classification model trained by PointNet is added to merge the segmentation process and object recognition. Our experiment on indoor environment shows the effectiveness of this system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.