High accuracy localization and user positioning tracking is critical in improving the quality of augmented reality environments. The biggest challenge facing developers is localizing the user based on visible surroundings. Current solutions rely on the Global Positioning System (GPS) for tracking and orientation. However, GPS receivers have an accuracy of about 10 to 30 meters, which is not accurate enough for augmented reality, which needs precision measured in millimeters or smaller. This paper describes the development and demonstration of a head-worn augmented reality (AR) based vision-aid indoor navigation system, which localizes the user without relying on a GPS signal. Commercially available augmented reality head-set allows individuals to capture the field of vision using the front-facing camera in a real-time manner. Utilizing captured image features as navigation-related landmarks allow localizing the user in the absence of a GPS signal. The proposed method involves three steps: a detailed front-scene camera data is collected and generated for landmark recognition; detecting and locating an individual’s current position using feature matching, and display arrows to indicate areas that require more data collects if needed. Computer simulations indicate that the proposed augmented reality-based vision-aid indoor navigation system can provide precise simultaneous localization and mapping in a GPS-denied environment.
Augmented Reality (AR) can seamlessly combine a real scene viewed by a user and a virtual component generated by a computer. This work introduces a system architecture integrating augmented reality technology with state-of-art computer vison techniques such as image semantic segmentation and style colorization. The proposed production system, ARNature, is able to superimpose a virtual scene, audio, and other enhancements in real time over a realworld environment for enhancing tourism experience. ARNature: A lot times, tourists have limited money and time to experience a tourist site during different seasons or weather conditions. The visitors are able to go on an augmented reality journey using an AR device, such as HoloLens, tablet or cellphones, and interact with real objects in a natural scene. Different enhancements of a tourism site are digitally overlaid onto visitors’ direct field of vision in real time. In addition, a voice module can be used to play music and provide additional information. Related algorithms, system design, and simulation results for a prototype ARNature system are presented. Furthermore, a no-reference image quality measure, Naturalness Image Quality Evaluator (NIQE), was utilized to evaluate the immersiveness and naturalness of ARNature. The results demonstrate that ARNature has the ability to enhance tourist experience in a truly immersive manner.
Facial emotion recognition technology finds numerous real-life applications in areas of virtual learning, cognitive psychology analysis, avatar animation, neuromarketing, human machine interactions, and entertainment systems. Most state-of-the-art techniques focus primarily on visible spectrum information for emotion recognition. This becomes very arduous as emotions of individuals vary significantly. Moreover, visible images are susceptible to variation in illumination. Low lighting, variation in poses, aging, and disguise have a substantial impact on the appearance of images and textural information. Even though great advances have been made in the field, facial emotion recognition using existing techniques is often not satisfactory when compared to human performance. To overcome these shortcomings, thermal images are preferred to visible images. Thermal images a) are less sensitive to lighting conditions, b) have consistent thermal signatures, and c) have a temperature distribution formed by the face vein branches. This paper proposes a robust emotion recognition system using thermal images- TERNet. To accomplish this, customized convolutional neural network(CNNs) is employed, which possess excellent generalization capabilities. The architecture adopts features obtained via transfer learning from the VGG-Face CNN model, which is further fine-tuned with the thermal expression face data from the TUFTS face database. Computer simulations demonstrate an accuracy of 96.2% when compared to the state-of-the-art models.
Translating environmental knowledge from bird’s eye view perspective, such as a map, to first person egocentric perspective is notoriously challenging, but critical for effective navigation and environment learning. Pointing error, or the angular difference between the perceived location and the actual location, is an important measure for estimating how well the environment is learned. Traditionally, errors in pointing estimates were computed by manually noting the angular difference. With the advent of commercial low-cost mobile eye trackers, it becomes possible to couple the advantages of automated image processing based techniques with these spatial learning studies. This paper presents a vision based analytic approach for calculating pointing error measures in real-world navigation studies relying only on data from mobile eye tracking devices. The proposed method involves three steps: panorama generation, probe image localization using feature matching, and navigation pointing error estimation. This first-of-its-kind application has game changing potential in the field of cognitive research using eye-tracking technology for understanding human navigation and environment learning and has been successfully adopted by cognitive psychologists.
Face recognition technologies have been in high demand in the past few decades due to the increase in human-computer interactions. It is also one of the essential components in interpreting human emotions, intentions, facial expressions for smart environments. This non-intrusive biometric authentication system relies on identifying unique facial features and pairing alike structures for identification and recognition. Application areas of facial recognition systems include homeland and border security, identification for law enforcement, access control to secure networks, authentication for online banking and video surveillance. While it is easy for humans to recognize faces under varying illumination conditions, it is still a challenging task in computer vision. Non-uniform illumination and uncontrolled operating environments can impair the performance of visual-spectrum based recognition systems. To address these difficulties, a novel Anisotropic Gradient Facial Recognition (AGFR) system that is capable of autonomous thermal infrared to visible face recognition is proposed. The main contribution of this paper includes a framework for thermal/fused-thermal-visible to visible face recognition system and a novel human-visual-system inspired thermal-visible image fusion technique. Extensive computer simulations using CARL, IRIS, AT and T, Yale and Yale-B databases demonstrate the efficiency, accuracy, and robustness of the AGFR system.
Eye tracking technology allows researchers to monitor position of the eye and infer one’s gaze direction, which is used to understand the nature of human attention within psychology, cognitive science, marketing and artificial intelligence. Commercially available head-mounted eye trackers allow researchers to track pupil movements (saccades and fixations) using infrared camera and capture the field of vision by a front-facing scene camera. The wearable eye tracker opened a new way to research in unconstrained environment settings; however, the recorded scene video typically has non-uniform illumination, low quality image frames, and moving scene objects. One of the most important tasks for analyzing the recorded scene video data is finding the boundary between different objects in a single frame. This paper presents a multi-level fixation-oriented object segmentation method (MFoOS) to solve the above challenges in segmenting the scene objects in video data collected by the eye tracker in order to support cognition research. MFoOS shows its advancement in position-invariance, illumination, noise tolerance and is task-driven. The proposed method is tested using real-world case studies designed by our team of psychologists focused on understanding visual attention in human problem solving. The extensive computer simulation demonstrates the method’s accuracy and robustness for fixation-oriented object segmentation. Moreover, a deep-learning image semantic segmentation combining MFoOS results as label data was explored to demonstrate the possibility of on-line deployment of eye tracker fixation-oriented object segmentation.
Autonomous facial recognition system is widely used in real-life applications, such as homeland border security, law enforcement identification and authentication, and video-based surveillance analysis. Issues like low image quality, non-uniform illumination as well as variations in poses and facial expressions can impair the performance of recognition systems. To address the non-uniform illumination challenge, we present a novel robust autonomous facial recognition system inspired by the human visual system based, so called, logarithmical image visualization technique. In this paper, the proposed method, for the first time, utilizes the logarithmical image visualization technique coupled with the local binary pattern to perform discriminative feature extraction for facial recognition system. The Yale database, the Yale-B database and the ATT database are used for computer simulation accuracy and efficiency testing. The extensive computer simulation demonstrates the method’s efficiency, accuracy, and robustness of illumination invariance for facial recognition.