This PDF file contains the front matter associated with SPIE Proceedings Volume 9408, including the Title Page, Copyright information, Table of Contents, Authors, Introduction (if any), and Conference Committee listing.
The pipeline industry has millions of miles of pipes buried along the length and breadth of the country. Since none of the areas through which pipelines run are to be used for other activities, it needs to be monitored so as to know whether the right-of- way (RoW) of the pipeline is encroached upon at any point in time. Rapid advances made in the area of sensor technology have enabled the use of high end video acquisition systems to monitor the RoW of pipelines. The images captured by aerial data acquisition systems are affected by a host of factors that include light sources, camera characteristics, geometric positions and environmental conditions. We present a multistage framework for the analysis of aerial imagery for automatic detection and identification of machinery threats along the pipeline RoW which would be capable of taking into account the constraints that come with aerial imagery such as low resolution, lower frame rate, large variations in illumination, motion blurs, etc. The proposed framework is described from three directions. In the first part of the framework, a method is developed to eliminate regions from imagery that are not considered to be a threat to the pipeline. This method makes use of monogenic phase features into a cascade of pre-trained classifiers to eliminate unwanted regions. The second part of the framework is a part-based object detection model for searching specific targets which are considered as threat objects. The third part of the framework is to assess the severity of the threats to pipelines in terms of computing the geolocation and the temperature information of the threat objects. The proposed scheme is tested on the real-world dataset that were captured along the pipeline RoW.
Faces often appear very small and oriented in surveillance videos because of the need of wide fields of view and typically a large distance between the cameras and the scene. Both low resolution and side-view faces make tasks such as face recognition difficult. As a result, face hallucination or super-resolution techniques of face images are generally needed, which has become a thriving research field. However, most existing methods assume face images have been well aligned into some canonical form (i.e. frontal, symmetric). Therefore, face alignment, especially for low-resolution face images, is a key and first step to the success of many face applications. In this paper, we propose an auto alignment approach for face images at different resolution, which consist of two fundamental steps: 1) To find the locations of facial landmarks or feature points (i.e. eyes, nose, and etc.) even for very low resolution faces; 2) To estimate and correct head poses based on the landmark locations and a 3D reference face model. The effectiveness of this method is shown by the aligned face images and the improved face recognition score on released data sets.
In past decade, the increasing popularity of imaging devices, especially smart phones, has led to a great increase in the amount of visual data. The rapidly increasing large scale data pose challenges to the storage and computational resources, and make many computer vision and pattern recognition tasks prohibitively expensive. Dimension reduction techniques explore hidden structures of the original high dimensional data and learn new low dimensional representation to alleviate the challenges. Popular dimension reduction techniques, such as PCA and NMF, do an efficient linear mapping to low dimensional space, while nonlinear techniques overcomes the limitation of linearity at the cost of expensive computational cost (e.g. computing the pairwise distance to find the geodesic distance). In this paper, a piecewise linear dimension reduction technique with global consistency and smoothness constraint is proposed to overcome the restriction of linearity at relatively low cost. Extensive experimental results show that the proposed methods outperform the linear method in the scenario of clustering both consistently and significantly.
Segmentation is a fundamental step in quantifying characteristics, such as volume, shape, and orientation of cells and/or tissue. However, quantification of these characteristics still poses a challenge due to the unique properties of microscopy volumes. This paper proposes a 2D segmentation method that utilizes a combination of adaptive and global thresholding, potentials, z direction refinement, branch pruning, end point matching, and boundary fitting methods to delineate tubular objects in microscopy volumes. Experimental results demonstrate that the proposed method achieves better performance than an active contours based scheme.
Automatic face recognition in real life environment is challenged by various issues such as the object motion, lighting conditions, poses and expressions. In this paper, we present the development of a system based on a refined Enhanced Local Binary Pattern (ELBP) feature set and a Support Vector Machine (SVM) classifier to perform face recognition in a real life environment. Instead of counting the number of 1's in ELBP, we use the 8-bit code of the thresholded data as per the ELBP rule, and then binarize the image with a predefined threshold value, removing the small connections on the binarized image. The proposed system is currently trained with several people's face images obtained from video sequences captured by a surveillance camera. One test set contains the disjoint images of the trained people's faces to test the accuracy and the second test set contains the images of non-trained people's faces to test the percentage of the false positives. The recognition rate among 570 images of 9 trained faces is around 94%, and the false positive rate with 2600 images of 34 non-trained faces is around 1%. Research work is progressing for the recognition of partially occluded faces as well. An appropriate weighting strategy will be applied to the different parts of the face area to achieve a better performance.
Color theme (palette) is a collection of color swatches for representing or describing colors in a visual design or an image. Color palettes have broad applications such as serving as means in automatic/semi-automatic design of visual media, as measures in quantifying aesthetics of visual design, and as metrics in image retrieval, image enhancement, and color semantics. In this paper, we suggest an autonomous mechanism for extracting color palettes from an image. Our method is simple and fast, and it works on the notion of visual saliency. By using visual saliency, we extract the fine colors appearing in the foreground along with the various colors in the background regions of an image. Our method accounts for defining different numbers of colors in the palette as well as presenting the proportion of each color according to its visual conspicuity in a given image. This flexibility supports an interactive color palette which may facilitate the designer’s color design task. As an application, we present how our extracted color palettes can be utilized as a color similarity metric to enhance the current color semantic based image retrieval techniques.
We adapt a classic online clustering algorithm called Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH), to incrementally cluster large datasets of features commonly used in multimedia and computer vision. We call the adapted version modified-BIRCH (m-BIRCH). The algorithm uses only a fraction of the dataset memory to perform clustering, and updates the clustering decisions when new data comes in. Modifications made in m-BIRCH enable data driven parameter selection and effectively handle varying density regions in the feature space. Data driven parameter selection automatically controls the level of coarseness of the data summarization. Effective handling of varying density regions is necessary to well represent the different density regions in data summarization. We use m-BIRCH to cluster 840K color SIFT descriptors, and 60K outlier corrupted grayscale patches. We use the algorithm to cluster datasets consisting of challenging non-convex clustering patterns. Our implementation of the algorithm provides an useful clustering tool and is made publicly available.
Lecture videos are common and increase rapidly. Consequently, automatically and efficiently indexing such videos is an important task. Video segmentation is a crucial step of video indexing that directly affects the indexing quality. We are developing a system for automated video indexing and in this paper discuss our approach for video segmentation and classification of video segments. The novel contributions in this paper are twofold. First we develop a dynamic Gabor filter and use it to extract features for video frame classification. Second, we propose a recursive video segmentation algorithm that is capable of clustering video frames into video segments. We then use these to classify and index the video segments. The proposed approach results in a higher True Positive Rate(TPR) 89.5% and lower False Discovery Rate(FDR) 11.2% compared with the commercial system(TPR= 81.8%, FDR=39.4%) demonstrate that the performance is significantly improved by using enhanced features.
Video surveillance systems are widely deployed for public safety. Real-time monitoring and alerting are some of the key requirements for building an intelligent video surveillance system. Real-life settings introduce many challenges that can impact the performance of real-time video analytics. Video analytics are desired to be resilient to adverse and changing scenarios. In this paper we present various approaches to characterize the uncertainty of a classifier and incorporate crowdsourcing at the times when the method is uncertain about making a particular decision. Incorporating crowdsourcing when a real-time video analytic method is uncertain about making a particular decision is known as online active learning from crowds. We evaluate our proposed approach by testing a method we developed previously for crowd flow estimation. We present three different approaches to characterize the uncertainty of the classifier in the automatic crowd flow estimation method and test them by introducing video quality degradations. Criteria to aggregate crowdsourcing results are also proposed and evaluated. An experimental evaluation is conducted using a publicly available dataset.
With the growing ubiquity of mobile devices, advanced applications are relying on computer vision techniques to provide novel experiences for users. Currently, few tracking approaches take into consideration the resource constraints on mobile devices. Designing efficient tracking algorithms and optimizing performance for mobile devices can result in better and more efficient tracking for applications, such as augmented reality. In this paper, we use binary descriptors, including Fast Retina Keypoint (FREAK), Oriented FAST and Rotated BRIEF (ORB), Binary Robust Independent Features (BRIEF), and Binary Robust Invariant Scalable Keypoints (BRISK) to obtain real time tracking performance on mobile devices. We consider both Google’s Android and Apple’s iOS operating systems to implement our tracking approach. The Android implementation is done using Android’s Native Development Kit (NDK), which gives the performance benefits of using native code as well as access to legacy libraries. The iOS implementation was created using both the native Objective-C and the C++ programing languages. We also introduce simplified versions of the BRIEF and BRISK descriptors that improve processing speed without compromising tracking accuracy.
Computer vision researchers have recently developed automated methods for rating the aesthetic appeal of a photograph. Machine learning techniques, applied to large databases of photos, mimic with reasonably good accuracy the mean ratings of online viewers. However, owing to the many factors underlying aesthetics, it is likely that such techniques for rating photos do not generalize well beyond the data on which they are trained. This paper reviews recent attempts to compare human ratings, obtained in a controlled setting, to ratings provided by machine learning techniques. We review methods to obtain meaningful ratings both from selected groups of judges and also from crowd sourcing. We find that state-of-the-art techniques for automatic aesthetic evaluation are only weakly correlated with human ratings. This shows the importance of obtaining data used for training automated systems under carefully controlled conditions.
Proc. SPIE 9408, Service-oriented workflow to efficiently and automatically fulfill products in a highly individualized web and mobile environment, 94080D (6 March 2015); https://doi.org/10.1117/12.2084942
Service Oriented Architecture1 (SOA) is widely used in building flexible and scalable web sites and services. In most of the web or mobile photo book and gifting business space, the products ordered are highly variable without a standard template that one can substitute texts or images from similar to that of commercial variable data printing. In this paper, the author describes a SOA workflow in a multi-sites, multi-product lines fulfillment system where three major challenges are addressed: utilization of hardware and equipment, highly automation with fault recovery, and highly scalable and flexible with order volume fluctuation.
Network cameras have been growing rapidly in recent years. Thousands of public network cameras provide tremendous amount of visual information about the environment. There is a need to analyze this valuable information for a better understanding of the world around us. This paper presents an interactive web-based system that enables users to execute image analysis and computer vision techniques on a large scale to analyze the data from more than 65,000 worldwide cameras. This paper focuses on how to use both the system's website and Application Programming Interface (API). Given a computer program that analyzes a single frame, the user needs to make only slight changes to the existing program and choose the cameras to analyze. The system handles the heterogeneity of the geographically distributed cameras, e.g. different brands, resolutions. The system allocates and manages Amazon EC2 and Windows Azure cloud resources to meet the analysis requirements.
With the recent introduction of mobile devices and development in client side application technologies, there is an explosion of the parameter matrix for color management: hardware platform (computer vs. mobile), operating system (Windows, Mac OS, Android, iOS), client application (Flesh, IE, Firefox, Safari, Chrome), and file format (JPEG, TIFF, PDF of various versions). In a modern digital print shop, multiple print solutions are used: digital presses, wide format inkjet, dye sublimation inkjet are used to produce a wide variety of customizable products from photo book, personalized greeting card, canvas, mobile phone case and more. In this paper, we outline a strategy spans from client side application, print file construction, to color setup on printer to manage consistency and also achieve what-you-see-is-what-you-get for customers who are using a wide variety of technologies in viewing and ordering product.
This paper describes how videos can be implemented into printed photo books and greeting cards. We will show that – surprisingly or not- pictures from videos are similarly used such as classical images to tell compelling stories.
Videos can be taken with nearly every camera, digital point and shoot cameras, DSLRs as well as smartphones and more and more with so-called action cameras mounted on sports devices. The implementation of videos while generating QR codes and relevant pictures out of the video stream via a software implementation was contents in last years’ paper. This year we present first data about what contents is displayed and how the users represent their videos in printed products, e.g. CEWE PHOTOBOOKS and greeting cards. We report the share of the different video formats used.
Fueled by the development of advanced driver assistance system (ADAS), autonomous vehicles, and the proliferation of cameras and sensors, automotive is becoming a rich new domain for innovations in imaging technology. This paper presents an overview of ADAS, the important imaging and computer vision problems to solve for automotive, and examples of how some of these problems are solved, through which we highlight the challenges and opportunities in the automotive imaging space.
Planning a trip needs to consider many unpredictable factors along the route such as traffic, weather, accidents, etc. People are interested viewing the places they plan to visit and the routes they plan to take. This paper presents a system with an Android mobile application that allows users to: (i) Watch the live feeds (videos or snapshots) from more than 65,000 geotagged public cameras around the world. The user can select the cameras using an interactive world map. (ii) Search for and watch the live feeds from the cameras along the route between a starting point and a destination. The system consists of a server which maintains a database with the cameras' information, and a mobile application that shows the camera map and communicates with the cameras. In order to evaluate the system, we compare it with existing systems in terms of the total number of cameras, the cameras' coverage, and the number of cameras on various routes. We also discuss the response time of loading the camera map, finding the cameras on a route, and communicating with the cameras.
The digitalization of audio is commonly implemented for the purpose of convenient storage and transmission of music and songs in today's digital age. Analyzing digital audio for an insightful look at a specific musical characteristic, however, can be quite challenging for various types of applications. Many existing musical analysis techniques can examine a particular piece of audio data. For example, the frequency of digital sound can be easily read and identified at a specific section in an audio file. Based on this information, we could determine the musical note being played at that instant, but what if you want to see a list of all the notes played in a song? While most existing methods help to provide information about a single piece of the audio data at a time, few of them can analyze the available audio file on a larger scale. The research conducted in this work considers how to further utilize the examination of audio data by storing more information from the original audio file. In practice, we develop a novel musical analysis system Musicians Aid to process musical representation and examination of audio data. Musicians Aid solves the previous problem by storing and analyzing the audio information as it reads it rather than tossing it aside. The system can provide professional musicians with an insightful look at the music they created and advance their understanding of their work. Amateur musicians could also benefit from using it solely for the purpose of obtaining feedback about a song they were attempting to play. By comparing our system's interpretation of traditional sheet music with their own playing, a musician could ensure what they played was correct. More specifically, the system could show them exactly where they went wrong and how to adjust their mistakes. In addition, the application could be extended over the Internet to allow users to play music with one another and then review the audio data they produced. This would be particularly useful for teaching music lessons on the web. The developed system is evaluated with songs played with guitar, keyboard, violin, and other popular musical instruments (primarily electronic or stringed instruments). The Musicians Aid system is successful at both representing and analyzing audio data and it is also powerful in assisting individuals interested in learning and understanding music.
In this paper, we introduce a wine social app Delectable. Delectable provides a social platform for users to capture, rate, comment, and research wine using their mobile devices. We implement a system to automatically recognize wine when users take a picture of the wine label. We address some of the difficulties of label recognition, such as the light condition, viewing angles and similarities among the same wine producers. As a recognition system that demands high accuracy, our system is integrated with both machine recognition and human crowd sourced recognition. We give an overview of the recognition system and illustrate the user experience.
Digital camera is gradually replacing traditional flat-bed scanner as the main access to obtain text information for its usability, cheapness and high-resolution, there has been a large amount of research done on camera-based text understanding. Unfortunately, arbitrary position of camera lens related to text area can frequently cause perspective distortion which most OCR systems at present cannot manage, thus creating demand for automatic text rectification. Current rectification-related research mainly focused on document images, distortion of natural scene text is seldom considered. In this paper, a scheme for automatic text rectification in natural scene images is proposed. It relies on geometric information extracted from characters themselves as well as their surroundings. For the first step, linear segments are extracted from interested region, and a J-Linkage based clustering is performed followed by some customized refinement to estimate primary vanishing point(VP)s. To achieve a more comprehensive VP estimation, second stage would be performed by inspecting the internal structure of characters which involves analysis on pixels and connected components of text lines. Finally VPs are verified and used to implement perspective rectification. Experiments demonstrate increase of recognition rate and improvement compared with some related algorithms.