Accurately recognizing building roof style leads to a much more realistic 3D building modeling and rendering. In this paper, we propose a novel system for image based roof style classification using machine learning technique. Our system is capable of accurately recognizing four individual roof styles and a complex roof which is composed of multiple parts. We make several novel contributions in this paper. First, we propose an algorithm that segments a complex roof to parts which enable our system to recognize the entire roof based on recognition of each part. Second, to better characterize a roof image, we design a new feature extracted from a roof edge image. We demonstrate that this feature has much better performance compared to recognition results generated by Histogram of Oriented Gradient (HOG), Scale-invariant Feature Transform (SIFT) and Local Binary Patterns (LBP). Finally, to generate a classifier, we propose a learning scheme that trains the classifier using both synthetic and real roof images. Experiment results show that our classifier performs well on several test collections.
The deformation field in nonlinear image registration is usually modeled by a global model. Such models are often faced with the problem that a locally complex deformation cannot be accurately modeled by simply increasing degrees of freedom (DOF). In addition, highly complex models require additional regularization which is usually ineffective when applied globally. Registering locally corresponding regions addresses this problem in a divide and conquer strategy. In this paper we propose a piecewise image registration approach using Discrete Cosine Transform (DCT) basis functions for a nonlinear model. The contributions of this paper are three-folds. First, we develop a multi-level piecewise registration framework that extends the concept of piecewise linear registration and works with any nonlinear deformation model. This framework is then applied to nonlinear DCT registration. Second, we show how adaptive model complexity and regularization could be applied for local piece registration, thus accounting for higher variability. Third, we show how the proposed piecewise DCT can overcome the fundamental problem of a large curvature matrix inversion in global DCT when using high degrees of freedoms. The proposed approach can be viewed as an extension of global DCT registration where the overall model complexity is increased while achieving effective local regularization. Experimental evaluation results provide comparison of the proposed approach to piecewise linear registration using an affine transformation model and a global nonlinear registration using DCT model. Preliminary results show that the proposed approach achieves improved performance.
Retained surgical items (RSIs) in patients is a major operating room (OR) patient safety concern. An RSI is any surgical tool, sponge, needle or other item inadvertently left in a patients body during the course of surgery. If left undetected, RSIs may lead to serious negative health consequences such as sepsis, internal bleeding, and even death. To help physicians efficiently and effectively detect RSIs, we are developing computer-aided detection (CADe) software for X-ray (XR) image analysis, utilizing large amounts of currently available image data to produce a clinically effective RSI detection system. Physician analysis of XRs for the purpose of RSI detection is a relatively lengthy process that may take up to 45 minutes to complete. It is also error prone due to the relatively low acuity of the human eye for RSIs in XR images. The system we are developing is based on computer vision and machine learning algorithms. We address the problem of low incidence by proposing synthesis algorithms. The CADe software we are developing may be integrated into a picture archiving and communication system (PACS), be implemented as a stand-alone software application, or be integrated into portable XR machine software through application programming interfaces. Preliminary experimental results on actual XR images demonstrate the effectiveness of the proposed approach.
Image registration is normally solved as a regularized optimization problem. The line search procedure is commonly employed in unconstrained nonlinear optimization. At each iteration step the procedure computes a step size that achieves adequate reduction in the objective function at minimal cost.In this paper we extend the constrained line search procedure with different regularization terms so as to improve convergence. The extension is addressed in the context of constrained optimization to solve a regularized image registration problem. Specifically, the displacement field between the registered image pair is modeled as the sum of weighted Discrete Cosine Transform basis functions. A Taylor series expansion is applied to the objective function for deriving a Gauss-Newton solution. We consider two regularization terms added to the objective function. A Tikhonov regularization term constrains the magnitude of the solution and a bending energy term constrains the bending energy of the deformation field. We modify both the sufficient and curvature conditions of the Wolfe conditions to accommodate the additional regularization terms. The proposed extension is evaluated by generated test collection with known deformation. The experimental evaluation results show that a solution obtained with bending energy regularization and Wolfe condition line search achieves the smallest mean deformation field error among 100 registration pairs. This solution shows in addition an improvement in overcoming local minima.
We present a system for door and window image-based measurement using an Android mobile device. In this system a user takes an image of a door or window that needs to be measured and using interaction measures specific dimensions of the object. The existing object is removed from the image and a 3D model of a replacement is rendered onto the image. The visualization provides a 3D model with which the user can interact. When tested on a mobile Android platform with an 8MP camera we obtain an average measurement error of roughly 0.5%. This error rate is stable across a range of view angles, distances from the object, and image resolutions. The main advantages of our mobile device application for image measurement include measuring objects for which physical access is not readily available, documenting in a precise manner the locations in the scene where the measurements were taken, and visualizing a new object with custom selections inside the original view.
Proc. SPIE. 9408, Imaging and Multimedia Analytics in a Web and Mobile World 2015
KEYWORDS: Detection and tracking algorithms, Magnetic resonance imaging, Image segmentation, Video, Image analysis, Machine learning, Video processing, Multimedia, Optical character recognition, Algorithm development
Lecture videos are common and increase rapidly. Consequently, automatically and efficiently indexing such videos is an important task. Video segmentation is a crucial step of video indexing that directly affects the indexing quality. We are developing a system for automated video indexing and in this paper discuss our approach for video segmentation and classification of video segments. The novel contributions in this paper are twofold. First we develop a dynamic Gabor filter and use it to extract features for video frame classification. Second, we propose a recursive video segmentation algorithm that is capable of clustering video frames into video segments. We then use these to classify and index the video segments. The proposed approach results in a higher True Positive Rate(TPR) 89.5% and lower False Discovery Rate(FDR) 11.2% compared with the commercial system(TPR= 81.8%, FDR=39.4%) demonstrate that the performance is significantly improved by using enhanced features.
Video segmentation and indexing are important steps in multi-media document understanding and information
retrieval. This paper presents a novel machine learning based approach for automatic structuring and indexing
of lecture videos. By indexing video content, we can support both topic indexing and semantic querying of
multimedia documents. In this paper, our proposed approach extracts features from video images and then uses
these features to construct a model to label video frames. Using this model, we are able to segment and indexing
videos with accuracy of 95% on our test collection.
Efficient utilization of videos of lectures and presentations requires indexing based on extracted background which includes slides and / or handwritten notes. Since the background in such videos is constantly evolving there is a need for special techniques for background recovery. The objective of this paper is a method for automatically extracting the evolving background in such videos. In contrast to general background subtraction techniques which aim at extracting foreground objects, the goal here is to extract the background and complete it where the foreground is removed. Experimental results comparing the proposed approach to other known techniques demonstrate improved performance when using the proposed approach.
Binarization is of significant importance in document analysis systems. It is an essential first step, prior to further
stages such as Optical Character Recognition (OCR), document segmentation, or enhancement of readability of
the document after some restoration stages. Hence, proper evaluation of binarization methods to verify their
effectiveness is of great value to the document analysis community. In this work, we perform a detailed goal-oriented evaluation of image quality assessment of the 18 binarization methods that participated in the DIBCO
2011 competition using the 16 historical document test images used in the contest. We are interested in the
image quality assessment of the outputs generated by the different binarization algorithms as well as the OCR
performance, where possible. We compare our evaluation of the algorithms based on human perception of quality
to the DIBCO evaluation metrics. The results obtained provide an insight into the effectiveness of these methods
with respect to human perception of image quality as well as OCR performance.
Optical character recognition is widely used for converting document images into digital media. Existing OCR
algorithms and tools produce good results from high resolution, good quality, document images. In this paper,
we propose a machine learning based super resolution framework for low resolution document image OCR. Two
main techniques are used in our proposed approach: a document page segmentation algorithm and a modified
K-means clustering algorithm. Using this approach, by exploiting coherence in the document, we reconstruct
from a low resolution document image a better resolution image and improve OCR results. Experimental results
show substantial gain in low resolution documents such as the ones captured from video.
Computer generated images are common in numerous computer graphics applications such as games, modeling,
and simulation. There is normally a tradeoff between the time allocated to the generation of each image frame
and and the quality of the image, where better quality images require more processing time. Specifically, in the
rendering of 3D objects, the surfaces of objects may be manipulated by subdividing them into smaller triangular
patches and/or smoothing them so as to produce better looking renderings. Since unnecessary subdivision
results in increased rendering time and unnecessary smoothing results in reduced details, there is a need to
automatically determine the amount of necessary processing for producing good quality rendered images. In
this paper we propose a novel supervised learning based methodology for automatically predicting the quality
of rendered images of 3D objects. To perform the prediction we train on a data set which is labeled by human
observers for quality. We are then able to predict the quality of renderings (not used in the training) with an
average prediction error of roughly 20%. The proposed approach is compared to known techniques and is shown
to produce better results.
Video structuring and indexing are two crucial processes for multi-media document understanding and information
retrieval. This paper presents a novel approach in automatic structuring and indexing lecture videos for
an educational video system. By structuring and indexing video content, we can support both topic indexing
and semantic querying of multimedia documents. In this paper, our goal is to extract indices of topics and link
them with their associated video and audio segments. Two main techniques used in our proposed approach are
video image analysis and video text analysis. Using this approach, we obtain accuracy of over 90.0% on our
Document layout analysis is of fundamental importance for document image understanding and information retrieval.
It requires the identification of blocks extracted from a document image via features extraction and block
classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine
printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications
of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting,
bagging, and combined model trees) in addition to other known learning algorithms. Experimental results
are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected
from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of
the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in
conjunction with the Ocropus feature set, we further improve the performance of the block classification system
to obtain a classification accuracy of 99.21%.
Layout analysis is a crucial process for document image understanding and information retrieval. Document
layout analysis depends on page segmentation and block classification. This paper describes an algorithm for
extracting blocks from document images and a boosting based method to classify those blocks as machine printed
text or not. The feature vector which is fed into the boosting classifier consists of a four direction run-length
histogram, and connected components features in both background and foreground. Using a combination of
features through a boosting classifier, we obtain an accuracy of 99.5% on our test collection.
In previous work , we proposed the application of the Expectation-Maximization (EM) algorithm in the binarization
of historical documents by defining a multi-resolution framework. In this work, we extend the multiresolution
framework to the Otsu algorithm for effective binarization of historical documents. We compare the
effectiveness of the EM based binarization technique to the Otsu thresholding algorithm on historical documents.
We demonstrate how the EM can be extended to perform an effective segmentation of historical documents by
taking into account multiple features beyond the intensity of the document image. Experimental results, analysis
and comparisons to known techniques are presented using the document image collection from the DIBCO 2009
Large degradations in document images impede their readability as well as substantially deteriorating the performance
of automated document processing systems. Image quality metrics have been defined to correlate with
OCR accuracy. However, this does not always correlate with human perception of image quality. When enhancing
document images with the goal of improving readability, it is important to understand human perception
of quality. The goal of this work is to evaluate human perception of degradation and correlate it to known
degradation parameters and existing image quality metrics. The information captured enables the learning and
estimation of human perception of document image quality.
In previous work we showed that shape descriptor features can be used in Look Up Table (LUT) classifiers to
learn patterns of degradation and correction in historical document images. The algorithm encodes the pixel
neighborhood information effectively using a variant of shape descriptor. However, the generation of the shape
descriptor features was approached in a heuristic manner. In this work, we propose a system of learning the
shape features from the training data set by using neural networks: Multilayer Perceptrons (MLP) for feature
extraction. Given that the MLP maybe restricted by a limited dataset, we apply a feature selection algorithm to
generalize, and thus improve, the feature set obtained from the MLP. We validate the effectiveness and efficiency
of the proposed approach via experimental results.
In previous work we showed that Look Up Table (LUT) classifiers can be trained to learn patterns of degradation
and correction in historical document images. The effectiveness of the classifiers is directly proportional to the
size of the pixel neighborhood it considers. However, the computational cost increases almost exponentially with
the neighborhood size. In this paper, we propose a novel algorithm that encodes the neighborhood information
efficiently using a shape descriptor. Using shape descriptor features, we are able to characterize the pixel
neighborhood of document images with much fewer bits and so obtain an efficient system with significantly
reduced computational cost. Experimental results demonstrate the effectiveness and efficiency of the proposed
The fast evolution of scanning and computing technologies have led to the creation of large collections of scanned paper documents. Examples of such collections include historical collections, legal depositories, medical archives, and business archives. Moreover, in many situations such as legal litigation and security investigations scanned collections are being used to facilitate systematic exploration of the data. It is almost always the case that
scanned documents suffer from some form of degradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large,
global degradation models do not perform well. In contrast, we propose to estimate local degradation models and
use them in enhancing degraded document images. Using a semi-automated enhancement system we have labeled
a subset of the Frieder diaries collection.1 This labeled subset was then used to train an ensemble classifier. The
component classifiers are based on lookup tables (LUT) in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly effcient. Experimental evaluation results are provided using the Frieder diaries collection.1
Degraded documents are frequently obtained in various situations. Examples of degraded document collections include historical document depositories, document obtained in legal and security investigations, and legal and medical archives. Degraded document images are hard to to read and are hard to analyze using computerized techniques. There is hence a need for systems that are capable of enhancing such images. We describe a language-independent semi-automated system for enhancing degraded document images that is capable of exploiting inter- and intra-document coherence. The system is capable of processing document images with high levels of degradations and can be used for ground truthing of degraded document images. Ground truthing of degraded document images is extremely important in several aspects: it enables quantitative performance measurements
of enhancement systems and facilitates model estimation that can be used to improve performance. Performance evaluation is provided using the historical Frieder diaries collection.1
Writer identification in offline handwritten documents is a difficult task with multiple applications such as authentication,
identification, and clustering in document collections. For example, in the context of content-based
document image retrieval, given a document with handwritten annotations it is possible to determine whether
the comments were added by a specific individual and find other documents annotated by the same person. In
contrast to online writer identification in which temporal stroke information is available, such information is not
readily available in offline writer identification. The base approach and the main contribution of our work is the
idea of using derived canonical stroke frequency descriptors from handwritten text to identify writers. We show
that a relatively small set of canonical strokes can be successfully employed for generating discriminative frequency
descriptors. Moreover, we show that by using frequency descriptors alone it is possible to perform writer
identification with success rate which is comparable to the known state of the art in offline writer identification
with close to 90% accuracy. As frequency descriptors are independent of existing descriptors, the performance
of offline writer identification may be improved by combining both standard and frequency descriptors. Experimental
evaluation with quantitative performance evaluation is provided using the IAM dataset.1
Optic fundus assessment is widely used for diagnosing vascular and non-vascular pathology. Inspection of the
retinal vasculature may reveal hypertension, diabetes, arteriosclerosis, cardiovascular disease and stroke. Due to
various imaging conditions retinal images may be degraded. Consequently, the enhancement of such images and
vessels in them is an important task with direct clinical applications. We propose a novel technique for vessel
enhancement in retinal images that is capable of enhancing vessel junctions in addition to linear vessel segments.
This is an extension of vessel filters we have previously developed for vessel enhancement in thoracic CT scans.
The proposed approach is based on probabilistic models which can discern vessels and junctions. Evaluation
shows the proposed filter is better than several known techniques and is comparable to the state of the art when
evaluated on a standard dataset. A ridge-based vessel tracking process is applied on the enhanced image to
demonstrate the effectiveness of the enhancement filter.
Subdivision of triangular meshes is a common technique for refining a given surface representation for various
purposes in computer vision, computer graphics, and finite element methods. Particularly, in the processing
of reconstructed surfaces based on sensed data, subdivision can be used to add surface points at locations in
which the sensed data was sparse and so increase the density of various computed surface properties at such
locations. Standard subdivision techniques are normally applied to the complete mesh and so add vertices and
faces throughout the mesh. In modifying global adaptive subdivision schemes to perform local subdivision, it
is necessary to guarantee smooth transition between subdivided regions and regions left at the original level
so as to prevent the formation of surface artifacts at the boundaries between such regions. Moreover, the
produced surface mesh needs to be suitable for successive local subdivision steps. We propose a novel approach
for incremental adaptive subdivision of triangle meshes which may be applied to multiple global subdivision
schemes and which may be repeated iteratively without forming artifacts in the subdivided mesh. The decision
of where to subdivide in each iteration is determined based on an error measure which is minimized through
subdivision. Smoothness between various subdivision levels is obtained through the postponement of local atomic
operations. The proposed scheme is evaluated and compared to known techniques using quantitative measures.
We address the problem of content-based image retrieval in the context of complex document images. Complex
documents typically start out on paper and are then electronically scanned. These documents have rich internal
structure and might only be available in image form. Additionally, they may have been produced by a combination
of printing technologies (or by handwriting); and include diagrams, graphics, tables and other non-textual
elements. Large collections of such complex documents are commonly found in legal and security investigations.
The indexing and analysis of large document collections is currently limited to textual features based OCR data
and ignore the structural context of the document as well as important non-textual elements such as signatures,
logos, stamps, tables, diagrams, and images. Handwritten comments are also normally ignored due to the
inherent complexity of offline handwriting recognition. We address important research issues concerning content-based
document image retrieval and describe a prototype for integrated retrieval and aggregation of diverse
information contained in scanned paper documents we are developing. Such complex document information
processing combines several forms of image processing together with textual/linguistic processing to enable
effective analysis of complex document collections, a necessity for a wide range of applications. Our prototype
automatically generates rich metadata about a complex document and then applies query tools to integrate
the metadata with text search. To ensure a thorough evaluation of the effectiveness of our prototype, we are
developing a test collection containing millions of document images.
Poor quality documents are obtained in various situations such as historical document collections, legal archives,
security investigations, and documents found in clandestine locations. Such documents are often scanned for
automated analysis, further processing, and archiving. Due to the nature of such documents, degraded document
images are often hard to read, have low contrast, and are corrupted by various artifacts. We describe
a novel approach for the enhancement of such documents based on probabilistic models which increases the
contrast, and thus, readability of such documents under various degradations. The enhancement produced by
the proposed approach can be viewed under different viewing conditions if desired. The proposed approach was
evaluated qualitatively and compared to standard enhancement techniques on a subset of historical documents
obtained from the Yad Vashem Holocaust museum. In addition, quantitative performance was evaluated based
on synthetically generated data corrupted under various degradation models. Preliminary results demonstrate
the effectiveness of the proposed approach.
Volume registration is fundamental to multiple medical imaging
algorithms. Specifically, non-rigid registration of thoracic CT scans
taken at different time instances can be used to detect new nodules more
reliably and assess the growth rate of existing nodules.
Voxel-based registration techniques are generally sensitive to intensity
variation and structural differences, which are common in CT scans due
to partial volume effects and naturally occurring motion and deformations.
The approach we propose in this paper is based on vessel tree extraction
which is then used to infer the complete volume registration. Vessels
form unique features with good localization.
Using extracted vessel trees, a minimization process is used to estimate
the motion vectors at vessels. Accurate motion vectors are obtained
at vessel junctions whereas vessel segments support only normal
component estimation. The obtained motion vectors are then interpolated
to produce a dense motion field using thin plate splines.
The proposed approach is evaluated on both real and synthetically
deformed volumes. The obtained results are compared to several
standard registration techniques. It is shown that by using vessel
structure, the proposed approach results in improved performance.
Automated detection of lung nodules in thoracic CT scans is an important clinical challenge. Blood vessels form a major source of false positives in automated nodule detection systems. Hence, the performance of such systems may be improved by enhancing nodules while suppressing blood vessels. Ideally, nodule enhancement filters
should enhance nodules while suppressing vessels and lung tissue. A distinction between vessels and nodules is normally obtained through eigenvalue analysis of the Hessian matrix. The Hessian matrix is a second order differential quantity and so is sensitive to noise. Furthermore, by relying on principal curvatures alone, existing
filters are incapable of distinguishing between nodules and vessel junctions, and are incapable of handling cases in which nodules touch vessels. In this paper we develop novel nodule enhancement filters that are capable of suppressing junctions and are capable of handling cases in which nodules appear to touch or even overlap with vessels. The proposed filters are based on optimized probabilistic models derived from eigenvalue analysis of the gradient correlation matrix which is a first order differential quantity and so are less sensitive to noise compared with known vessel enhancement filters. The proposed filters are evaluated and compared to known techniques both qualitatively, quantitatively. The evaluation includes both synthetic and actual clinical data.
Analysis of large collections of complex documents is an increasingly important need for numerous applications. Complex documents are documents that typically start out on paper and are then electronically scanned. These documents have rich internal structure and might only be available in image form. Additionally, they may have been produced by a combination of printing technologies (or by handwriting); and include diagrams, graphics, tables and other non-textual elements. The state of the art today for a large document collection is essentially text search of OCR'd documents with no meaningful use of data found in images, signatures, logos, etc. Our prototype automatically generates rich metadata about a complex document and then applies query tools to integrate the metadata with text search. To ensure a thorough evaluation of the effectiveness of our prototype, we are also developing a roughly 42,000,000 page complex document test collection. The collection will include relevance judgments for queries at a variety of levels of detail and depending on a variety of content and structural characteristics of documents, as well as "known item" queries looking for particular documents.
Vessel enhancement in volumetric data is a necessary prerequisite in
various medical imaging applications. In the context of automated lung nodule detection in thoracic CT scans, segmented blood vessels can be used to resolve local ambiguities based on global considerations and so improve the performance of lung nodule detection algorithms. Segmenting the data correctly is a difficult problem with direct consequences for subsequent processing steps. Voxels belonging to vessels and nodules in thoracic CT scans are both characterized by high contrast with respect to a local neighborhood. Thus in order to enhance vessels while suppressing nodules, additional characteristics should be used. In this paper we propose a novel vessel enhancement filter that is capable of enhancing vessels and junctions in thoracic CT scans while suppressing nodules. The proposed filters are based on a Gaussian mixture model which is optimized through expectation maximization. The proposed filters are based on first order differential quantities and so are less sensitive to noise compared with known Hessian-based vessel enhancement filters. Moreover, the proposed filters utilize an adaptive window and so avoid the common need for multiple scale analysis. The proposed filters are evaluated and compared to known techniques qualitatively and quantitatively on both synthetic and actual clinical data and it is shown that the proposed filters perform better.
Synthesizing new views from existing ones is an emerging field with various applications. Approaches for view synthesis rely upon dense or sparse matching between existing views. In all cases, some parts of the original images are inevitably unmatched and, therefore, the synthesis of new views of the corresponding regions need to rely upon sparse sets of matched points. Triangulation based upon sparsely matched points constitutes a possible solution for the handling of unmatched regions. However, such a triangulation should respect the underlying geometry of the corresponding scene. In this paper, a novel approach is proposed for a physically valid joint-triangulation of sparsely matched points. The proposed approach is based upon the maximization of a physical validity criterion which is supported by textured regions in the images. The produced triangulation is such that each triangle corresponds approximately to a planar surface in the scene. Given an arbitrary initial triangulation, the proposed approach refines it by modifying links between vertices so as to locally increase the physical validity measure. Furthermore, since missing or incorrect matched points may impede the correctness of the current triangulation, an iterative split-and-merge phase adds or removes matching points in low-score triangles, by means of a correctness evaluation based upon a large evaluation support. The paper presents results of the proposed approach with synthetic views—which allow quantitative performance evaluation—and natural images.
Blood vessel segmentation in volumetric data is a necessary prerequisite in various medical imaging applications. Specifically, when considering the application of automatic lung nodule detection in thoracic CT scans, segmented blood vessels can be used in order to resolve local ambiguities based on global considerations and so improve the performance of lung nodule detection algorithms. In this paper, a novel regulated morphology approach to fuzzy shape analysis is described in the context of blood vessel extraction in thoracic CT scans. The fuzzy shape representation is obtained by using regulated morphological operations. Such a representation is necessary due to noise present in the data and due to the discrete nature of the volumetric data produced by CT scans, and particularly the interslice spacing. Regulated morphological operations are a generalization of ordinary morphological operations which relax the extreme strictness inherent to ordinary morphological operations. Based on constraints of collinearity, size, and global direction, a tracking algorithm produces a set of connected trees representing blood vessels and nodules in the volume. The produced tree structures are composed of fuzzy spheres in which the degree of object membership is proportional to the ratio between the occupied volume and the volume of the discrete sphere encompassing it. The performance of the blood vessel extraction algorithm described in the paper is evaluated based on a distance measure between a known blood vessel structure and a recovered one. As the generation of synthetic data for which the true vessel network is known may not be sufficiently realistic, our evaluation is based on different versions of real data corrupted by multiplicative Gaussian noise.
Accurate curvature estimation in mesh surfaces is an important problem with numerous applications. Curvature is a significant indicator of ridges and can be used in applications such as: recognition, adaptive smoothing, and compression. Common methods for curvature estimation are based on quadric surface fitting to local neighborhoods. While the use of quadric surface patches simplifies computations, due to its simplicity the quadric surface model is incapable of modeling accurately the local surface geometry and introduces a strong element of smoothing into the computations. Hence, reducing the accuracy of the curvature estimation. The method proposed in this paper models the local surface geometry by using a set of quadratic curves. Consequently, as the proposed model has a large number of degrees of freedom, it is capable of modeling the local surface geometry much more accurately compared to quadric surface fitting. The experimental setup for evaluating the proposed approach is composed of randomly generated Bezier surfaces for which the curvature is known with various levels of additive Gaussian noise. We compare the results obtained by the proposed approach to those obtained by other techniques. It is demonstrated that the proposed approach produces significantly improved estimation results which are less sensitive to noise.
In this paper we extend previous techniques we developed for efficient morphological processing of 2D document images to the analysis of 3D voxel data obtained by Computed Tomography (CT) scans. The proposed approach is based on a directional interval coding scheme of the voxel data and a basic set of operations that can be employed directly to the encoded data. The scan lines can be chosen to be in an arbitrary direction so as to fit directionality inherent to the data. Morphological operations are obtained by manipulating pairs of intervals belonging to the data and the kernel, where such manipulation can result in the addition, removal, or change of existing intervals. In addition to the implementation of ordinary morphological operations we develop a convolution operation that can be applied directly to the encoded data thus enabling the implementation of regulated morphological operations which incorporate a variable level of strictness. The computational complexity of the proposed operations is evaluated and compared to that of the standard implementation. The paper concludes with simulation results in which the execution time of iterative application of different morphological operations based on a standard implementation and the proposed encoded implementation are compared.
Mesh surfaces are an important form of 3D object representation which has become increasingly significant in recent years due to evolving 3D acquisition techniques, computing power, and commercial needs. Feature edge extraction in polygonal meshes has numerous applications ranging from scene segmentation and object recognition to mesh compression. The method proposed in this paper for feature edge extraction in polygonal mesh surfaces follows the signal processing approach to edge detection in images at multiple scales and in the presence of noise. Namely, multiple smoothing iterations in the spatial domain are used to produce several versions of the original mesh at different scales, whereas the same topology is maintained throughout the scales. Edges are extracted at each scale by applying a differential operator to an underlying locally smooth surface represented by the mesh at that scale so as to evaluate the curvature at each vertex of the mesh. The proposed approach is based on the identification of multiple local support planes that facilitate the representation of an arbitrary mesh surface by a set of overlapping Monge patches. Such a representation enables the application of curvature estimation techniques commonly employed for range image analysis, and provides an alternative to Laplacian smoothing by local quadratic surface fitting. The proposed approach is suitable for irregular and non-dense meshes.
Computer models of real world objects and scenes are essential in a large and rapidly growing number of applications, hence motivating the automatic generation of models from images. While the completeness and accuracy of extracted models may be essential in some cases, in applications such as image-based view synthesis in which the goal is to produce new views of a scene, partial models with limited accuracy may produce satisfactory results. In this paper a method is described for partial image based-modeling which relies on a sparse set of matching points between several views. While a sparse set of matching points may be obtained more reliably, it provides only partial information on the reconstructed scene and uses only a small subset of the information contained in the images. Consequently, in the proposed approach, correlation constraints are used in order to test hypotheses in projective space so as to improve the correctness of the reconstructed model. The correlation constraints are based on all the image pixels belonging to the convex hull of the matched point set, thus utilizing a large amount of the information contained in the images. The same constraints are then used to modify the reconstructed model by detecting zones in which the model should be broken into several parts in order to accommodate occlusions in the scene and in order to smooth planar surfaces composed of several polygons. The paper provides demonstration of the application of the proposed approach to image-based view synthesis and geometric distortion correction in document images.
Robotic teleoperation is a major research area with numerous applications. Efficient teleoperation, however, greatly depends on the provided sensory information. In this paper, an integrated radar- photometry sensor is presented. The developed sensor relies on the strengths of the two main modalties: robust radar-based range data, and high resolution dynamic photometric imaging. While radar data has low resolution and depth from motion in photometric images is susceptible to poor visibility conditions, the integrated sensor compensates for the flaws of the individual components. The integration of the two modalities is achieved by us ing the radar based range data in order to constrain the optical flow estimation, and fusing the resulting depth maps. The optical flow computation is constrained by a model flow field based upon the radar data, by using a rigidity constraint, and by incorporating edge information into the optical flow estimation. The data fusion is based upon a confidence estimation of the image based depth computation. Results with simulated data demonstrate the good potential of the approach.
The teleoperation of equipment under impoverished sensing and communication delays, cannot be handled efficiently by conventional remote control techniques. Our approach to this problem is based on an augmented reality control mode in which a graphical model of the equipment is overlaid upon real views from the work-site. A basic capability required in order to produce such an augmented reality mode is the ability to synthesize visual information from new viewpoints based upon existing ones, so as to compensate for the sparsity of real data. Our approach to the problem of image-based view synthesis is based upon the implicit construction of a 3D approximation of the scene, composed of planar triangular patches. New views are then generated by texture-mapping the available real image data onto the reprojected triangles. In order to generate a physically valid joint-triangulation which minimizes the distortions in the rendering of the new view, an iterative approach is utilized. This approach begins with an initial triangulation and refines it iteratively through node-linking alterations and a split and merge process, based upon correlation values between corresponding triangular patches. The paper presents results for both synthetic and real scenes.
Directional decomposition of maps and line-drawing images has the advantage of stressing directional information, and so may assist in the analysis of such images. In this paper, a method is described for directional decomposition of maps and line-drawing images into an arbitrary number of directional edge planes, where the range of directions that is included in each directional edge plane may be determined individually. The proposed approach is based on self dilated line kernels, which are generated by dilating discrete periodic line segments by themselves. These kernels are then used by regulated morphological operations, that extend the fitting interpretation of the ordinary morphological operations, in order to obtain the required decomposition. The paper describes necessary propositions of the proposed approach, and represents examples of their use for the application of line-drawings analysis.
In previous work, we presented algorithms for the analysis of maps and line-drawing images which are based on the processing of directional edge planes by directional morphological operations. This paper discusses the problem of efficient morphological processing of directional edge planes on a serial computer, where it is assumed that arbitrary kernels may be used. The proposed approach is based on a compact representation of the edge planes, which is obtained by using directional interval coding, where the direction of the interval as is adapted individually in each directional edge plane. In a broader sense, the proposed approach provides a general framework for efficient processing of binary images which as based on a directional interval coding. This framework supports basic morphological operations with arbitrary kernels and basic logical operations between any number of images.
Shape decomposition is mainly motivated by structural shape description methods. Given a complex shape it is possible to decompose it into simpler sub-parts, that are well described by scalar global features, and then use the sub-parts in order to compose a structural description of the shape. This paper presents a shape decomposition method that performs decomposition of a polygonal approximation of the shape, into nearly convex sub-parts which are possibly overlapping, by locating structures in a fuzzy relation matrix. The fuzzy relation that is used to construct the fuzzy relation matrix, is defined on the set of the polygon vertices by a membership function that has a maximal value when the line connecting two vertices is contained completely within the polygon, and decreases as the deviation of this line from the polygon increases.
A common task in cytogenetic tests is the classification of human chromosomes. Successful separation of touching chromosomes is vital for correct classification. Current systems for automatic chromosome classification are mostly interactive and require human intervention for correct separation of touching chromosomes. Common methods for separation of touching chromosomes tend to fail where ambiguity or incomplete information are involved. We developed a method for the separation of touching objects which encompasses a low-level knowledge about the objects and uses only extracted information. This method is fast and does not depend on the existence of a separating path. Finally the complete process of separation is demonstrated, including cases which are not separable by other methods.
The purpose of this course is to introduce algorithms for 3D structure inference from 2D images. In many applications, inferring 3D structure from 2D images can provide crucial sensing information. The course will begin by reviewing geometric image formation and mathematical concepts that are used to describe it, and then move to discuss algorithms for 3D model reconstruction.
The problem of 3D model reconstruction is an inverse problem in which we need to infer 3D information based on incomplete (2D) observations. We will discuss reconstruction algorithms which utilize information from multiple views. Reconstruction requires the knowledge of some intrinsic and extrinsic camera parameters, and the establishment of correspondence between views. We will discuss algorithms for determining camera parameters (camera calibration) and for obtaining correspondence using epipolar constraints between views. The course will also introduce relevant 3D imaging software components available through the industry standard OpenCV library.