In this work we present the use and properties of a transformation uncertainty polytope for a frequently encountered problem in computer vision: registration in visual inspection. For each feature point in the reference image, a corresponding feature point must be distinguished in the test image among many candidates. A convex polytope is used to captivate the uncertainty of the transformation from the reference feature points to uncertainty regions in the test image in which the candidate matches are to be found. By checking the consistency of the uncertainty transformation for pairs of possible matches, we construct a consistency graph. The consistency graph gives us the necessary information to distinguish the good matches from the rest. Based on the best matches, we compute the registration transformation.
An interesting problem in analysis of video data concerns design of algorithms that detect perceptually significant features in an unsupervised manner, for instance methods of machine learning for automatic classification of human expression. A geometric formulation of this genre of problems could be modeled with help of perceptual psychology. In this article, we outline one approach for a special case where video segments are to be classified according to expression of emotion or other similar facial motions. The encoding of realistic facial motions that convey expression of emotions for a particular person P forms a parameter space XP whose study reveals the “objective geometry” for the problem of unsupervised feature detection from video. The geometric features and discrete representation of the space XP are independent of subjective evaluations by observers. While the “subjective geometry” of XP varies from observer to observer, levels of sensitivity and variation in perception of facial expressions appear to share a certain level of universality among members of similar cultures. Therefore, statistical geometry of invariants of XP for a sample of population could provide effective algorithms for extraction of such features. In cases where frequency of events is sufficiently large in the sample data, a suitable framework could be provided to facilitate the information-theoretic organization and study of statistical invariants of such features. This article provides a general approach to encode motion in terms of a particular genre of dynamical systems and the geometry of their flow. An example is provided to illustrate the general theory.
Recently it has been theorized that some European painters as early as 1420 used concave mirrors (and, later, converging lenses) to project real inverted images onto their supports which they then traced and painted over. We review the image analytic, historical and art historical evidence and counter-evidence for this bold claim, focusing on key paintings in the debate. While some of the evidence is consistent with the use of optical projections in the 15th century, all such evidence is also consistent with other explanations as well. More importantly, for those paintings highlighted as supporting the projection theory, there is much evidence that is inconsistent with the use of optics or extremely difficult to explain as arising from the use of optics. Further, there is no historical documentary evidence from the 15th century suggesting anyone had even seen an image of an illuminated object projected onto a screen -- the first step in the proposed projection method. The projection method would have been the most sophisticated optical procedure of its day, which theory proponents speculate was discovered by artists, not the scientists who were actively exploring optical systems. Because the burden of proof lies foursquare upon the theory’s proponents -- the revisionists -- in the absence of compelling reasons to reject “traditional” (non-optical) explanations we must reject the projection theory. We conclude by rejecting the claims that the optical projection theory has been “proven.”
The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
The process of reconstruction of a parabola in 3-D space from a pair of arbitrary perspective views obtains the set of parameters which represent the parabola. This method is widely used in many applications of 3-D object recognition, machine inspection and trajectory tracing. However in certain applications which require a large degree of accuracy, a study of errors in the process of reconstruction, with the help of a rigorous performance analysis is necessary. In this paper, the reconstruction of a 3D parabola from two perspective projections is described. In this process, the two end points and the vertex of the two pair of projections of the parabola are considered as feature points to reconstruct the parabola in 3-D. Simulation studies have been conducted to observe the effect of noise on errors in the process of reconstruction. The performance analysis illustrating the effect of noise, loss of accuracy due to mathematical calculations and parameters of imaging setup, on errors in reconstruction are presented. The angle between the reconstructed and original parabola in 3-D space has been used as a one of the criterion for the measurement of error. Smaller resolution of the image, certain geometric conditions and imaging setup produce poor performance in reconstruction. Results of this study are useful for the design of an optimal stereo-based imaging system, for best reconstruction with minimum error.
This paper describes a method to achieve different level of detail for the given volumetric data by assigning weight for the given data points. The relation between wavelet transformation and alpha shape was investigated to define the different level of resolution. Wavelets are mathematical tool for hierarchically decomposing functions. We apply this feature for describing the ranking of importance for each data point. We treat a volumetric scattered data as the coefficients corresponding to a three-dimensional piecewise constant basis functions of wavelet transformation. We assign weight value based on wavelet coefficient. The given volumetric scattered data points, with a real weight, is connected by using the concept of weighted alpha shapes. Scattered data is defined as a collection of data that have little specified connectivity between data points. The quality of interpolant in volumetric trivariate space depends not only on the distribution of the data points in R3, but also on the data value (intensity). Wavelet coefficients can provide the description in terms of a coarse overall shape, plus details that range from broad to narrow with an approximation coefficients and detail coefficients, respectively. We can improve the quality of an approximation by using wavelet coefficient as weight for the corresponding data points.
Two efficient workflow are developed for the reconstruction of a 3D full color building model. One uses a point wise sensing device to sample an unknown object densely and attach color textures from a digital camera separately. The other uses an image based approach to reconstruct the model with color texture automatically attached.
The point wise sensing device reconstructs the CAD model using a modified best view algorithm that collects the maximum number of construction faces in one view. The partial views of the point clouds data are then glued together using a common face between two consecutive views. Typical overlapping mesh removal and coarsening
procedures are adapted to generate a unified 3D mesh shell structure. A post processing step is then taken to combine the digital image content from a separate camera with the 3D mesh shell surfaces. An indirect uv mapping procedure first divide the model faces into groups within which every face share the same normal direction. The corresponding images of these faces in a group is then adjusted using the uv map as a guidance. The final assembled image is then glued back to the 3D mesh to present a full colored building model. The result is a virtual building that can reflect the true dimension and surface material conditions of a real world campus building. The image based modeling procedure uses a commercial photogrammetry package to reconstruct the 3D model. A novel view planning algorithm is developed to guide the photos taking procedure. This algorithm successfully generate a minimum set of view angles. The set of pictures taken at these view angles can guarantee that each model face shows up at least in two of the pictures set and no more than three. The 3D model can then be reconstructed with minimum amount of labor spent in correlating picture pairs. The finished model is compared with the original object in both the topological and dimensional aspects. All the test cases show exact same topology and reasonably low dimension error ratio. Again proving the applicability of the algorithm.
We propose a new implicit surface polygonalization algorithm based on front propagation. The algorithm starts from a simple seed (e.g. a triangle) that can be automatically initialized, and always enlarges its boundary contour outwards along its tangent direction suggested by the underlying volume data. Our algorithm can conduct mesh optimization and Laplacian smoothing on-the-fly and generate meshes of much higher quality than the Marching-cubes algorithm. Experimental results on both real and synthetic volumetric datasets are shown to demonstrate the robustness and effectiveness of the new algorithm.
In this paper, we propose a method to estimate the pose of tube-shaped flexible objects from their shadow cast on non-planar backgrounds. To compensate for the distortion of the shadow by the non-planar backgrounds, we introduce a model-based method to calculate a look-up-table that enables scene-specific undistortion of shadow images if the geometric relations of camera, background and light source are known. In accordance with the established term image rectification that is used to correct images for camera lens errors we propose the term geometric rectification for this process. It is shown how to estimate the 3d position of tube-shaped flexible objects from geometrically rectified shadow images. For unknown lamp positions, we present a method to estimate the position of the point light from an image showing a special calibration rig. The method is verified on synthetic and real world example images taken from Industrial Machine Vision applications.
We demonstrate a standalone digital pen that writes on regular paper by tracking the writing nib’s absolute position in paper coordinates. The pen writes like a regular pen, but simultaneously captures handwritten information digitally with the aid of a Navigation Engine mounted atop the pen. The Navigation Engine has a wide-field-of-view vision system with a single-viewpoint catadioptric lens for observing the environment as the pen writes. A processor belonging to the Navigation Engine applies computationally efficient navigation algorithms to a stream of images of the pen’s environment to capture the pen’s full movement. The resultant data stream including x-y position of the nib and the pen’s Euler angles facilitate application of this technology to a wide range of tasks. In contrast to all other known digital pen technologies, this pen not only functions like a regular pen, but also provides an electronic copy of the digital writing without using any special paper.
The perspex machine is a theoretical computer that arose from the unification of the Turing machine with projective geometry. It is super-Turing because it can operate on any, or all, numbers on the real number line, for example, along a geometrical ray. It can operate on individual Turing-incomputable numbers and on sets of numbers with the cardinality of the continuum (real number line). This exceeds the cardinality of the natural numbers, that is, the number of symbols accessible to a Turing machine. The perspex machine can compute continuous events in addition to discrete ones. We suppose this gives it enough power to describe the operation of the physical universe, including the operation of minds.
We review the perspex machine and improve it by reducing its halting conditions to one condition. We introduce a data structure that can accelerate a wide class of perspex programs. We show how the perspex can be visualised as a tetrahedron, artificial neuron, computer program, and as a geometrical transformation. We discuss the temporal properties of the perspex machine, dissolve the famous time travel paradox, and present a hypothetical time machine. Finally, we discuss some mental properties and show how the perspex machine solves the mind-body problem and, specifically, how it provides one physical explanation for the occurrence of paradigm shifts.
The perspex machine is a continuous machine that performs perspective transformations. It is a super-Turing machine that contains the Turing machine at discrete locations in perspex space. We show that perspex spaces can be constructed so that all of the operations in a Turing program lie in a continuum of similar operations in the space, except for the Turing halt which is always a discontinuous operation. We then show how to convolve a Turing program to produce an isolinear program that it is robust to missing instructions and degrades gracefully when started incorrectly, sometimes even recovering in performance. We hypothesize that animal brains are similarly robust and graceful because animal neurons share the geometrical properties of the perspex machine. Furthermore, convolution of Turing programs makes possible the band-pass filtering and reconstruction of programs. Global processing can then be obtained by executing the broad bands before the finer ones. Hence, any existing computer program can be compiled on a perspex machine to make it global in operation, robust to damage, and degrade gracefully in the presence of error. The three “Holy Grails” of AI -- globality, robustness, and graceful degradation -- can be supplied by a compiler. They do not require specific programming in individual cases because they are geometrical properties of the perspex machine.
In this paper we are going to present end to end algorithms that address curvature extraction, shape representation and shape similarity retrieval. Our novel shape contour tracing algorithm can trace open, ill-defined and closed shapes and return an ordered set of background points adjacent to the shape’s contour. Our shape descriptor builds a multi-resolution equal segmentation polygonal based shape representation that uses the center of the shape as a reference point and is invariant to scale, rotation and translation, and efficient in terms of time and space. The shape descriptor captures three contour primitives including distance and slope at regular intervals around the center. The dual stage novel shape matching algorithm works in two stages. The first is data driven and uses a shape signature metric to factor out dissimilar shapes while the second stage linearly scans the remaining shapes and measures the similarity using elasticity with a distance and a user-friendly fuzzy measure. We have applied our algorithms on the MPEG-7 shape core experiment and achieved the best result reported based on the number of queries. Our algorithms achieved 83.23% for the similarity test of part B where the optimized CSS shape descriptor came second at 81.12%.
For a binary image containing only curves (and lines) in a background infested by binary noises, (e.g., salt-and-pepper noise,) a very efficient way to extract the image data, and to save them in a very compact file for an accurate and complete image recall later, is very attractive to many image processing and pattern recognition systems. This paper reports the data-extraction method we developed recently for inputting a binary image to a special neural-network pattern recognition system, the noniterative, real-time learning system. We use an adaptive/tracking window to track the direction of a continuous curve in the binary image, and record the xy-coordinates of all points on this curve until the window hits an end point, or a branch point, or the original starting point. By scanning this tracking window across the whole image frame, we can then segment the original binary image into many single curves. The xy’s of points on each curve can then be analyzed by a curve fitting process, and the analytic data can be stored very compactly in an analog data file. This data file can be recalled very efficiently to reconstruct the original binary image, or can be used directly for inputting to a special neural network and for carrying out an extremely fast pattern learning process. This paper reports the image-processing steps, the programming algorithm, and the experimental results on this novel image extraction technique. It will be verified in each experiment by reconstructing the original image from the compactly extracted analog data file.
The question, which shapes can be digitized without any change in the fundamental geometric and topological properties is of great importance for the reliability of image analysis algorithms, but is nevertheless unanswered for a lot of digitization schemes. While r-regularity is a sufficient criterion for shapes to be reconstructed correctly by using any regular or irregular sampling grid of certain density, necessary criteria are up to now unknown. The author proves such a necessary criterion: If you choose some sampling grid and you want a shape to be digitized correctly with any alignment of this grid, then the shape has to be a bordered 2D-manifold, i.e. its boundary has to have no junctions. This implies that any correct digitization is an extended well-composed set and thus the well known problems of defining connectivity in 2D are always due to wrong sampling or improper original shapes. This is of great importance, since extended well-composed sets have many nice topological properties, for example the Jordan curve theorem holds and the Euler characteristic is locally computable. Moreover the author proves a second necessary criterion: In case of a correct digitization with a grid of a certain density, shapes are not allowed to have corners with an angle smaller than 60 degrees. In case of common square grids the smallest possible angle is even 90 degrees. If some shape has some corner with a too small angle, the shape can not be digitized topologically correctly with every alignment of some sampling grid, if this grid exceeds a certain density. Thus the intuitive assumption that a finer grid would lead to a better digitization (in a topological sense) is simply wrong.
A 3D binary digital image is said to be well-composed if and only if the set of points in the faces shared by the voxels of foreground and background points of the image is a 2D manifold. Well-composed images enjoy important topological and geometric properties; in particular, there is only one type of connected component in any well-composed image, as 6-, 14-, 18-, and 26-connected components are equal. This implies that several algorithms used in computer vision, computer graphics, and image processing become simpler. For example, thinning algorithms do not suffer from the irreducible thickness problem if the image is well-composed, and the extraction of isosurfaces from well-composed images using the Marching Cubes (MC) algorithm or some of its enhanced variations can be simplified, as only six out of the fourteen canonical cases of cube-isosurface intersection can occur. In this paper, we introduce a new randomized algorithm for making 3D binary digital images that are not well-composed into well-composed ones. We also analyze the complexity and convergence of our algorithm, and present experimental evidence of its effectiveness when faced with practical medical imaging data.
The normalized cut algorithm is a graph partitioning algorithm that has previously been used successfully for image segmentation. It is originally applied to pixels by considering each pixel in the image as a node in the graph. In this paper we investigate the feasibility of applying the normalized cut algorithm to micro segments by considering each micro segment as a node in the graph. This will severely reduce the computational demand of the normalized cut algorithm, due to the reduction of the number of nodes in the graph. The foundation of the translation to micro segments will be the region adjacency graph. A floating point based rainfalling watershed algorithm will create the initial micro segmentation. We will first explain the rainfalling watershed algorithm. Then we will review the original normalized cut algorithm for image segmentation and describe the changes that are needed when we apply the normalized cut algorithm to micro segments. We investigate the noise robustness of the complete segmentation algorithm on an artificial image and show the results we obtained on photographic images. We also illustrate the computational demand reduction by comparing the running times.
We present a method for building panoramic video for Video GIS. Video GIS(geographic information system) is a new field of mosaics application, which utilized for the automobile navigation system and the panoramic video which composed of several images taken by adjacent cameras provides sufficient information to a first trip driver. The perspective transformation, which estimated from the appropriate corresponding pairs between adjacent images, can construct the panoramic video without unwilling distortions. We use corner points for the corresponding feature, and local peak detection of corner strength estimated from morphological structures are utilized for fast and robust corner detection. The criterion method of corner strength we proposed, guarantees the robust detection of corner in any situations. For the perspective transformation, 8-parameters are estimated from perspective equations, and four pairs of matched points in the adjacent images are selected via pattern matching of corner points to construct the equations. In general, when we stitch two adjacent images together by using 8-parameter transform, some unwanted discontinuities of intensity or color will exist between their common areas and bilinear blending technique is used to construct a seamless panorama. Experiments show that our method yields good results in various conditions.
This paper proposes a novel method synthesizing a series of continuous and realistic intermediate views between a pair of reference images without 3D knowledge. The synthesizing procedure
consists of four main steps: segmentation, image pre-warping, view
morphing and image post-warping. By KLT tracker the feature correspondence between a pair of reference images can be found accurately and robustly. Therefore, the fundamental matrix constructed from the correspondence has enough dependability. Based on fundamental matrix, the scanlines number and coordinates can be computed in order to pre-warp the reference image. Morphing the pre-warped virtual image between the pair of reference images across the positions of the virtual view point, a series of continuous and realistic intermediate views can be constructed. By post-warping the interpolated views, the virtual images can be obtained. And they are very similar to the actual ones which can be seen in real environment. View morphing relies on the correspondence of feature exclusively. Since KLT tracker automatically builds the robust correspondence of feature, virtual images are synthesized more accurately in our method than in other previous methods. Experimental Result shows that the higher quality correspondence of feature causes a realistic visual effect in morphing.