Automatic affective expression recognition has attracted more and more attention of researchers from different
disciplines, which will significantly contribute to a new paradigm for human computer interaction (affect-sensitive
interfaces, socially intelligent environments) and advance the research in the affect-related fields including psychology,
psychiatry, and education. Multimodal information integration is a process that enables human to assess affective states
robustly and flexibly. In order to understand the richness and subtleness of human emotion behavior, the computer
should be able to integrate information from multiple sensors. We introduce in this paper our efforts toward machine
understanding of audio-visual affective behavior, based on both deliberate and spontaneous displays. Some promising
methods are presented to integrate information from both audio and visual modalities. Our experiments show the
advantage of audio-visual fusion in affective expression recognition over audio-only or visual-only approaches.
Combining different modalities for pattern recognition task is a very promising field. Basically, human always
fuse information from different modalities to recognize object and perform inference, etc. Audio-Visual gender
recognition is one of the most common task in human social communication. Human can identify the gender
by facial appearance, by speech and also by body gait. Indeed, human gender recognition is a multi-modal
data acquisition and processing procedure. However, computational multimodal gender recognition has not
been extensively investigated in the literature. In this paper, speech and facial image are fused to perform a
mutli-modal gender recognition for exploring the improvement of combining different modalities.
In this paper, an efficient rotation invariant texture classification method is proposed. Comparing with the previous texture classification method, which is also based on Gabor wavelets, two modifications are made in this paper. Firstly, an adaptive circular orientation normalization scheme is proposed. Because both the effects of orientation and frequency to Gabor features are considered, our method can effectively eliminate the disturbance from inter-frequency, and therefore has the ability to reduce the effect of image rotation. Secondly, besides the Gabor features, which mainly represent the local texture information of an image, the statistical property of the intensity values of an image is also used for texture classification in our algorithm. Our method is evaluated based on the Brodatz album, and the experimental results show that it outperforms the traditional algorithms.
Radar target's HRRP always has some information redundancy, and is easily to be affected by noise or lack of
separability. In this paper, using the advantage of kernel methods for solving nonlinear forms, we propose a radar
target's HRRP feature extraction method based on Kernel Principal Component Analysis (KPCA) and a radar target
fuzzy recognition method based on Support Vector Data Description (SVDD). In the course of feature extraction, KPCA
method is used to reduce radar target's HRRP and to compress the dimension of HRRP, so that we can depress the noise
and the sensitivity of target posture; in the course of recognition, we first find the smallest hyper-sphere including every
class of training samples in feature space, then construct the fuzzy membership function according to the distance
between every testing sample and the hyper-sphere surface, so we can recognize every testing sample based on its fuzzy
membership. Simulation results of multi-target recognition reveal that the new method proposed in this paper not only
achieves high recognition accuracy, but also has excellent generalization performance, for instance, we can achieve high
recognition accuracy in lower SNR. So the new feature extraction and recognition method proposed in this paper is
particularly suitable for radar target recognition.
Consensus building in group support systems relies on the mutual-question and mutual-elicitation of experts, so a
feedback mechanism is required to conduct experts to converge their thinking by visualizing the individual opinion and
the consistent state of the group. This paper proposes a new feedback mechanism, which first clusters the experts'
preferences into a set of subgroups, and then uses different line-types or line-colors to display the clustered opinions in
parallel coordinate. By using this mechanism, the group consistency is analyzed and the group discussion is conducted
efficiently. One of the characteristics of the proposed method is that it can protect the minority views automatically. An
example is presented to illustrate the application of the method.
This paper presents a novel scene classification method using low-level feature and intermediate feature. The purpose of
the proposed method is to improve the performance of scene classification and reduce the labeled data required using the
complementary information between low-level and intermediate feature. The proposed method uses the co-training
algorithm to classify scenes, in which the low-level feature and intermediate feature are two views of co-training
algorithm. For low-level feature, Block Based Gabor Texture (BBGT) feature is extracted to describe the texture
property of images incorporating the spatial layout information. For intermediate feature, Bag Of Word (BOW) feature is
extracted to describe the distribution of local semantic concepts in images based on quantized local descriptors.
Experiment results show that this proposed method has satisfactory classification performances on a large set of 13 categories of complex scenes.
Put forward a method of 3D point reconstruction from an image taken by monocular camera. In algorithm, the coordinate
conversion factor of the given point can be calculated through a pair of parallel lines, whose positions in the scene are
known. Then reconstruction of any points in 3D space which contains parallel lines can be achieved using the coordinate
conversion factor. In experiments, the algorithm is used to estimate the trajectory of walking people.
To reconstruct a 3D scene around the lunar rover from stereo image-pairs captured by the panoramic cameras, based on
which an intuitive platform can be put up for scientists to plan exploration commands, we set about to study the 3D
scene reconstruction. This paper mainly presents a scheme of registering local scene models to reconstruction a large
scene. When a few of local 3D scene models have been reconstructed respectively, we firstly find the common 3D point
sets between every two adjacent local models based on edge detection and image matching, secondly fit the matrix of
coordinate transformation employing the technique of separating rotation matrix and translate vector, and lastly register
these local 3D models into a uniform coordinate system. In this scheme, we don't need to set a few of control points in a
scene beforehand; and we determine the rotation matrix by a system of linear equations based on Cayley transformation.
Experimental results of reconstructing indoor and outdoor scenes show that our scene registration method is feasible.
In this paper we propose an approach to model the posterior probability output of multi-class SVMs. The sigmoid
function is used to estimate the posterior probability output in binary classification. This approach modeling the posterior
probability output of multi-class SVMs is achieved by directly solving the equations that are based on the combination of
the probability outputs of binary classifiers using the Bayes's rule. The differences and different weights among these
two-class SVM classifiers, based on the posterior probability, are considered and given for the combination of the
probability outputs among these two-class SVM classifiers in this method. The comparative experiment results show that
our method achieves the better classification precision and the better probability distribution of the posterior probability
than the pairwise couping method and the Hastie's optimization method.
In this paper, we integrate a pair of CCD cameras and a digital pan/title of two degrees of freedom into a binocular stereo
vision system, which simulates the panoramic cameras system of the lunar rover. The constraints for placement and
parameters choice of the stereo cameras pair are proposed based on science objective of Chang'e-IImission. And then
these constraints are applied to our binocular stereo vision system and analyzed the location precise of it. Simulation and
experimental result confirm the constraints proposed and the analysis of the location precise.
A novel shape classification method based on Hidden Markov Models (HMMs) is proposed in the paper. Instead of
characterizing points along an object contour, our method employs HMMs to model the relationship among structural
segments of the contour. Firstly, an object contour is partitioned into segments at points with zero curvature value.
Secondly, each segment is represented with structural features. Finally, a HMMs is utilized to characterize the object
contour by treating each segment as an observation of a hidden state. Promising experimental results obtained on two
popular shape datasets demonstrate that the proposed method is efficient in classifying shapes, particularly unclosed
shapes and similar shapes.
Nowadays there are more and more targets, so it is more difficult for radar networking to track the important targets. To
reduce the pressure on radar networking and the waste of ammunition, it is very necessary for radar networking to
recognize the targets. Two target recognition approaches of radar networking based on fuzzy mathematics are proposed
in this paper, which are multi-level fuzzy synthetical evaluation technique and lattice approaching degree technique. By
analyzing the principles, the application techniques are given, the merits and shortcomings are also analyzed, and
applying environments are advised. Another emphasis is the compare between the multiple mono-level fuzzy synthetical
evaluation and the multi-level fuzzy synthetical evaluation, an instance is carried out to illuminate the problem, then the
results are analyzed in theory, the conclusions are gotten which can be instructions for application in engineering.
In the industry, the three dimension of the object is often measured. But the method is usually the contact measurement.
And the speed is slowly. The measurement is needed both high precision and fast speed, so the non-contact measurement
is required. The grating projecting is the non-contact measurement with prospects. But there are some difficulties in the
method. Firstly, when the object has the steps shape or there are shadows in the grating stripes, the disconnected phase
can't be correctly unwrapped. Secondly, it is very difficulty to realize the real time digital filter. Now the digital filter is
man-machine conversation, so the speed is slowly. Thirdly, in order to measurement the different object, the adaptive
grating is needed.
In order to resolve the above problems, the grating program is created on the computer. The program has many
functions, including the phase shift, the two-frequency grating and the grating frequency is easy to adjust. So the
adaptive grating is realized. The two-frequency grating is programmed by the computer. And it is projected to the
measured object. The measurement object is placed on the exact rotary platform. The deformed grating is collected in the
Charge Coupled Device (CCD). After getting two images, the two images are mosaiced. Then the clear object image
modulated by the grating is got. The problem of the steps shape or there are shadows in the grating stripe is worked out.
Then the fourier transform is used to process the image. In the traditional fourier transform profilometry, the phase is
worked out as follows: After fourier transform, the zero frequency spectra is shifted to the origin of frequency, then filter
the needed signal. Then the needed signal is shifted to the center of frequency, and then the zero frequency is shifted to
both sides. After inverse fourier transform, the imaginary part is getting, so the phase is getting. But it has a difficult in
the above method, because of three times frequency shift, and the center frequency is difficult to confirm, the frequency
shift can't be correct and the filter can't be designed correctly, and the error can be transferred, so the result of filter is
not well, it has bad effect to the later measurement. The result of measurement is also not well. In order to conquer the
difficult, after the fourier transform, filtering the needed signal without frequency shifting, then inverse fourier
transform. So the phase relational with the frequency and coordinate is getting. The phase of the reference surface is
getting by the same method. Then the difference phase is getting. The real difference phase of low frequency is easy to
got, then the real difference phase of high-frequency is work out based on it. At last, according to the relation of
difference phase and the height, the three dimensional profilometry of the object is reconstructed.
An example of step shape object is done. The three dimensional profilometry is reconstructed successfully. It takes 3
second to reconstruct the three dimensional profilometry. The precision is 0.5mm. The result indicates that the method
has conquered the above problems.
The result indicates that the method is simple, with fast speed and high precision. Three dimension profilometry
measurement of the objected that have the step shape or the shadow in the projecting can be successfully resolved.
The rapid development of the astronomical observation has led to many large sky surveys such as SDSS, 2DF, LAMOST
etc. Because of the sheer size of these surveys, it becomes urgent to develop methods of reliable and automated spectral
recognition. A new cross correlation technique for redshift determination of galaxy spectra is presented in this paper. We
use principle components analysis to construct galaxy templates. According to the redshift candidates determined by
spectra line features, cross-correlation between the observed spectrum and the templates is measured by the weighted
sum of several similarity evidences. The candidate of the highest correlation is chosen as the estimated redshift. Both
simulated spectra and observed spectra are used to test the proposed method, the correct rate can reach 97% above.
According to the main characteristic of the city's water affairs dispatchment, the structure of water affairs dispatchment
based on rough sets theory was proposed. After each factors were considered synthetically, knowledge expression system
was set up, and the water affairs dispatchment control regulation was reduced and acquired. To some extent, it's a new
method of processing the uncertain information in the water affairs dispatchment. The example demonstrates that this
method has reduced the dispatchment control, and its regulation acquired is of objectivity, so it can solve preferably the
control problem of the city's water affairs dispatchment.
Classification is a basic topic in data mining and pattern recognition. Following advances in computer science, a lot of
new methods have been proposed in recent years, such as artificial neural networks, decision trees, fuzzy set and
Bayesian Networks, etc. As a probabilistic network, Bayesian Networks is a powerful tool for handling uncertainty in
data mining and many other domains. Naïve Bayes Classifier (NBC) is a simple and effective classification method,
which is built on the assumption of conditional independence between the class attributes. This topology structure can
not describe the inherent relation among the features. In this paper, we apply Bayesian Networks Augmented Naïve Bayes (BAN) for the texture classification of aerial images, which relaxes the independent assumption in NBC. A new method for learning the networks topology structure based on training samples is adopted in this paper. Comparison experiments show higher accuracy of BAN classifier than NBC. The results also show the potential applicability of the proposed method.
In LPR system, character recognition subsystem is heavily affected by image quality. To resolve this problem and
improve recognition rate, a new algorithm is proposed, in which pulse coupled neural network (PCNN) is applied into the
recognition of license plate character. PCNN model is simplified to improve computation efficiency, and then is utilized to
extract three features from dimension-normalized binary result of input character image. Based on these features, weighted
voting is performed and final estimation of input character is made. The experiment results show that compared with common algorithms based on BP network, the new algorithm based on simplified PCNN model has higher total recognition rate and stronger robustness, and is more convenient and flexible.
In order to settle the conflict of the visualization efficiency and the great amount data in digital city in 3d GIS system, the
author put forward a 3D data model with LOD ability especially for the complex 3d object such as buildings in 3d city GIS
system. In this paper, the author described the components of the complex 3d object, then explained that we can settle the
conflict mentioned above by designing a new model that has the capability of describing levels of details. The author
expound the basic theory of the model with LOD ability related with the vision point, and then defined several key
conceptions, at same time, the author analyzed the principles and the visualization process of the model in 3d space. In the
end of the paper, the author verified data model through an experiment, the results of the experiment showed that the model
put forward is effective and high efficiency.
Introduce a CAD-based visual inspection system, which is designed to measure geometry dimension of sheet metal parts
automatically, such as distance, angle, parameters of circle, etc. The inspection system extracts the features of sequence
images depending on CAD data, and then to reconstruct real 3D of part. It outputs the measurement results against the
various requirements of customer. The main contents of the paper include searching and matching of group of lines with
image space constrains; visual inspection and reconstruction of sheet metal parts based on the photogrammetry
generalized point photogrammetry. The result of experiments shows that the inspection system is robust and achieves the
precision level of repeated manual measurement of an experienced inspector. The arithmetic discussed in the paper has
potential to deal with another object with sharpness edges except sheet metal part.
With the emerging security demands, biometric identification technology has attracted more and more attention in recent
years, and iris recognition is one of the most reliable biometric technologies. Iris localization is a crucial part in the iris
recognition, which is quite time-consuming and easily disturbed by various noises, especially the eyelashes. A novel iris
localization method is proposed in this paper. In the location of inner iris boundary, the gray curves of a row and a
column with the pupil edge are used to estimate the coarse center and radius of pupil, which can reject the eyelash noises.
The experiments show this coarse location method has better accuracy and speed than the common gray projection. Edge
points of pupil are extracted by a gradient operator and fitted as the iris inner boundary. In the location of outer iris
boundary, the image binarization is use to mark most noises, and then the outer iris boundary is extracted by
integro-differential operator from the coarseness to fine. Performance experiments have been done, and the results show
that about 0.175 second at speed and 99.5% at precision are reached by developed algorithm. In comparison with other
classical methods, this algorithm has faster speed and better robustness.
Probability-Based Covering Algorithm (PBCA) is a new algorithm based on probability distribution. It uses the probability of samples and decides the class of the sample on the border of coverage by voting. In the original covering algorithm, there are many tested samples that can't be classified by the spherical neighborhood gained. The network structure of PBCA is mixed structure composed of feed-forward network and feedback network. The method of adding
some samples of different class and enlarging the coverage radius is used to decrease the number of refused samples and
improve the rates of recognition. The algorithm is effected in improving the study precision.
Differential Evolution (DE) method is introduced in this paper to make up the insufficiency of basic probabilistic neural
network. Consequently, a new texture image recognition method based on Modified Probabilistic Neural Network
(MPNN) is proposed. At first, tree structure wavelet packet transformation is used to extract the energy characteristic,
and statistical method is used to extract the statistical mean value, average energy, standard deviation, and mean residual
characteristics for obtaining the feature vector; then the feature vector of texture image is trained by the MPNN, thus the
texture image is identified. The experiment result indicates that, compared to the BP neural network, RBF neural
network, and the basic probabilistic neural network, the modified probabilistic neural network has higher accuracy and
faster convergence speed.
In this paper, we proposed a manifold-based algorithm called Orthogonal Neighborhood Preserving Embedding (ONPE)
for dimensionality reduction and feature extraction. ONPE algorithm is based on the Neighborhood Preserving
Embedding (NPE) algorithm. NPE is an unsupervised dimensionality reduction method which is the linear
approximation of classical nonlinear method. However, the feature vectors obtained by NPE are nonorthogonal. ONPE
inherits NPE's neighborhood preserving property and produces orthogonal feature vectors. As orthogonal eigenvectors
preserve the metric structure of the image space, the ONPE algorithm has more neighborhood preserving power and
discriminating power than NPE. Furthermore, ONPE can find the mapping which best preserves the manifold's
estimated intrinsic geometry structure in a linear sense. Experimental results show that ONPE is an effective method for
The development of hardware and software is not sufficient to meet the real-time visualization requirements of large
scale 3D City Models. How to adaptively coordinate the speed and quality of rendering according to the data volume and
hardware/software environment is therefore a critical issue. This paper proposes an algorithm which predicts the
rendering time according to the features of 3D City Models at first, and then to calculate the object importance value
based on the mathematical model which considers the indicators of each object: location, distance, visibility and
semantics, and finally to select the object set to be rendered by a fast recursive algorithm. There are five factors selected
to test their influence on rendering time: triangle number, vertex number, texture number, screen pixel number, and the
texture image size. According to multivariate statistical theory, experimental results prove that both geometry and texture
data size are significant for rendering time of 3D City Models. A typical 3D building group models are employed for
experimental analysis. The results show that the method introduced in this paper is accurate to predict the time of
rendering 3D models with detailed texture. The adaptive rendering performance is also significantly improved.
We propose a totally novel method based on a revised ant colony clustering algorithm (ACCA) to explore the topic of
textural defect detection. In this algorithm, our efforts are mainly made on the definition of local irregularity
measurement and the implementation of the revised ACCA. The local irregular measurement defined evaluates the local
textural inconsistency of each pixel against their mini-environment. In our revised ACCA, the behaviors of each ant are
divided into two steps: release pheromone and act. The quantity of pheromone released is proportional to the irregularity
measurement; the actions of the ants to act next are chosen independently of each other in a stochastic way according to
some evaluated heuristic knowledge. The independency of ants implies the inherent parallel computation architecture of
this algorithm. We apply the proposed method in some typical textural images with defects. From the series of
pheromone distribution map (PDM), it can be clearly seen that the pheromone distribution approaches the textual defects
gradually. By some post-processing, the final distribution of pheromone can demonstrate the shape and area of the
Image semantic understanding is one of the most important techniques for solving the problem of semantic gap. By
introducing generalized computing into image semantic understanding, this paper presents a kind of third class image
description model. Then, under the guidance of the model, the approaches of image semantic information extraction is
proposed based on generalized set and generalized transformation. Finally, a kind of image semantic understanding
system based on generalized is sketched out.
We propose an appearance based model for face recognition in news videos using an enormously large databank of still
images. This is a step towards building an elaborate face-query system using multimodal audio-visual data. We use the
fact that faces of the same person appear similar than of different people. We preprocess the videos, apply feature
extraction, feature matching and a unique parallel line matching algorithm to develop a simple yet a powerful face
recognition system. We tested our approach on real world data and the results show good performance both for high
resolution still images and low resolution news videos without involving any training or tasks like face rectification,
warping etc. It can be incorporated as part of a larger multimodal news video analysis system with problems of time
alignment between text and faces. Our results show that this simple approach also works well where video modality is
the only source of information.
The simulation of ocean environment in the infrared has been a hot yet difficult problem in the field of computer
simulation. In this paper, the shortage of the simulation of infrared ocean images with Vega is analyzed, and then a new
simulation method based on 3D modeling with OpenGL is introduced. The new method abandons the high precision
mesh but uses mathematical model to manipulate vertex of the mesh and establish the model. Experiments demonstrated
that the method proposed is much more efficient and guarantees the quality of the simulation images. Finally a similarity
evaluation function based on features extracted from co-occurrence matrix such as angular second moment, entropy,
related coefficient, contrast and uniformity is put forward to evaluate the similarity of the images.
Logo recognition has gained much development in the document retrieval and shape analysis domain. As human computer
interaction becomes more and more popular, the logo recognition through a web-camera is a promising technology in view
of application. But for practical application, the study of logo recognition in real scene is much more difficult than the work
in clear scene. To cope with the need, we make some improvements on conventional method. First, moment information is
used to calculate the test image's orientation angle, which is used to normalize the test image. Second, the main structure of
the test image, which is represented by lines patterns, is acquired and modified Hausdorff distance is employed to match the
image and each of the existing templates. The proposed method, which is invariant to scale and rotation, gives good result
and can work at real-time. The main contribution of this paper is that some improvements are introduced into the exiting
recognition framework which performs much better than the original one. Besides, we have built a highly successful logo
recognition system using our improved method.
Rapid texture mapping of buildings is a key aspect for reconstruction of 3D city landscapes. An effective approach by
the way of coarse-to-fine 3D building model generation by integration of LIDAR and multiple overlap images is
proposed. Classification and segmentation can be processed by combined multi-spectral information which is provided
by color aerial image and geometric information from multi-return laser scanned data. A connected graph of the segment
label image has to be created to derive the neighborhood relation of the planar segments. A line segment matching, based
on geometry and chromatic constraint, is applied for automatically getting the corresponding line features in multi target
images. Hypotheses for polyhedral surfaces are selected using topological relations and verified using geometry.
Nonlinear CCA extends the linear CCA in that it operates in the kernel space and thus implies the nonlinear
combinations in the original space. This paper presents a classification method based on the kernel canonical correlation
analysis (KCCA). We introduce the probabilistic label vectors (PLV) for a give pattern which extend the conventional
concept of class label, and investigate the correlation between feature variables and PLV variables. A PLV predictor is
presented based on KCCA, and then classification is performed on the predicted PLV. We formulate a frame for
classification by integrating class information through PLV. Experimental results on Iris data set classification and facial
expression recognition show the efficiencies of the proposed method.
Human tracking has attracted much attention from the researchers in the fields of computer vision and pattern
recognition. The problem is generally extremely challenging partly because human bodies are articulated and versatile,
and partly because background clutter, both of which demand a strong human model. However, there is usually a trade-off
between the discriminative power and the complexity of a given model. This paper presents a simple yet distinctive
appearance model for real time human tracking by exploiting the pairwise constraints between parts. The parts in our
model are generated online by sampling the foreground of the scene into overlapping blocks and grouping them into
appearance coherent parts with mean shift algorithm. Constraints between the resulting parts are defined and used to
encode the structure of human body. To tolerate the possible human deformations and occlusions, the model is layered.
With this model, we design an algorithm for human tracking and test its performance on real world image sequences.
Experimental results show that the proposed appearance model although simple, has enough discriminative power to
classify multiple humans even in presence of occlusions and the associated tracking method can run in real time.
Spatial color Mixture Of Gaussians model (SMOG model) based similarity measure is superior to the popular
color histogram based one since it considers not only the colors in a region, but also the spatial layout of these colors.
However, two drawbacks of SMOG are still obvious, firstly, in the initialization of SMOG, some background pixels are
inevitably introduced and clustered as an object mode for tracking, this often degenerates the tracking performance.
Secondly, the weight of each Gaussian mode is restricted by the probability of the pixels belong to it, so a low
probability Gaussian mode always contribute a little in similarity measure even it has a high discrimination for
discriminating the object. A revised SMOG model is proposed to efficiently cope with these two problems by sufficiently
considering the object local background. Experiment results on synthetic and real image sequences verified the validity
of the revised model.
Being put forward by the researchers in computer vision, self calibration commonly deals with camera with linear model.
Since the distortion is practically existed especially for ordinary camera, the result of calibration can't meet the demand
of vision measurement with high accuracy regardless of the distortion. Being obedience to systematism mainly, the
distortion is the target function of distortion coefficient, principal point, principal distance ratio and skew factor etc. So
there exists a group of parameters including of distortion coefficient, principal point, principal distance ratio and skew
factor and fundamental matrix which make homologous point meets epipolar restriction theoretically. Accordingly, the
paper advances the way titled self calibration of camera with non-linear imaging model which is on basis of the Kruppa
equation. In calculating the fundamental matrix, we can obtain interior elements except principal distance by taking into
account distortion correction about image coordinate. Then the principal distance can be obtained by using Kruppa
equation. This way only need some homologous points between two images, not need any known information about
objects. Lots of experiments have proven its correctness and reliability.
UAV Video is rapidly emerging as a widely used source of imagery for many applications in recent years. This paper
presents our research on the UAV video processing system for the purpose of fire surveillance, which include: (1) UAV
video stream processing. This step involves three aspects: decoding, re-sampling and matching. Microsoft(R) DirectX(R)
technology is used to decode highly compressed video stream and re-sampled them into still video frame based on the
time base and rate of UAV navigation sensor. One feature-based image-matching algorithm is developed to quickly
obtain Tie points for latter calibration operation. (2) UAV system orientation. This step also involves three aspects:
Camera IOP, Boresight Alignment and bundle adjustment. TSAI's two-stage technique is used to obtain initial camera
focus length f, distortion coefficients k1 and six Exterior Of Parameter (EOP) for one selected video image. Meanwhile, the Boresight Matrix is deduced by the comparison of GPS/INS derived parameters with solved EOPs. Further more, all
parameters including EOPs of all re-sampled video images and camera IOP are optimally estimated based on developed
bundle adjustment algorithm. (3) UAV Video geo-registration and mosaic. All re-sampled video frames are geo-registered
into uniform geo-reference coordinate frame vice Classic photogrammetric orthorectification model and merge
with each other with developed mosaic algorithm. The results demonstrated that the geo-accuracy of mosaic image
generated from UAV video can achieve 1-2 pixels in planimetry and its combination with GIS-supported data for fast
response to time-critical event, e.g., forest fire, is descried.
Wide baseline stereo correspondence has become a challenging and attractive problem in computer vision and its related
applications. Getting high correct ratio initial matches is a very important step of general wide baseline stereo
correspondence algorithm. Ferrari et al. suggested a voting scheme called topological filter in  to discard mismatches
from initial matches, but they didn't give theoretical analysis of their method. Furthermore, the parameter of their
scheme was uncertain. In this paper, we improved Ferraris' method based on our theoretical analysis, and presented a
novel scheme called topologically clustering to discard mismatches. The proposed method has been tested using many
famous wide baseline image pairs and the experimental results showed that the developed method can efficiently extract
high correct ratio matches from low correct ratio initial matches for wide baseline image pairs.
An extended Bayesian classifier, which is able to fuse information in original image and in its wavelet domain, is
designed for infrared image segmentation. The algorithm begins with a re-sampling process over the original image and
a wavelet transformation of the original image. Then, the Spatially Variant Mixture Model (SVMM) is applied in the
bootstrap samples and the wavelet coefficients. The corresponding parameters are estimated by EM (Expectation
Maximum) algorithm. Finally, a two-element Bayesian classifier is constructed. One part of the classifier is designed to
exploit information in the original image, and the other part is designed to exploit information obtained in the wavelet
domain. Theoretic analysis and experimental results confirms that the approach is efficient for infrared image
segmentation, robust to noise and less computationally involved.
The most methods of close-range photogrammetry are based on Direct Linear Transformation (DLT). But DLT often has
unstable solution and every image needs more than six ground control points to compute DLT parameters, so this
method is hard to acquire the high accuracy and its efficiency is low. The paper discusses a new method of digital close-range
photogrammetry - panning and multi-baseline digital close-range photogrammetry. This method enlarges the
intersection angle and improves the intersection precision by multi-baseline. At the same time this method applies the
classic aerotriangulation and bundle adjustment to the close-range photogrammetry, we need more than three ground
control points to compute the exterior orientation elements of all images. The experiments prove that this method can
acquire the high accuracy.
A clustering method based on Joint Boost for Synthesis Aperture Radar images is proposed. In this method, we follow
the steps of Joint Boost, but substitute weak learns with basic clustering algorithm. We compute the sharing features
between samples in order to reduce clustering times. The proposed clustering method, JBC constructs a new training set
by random sampling from the original dataset, then selects the best feature and the best clusters for sharing, and
calculates a distribution over the training samples using current shared feature and clusters, and finally a basic clustering
algorithm (e.g. K-mean) is applied to partition the new training set. The final clustering solution is produced by
aggregating the obtained partitions. The clustering results for SAR images show that the proposed method has a good performance.
This paper presents a new distance measure for image matching based on local Kullback-Leibler divergence, which we
call Image Kullback-Leibler Distance (IKLD). Unlike traditional methods, IKLD takes account into not only the spatial
relationships of pixels, but also the structure information around pixels. Therefore, it is robust enough to small changes
in viewpoint. In order to illustrate its performance, we imbed it into support vector machines for view-based object
recognition. Experimental results based on the COIL-100 show that it outperforms most existing techniques, such as
traditional PCA+LDA (principal component analysis, linear discriminant analysis), non-linear SVM, Discriminant
Tensor Rank-One Decomposition (DTROD) and Sparse Network of Winnows (SNoW).
It is a challenging work to classify video shots into a predefined genre set according to their semantic contents, which is
helpful to video indexing, summarization and retrieval. This research proposes a novel shot classification algorithm with
concept detection for news video programs. Six semantic shot types are studied and categorized: Anchorperson,
Monologue, Reporter, Commercial, Still image and Miscellaneous, in which anchorperson shots are detected by
clustering methods, reporter and monologue shots are distinguished by Conditional Random Fields (CRFs), and the last
three categories are picked out by rule-based methods. Multimodality features are employed, such as visual, audio, face,
temporal and contextual features. The experimental results show its effectiveness and achieve a high average accuracy of 96.5%.
Scientists on the ground need understand the environment around the unmanned lunar rover in lunar exploration through
analyzing data obtained by various payloads. There are two main material on the moon, high land material and mare
material on the moon. We use reflectance spectrums of lunar soils from Apollo mission measured by LSCC to classify
the two kinds of materials. Principal component analysis is applied to reduce and select the feature of the reflectance
spectrums. These features input support vector machine, which base on statistical learning theory and is used widely to
classify in modern pattern recognition. Our work shows that the reflectance spectrums of lunar soils are strong link with
the material which they represent.
An algorithm for merging images has been proposed based on statistic principle of region geometric shape in the paper.
The Algorithm is of high precision and speed for solving images conjoint when there is rare point's feature on those
images. For enhancing the region geometric shape features we described an Image Difference Dynamic Binary Method
firstly. And then the merging principle of images on which there is little point's feature. Last the process and steps of
merging image are described in all details.
This paper deals with image quality analysis considering the impact of psychological factors involved in assessment. The
attributes of image quality requirement were partitioned according to the visual perception characteristics and the
preference of image quality were obtained by the factor analysis method. The features of image quality which support
the subjective preference were identified, The adequacy of image is evidenced to be the top requirement issues to the
display image quality improvement. The approach will be beneficial to the research of the image quality subjective
quantitative assessment method.
Ant Colony Optimization (ACO) algorithm takes inspiration from the coordinated behavior of ant swarms, which has
been applied in many study fields as a novel evolutionary technology to solve optimization problems. But it has rarely
been used to process remote sensing data. Using the ACO algorithm to remote sensing image classification does not
assume an underlying statistical distribution for the pixel data, the contextual information can be taken into account, and
it has strong robustness. In this paper, taking Landsat TM data as an example, the process of ACO method in remote
sensing data classification is introduced in detail, and has achieved a good result. The study results suggest that ACO
become a new effective method for remote sensing data processing.
In the marker based human motion capture system, it's a key step to accurately extract and track the 2-D coordinates
of the body joints, that because the 3-D reconstruction process and the reliability of the capture system depend heavily
on it. Different from those traditional solutions, we use ordinary industrial cameras and take colorful balls as the markers
to solve this key point. We have also promoted our solution to solve the problem of occlusion. Finally, we got perfect
result in practical applications, and the whole process can be computed in real-time. The method will be extended in the
A model of a measurement system composed by two CCD cameras using parallel binocular line-structured light is
proposed. The singular value decomposition is used to solve the over-determined equation to obtain the parameters in the
imaging model of the cameras. The feature recognition technique is applied to segment feature information of the image
in the process of range image acquisition. Then pre-processing (image smoothing, binaryzation and image segmentation)
of the image is processed, and the image is condensed to remove useless information. The image acquisition and
condensing are carried out in parallel to gather image and extract effective data simultaneously. The proposed method
solves the difficulty of removing the disturbed information in range image and realizes parallel data processing, which
greatly simplifies the following work of image matching and image characteristic data extraction.
Confidence evaluation is an important technique in image matching process. This paper proposes a confidence level
evaluation method for image matching result based on support vector machine (SVM). We divide the matching result
into two different types: the correct result and the wrong result. So we translate the match result's confidence evaluation
problem into the matching result's classification. This paper firstly provides a method of how to prepare the character
parameters which can accurately reflect the matching performance. And then the SVM based on Gaussian kernel is used
as a classifier to classify the match result and discriminate the match result's type. The experiments show that this
method is effective. Compared with the Dempster-Shafer (D-S) evidence reasoning fusion method it has much higher
Spectropolarimetric imaging can provide useful discriminating information for human face recognition that cannot be
obtained by other imaging methods. This paper examines the ability of face recognition by using spectropolarimetric
images. The Spectropolarimetric images were collected by using a CCD camera equipped with a liquid crystal tunable
filter, which could capture 32 bands of images over the visible and near-infrared light (0.4μm-0.72μm). Since
polarization techniques have better contrast mechanisms for tissue imaging and spectroscopy, and can also provide
additional information about the structure of tissues, it is expected that better discriminate performance can be obtained
by using polarimetric and spectral information than just using spectral information. An algorithm for facial
characteristics analysis is presented to exploit only the spectropolarimetric information from different types of facial
tissues. Experiments demonstrate that the proposed algorithm can distinguish efficiently the different facial tissues.
Nowadays, local feature based image categorization algorithm has attracted increasing attention in the computer vision
community. In this paper, we present a local feature based image categorization scheme by using Multi-Scale
Vocabulary. This technique works by partitioning the feature space into clusters at several different levels to form multi-scale
vocabulary and generate corresponding fixed-length descriptors at different scales for each image. Then we design
particular similarity measure for multi-scale descriptors and finally apply KNN and SVM to realize image categorization
task. Experiments conducted on the ETH80 dataset have demonstrated the effectiveness of our approach.
Automatic target recognition(ATR) is the key of the image guidance technology, yet it is difficult to recognize the target
by merely depending on the real-time image acquired by flying vehicle cameras, moreover, the task of recognizing the
target from the real-time images by the vehicle-carrying image processing system is a hard work itself. The main trend of
the ATR nowadays is to make utilization of the images produced by high-resolution remote sensing satellite to retrieve
the front elevation of the interested region before hand. These front elevations are loaded upon the flying vehicles and
are matched with the real-time images acquired by vehicle-carrying cameras to recognize the interested target. Obviously,
the key step of this method is to recover the 3D information from 2D images. This paper proposed a framework to
produce multi-scale and multi-viewpoint projection images based on remote sensing satellite stereopair by means of
photogrammetry and computer vision. First we proposed a algorithm for reconstructing the 3D structure of the target by
digital photogrammetric techniques and establishing the 3D model of the target using the OpenGL visualization toolkit.
Then the conversion relationship between the world coordinate system and the simulation space coordinate system is
provided to produce the front elevation in the simulation space.
In recent years, the tasks of fingerprint examiners have been greatly aided by the development of automatic fingerprint
classification systems. These systems operate by matching low-level features automatically extracted from fingerprint
images, often represented collectively as numeric vectors, for their decision. However, there are two major shortcomings
in current systems. First, the result of classification depends solely on the chosen features and the algorithm that matches
them. Second, the systems cannot adapt their results over time through interaction with individual fingerprint examiners
who often have different degrees of experiences. In this paper, we demonstrate by incorporating relevance feedback in a
fingerprint classification system, a personalized semantic space over the database of fingerprints for each user can be
incrementally learned. The fingerprint features that induce the initial features space from which individual semantic
spaces are being learned were obtained by multispectral decomposition of fingerprints using a bank of Gabor filters. In
this learning framework, the out-of-sample extension of a recently introduced dimensionality reduction method, called
Twin Kernel Embedding (TKE), is applied to learn both the semantic space and a mapping function for classifying novel
fingerprints. Experimental results confirm this learning framework for examiner-centric fingerprint classification.
Automatic generate 3D models of buildings and other man-made structures from images has become a topic of
increasing importance, those models may be in applications such as virtual reality, entertainment industry and urban
planning. In this paper we address the main problems and available solution for the generation of 3D models from
terrestrial images. We first generate a coarse planar model of the principal scene planes and then reconstruct windows to
refine the building models. There are several points of novelty: first we reconstruct the coarse wire frame model use the
line segments matching with epipolar geometry constraint; Secondly, we detect the position of all windows in the image
and reconstruct the windows by established corner points correspondences between images, then add the windows to the
coarse model to refine the building models. The strategy is illustrated on image triple of college building.
The development of the remote sensing technology makes us obtain very abundant information of nature, especially
with the appearance of high resolution remote sensing image it extends the visual field of the nature. High-resolution
satellite images such as Quickbird and IKONOS have been applied into many fields. But the challenge that faces us is
how to make use of the data effectively and obtain more useful information through some processing. Because in the
target recognition, the mutual-complementarity among the different results obtained by the different classifier making
using of the same features usually is very strong and high resolution remote sensing data have a lot of characteristics
such as spectral, texture and context and so on compared to the other lower resolution remote sensing data, the Multiple
Classifiers making use of multi-characteristic was proposed to improve the high resolution remote sensing image
classification accuracy in this paper. The experiments show that the approach can obtain higher classification accuracy
and better classification result than single classifier.
With the aerial LIDAR technology developing, how to automatically recognize and reconstruct the buildings from LIDAR dataset is an important research topic along with the widespread applications of LIDAR data in city modeling, urban planning, etc.. Applying the information of the first-and-last echo data of the same laser point, in this paper, a scheme of 3D-reconstruction of simple building has been presented, which mainly include the following steps: the recognition of non-boundary building points and boundary building points and the generation of each building-point-cluster; the localization of the boundary of each building; the detection of the planes included in each cluster and the reconstruction of building in 3D form. Through experiment, it can be proved that for the LIDAR data with first-and-last echo information the scheme can effectively and efficiently 3D-reconstruct simple buildings, such as flat and gabled buildings.
In this paper, a face recognition method using local qualitative representations is proposed to solve the problem of face
recognition in varying lighting. Based on the observation that the ordinal relationship between the average brightness of
image regions pair is invariant under lighting changes, Local Binary Mapping is defined as an illumination invariant for
face recognition based on Local Binary Pattern descriptor, which extracts the local variance features of an image. For the
'symbol' feature vector, hamming distance is used as similarity measurement. It has been proved that the proposed
method can provide the accuracy of 100 percent for subset 2, 3, 4 and 98.89 percent for subset 5 of the Yale facial
database B when all images in subset 1 are used as gallery.
The problem of object category recognition has long challenged the computer vision community. In this paper, we
address these tasks via learning two-class and multi-class discriminative models. The proposed approach integrates the
Adaboost algorithm into the decision tree structure, called DB-Tree, and each tree node combines a number of weak
classifiers into a strong classifier (a conditional posterior probability). In the learning stage, each boosted classifier in a
tree node is trained to split the training set to left and right sub-trees, and the classifier is thus used not to return the class
of the sample but rather to assign the sample to the left or right sub-tree. Therefore, the DB-Tree can be built up
automatically and recursively. In the testing stage, the posterior probability of each node is computed by the weighted
conditional probability of left and right sub-trees. Thus, the top node of the tree can output the overall posterior
probability. In addition, the multi-class and two-class learning procedures become unified, through treating the multi-class
classification problem as a special two-class classification problem, and either a positive or negative label is
assigned to each class in minimizing the total entropy in each node.
With the proliferation of multimedia information on the network, automatic rating of web pages becomes more and more
important for web management. Obviously image analysis is very important in these kinds of tasks. But to our best
knowledge, there are no publications reported on it. In this paper, we propose a novel framework to rate webpage using
image content analysis. The rated categories are in compliance with the standard utilized by most web browsers like
Internet Explorer, which include Normal, Revealing Attire, Exposed Breasts and Bare Buttocks. To make the rating work
feasible, we analyze the images mainly using skin detection and body region, and face detection is also used as guidance
for the detection of skin and body region. After that, all the results of image content analysis are integrated to achieve
content rating results for web pages. We tested our system on two data sets and demonstrated its effectiveness.