The automatic restoration of image colours (inpainting) is an interesting challenge in computer vision. The aim is to restore/recover missing colour information in a region based on the surrounding region information in a way that looks acceptable to the human eye. We investigate two functional formulas based on the difference between the directional derivative of the gradient (and the Laplacian) of two channels. The Euler-Langrage process applied to the two functional produces a nonlinear second order (and a nonlinear fourth order) PDE, the numerical solutions of which restore the colour to the region of interest. The first method extends an already established, but only for one specific colour space. We shall establish the effectiveness of the corresponding image inpainting schemes, in both the spatial and wavelet domains for 8 different colour spaces. We demonstrate the success of both schemes for a large number of natural images with better performance in comparison with the popular Poisson formula.
Image inpainting is the process of filling in the missing region to preserve continuity of its overall content and semantic. In this paper, we present a novel approach to improve an existing scheme, called exemplar-based inpainting algorithm, using Topological Data Analysis (TDA). TDA is a mathematical approach concern studying shapes or objects to gain information about connectivity and closeness property of those objects. The challenge in using exemplar-based inpainting is that missing regions neighborhood area needs to have a relatively simple texture and structure. We studied the topological properties (e.g. number of connected components) of missing regions surrounding the missing area by building a sequence of simplicial complexes (known as persistent homology) based on a selected group of uniform Local binary Pattern LBP. Connected components of image regions generated by certain landmark pixels, at different thresholds, automatically quantify the texture nature of the missing regions surrounding areas. Such quantification help determine the appropriate size of patch propagation. We have modified the patch propagation priority function using geometrical properties of curvature of isophote and improved the matching criteria of patches by calculating the correlation coefficients from spatial, gradient and Laplacian domain. We use several image quality measures to illustrate the performance of our approach in comparison to similar inpainting algorithms. In particular, we shall illustrate that our proposed scheme outperforms the state-of-the-art exemplar-based inpainting algorithms.
Facial expression identification is an important part of face recognition and closely related to emotion detection from
face images. Various solutions have been proposed in the past using different types of cameras and features. Microsoft
Kinect device has been widely used for multimedia interactions. More recently, the device has been increasingly
deployed for supporting scientific investigations. This paper explores the effectiveness of using the device in identifying
emotional facial expressions such as surprise, smile, sad, etc. and evaluates the usefulness of 3D data points on a face
mesh structure obtained from the Kinect device. We present a distance-based geometric feature component that is
derived from the distances between points on the face mesh and selected reference points in a single frame. The feature
components extracted across a sequence of frames starting and ending by neutral emotion represent a whole expression.
The feature vector eliminates the need for complex face orientation correction, simplifying the feature extraction process
and making it more efficient. We applied the kNN classifier that exploits a feature component based similarity measure
following the principle of dynamic time warping to determine the closest neighbors. Preliminary tests on a small scale
database of different facial expressions show promises of the newly developed features and the usefulness of the Kinect
device in facial expression identification.
Video compression and encryption became very essential in a secured real time video transmission. Applying both
techniques simultaneously is one of the challenges where the size and the quality are important in multimedia
transmission. In this paper we proposed a new technique for video compression and encryption. Both encryption and
compression are based on edges extracted from the high frequency sub-bands of wavelet decomposition. The
compression algorithm based on hybrid of: discrete wavelet transforms, discrete cosine transform, vector quantization,
wavelet based edge detection, and phase sensing. The compression encoding algorithm treats the video reference and
non-reference frames in two different ways. The encryption algorithm utilized A5 cipher combined with chaotic logistic
map to encrypt the significant parameters and wavelet coefficients. Both algorithms can be applied simultaneously after
applying the discrete wavelet transform on each individual frame. Experimental results show that the proposed
algorithms have the following features: high compression, acceptable quality, and resistance to the statistical and bruteforce
attack with low computational processing.
This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.
Video compression and encryption became an essential part in multimedia application and video conferencing in particular. Applying both techniques simultaneously is one of the challenges where the size and the quality are important. In this paper we are suggesting the use of wavelet transform in order to deal with the low frequency coefficients when undertaking the encryption on the wavelet high frequency coefficients while accomplishing the compression. Applying both methods simultaneously is not new. In this paper we are suggesting a way to improve the security level of the encryption with better computational performance in both encryption and compression. Both encryption and compression in this paper are based on edges extraction from the wavelet high frequency sub-bands. Although there are some research perform the edge detection on the spatial domain, but the number of edges produced based on wavelet can be dynamic which have an effect on the compression ratio dynamically. Moreover, this kind of edge detection in wavelet domain will add different level of selective encryption.
One of the common types of steganography is to conceal an image as a secret message in another image which normally called a cover image; the resulting image is called a stego image. The aim of this paper is to investigate the effect of using different cover image quality, and also analyse the use of different bit-plane in term of robustness against well-known active attacks such as gamma, statistical filters, and linear spatial filters. The secret messages are embedded in higher bit-plane, i.e. in other than Least Significant Bit (LSB), in order to resist active attacks. The embedding process is performed in three major steps: First, the embedding algorithm is selectively identifying useful areas (blocks) for embedding based on its lighting condition. Second, is to nominate the most useful blocks for embedding based on their entropy and average. Third, is to select the right bit-plane for embedding. This kind of block selection made the embedding process scatters the secret message(s) randomly around the cover image. Different tests have been performed for selecting a proper block size and this is related to the nature of the used cover image. Our proposed method suggests a suitable embedding bit-plane as well as the right blocks for the embedding. Experimental results demonstrate that different image quality used for the cover images will have an effect when the stego image is attacked by different active attacks. Although the secret messages are embedded in higher bit-plane, but they cannot be recognised visually within the stegos image.
This paper presents gait recognition based on human skeleton and trajectory of joint points captured by Microsoft Kinect sensor. In this paper Two sets of dynamic features are extracted during one gait cycle: the first is Horizontal Distance Features (HDF) that is based on the distances between (Ankles, knees, hands, shoulders), the second set is the Vertical Distance Features (VDF) that provide significant information of human gait extracted from the height to the ground of (hand, shoulder, and ankles) during one gait cycle. Extracting these two sets of feature are difficult and not accurate based on using traditional camera, therefore the Kinect sensor is used in this paper to determine the precise measurements. The two sets of feature are separately tested and then fused to create one feature vector. A database has been created in house to perform our experiments. This database consists of sixteen males and four females. For each individual, 10 videos have been recorded, each record includes in average two gait cycles. The Kinect sensor is used here to extract all the skeleton points, and these points are used to build up the feature vectors mentioned above. K-nearest neighbor is used as the classification method based on Cityblock distance function. Based on the experimental result the proposed method provides 56% as a recognition rate using HDF, while VDF provided 83.5% recognition accuracy. When fusing both of the HDF and VDF as one feature vector, the recognition rate increased to 92%, the experimental result shows that our method provides significant result compared to the existence methods.
This paper presents a new algorithm for human gait recognition based on Spatio-temporal body biometric features
using wavelet transforms. The proposed algorithm extracts the Gait cycle depending on the width of boundary box
from a sequence of Silhouette images. Gait recognition is based on feature level fusion of three feature vectors: the gait
spatio-temporal feature represented by the distances between (feet, knees, hands, shoulders, and height); binary
difference between consecutive frames of the silhouette for each leg detected separately based on hamming distance; a
vector of statistical parameters captured from the wavelet low frequency domain. The fused feature vector is subjected
to dimension reduction using linear discriminate analysis. The Nearest Neighbour with a certain threshold used for
classification. The threshold is obtained by experiment from a set of data captured from the CASIA database. We shall
demonstrate that our method provides a non-traditional identification based on certain threshold to classify the outsider
members as non-classified members.
Analysing a text or part of it is key to handwriting identification. Generally, handwriting is learnt over time and people
develop habits in the style of writing. These habits are embedded in special parts of handwritten text. In Arabic each
word consists of one or more sub-word(s). The end of each sub-word is considered to be a connect stroke. The main
hypothesis in this paper is that sub-words are essential reflection of Arabic writer's habits that could be exploited for
writer identification. Testing this hypothesis will be based on experiments that evaluate writer's identification, mainly
using K nearest neighbor from group of sub-words extracted from longer text. The experimental results show that using a
group of sub-words could be used to identify the writer with a successful rate between 52.94 % to 82.35% when top1 is
used, and it can go up to 100% when top5 is used based on K nearest neighbor. The results show that majority of writers
are identified using 7 sub-words with a reliability confident of about 90% (i.e. 90% of the rejected templates have
significantly larger distances to the tested example than the distance from the correctly identified template). However
previous work, using a complete word, shows successful rate of at most 90% in top 10.
Natural languages like Arabic, Kurdish, Farsi (Persian), Urdu, and any other similar languages have many features,
which make them different from other languages like Latin's script. One of these important features is diacritics. These
diacritics are classified as: compulsory like dots which are used to identify/differentiate letters, and optional like short
vowels which are used to emphasis consonants. Most indigenous and well trained writers often do not use all or some of
these second class of diacritics, and expert readers can infer their presence within the context of the writer text. In this
paper, we investigate the use of diacritics shapes and other characteristic as parameters of feature vectors for Arabic
writer identification/verification. Segmentation techniques are used to extract the diacritics-based feature vectors from
examples of Arabic handwritten text.
The results of evaluation test will be presented, which has been carried out on an in-house database of 50 writers. Also
the viability of using diacritics for writer recognition will be demonstrated.
This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character
Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by
the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that
are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these
problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the
diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also
investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that
yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels
to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental
rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We
shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases
and in-house created databases. These algorithms help improve character segmentation accuracy by transforming
handwritten Arabic text into a form that could benefit from analysis of printed text.
Noise in general is considered to be degradation in image quality. Moreover image quality is measured based on
the appearance of the image edges and their clarity. Most of the applications performance is affected by image
quality and level of different types of degradation. In general measuring image quality and identifying the type
of noise or degradation is considered to be a key factor in raising the applications performance, this task can be
very challenging. Wavelet transform now a days, is widely used in different applications. These applications are
mostly benefiting from the wavelet localisation in the frequency domain. The coefficients of the high frequency
sub-bands in wavelet domain are represented by Laplace histogram. In this paper we are proposing to use the
Laplace distribution histogram to measure the image quality and also to identify the type of degradation
affecting the given image.
Image quality and the level of degradation are mostly measured using a reference image with reasonable quality.
The discussed Laplace distribution histogram provides a self testing measurement for the quality of the image.
This measurement is based on constructing the theoretical Laplace distribution histogram of the high frequency
wavelet sub-band. This construction is based on the actual standard deviation, then to be compared with the
actual Laplace distribution histogram. The comparison is performed using histogram intersection method. All
the experiments are performed using the extended Yale database.
This paper proposes a wireless mesh network implementation consisting of both Wi-Fi Ad-Hoc networks as well as
Bluetooth Piconet/Scatternet networks, organised in an energy and throughput efficient structure. This type of networks
can be easily constructed for Crises management applications, for example in an Earthquake disaster. The motivation of
this research is to form mesh network from the mass availability of WiFi and Bluetooth enabled electronic devices such
as mobile phones and PC's that are normally present in most regions were major crises occurs.
The target of this study is to achieve an effective solution that will enable Wi-Fi and/or Bluetooth nodes to seamlessly
configure themselves to act as a bridge between their own network and that of the other network to achieve continuous
routing for our proposed mesh networks.
Mobile Phones and other hand held devices are constrained in their memory and computational power, and yet new generations of theses devices provide access to the web-based services and are equipped with digital cameras that make them more attractive to users. These added capabilities are expected to help incorporate such devices into the global communication system. In order to take advantage of these capabilities, there are desperate need for highly efficient algorithms including real-time image and video processing and transmission. This paper is concerned with high quality video compression for constrained mobile devices. We attempt to tweak a wavelet-based feature-preserving image compression technique that we have developed recently, so as to make it suitable for implementation on mobile phones and PDA's. The earlier version of the compression algorithm exploits the statistical properties of the multi-resolution wavelet-transformed images. The main modification is based on the observation that in many cases the statistical parameters of wavelet subbands of adjacent video frames do not differ significantly. We shall investigate the possibility of re-using codebooks for a sequence of adjacent frames without having adverse effect on image quality if any. Such an approach results in significant bandwidth and processing-time savings. The performance of this scheme will be tested in comparison to other video compression methods. Such a scheme is expected to be of use in security applications such as transmission of biometric data for a server-based verification.
Person authentication can be strongly enhanced by the combination of different modalities. This is also true for the face and voice signals, which can be obtained with minimal inconvenience for the user. However, features from each modality can be combined at various different levels of processing and for face and voice signals the advantage of fusion depends strongly on the way they are combined. The aim of the work presented is to investigate the optimal strategy for combining voice and face modalities for signals of varying quality. The experimental data are taken from a newly acquired database using a PDA, which contains audio-visual recordings in different conditions. Voice features use mel-frequency cepstral coefficients, while the face signal is parameterised using wavelet coefficients in certain subbands. Results are presented for both early (feature-level) and late (score-level) fusion. At each level different fixed and variable weightings are used, both to weight between frames within each modality and to weight between modalities, where weights are based on some measure of signal reliability, such as the accuracy of automatic face detection or the audio signal to noise ratio. In addition, the contribution to authentication of information from different areas of the face is explored to determine a regional weighting for the face coefficients.
Biometric databases form an essential tool in the fight against international terrorism, organised crime and fraud. Various
government and law enforcement agencies have their own biometric databases consisting of combination of fingerprints,
Iris codes, face images/videos and speech records for an increasing number of persons. In many cases personal data
linked to biometric records are incomplete and/or inaccurate. Besides, biometric data in different databases for the same
individual may be recorded with different personal details. Following the recent terrorist atrocities, law enforcing
agencies collaborate more than before and have greater reliance on database sharing. In such an environment, reliable
biometric-based identification must not only determine who you are but also who else you are. In this paper we propose a
compact content-based video signature and indexing scheme that can facilitate retrieval of multiple records in face
biometric databases that belong to the same person even if their associated personal data are inconsistent. We shall assess
the performance of our system using a benchmark audio visual face biometric database that has multiple videos for each
subject but with different identity claims. We shall demonstrate that retrieval of relatively small number of videos that are
nearest, in terms of the proposed index, to any video in the database results in significant proportion of that individual
Advances in digital image processing, the advents of multimedia computing, and the availability of affordable high quality digital cameras have led to increased demand for digital images/videos. There has been a fast growth in the number of information systems that benefit from digital imaging techniques and present many tough challenges. In this paper e are concerned with applications for which image quality is a critical requirement. The fields of medicine, remote sensing, real time surveillance, and image-based automatic fingerprint/face identification systems are all but few examples of such applications. Medical care is increasingly dependent on imaging for diagnostics, surgery, and education. It is estimated that medium size hospitals in the US generate terabytes of MRI images and X-Ray images are generated to be stored in very large databases which are frequently accessed and searched for research and training. On the other hand, the rise of international terrorism and the growth of identity theft have added urgency to the development of new efficient biometric-based person verification/authentication systems. In future, such systems can provide an additional layer of security for online transactions or for real-time surveillance.