In this paper we describe a mobile-based system that allows first responders to identify and track gang graffiti
by combining the use of image analysis and location-based-services. The gang graffiti image and metadata
(geoposition, date and time) obtained automatically are transferred to a server and uploaded to a database of
graffiti images. The database can then be queried with the matched results sent back to the mobile device where
the user can then review the results and provide extra inputs to refine the information.
The area of dietary assessment is becoming increasingly important as obesity rates soar, but valid measurement of the
food intake in free-living persons is extraordinarily challenging. Traditional paper-based dietary assessment methods
have limitations due to bias, user burden and cost, and therefore improved methods are needed to address important
hypotheses related to diet and health. In this paper, we will describe the progress of our mobile Diet Data Recorder
System (DDRS), where an electronic device is used for objective measurement on dietary intake in real time and at
moderate cost. The DDRS consists of (1) a mobile device that integrates a smartphone and an integrated laser package,
(2) software on the smartphone for data collection and laser control, (3) an algorithm to process acquired data for food
volume estimation, which is the largest source of error in calculating dietary intake, and (4) database and interface for
data storage and management. The estimated food volume, together with direct entries of food questionnaires and voice
recordings, could provide dietitians and nutritional epidemiologists with more complete food description and more
accurate food portion sizes. In this paper, we will describe the system design of DDRS and initial results of dietary
The Frankencamera (FCam) architecture and API enables precise control over the camera in computational photography applications. We present an extension to FCam API for systems equipped with multiple cameras. The proposed extension allows for an enumeration of cameras and their corresponding properties, such as position or orientation. In addition, we explicitly support camera synchronization, either through hardware mechanisms or software primitives. If hardware synchronization is available, cameras can be grouped together under a concept of a multi-sensor. Otherwise, multiple camera streams are scheduled asynchronously and synchronized using our software control primitives.
Mobile devices present a challenging platform for 3D video because of inherent device limitations. Continuously
Adjustable Pulfrich Spectacles for Mobile Devices (CAPS-MD) is a new implementation of the Pulfrich 3D stereoscopic
effect. For every scene that contains lateral motion in a 2D movie, CAPS-MD provides realistic 3D. Since it requires
minimal additional processing, it is appropriate for mobile devices.
3D movies utilizing the Pulfrich stereoscopic effect have been made for 80 years using passive viewing spectacles.
CAPS-MD use active viewing spectacles to overcome the limitations of passive spectacles. 3D movies normally employ
the asymmetry of dual images to produce stereopsis. CAPS-MD works on the principle of illumination asymmetry, and
only needs to control the differential lens optical densities.
CAPS-MD is fabricated from optoelectronic materials that electronically control the lens optical densities. The eye's
retinal triggering is used by CAPS-MD to determine the differential lens optical densities. Motion estimation
calculations from the digital image processing used to display 2D video on mobile devices are reused to calculate realtime
lens adjustments so CAPS-MD always conform to the optical density that optimizes the Pulfrich stereoscopic
effect. Only negligible additional processing is necessary for CAPS-MD to show 3D for every scene that contains lateral
motion in any 2D movie.
We present an approach to measure and model the parameters of human point-of-gaze (PoG) in 3D space. Our model
considers the following three parameters: position of the gaze in 3D space, volume encompassed by the gaze and time
for the gaze to arrive on the desired target.
Extracting the 3D gaze position from binocular gaze data is hindered by three problems. The first problem is the lack of
convergence - due to micro saccadic movements the optical lines of both eyes rarely intersect at a point in space. The
second problem is resolution - the combination of short observation distance and limited comfort disparity zone typical
for a mobile 3D display does not allow the depth of the gaze position to be reliably extracted. The third problem is
measurement noise - due to the limited display size, the noise range is close to the range of properly measured data.
We have developed a methodology which allows us to suppress most of the measurement noise. This allows us to
estimate the typical time which is needed for the point-of-gaze to travel in x, y or z direction. We identify three temporal
properties of the binocular PoG. The first is reaction time, which is the minimum time that the vision reacts to a stimulus
position change, and is measured as the time between the event and the time the PoG leaves the proximity of the old
stimulus position. The second is the travel time of the PoG between the old and new stimulus position. The third is the
time-to-arrive, which is the time combining the reaction time, travel time, and the time required for the PoG to settle in
the new position.
We present the method for filtering the PoG outliers, for deriving the PoG center from binocular eye-tracking data and
for calculating the gaze volume as a function of the distance between PoG and the observer. As an outcome from our
experiments we present binocular heat maps aggregated over all observers who participated in a viewing test. We also
show the mean values for all temporal properties separately for x, y and z direction averaged over all observers. We
show the typical size of a binocular area of interest for a portable autostereoscopic display, as well as typical time the 3D
vision can react to sudden changes in a 3D scene.
Most of candidate methods for compression of mobile stereo video apply block-transform based compression
based on the H-264 standard with quantization of transform coefficients driven by quantization parameter
(QP). The compression ratio and the resulting bit rate are directly determined by the QP level and high
compression is achieved for the price of visually noticeable blocking artifacts. Previous studies on perceived quality
of mobile stereo video have revealed that blocking artifacts are the most annoying and most influential in the
acceptance/rejection of mobile stereo video and can even completely cancel the 3D effect and the corresponding
quality added value. In this work, we address the problem of deblocking of mobile stereo video. We modify a
powerful non-local transform-domain collaborative filtering method originally developed for denoising of images
and video. The method employs grouping of similar block patches residing in spatial and temporal vicinity of a
reference block in filtering them collaboratively in a suitable transform domain. We study the most suitable way
of finding similar patches in both channels of stereo video and suggest a hybrid four-dimensional transform to
process the collected synchronized (stereo) volumes of grouped blocks. The results benefit from the additional
correlation available between the left and right channel of the stereo video. Furthermore, addition sharpening is
applied through an embedded alpha-rooting in transform domain, which improve the visual appearance of the
Conventional Global Positioning System (GPS) receivers operate well in open-sky environments. But their performance
degrades in urban canyons, indoors and underground due to multipath, foliage, dissipation, etc. To overcome such
situations, several enhancements have been suggested such as Assisted GPS (A-GPS). Using this approach, orbital
parameters including ephemeris and almanac along with reference time and coarse location information are provided to
GPS receivers to assist in acquisition of weak signals. To test A-GPS enabled receivers high-end simulators are used,
which are not affordable by many academic institutions. This paper presents an economical A-GPS supplement for
inexpensive simulators which operates on application layer. Particularly proposed solution is integrated with National
Instruments' (NI) GPS Simulation Toolkit and implemented using NI's Labview environment. This A-GPS support
works for J2ME and Android platforms. The communication between the simulator and the receiver is in accordance
with the Secure User Plane Location (SUPL) protocol encapsulated with Radio Resource Location Protocol (RRLP)
applies to Global System for Mobile Communications (GSM) and Universal Mobile Telecommunications System
(UMTS) cellular networks.
In cases of nuclear disasters it is desirable to know one's personal exposure to radioactivity and the related health risk.
Usually, Geiger-Mueller tubes are used to assess the situation. Equipping everyone with such a device in a short period
of time is very expensive. We propose a method to detect ionizing radiation using the integrated camera of a mobile
consumer device, e.g., a cell phone. In emergency cases, millions of existing mobile devices could then be used to
monitor the exposure of its owners. In combination with internet access and GPS, measured data can be collected by a
central server to get an overview of the situation.
During a measurement, the CMOS sensor of a mobile device is shielded from surrounding light by an attachment in front
of the lens or an internal shutter. The high-energy radiation produces free electrons on the sensor chip resulting in an
image signal. By image analysis by means of the mobile device, signal components due to incident ionizing radiation are
separated from the sensor noise. With radioactive sources present significant increases in detected pixels can be seen.
Furthermore, the cell phone application can make a preliminary estimate on the collected dose of an individual and the
associated health risks.
This article describes primarily the development and empiric validation of a design for security warning messages on
smartphones for primary school children (7-10 years old). Our design approach for security warnings for children uses a
specific character and is based on recommendations of a paediatrician expert. The design criteria are adapted to
children's skills, e.g. their visual, acoustic, and haptic perception and their literacy.
The developed security warnings are prototypically implemented in an iOS application (on the iPhone 3G/4G) where
children are warned by a simulated anti-malware background service, while they are busy with another task. For the
evaluation we select methods for empiric validation of the design approach from the field of usability testing ("think
aloud" test, questionnaires, log-files, etc.). Our security warnings prototype is evaluated in an empiric user study with 13
primary school children, aged between 8 and 9 years and of different gender (5 girls, 8 boys). The evaluation analysis
shows, that nearly all children liked the design of our security warnings. Surprisingly, on several security warning
messages most of the children react in the right way after reading the warning, although the meaning couldn't be
interpreted in the right way. Another interesting result is, that several children relate specific information, e.g. update, to
a specific character. Furthermore, it could be seen that most of the primary school test candidates have little awareness of
security threats on smartphones. It is a very strong argument to develop e.g. tutorials or websites in order to raise
awareness and teach children how to recognize security threats and how to react to them. Our design approach of
security warnings for children's smartphones can be a basis for warning on other systems or applications like tutorials,
which are used by children.
In a second investigation, we focus on webpages, designed for children since smartphones and webpages (the services
behind) are more and more interconnected. From this point of view those services should continue the securityapproaches
for children's smartphones. The webservices were evaluated among different criteria, e.g. data protection.
The results of a first investigation are reported in this paper.
Frame rate up conversion (FRC) is the process of converting between different frame rates for targeted display
formats. Besides scanning format applications for large displays, FRC can be used to increase the frame rate of
video at the receiver end for video telephony, video streaming or playback applications for mobile platforms where
bandwidth savings are crucial. Many algorithms have been proposed for decoder/receiver side FRC. However,
most of them are from video encoding/decoding point of view. We systematically studied the strategies of
utilizing the camera 3A (auto exposure, auto white balance and auto focus) information to assist FRC process,
while in this paper we focus on the technique using camera exposure information to assist the decoder FRC.
In the proposed strategy the exposure information as well as other camera 3A related information is packetized
as the meta data which is attached to the corresponding frame and transmitted together with the main video
bit stream to the decoder side for FRC assistance. The meta data contains information such as zooming, auto
focus, AE (auto exposure), AWB (auto white balance) statistics, scene change detection, global motion detected
from motion sensors. The proposed meta data consists of camera specific information which is different than just
sending motion vectors or mode information to aid FRC process. Compared to traditional FRC approaches used
in mobile platforms, the proposed approach is a low-complexity,
low-power solution which is crucial in resource
constrained environments such as mobile platforms.
A double-base number system (DBNS) has recently been introduced and investigated   . This system has been
shown to have some interesting and potentially far-reaching applications in digital filtering, encryption, digital
electronics, and image enhancement. In this paper we present a new concept of generating parametric number
representations by fusing systems such as DBNS using multiplication and addition operations. We introduce
Fibonacci like (p,q)-sequences and determine their efficiency in representing data. We develop an algorithm to test the
sparsity of fused number representation systems and explore the dual relationship between sparsity and memory. We
also consider the applications of these representations in data compression and barcoding. Simulation results are
presented to demonstrate the performance of the new class of systems. A comparison with commonly used doublebase
number systems is also presented.
In this paper we extend the manual white balancing technique available on most imaging devices by allowing
a user to specify arbitrary colors in the scene. We derive an interpolation technique to assign weights to the
arbitrary colors which are then used to estimate the RGB complements corresponding to a white target. We
obtain the user input by displaying a captured image alongside a color grid of commonly occurring colors. The
user specifies color pairs - patches in the scene and veridical colors on the grid. We then use these pairs to
estimate the white point with our interpolation method. The estimated white point is then used to construct a
diagonal transform to determine the camera output under a desired illuminant.
We will present results from testing our methods on images acquired under several illumination conditions.
Our approach is very suitable for mobile devices because most mobile devices are equipped with moderately
sophisticated imaging systems and our method allows better color capture with relatively little user input.
Further, we can realize our method on mobile devices since these devices have built-in tools for graphical user
input. Our method can be useful in several photography and image analysis applications.
We present a light-weight method for automatically detecting shapes that have an approximate rotational
symmetry (e.g., a square or equilateral triangle) on discrete-space images. Our motivation is the problem
of automatically detecting and recognizing hazardous material placards on a mobile platform (e.g., a mobile
telephone) equipped with a camera. The proposed method is
well-suited for mobile device applications,
which are characterized by limited memory, processing power and battery life. It is based on comparing the
magnitude of the coefficients of the Fourier series of the centralized moments of the Radon transform of the
image after segmentation. However, in our approach, the computation of the Radon transform is bypassed
as we obtain these coefficients directly from the rows of the Pascal Triangle of the segmented image. The
Pascal Triangle of an image is composed of complex moments arranged in a pyramidal fashion similar to
the binomial coefficients. These complex moments are obtained from a coarse segmentation of the shape
represented by a gray-scale image. In particular, the contours of the object do not need to be precisely
defined, and the shape needs not be connected. Moreover, our approach is invariant under translation,
rotation, and scaling. We tested our method on images from the
MPEG-7 shape database as well as images
from our own database of hazardous material placards.
Focusing on digital imagery, this paper introduces a strategy to handle heterogeneous hardware in mobile environments.
Constrained system resources of most mobile viewing devices require contents that are tailored to the
requirements of the user and the capabilities of the device. Appropriate image adaptation is still an unsolved
research question. Due to the complexity of the problem, available solutions are either too resource-intensive or
inflexible to be more generally applicable.
The proposed approach is based on scalable image compression and progressive refinement as well as data
and user profiles. A scalable image is created once and used multiple times for different kinds of devices and
user requirements. Profiles available on the server side allow for an image representation that is adapted to
the most important resources in mobile computing: screen space, computing power, and the volume of the
transmitted data. Options for progressively refining content thereby allow for a fluent viewing experience during
adaptation. Due to its flexibility and low complexity, the proposed solution is much more general compared
to related approaches. To document the advantages of our approach we provide empirical results obtained in
experiments with an implementation of the method.
This paper is dedicated to entropy coding for scalable video compression based on three-dimensional discrete
wavelet transform (3-D DWT). A new simple bit-plane entropy coding of wavelet subband matrices is proposed.
Practical results show that 3-D DWT video codec with proposed entropy coding allows to increase the encoding
speed 2-3 times for the same quality level in comparison with x.264 codec which is one of the fastest software
implementation of H.264/AVC standard.
Establishing correspondences between two hyper-graphs is a fundamental issue in computer vision, pattern recognition,
and machine learning. A hyper-graph is modeled by feature set where the complex relations are represented by hyperedges.
Hence, a match between two vertex sets determines a hyper-graph matching problem. We propose a new
bidirectional probabilistic hyper-graph matching method using Bayesian inference principle. First, we formulate the
corresponding hyper-graph matching problem as the maximization of a matching score function over all permutations of
the vertexes. Second, we induce an algebraic relation between the hyper-edge weight matrixes and derive the desired
vertex to vertex probabilistic matching algorithm using Bayes theorem. Third, we apply the well known convex
relaxation procedure with probabilistic soft matching matrix to get a complete hard matching result. Finally, we have
conducted the comparative experiments on synthetic data and real images. Experimental results show that the proposed
method clearly outperforms existing algorithms especially in the presence of noise and outliers.
In this paper, we present a novel approach for the adaptation of large images to small display sizes. As a recent
study suggests, most viewers prefer the loss of content over the insertion of deformations in the retargeting
process.1 Therefore, we combine the two image retargeting operators seam carving and cropping in order to
resize an image without manipulating the important objects in an image at all. First, seams are removed carefully
until a dynamic energy threshold is reached to prevent the creation of visible artifacts. Then, a cropping window
is selected in the image that has the smallest possible window size without having the removed energy rise above
a second dynamic threshold. As the number of removed seams and the size of the cropping window are not fix,
the process is repeated iteratively until the target size is reached. Our results show that by using this method,
more important content of an image can be included in the cropping window than in normal cropping. The
"squeezing" of objects which might occur in approaches based on warping or scaling is also prevented.
We present in this paper a sample quality control approach for the case using a mobile phone's camera as a fingerprint
sensor for fingerprint recognition. Our approach directly estimates the maximum ridge frequency orientation by the
amplitude-frequency features of the Fast Fourier Transform and takes the frequency features' difference in two
perpendicular orientations as a distinguishing feature for ridge-like patterns. Then a decision criterion which combines
the frequency components' energy and ridge orientation features is used to determine if an image block should be
classified as high-quality fingerprint area or not. The number of such high-quality blocks can thus be used to indicate the
whole fingerprint sample's quality. Experiments show this approach's effectiveness in distinguishing the high-quality
blocks from other low-quality ones or background area. Mapping the quality metric to the sample utility as derived from
the the NIST minutiae extractor "mindtct" function is also given to verify the approach's quality prediction effectiveness.
Keywords: Fingerprint, quality assessment, mobile phone camera
This paper deals with the forensic examination of Android smartphones. The structure of the Android system
was analyzed and a forensic guide was created. As an example this guide was used to examine a HTC Desire.
The conclusion of this paper is the fact that all data stored on the smartphone can be examined. The main
problem is that some of the used procedures lack forensic requirements.
This paper deals with forensic investigation of stored location data collected by Android mobile devices. The main
aspects of the study are the extraction and examination of the location data and the possibilities for additional use of the
Nowadays mobile phones are the most widely used portable devices which evolve very fast adding new features and
improving user experiences. The latest generation of hand-held devices called smartphones is equipped with superior
memory, cameras and rich multimedia features, empowering people to use their mobile phones not only as a
communication tool but also for entertainment purposes. With many young students showing interest in learning mobile
application development one should introduce novel learning methods which may adapt to fast technology changes and
introduce students to application development. Mobile phones become a common device, and engineering community
incorporates phones in various solutions. Overcoming the limitations of conventional undergraduate electrical
engineering (EE) education this paper explores the concept of template-based based education in mobile phone
programming. The concept is based on developing small exercise templates which students can manipulate and revise for
quick hands-on introduction to the application development and integration. Android platform is used as a popular open
source environment for application development. The exercises relate to image processing topics typically studied by
many students. The goal is to enable conventional course enhancements by incorporating in them short hands-on
Many multimedia processing algorithms as well as communication algorithms implemented in mobile devices are based
on intensive implementation of linear algebra methods, in particular, implying implementation of a large number of inner
products in real time. Among most efficient approaches to perform inner products are the Associative Computing (ASC)
approach and Distributed Arithmetic (DA) approach. In ASC, computations are performed on Associative Processors
(ASP), where Content-Addressable memories (CAMs) are used instead of traditional processing elements to perform
basic arithmetic operations. In the DA approach, computations are reduced to look-up table reads with respect to binary
planes of inputs. In this work, we propose a modification of Associative processors that supports efficient
implementation of the DA method. Thus, the two powerful methods are combined to further improve the efficiency of
multiple inner product computation. Computational complexity analysis of the proposed method illustrates significant
speed-up when computing multiple inner products as compared both to the pure ASC method and to the pure DA method
as well as to other state-of the art traditional methods for inner product calculation.
This paper deals with forensically interesting features of the Microsoft Xbox 360 game console. The construction
and the internal structure are analysed more precisely. One of the main aspects of the study is to analyse the
used file system which was examined for forensic features. Possible difficulties that might be of importance to
the forensic investigator are discussed.
This paper deals with forensically interesting features of the Sony Playstation 3 game console. The construction and the
internal structure are analyzed more precisely. Interesting forensic features of the operating system and the file system
are presented. Differences between a PS3 with and without jailbreak are introduced and possible forensic attempts when
using an installed Linux are discussed.
We present here the results obtained by including a new image descriptor, that we called prosemantic feature
vector, within the framework of QuickLook2 image retrieval system. By coupling the prosemantic features and
the relevance feedback mechanism provided by QuickLook2, the user can move in a more rapid and precise way
through the feature space toward the intended goal. The prosemantic features are obtained by a two-step feature
extraction process. At the first step, low level features related to image structure and color distribution are
extracted from the images. At the second step, these features are used as input to a bank of classifiers, each
one trained to recognize a given semantic category, to produce score vectors. We evaluated the efficacy of the
prosemantic features under search tasks on a dataset provided by Fratelli Alinari Photo Archive.
Traffic sign inventories are important to governmental agencies as they facilitate evaluation of traffic sign locations
and are beneficial for road and sign maintenance. These inventories can be created (semi-)automatically based
on street-level panoramic images. In these images, object detection is employed to detect the signs in each
image, followed by a classification stage to retrieve the specific sign type. Classification of traffic signs is a
complicated matter, since sign types are very similar with only minor differences within the sign, a high number of
different signs is involved and multiple distortions occur, including variations in capturing conditions, occlusions,
viewpoints and sign deformations. Therefore, we propose a method for robust classification of traffic signs, based
on the Bag of Words approach for generic object classification. We extend the approach with a flexible, modular
codebook to model the specific features of each sign type independently, in order to emphasize at the inter-sign
differences instead of the parts common for all sign types. Additionally, this allows us to model and label the
present false detections. Furthermore, analysis of the classification output provides the unreliable results. This
classification system has been extensively tested for three different sign classes, covering 60 different sign types
in total. These three data sets contain the sign detection results on street-level panoramic images, extracted
from a country-wide database. The introduction of the modular codebook shows a significant improvement for
all three sets, where the system is able to classify about 98% of the reliable results correctly.
We model the sequence of human actions operating an infusion pump using a Markovian conditional exponential model.
We divide each video recorded by a camera into video action units. A video action unit corresponds to the start of a unique
human action operation of the infusion pump to the end of that human action operating an infusion pump. We calculate
the MOSIFT features of video action units which combines the spatial and temporal dimensions from videos. We vector
quantize the MOSIFT features of video action units using K means clustering as video codebook elements. We estimate
the conditional exponential model parameters from a training set using maximum entropy constraint and use the video
codebook elements as maximum entropy constraint features. We estimate the parameters of the Markovian conditional
exponential model from a training set. This Markovian conditional exponential model has 6 states which correspond to
the 6 classes of infusion pump operation. To find the optimal state sequence of the Markovian conditional exponential
model we use the Viterbi algorithm. This optimal state sequence corresponds to the class label sequence. The infusion
pump operation is recorded from 4 video cameras. We calculate the results of classification of 6 classes of infusion
pump operation using the conditional exponential model for the 4 video cameras and also we calculate the results of of
classification of 6 classes of infusion pump operation using the Markovian conditional exponential model for the 4 video
cameras. The classification performance of the Markovian conditional exponential model is better than the classification
performance of conditional exponential model.
In this work we propose a novel approach to automatically detect a swimmer and estimate his/her pose continuously
in order to derive an estimate of his/her stroke rate given that we observe the swimmer from the side.
We divide a swimming cycle of each stroke into several intervals. Each interval represents a pose of the stroke.
We use specifically trained object detectors to detect each pose of a stroke within a video and count the number
of occurrences per time unit of the most distinctive poses (so-called key poses) of a stroke to continuously infer
the stroke rate. We extensively evaluate the overall performance and the influence of the selected poses for all
swimming styles on a data set consisting of a variety of swimmers.
In this paper, we propose a multi-view face detection system that locates head positions and indicates the direction of each face in
3-D space over a multi-camera surveillance system. To locate 3-D head positions, conventional methods relied on face detection in 2-D images and projected the face regions back to 3-D space for correspondence. However, the inevitable false face detection and rejection usually degrades the system performance. Instead, our system searches for the heads and face directions over the 3-D space using a sliding cube. Each searched 3-D cube is projected onto the
2-D camera views to determine the existence and direction of human faces. Moreover, a pre-process to estimate the locations of candidate targets is illustrated to speed-up the searching process over the 3-D space. In summary, our proposed method can efficiently fuse multi-camera information and suppress the ambiguity caused by detection errors. Our evaluation shows that the proposed approach can efficiently indicate the head position and face direction on real video sequences even under serious occlusion.
This paper proposes a novel method to generate keyframes from cartoon animation with the aim to improve the details
and accuracy of contents represented by keyframes. Consider that general techniques on video summarization usually
drop some important contents due to its restriction on aspect ratio; this paper thus proposes a new method using
panorama technology to add more details to be included in each keyframe. The concept is to mark the time code based
on shot boundary and optical flow direction. The period of time between every two consecutive marked time codes is
used to form a shot sequence which is actually a sequence of frames. The global and local optical flows are also used to
determine how to select the frames and when to stitch the frames together according to the rules. The results of this
proposed method are keyframes generated from various types of cartoon animation which are outstanding compared to
their comic adaptations.
This PDF file contains the front matter associated with SPIE Proceedings Volume 8304, including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.