The increased sensing and computing capabilities of mobile devices can provide for enhanced mobile user experience.
Integrating the data from different sensors offers a way to improve application performance in camera-based
applications. A key advantage of using cameras as an input modality is that it enables recognizing the context.
Therefore, computer vision has been traditionally utilized in user interfaces to observe and automatically detect
the user actions. The imaging applications can also make use of various sensors for improving the interactivity and
the robustness of the system. In this context, two applications fusing the sensor data with the results obtained
from video analysis have been implemented on a Nokia Nseries mobile device. The first solution is a real-time
user interface that can be used for browsing large images. The solution enables the display to be controlled by
the motion of the user's hand using the built-in sensors as complementary information. The second application
is a real-time panorama builder that uses the device's accelerometers to improve the overall quality, providing
also instructions during the capture. The experiments show that fusing the sensor data improves camera-based
applications especially when the conditions are not optimal for approaches using camera data alone.
The future multi-modal user interfaces of battery-powered mobile devices are expected to require computationally
costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for
parallel processing and the addition of programmable stages and high precision arithmetic provide for opportunities
to implement energy-efficient complete algorithms. At the moment the first mobile graphics accelerators
with programmable pipelines are available, enabling the GPGPU implementation of several image processing
algorithms. In this context, we consider a face tracking approach that uses efficient gray-scale invariant texture
features and boosting. The solution is based on the Local Binary Pattern (LBP) features and makes use of the
GPU on the pre-processing and feature extraction phase. We have implemented a series of image processing
techniques in the shader language of OpenGL ES 2.0, compiled them for a mobile graphics processing unit and
performed tests on a mobile application processor platform (OMAP3530). In our contribution, we describe the
challenges of designing on a mobile platform, present the performance achieved and provide measurement results
for the actual power consumption in comparison to using the CPU (ARM) on the same platform.
Since more processing power, new sensing and display technologies are already available in mobile devices, there
has been increased interest in building systems to communicate via different modalities such as speech, gesture,
expression, and touch. In context identification based user interfaces, these independent modalities are combined
to create new ways how the users interact with hand-helds. While these are unlikely to completely replace
traditional interfaces, they will considerably enrich and improve the user experience and task performance. We
demonstrate a set of novel user interface concepts that rely on built-in multiple sensors of modern mobile devices
for recognizing the context and sequences of actions. In particular, we use the camera to detect whether the user
is watching the device, for instance, to make the decision to turn on the display backlight. In our approach the
motion sensors are first employed for detecting the handling of the device. Then, based on ambient illumination
information provided by a light sensor, the cameras are turned on. The frontal camera is used for face detection,
while the back camera provides for supplemental contextual information. The subsequent applications triggered
by the context can be, for example, image capturing, or bar code reading.
Modern mobile communication devices frequently contain built-in cameras allowing users to capture highresolution
still images, but at the same time the imaging applications are facing both usability and throughput
bottlenecks. The difficulties in taking ad hoc pictures of printed paper documents with multi-megapixel cellular
phone cameras on a common business use case, illustrate these problems for anyone. The result can be
examined only after several seconds, and is often blurry, so a new picture is needed, although the view-finder
image had looked good. The process can be a frustrating one with waits and the user not being able to predict
the quality beforehand. The problems can be traced to the processor speed and camera resolution mismatch,
and application interactivity demands. In this context we analyze building mosaic images of printed documents
from frames selected from VGA resolution (640x480 pixel) video. High interactivity is achieved by providing
real-time feedback on the quality, while simultaneously guiding the user actions. The graphics processing unit of
the mobile device can be used to speed up the reconstruction computations. To demonstrate the viability of the
concept, we present an interactive document scanning application implemented on a Nokia N95 mobile phone.
The video applications on mobile communication devices have usually been designed for content creation, access,
and playback. For instance, many recent mobile devices replicate the functionalities of portable video cameras
and video recorders, and digital TV receivers. These are all demanding uses, but nothing new from the consumer
point of view. However, many of the current devices have two cameras built in, one for capturing high resolution
images, and the other for lower, typically VGA (640x480 pixels) resolution video telephony. We employ video to
enable new applications and describe four actual solutions implemented on mobile communication devices. The
first one is a real-time motion based user interface that can be used for browsing large images or documents such
as maps on small screens. The motion information is extracted from the image sequence captured by the camera.
The second solution is a real-time panorama builder, while the third one assembles document panoramas, both
from individual video frames. The fourth solution is a real-time face and eye detector. It provides another type
of foundation for motion based user interfaces as knowledge of presence and motion of a human faces in the view
of the camera can be a powerful application enabler.