Human lip-readers are increasingly being presented as useful in the gathering of forensic evidence but, like all humans, suffer from unreliability. Here we report the results of a long-term study in automatic lip-reading with the objective of converting video-to-text (V2T). The V2T problem is surprising in that some aspects that look tricky, such as real-time tracking of the lips on poor-quality interlaced video from hand-held cameras, but prove to be relatively tractable. Whereas the problem of speaker independent lip-reading is very demanding due to unpredictable variations between people. Here we review the problem of automatic lip-reading for crime fighting and identify the critical parts of the problem.
A recent trend in law enforcement has been the use of Forensic lip-readers. Criminal activities are often recorded
on CCTV or other video gathering systems. Knowledge of what suspects are saying enriches the evidence gathered
but lip-readers, by their own admission, are fallible so, based on long term studies of automated lip-reading, we
are investigating the possibilities and limitations of applying this technique under realistic conditions. We have
adopted a step-by-step approach and are developing a capability when prior video information is available for the
suspect of interest. We use the terminology video-to-text (V2T) for this technique by analogy with speech-to-text
(S2T) which also has applications in security and law-enforcement.
Conference Committee Involvement (1)
Optics and Photonics for Counter-Terrorism and Crime-Fighting