28 May 2013 Fusing video and text data by integrating appearance and behavior similarity
Author Affiliations +
In this paper, we describe an algorithm for multi-modal entity co-reference resolution and present experimental results using text and motion imagery data sources. Our model generates probabilistic association between entities mentioned in text and detected in video data by jointly optimizing the measure of appearance and behavior similarity. Appearance similarity is calculated as a match between propositionderived entity attributes mentioned in text, and the object appearance classification from video sources. The behavior similarity is calculated based on the semantic information about entity movements, actions, and interactions with other entities mentioned in text and detected in video sources. Our model achieved 79% Fscore for text-to-video entity co-reference resolution; we show that entity interactions present unique features for resolving variability present in text data and ambiguity of visual appearance of entities.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Georgiy Levchuk, Georgiy Levchuk, Charlotte Shabarekh, Charlotte Shabarekh, "Fusing video and text data by integrating appearance and behavior similarity", Proc. SPIE 8751, Machine Intelligence and Bio-inspired Computation: Theory and Applications VII, 875107 (28 May 2013); doi: 10.1117/12.2014878; https://doi.org/10.1117/12.2014878


Wrist pulse signal acquisition by video motion processing
Proceedings of SPIE (October 22 2018)
Discovering video structure using the pseudosemantic trace
Proceedings of SPIE (December 31 2000)
STRG QL spatio temporal region graph query language for...
Proceedings of SPIE (January 27 2008)
Activity-based exploitation of Full Motion Video (FMV)
Proceedings of SPIE (May 24 2012)
Mosaics from video with burned-in metadata
Proceedings of SPIE (May 09 2005)

Back to Top