13 April 2018 Video segmentation using keywords
Author Affiliations +
Proceedings Volume 10696, Tenth International Conference on Machine Vision (ICMV 2017); 106960U (2018) https://doi.org/10.1117/12.2310102
Event: Tenth International Conference on Machine Vision, 2017, Vienna, Austria
At DAVIS-2016 Challenge, many state-of-art video segmentation methods achieve potential results, but they still much depend on annotated frames to distinguish between background and foreground. It takes a lot of time and efforts to create these frames exactly. In this paper, we introduce a method to segment objects from video based on keywords given by user. First, we use a real-time object detection system - YOLOv2 to identify regions containing objects that have labels match with the given keywords in the first frame. Then, for each region identified from the previous step, we use Pyramid Scene Parsing Network to assign each pixel as foreground or background. These frames can be used as input frames for Object Flow algorithm to perform segmentation on entire video. We conduct experiments on a subset of DAVIS-2016 dataset in half the size of its original size, which shows that our method can handle many popular classes in PASCAL VOC 2012 dataset with acceptable accuracy, about 75.03%. We suggest widely testing by combining other methods to improve this result in the future.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Vinh Ton-That, Vinh Ton-That, Chi-Tai Vong, Chi-Tai Vong, Xuan-Truong Nguyen-Dao, Xuan-Truong Nguyen-Dao, Minh-Triet Tran, Minh-Triet Tran, } "Video segmentation using keywords", Proc. SPIE 10696, Tenth International Conference on Machine Vision (ICMV 2017), 106960U (13 April 2018); doi: 10.1117/12.2310102; https://doi.org/10.1117/12.2310102

Back to Top