With the growing mobility of the population and popularity of the Internet, real estate agents have larger database to manage. This paper presents a solution to classify images of a certain house, such as living room, kitchen, bathroom, layout sketch and external appearance collected by a real estate agent using transfer learning. The pictures are like those images posted on the real estate agent website to help people find out what’s the house looks like inside and outside. We employ a transfer learning approach for VGG-19 architecture. Using a network pre-trained on the general ImageNet dataset, we perform supervised fine-tuning on the last full connect layer and change the output size from 1000 to 5. Experimental results achieved with 5-fold cross-validation show that after training, this fine-tuning approach achieves high test accuracy of 99.4%.
Human detection is a technology that is used in various fields. It is, however, a trade-oﬀ between detection accuracy and precise extraction of human regions. The purpose of this paper is, therefore, to extract a precise human region using Coarse-to-Fine Method or Human Skeleton Method. In the Coarse-to-Fine Method, at first, Coarse Detector detects multi-posture humans. Next, Fine Detector extracts precise human regions. In the Human Skeleton Method, human skeletons are extracted by using OpenPose. Next, skeleton images are dilated based on physique of the human. Finally, human regions are extracted by using GrabCut. The extracted results are evaluated by detection accuracy and preciseness of extracted regions. F-measure and IoU (Intersection over Union) are employed to evaluate the detection accuracy and the preciseness respectively. In the Coarse-to-Fine Method, the detection accuracy is 0.523, and the extraction preciseness becomes 0.807. In the Human Skeleton Method, the detection accuracy reaches 0.928, and the extraction preciseness is 0.868. Especially, Human Skeleton Method gets excellent performance in not only the extraction preciseness but also the detection accuracy.
This paper describes one of the assistance methods for annotation tasks of sign language words using binary action segmentation. The binary action segmentation divides a sign video into binary units, which correspond to during sign and static posture. At this time, the user's annotation tasks can be reduced from the full-manual work to inputting labels and correction of the segmented units. The proposed binary action segmentation is composed of Support Vector Machine and Graphcuts. The trained Support Vector Machine classifies each frame into Motion or Pause, and Graphcuts refines the initial segmentation. We evaluated the proposed method with a Japanese sign language words database. The database includes 92 Japanese sign language words which are signed by ten native signers. The total number of videos is 4,590, and 3,800 videos of 76 words except for recording and sign errors are used for the evaluation. The proposed method achieves comparable result with a smaller amount of training data than the previous method. Moreover, the work reduction ratios of annotation tasks using an annotation interface were 26:17%, 26:34%, and 17:88% for the sets whose the numbers of segmented units were 2, 3, and 4, respectively.