Semantic segmentation of urban areas can provide useful information for analyzing and detecting changes in urban development. Recently, numerous remote sensing image datasets from various platforms have been acquired, and various semantic segmentation studies using them have been conducted. However, they do not contain many images because of their large data capacity and difficulty in constructing label data. Furthermore, it is difficult to use them simultaneously because each dataset has a different spatial resolution, shooting angle, and meaningful objects. In this study, two different UAV image datasets, such as UAVid semantic segmentation and semantic drone datasets, were used to train a combined U-net model to use heterogeneous remote sensing datasets for semantic segmentation tasks simultaneously. The UAVid dataset has a flight height of 50 m and 300 images with eight classes. However, the semantic drone dataset was acquired at an altitude of 5–30 m above the ground and contains 598 images with 20 classes. The combined U-net model is based on the U-net architecture, but it receives input from two different data sources. The experimental results showed that learning two datasets with a combined U-net improved semantic segmentation accuracy more than learning each data with a U-net. This study confirms the ability to train two different datasets acquired from different places and platforms simultaneously; thus, evaluating the applicability of semantic segmentation studies using heterogeneous remote sensing datasets.
|