With the proliferation of the user-generated videos, temporal segmentation is becoming a challengeable problem. Traditional video temporal segmentation methods like shot detection are not able to work on unedited user-generated videos, since they often only contain one single long shot. We propose a novel temporal segmentation framework for user-generated video. It finds similar frames with a tree partitioning min-Hash technique, constructs sparse temporal constrained affinity sub-graphs, and finally divides the video into sub-shot-level segments with a dense-neighbor-based clustering method. Experimental results show that our approach outperforms all the other related works. Furthermore, it is indicated that the proposed approach is able to segment user-generated videos at an average human level.