We focus on motion saliency detection, which has attracted much attention in recent years. Distinct from conventional algorithms, without considering the spatial information of the input frames, our method is solely based on the temporal difference of corresponding pixels. To be specific, the difference is modeled as two parts, i.e., symmetric frame difference and background sample difference. The first term calculates the differences of adjacent frames, including not only the previous n frames but also the backward n frames. To obtain the background sample difference, N samples taken from the previous frames are stored, and the differences between the coming pixel and all its corresponding samples are accumulated. The initialization and updating policies of the samples follow the off-the-shelf visual background extractor method. Finally, the motion saliency map is generated by fusing the aforementioned two features. We describe our method in full detail and compare it with three other state-of-the-art motion detection techniques. Experiments on a variety of visual surveillance databases verify that the proposed algorithm consistently outperformed the compared methods. We also demonstrate how the generated saliency maps can be used to create high-quality segmentation masks for moving object detection.