We present a compressive sensing video acquisition scheme that relies on the sparsity properties of video in the spatial domain. In this scheme, the video sequence is represented by a reference frame, followed by the difference of measurement results between each pair of neighboring frames. The video signal is reconstructed by first reconstructing the frame differences using [script l]1 minimization algorithm, then adding them sequentially to the reference frame. Simulation results on both simulated and real video sequences show that when the spatial changes between neighboring frames are small, this scheme provides better reconstruction results than existing compressive sensing video acquisition schemes, such as 2-D or 3-D wavelet methods and the minimum total-variance (TV) method. This scheme is suitable for compressive sensing acquisition of video sequences with relatively small spatial changes. A method that estimates the amount of spatial change based on the statistical properties of measurement results is also presented.