Today, video surveillance systems produce thousands of terabytes of data. This source of information can be very valuable, as it contains spatio-temporal information about abnormal, similar or periodic activities. However, a search for certain situations or activities in unstructured large-scale video footage can be exhausting or even pointless. Searching surveillance video footage is extremely difficult due to the apparent similarity of situations, especially for human observers. In order to keep this amount manageable and hence usable, this paper aims at clustering situations regarding their visual content as well as motion patterns. Besides standard image content descriptors like HOG, we present and investigate novel descriptors, called Franklets, which explicitly encode motion patterns for certain image regions. Slow feature analysis (SFA) will be performed for dimension reduction based on the temporal variance of the features. By reducing the dimension with SFA, a higher feature discrimination can be reached compared to standard PCA dimension reduction. The effects of dimension reduction via SFA will be investigated in this paper. Cluster results on real data from the Hamburg Harbour Anniversary 2014 will be presented with both, HOG feature descriptors and Franklets. Furthermore, we could show that by using SFA an improvement to standard PCA techniques could be achieved. Finally, an application to visual clustering with self-organizing maps will be introduced.