We propose a supervised approach to detect falls in a home environment using an optimized descriptor adapted to real-time tasks. We introduce a realistic dataset of 222 videos, a new metric allowing evaluation of fall detection performance in a video stream, and an automatically optimized set of spatio-temporal descriptors which fed a supervised classifier. We build the initial spatio-temporal descriptor named STHF using several combinations of transformations of geometrical features (height and width of human body bounding box, the user’s trajectory with her/his orientation, projection histograms, and moments of orders 0, 1, and 2). We study the combinations of usual transformations of the features (Fourier transform, wavelet transform, first and second derivatives), and we show experimentally that it is possible to achieve high performance using support vector machine and Adaboost classifiers. Automatic feature selection allows to show that the best tradeoff between classification performance and processing time is obtained by combining the original low-level features with their first derivative. Hence, we evaluate the robustness of the fall detection regarding location changes. We propose a realistic and pragmatic protocol that enables performance to be improved by updating the training in the current location with normal activities records.