High-performance Activity Recognition models from video data are difficult to train and deploy efficiently. We measure efficiency in performance, model size, and run-time; during training and inference. Researchers have demonstrated that 3D convolutions capture the space-time dynamics well . The challenge is that 3D convolutions are computationally intensive.  Propose the Temporal Shift Module (TSM) for train-efficiency, and  proposes DeepCompression for inference-efficiency. TSM is a simple yet effective way to gain near 3D convolution performance at 2D convolution computation cost. We apply these efficiency techniques to a newly labeled activity recognition data set through transfer learning. Our labeling strategy is meant to create highly temporal activity. We benchmark against a 2D ResNet50 backbone trained on individual frames, and a multilayer 3DCNN on multi-frame short videos. Our contributions are: 1. A new highly temporal activity recognition dataset based on egoHands . 2. results that show a 3D backbone on videos outperforms a 2D one on frames. 3. With TSM we achieve 5x train efficiency in run-time with negligible performance loss. 4. With Quantization alone we achieve 10x inference efficiency in model size with negligible performance loss.