Because of the shortage of nurses in the world, the realization of a robotic nurse that can support surgeries autonomously is very important. More specifically, the robotic nurse should be able to autonomously recognize different situations of surgeries so that the robotic nurse can pass necessary surgical tools to the medical doctors in a timely manner. This paper proposes and explores methods that can classify suture and tying actions during suture operations from the video sequence that observes the surgery scene that includes the surgeon’s hands. First, the proposed method uses skin pixel detection and foreground extraction to detect the hand area. Then, interest points are randomly chosen from the hand area so that their 3D SIFT descriptors are computed. A word vocabulary is built by applying hierarchical K-means to these descriptors, and the words’ frequency histogram, which corresponds to the feature space, is computed. Finally, to classify the actions, either SVM (Support Vector Machine), Nearest Neighbor rule (NN) for the feature space or a method that combines “sliding window” with NN is performed. We collect 53 suture videos and 53 tying videos to build the training set and to test the proposed method experimentally. It turns out that the NN gives higher than 90% accuracies, which are better recognition than SVM. Negative actions, which are different from either suture or tying action, are recognized with quite good accuracies, while “Sliding window” did not show significant improvements for suture and tying and cannot recognize negative actions.