Viewpoint variation has been a major challenge in dealing with comparison-based image processing. Reduction or total removal of viewpoint variation has been the common pursuit of many processing applications such as human motion analysis and gesture analysis. By exploiting three-dimensional (3-D) skeletal joints information provided by RGB-D cameras such as Kinect, this study proposes a skeleton-based viewpoint invariant transformation (SVIT) technique that transforms the 3-D skeleton data into an orthogonal coordinate system constructed by the three most stable joints of detected person’s upper torso: the left shoulder, the right shoulder, and the spine. The proposed 3-D transformation can eliminate observation variations caused not only by viewpoint differences but also individual differences in body swing movement. With reference to the human activity database MSRDailyAct3D and our recording data, experiments on human motion analysis were designed and conducted to contrast the effectiveness of the proposed SVIT as a preprocessing step before comparing the test data and the sample data, with very different viewpoints. This results show that, with SVIT as a pre-processing for activity analysis, the performance of correct activity identification has increased from 45% to 60%. SVIT outperforms other state-of-the-art view invariant human motion analysis based on human body centric approaches.