Tracking of a handheld device’s three-dimensional (3-D) position and orientation is fundamental to various application domains, including augmented reality (AR), virtual reality, and interaction in smart spaces. Existing systems still offer limited performance in terms of accuracy, robustness, computational cost, and ease of deployment. We present a low-cost, accurate, and robust system for handheld pose tracking using fused vision and inertial data. The integration of measurements from embedded accelerometers reduces the number of unknown parameters in the six-degree-of-freedom pose calculation. The proposed system requires two light-emitting diode (LED) markers to be attached to the device, which are tracked by external cameras through a robust algorithm against illumination changes. Three data fusion methods have been proposed, including the triangulation-based stereo-vision system, constraint-based stereo-vision system with occlusion handling, and triangulation-based multivision system. Real-time demonstrations of the proposed system applied to AR and 3-D gaming are also included. The accuracy assessment of the proposed system is carried out by comparing with the data generated by the state-of-the-art commercial motion tracking system OptiTrack. Experimental results show that the proposed system has achieved high accuracy of few centimeters in position estimation and few degrees in orientation estimation.