This paper proposes a pointing device using vision with gaze control which performs its task by tracking and recognizing user's hand. To realize this gaze control operation, i.e. restricting processing area within certain region and suppressing data around that region, we introduce the 'local-moment feature' defined as moments of local area of an image. Since the local moment has the capability to suppress unnecessary information outside of the area, it can be applied effectively to an image containing much noises and obstacle objects. Comparing with the ordinary moment, computational cost of the local moment is fairly small and can be easily applied to real-time system. Using proposed system, a user can point on a position on the screen by moving the hand, and can manipulate objects on the screen in several ways, such as 'pick', 'release' or 'tap', by forming fingers in particular shape. The proposed system has been implemented on personal computer with video capture board and the validity of the system has been shown by results of tracking and recognition experiments.