Eye tracking technology allows researchers to monitor position of the eye and infer one’s gaze direction, which is used to understand the nature of human attention within psychology, cognitive science, marketing and artificial intelligence. Commercially available head-mounted eye trackers allow researchers to track pupil movements (saccades and fixations) using infrared camera and capture the field of vision by a front-facing scene camera. The wearable eye tracker opened a new way to research in unconstrained environment settings; however, the recorded scene video typically has non-uniform illumination, low quality image frames, and moving scene objects. One of the most important tasks for analyzing the recorded scene video data is finding the boundary between different objects in a single frame. This paper presents a multi-level fixation-oriented object segmentation method (MFoOS) to solve the above challenges in segmenting the scene objects in video data collected by the eye tracker in order to support cognition research. MFoOS shows its advancement in position-invariance, illumination, noise tolerance and is task-driven. The proposed method is tested using real-world case studies designed by our team of psychologists focused on understanding visual attention in human problem solving. The extensive computer simulation demonstrates the method’s accuracy and robustness for fixation-oriented object segmentation. Moreover, a deep-learning image semantic segmentation combining MFoOS results as label data was explored to demonstrate the possibility of on-line deployment of eye tracker fixation-oriented object segmentation.