Learning on the Internet becomes the main stream in the deployment of distance education and training. In such an environment, both learners and teachers or peer-to-peer are able to join the learning activities synchronously as well as asynchronously. However, the present user interfaces still do not provide enough flexibility and interactively to both learners and teachers. As a result, the users may lack of interests to continuously use the prepared learning environment. In this paper, we design and implement a multimedia user interface, which contains video, audio, text contents, and authoring tools. We integrate the state-of-the-art technologies, such as image segmentation, object tracking, and voice over IP. In the video portion, the system is able to automatically track the teacher movement. Teachers have high degree of freedom in presenting themselves to attract learners' attention. To achieve the above, the developed region-based image segmentation scheme and tracking scheme is realized in real-time. Special voice transmission scheme based on forward error correction is also designed to wrestle with the transmission difficulties on the present Internet. Authoring tools are provided to teachers and learners to improve the convenient utilization of the system.