Visualization of speech signals, including the capability to visualize the waveforms while simultaneously hearing the speech, is among the essential requirements in speech processing research. In tasks related to labeling of speech signals, visualization activities may have to be performed by multiple users upon a centralized collection of speech data. When speech labeling activities involve perceptual issues, the human factors issues including functionality tradeoffs are particularly important, since the user's burden (tiredness, annoyance) can affect the perceptual responses. We developed VideVox (pronounced 'Veedeh-Vox'), a speech visualization facility, in which the visualization activities may be performed by a large number of users in geographically, dialectically and linguistically diverse locations. Developed in Java, and capable of operating both as an Internet Java applet and a Java application, VideVox is platform independent. Using the client-server architecture paradigm, it allows distributed visualization work. The Internet orientation makes VideVox a promising direction for speech signal visualization in speech labeling activities that require a large number of users in multiple locations. In the paper, we describe our approach, VideVox features, modes of audio data exploration and audio-synchronous animation for speech visualization, operations related to identification of perceptual events, and the human factors issues related to perception-oriented visualizations of speech.