Measuring voice quality for telephony is not a new problem. However, packet-switched, best-effort networks such as the Internet present significant new challenges for the delivery of real-time voice traffic. Unlike the circuit-switched PSTN, Internet protocol (IP) networks guarantee neither sufficient bandwidth for the voice traffic nor a constant, minimal delay. Dropped packets and varying delays introduce distortions not found in traditional telephony. In addition, if a low bitrate codec is used in voice over IP (VoIP) to achieve a high compression ratio, the original waveform can be significantly distorted. These new potential sources of signal distortion present significant challenges for objectively measuring speech quality. Measurement techniques designed for the PSTN may not perform well in VoIP environments. Our objective is to find a speech quality metric that accurately predicts subjective human perception under the conditions present in VoIP systems. To do this, we compared three types of measures: perceptually weighted distortion measures such as enhanced modified Bark spectral distance (EMBSD) and measuring normalizing blocks (MNB), word-error rates of continuous speech recognizers, and the ITU E-model. We tested the performance of these measures under conditions typical of a VoIP system. We found that the E-model had the highest correlation with mean opinion scores (MOS). The E-model is well-suited for online monitoring because it does not use the original (undistorted) signal to compute its quality metric and because it is computationally simple.