In this paper, we present an automatic summarization of highlights in golf videos based on audio information alone
without video information. The proposed highlight summarization system is carried out based on semantic audio
segmentation and detection on action units from audio signals. Studio speech, field speech, music, and applause are
segmented by means of sound classification. Swing is detected by the methods of impulse onset detection. Sounds like
swing and applause form a complete action unit, while studio speech and music parts are used to anchor the program
structure. With the advantage of highly precise detection of applause, highlights are extracted effectively. Our
experimental results obtain high classification precision on 18 golf games. It proves that the proposed system is very
effective and computationally efficient to apply the technology to embedded consumer electronic devices.