Nowadays, multiple video cameras are employed for live broadcast and recording of almost all major social events in the TV industry, and all these camera streams have to be aggregated and rendered into one video program for audiences. While this content composition process aims at presenting the most interesting perspective of an event, it leads to the problem of how to fully customize the finally composed video program to different audience interests without requiring too much input from the audience. The goal of this work is to solve this problem by proposing the Automatic Video Production with User Customization (AVPUC) system that separates the video stream interestingness comparison from video program rendering to provide space for maximized customization. The unique feature of the AVPUC systems is that Human-controlled video selection and automatic video evaluation are combined to support video content customization and reduce redundant audience inputs. Preliminary evaluation results confirm that AVPUC's capturing-evaluation-render model for video production improves audiences' satisfaction for customized multi-perspective viewing of social events.