Video quality assessment (VQA) plays a crucial role in video transmission and storage to ensure optimal video compression effect. In existing research, deep neural networks are commonly employed to align with the complexities of the Human Visual System (HVS) and achieve superior performance. However, due to the complexity of the HVS, many characteristics of HVS have not been considered, resulting in a suboptimal fit to the HVS. To address this issue, this paper proposes a novel VQA method grounded in the HVS. Notably, our method introduces brightness information from video frames as embeddings, thereby enhancing the model's alignment with the HVS. Additionally, we identify that feeding every single frame into the network for feature extraction leads to a lot of redundant information. To improve efficiency while maintaining comparable performance, our method samples the video frames at an interval of eight, resulting in an eightfold improvement in efficiency. The effectiveness of our method is validated through extensive experiments conducted on four mainstream VQA datasets. Ablation studies further corroborate the effectiveness and efficiency of our method by demonstrating the positive impact of incorporating brightness information into VQA.
|