13 February 2012 Going from small to large data in steganalysis
Author Affiliations +
With most image steganalysis traditionally based on supervised machine learning methods, the size of training data has remained static at up to 20000 training examples. This potentially leads to the classifier being undertrained for larger feature sets and it may be too narrowly focused on characteristics of a source of cover images, resulting in degradation in performance when the testing source is mismatched or heterogeneous. However it is not difficult to obtain larger training sets for steganalysis through simply taking more photos or downloading additional images. Here, we investigate possibilities for creating steganalysis classifiers trained on large data sets using large feature vectors. With up to 1.6 million examples, naturally simpler classification engines must be used and we examine the hypothesis that simpler classifiers avoid overtraining and so perform better on heterogeneous data. We highlight the possibilities of online learners, showing that, when given sufficient training data, they can match or exceed the performance of complex classifiers such as Support Vector Machines. This applies to both their accuracy and training time. We include some experiments, not previously reported in the literature, which provide benchmarks of some known feature sets and classifier combinations.
© (2012) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ivans Lubenko, Andrew D. Ker, "Going from small to large data in steganalysis", Proc. SPIE 8303, Media Watermarking, Security, and Forensics 2012, 83030M (13 February 2012); doi: 10.1117/12.910214; https://doi.org/10.1117/12.910214

Back to Top