Convolutional autoencoders (CAEs) have been widely used as unsupervised feature extractors for high-resolution images. As a key component in CAEs, pooling is a biologically inspired operation to achieve scale and shift invariances, and the pooled representation directly affects the CAEs’ performance. Fine-grained pooling, which uses small and dense pooling regions, encodes fine-grained visual cues and enhances local characteristics. However, it tends to be sensitive to spatial rearrangements. In most previous works, pooled features were obtained by empirically modulating parameters in CAEs. We see the CAE as a whole and propose a fine-grained representation learning law to extract better fine-grained features. This representation learning law suggests two directions for improvement. First, we probabilistically evaluate the discrimination-invariance tradeoff with fine-grained granularity in the pooled feature maps, and suggest the proper filter scale in the convolutional layer and appropriate whitening parameters in preprocessing step. Second, pooling approaches are combined with the sparsity degree in pooling regions, and we propose the preferable pooling approach. Experimental results on two independent benchmark datasets demonstrate that our representation learning law could guide CAEs to extract better fine-grained features and performs better in multiclass classification task. This paper also provides guidance for selecting appropriate parameters to obtain better fine-grained representation in other convolutional neural networks.