Self-learning equivalent-convolutional neural structures (SLECNS) for auto-coding-decoding and image clustering are discussed. The SLECNS architectures and their spatially invariant equivalent models (SI EMs) using the corresponding matrix-matrix procedures with basic operations of continuous logic and non-linear processing are proposed. These SI EMs have several advantages, such as the ability to recognize image fragments with better efficiency and strong cross correlation. The proposed clustering method of fragments with regard to their structural features is suitable not only for binary, but also color images and combines self-learning and the formation of weight clustered matrix-patterns. Its model is constructed and designed on the basis of recursively processing algorithms and to k-average method. The experimental results confirmed that larger images and 2D binary fragments with a large numbers of elements may be clustered. For the first time the possibility of generalization of these models for space invariant case is shown. The experiment for an image with dimension of 256x256 (a reference array) and fragments with dimensions of 7x7 and 21x21 for clustering is carried out. The experiments, using the software environment Mathcad, showed that the proposed method is universal, has a significant convergence, the small number of iterations is easily, displayed on the matrix structure, and confirmed its prospects. Thus, to understand the mechanisms of self-learning equivalence-convolutional clustering, accompanying her to the competitive processes in neurons, and the neural auto-encoding-decoding and recognition principles with the use of self-learning cluster patterns is very important which used the algorithm and the principles of non-linear processing of two-dimensional spatial functions of images comparison. These SIEMs can simply describe the signals processing during the all training and recognition stages and they are suitable for unipolar-coding multilevel signals. We show that the implementation of SLECNS based on known equivalentors or traditional correlators is possible if they are based on proposed equivalental two-dimensional functions of image similarity. The clustering efficiency in such models and their implementation depends on the discriminant properties of neural elements of hidden layers. Therefore, the main models and architecture parameters and characteristics depends on the applied types of non-linear processing and function used for image comparison or for adaptive-equivalental weighing of input patterns. Real model experiments in Mathcad are demonstrated, which confirm that non-linear processing on equivalent functions allows you to determine the neuron winners and adjust the weight matrix. Experimental results have shown that such models can be successfully used for auto- and hetero-associative recognition. They can also be used to explain some mechanisms known as "focus" and "competing gain-inhibition concept". The SLECNS architecture and hardware implementations of its basic nodes based on multi-channel convolvers and correlators with time integration are proposed. The parameters and performance of such architectures are estimated.