Optimization of large neural networks is essential in improving the network speed and generalization power, while at the same time reducing the training error and the network complexity. Boltzmann methods have been used as a statistical method for combinatorial optimization and for the design of learning algorithm. In the networks studied here, the Adaptive Resonance Theory (ART) serves as a connection creation operator and the Boltzmann method serves as a competitive connection annihilation operator. By combining these two methods it is possible to generate small networks that have similar testing and training accuracy and good generalization from small training sets. Our findings demonstrate that for a character recognition problem the number of weights in a fully connected network can be reduced by over 80%. We have applied the Boltzmann criteria to differential pruning of the connections which is based on the weight contents rather than on the number of connections.