As a step toward understanding the complex information from data and relationships, structural and discriminative knowledge reveals insight that may prove useful in data interpretation and exploration. This paper reports the development of an automated and intelligent procedure for generating the hierarchy of minimize entropy models and principal component visualization spaces for improved data explanation. The proposed hierarchical mimimax entropy modeling and probabilistic principal component projection are both statistically principles and visually effective at revealing all of the interesting aspects of the data set. The methods involve multiple use of standard finite normal mixture models and probabilistic principal component projections. The strategy is that the top-level model and projection should explain the entire data set, best revealing the presence of clusters and relationships, while lower-level models and projections should display internal structure within individual clusters, such as the presence of subclusters and attribute trends, which might not be apparent in the higher-level models and projections. With may complementary mixture models and visualization projections, each level will be relatively simple while the complete hierarchy maintains overall flexibility yet still conveys considerable structural information. In particular, a model identification procedure is developed to select the optimal number and kernel shapes of local clusters from a class of data, resulting in a standard finite normal mixtures with minimum conditional bias and variance, and a probabilistic principal component neural network is advanced to generate optimal projections, leading to a hierarchical visualization algorithm allowing the complete data set to be analyzed at the top level, with best separated subclusters of data points analyzed at deeper levels. Hierarchial probabilistic principal component visualization involves (1) evaluation of posterior probabilities for mixture data set, (2) estimation of multiple principal component axes from probabilistic data set, and (3) generation of a compete hierarchy of visual projections. With a soft clustering of the data set ti via the EM algorithm, data points will effectively belong to more than one cluster at any given level with posterior probabilities denoted by zik. Thus, the effective input values are zikti for an independent visualization space k in the hierarchy. Further projections can again be performed using the effective input values zikzj\kti for the visualization subspace j. The complete visual explanation hierarchy is generated by performing principal projection and model identification in two iterative steps using information theoretic criteria, EM algorithm, and probabilistic principal component analysis.