Psychotic disorders such as schizophrenia and bipolar disorder are difficult to classify because they share overlapping symptoms. Deriving biomarkers of illness using structural MRI dataset are essential because they may lead to improved diagnosis. Previous studies typically predict the diagnosis labels using supervised classifiers that rely on truly labeled dataset. Mislabeled subjects may increase the complexity of the predictive model and may impact its performance. In this work, we address the problem of inaccurate diagnosis labeling of psychotic disorders using a data-driven approach. We performed dimension reduction using PCA on the vectorized images and then k-mean clustering on the components. We evaluate our method on a structural MRI dataset, with over 900 subjects labeled using DSM-IV and biotypes. An ANOVA statistical significance test was performed after clustering based on each labelling approach and after clustering. Subjects were grouped into 5 clusters using our method, and each cluster includes all types of patients. However, we found statistically significant group differences in brain regions across 5 clusters, while for DSM and biotype, there were no significant differences. Our results also show the performance of the predictive model improved significantly using datadriven labels. Our method shows underlying biological changes associated with mental illness may be identified by studying and considering features of the brain imaging data, and annotating brain imaging data using a data-driven approach may eventually lead to improved diagnosis and advanced drug discovery and help patients.