Understanding of stroke etiology and its genetic pathways is critical for planning, implementation, and evaluation of stroke patient treatments. However, this knowledge discovery requires phenotyping stroke and integration of multiple demographic, clinical, genetic and imaging phynotypes by developing and running sophisticated processing pipelines at massive scale. The Stroke Neuroimaging Phenotype Repository (SNIPR) was developed in 2018 as a large multi-center centralized imaging repository of clinical CT and MRI scans from stroke patients worldwide, based on the Extensible Neuroimaging Archive Toolkit (XNAT). The aims of this repository are to: (i) Create a central retrospective repository to host and provide secure access to data from anonymized acute stroke patients with serial clinical imaging; (ii) Facilitate integration of independent stroke phenotypic studies via data aggregation techniques; and (iii) Expedite the development of containerized deep learning pipelines to perform large-scale analysis of complications after stroke. Currently, SNIPR hosts 8 projects, 1877 subjects and 5281 imaging sessions from Washington University Medical Center’s clinical image archive as well as contributions from collaborators in different countries, including US, Finland, Poland, and Spain. Moreover, we have used XNAT’s standard XML Schema extension mechanism to create data type extensions to support stroke phenotypic studies, including clinical phenotypes like NIHSS and imaging phenotypes like infarct and Cerebrospinal fluid (CSF) volume. We have developed deep learning pipelines to facilitate image processing and analysis and deployed these pipelines through XNAT’s container service. The container service enables these pipelines to execute at large scale with Docker Swarm on an attached compute cluster. Our pipelines include a scan-type classifier which includes a convolutional neural network (CNN) approach and a natural language processing approach to automatically categorize uploaded CT sequences into defined classes to facilitate selection for further analysis. We deployed this containerized classifier within a broader pipeline to facilitate big data analysis of cerebral edema after stroke, and we got 99.4 % test accuracy on 10000 scans. SNIPR enables the developed automatic pipelines to use this automatic scan selection, develop and validate imaging phenotypes and couple them with clinical and genetic data with the overarching aim of enabling a broad understanding of stroke progression and outcomes.