Due to variations in unique patient populations, imaging hardware, and drift, artificial intelligence (AI) models perform differently in different clinical practices. The validation of externally developed models is a critical step in implementation, but several challenges related to data de-identification, security, and exchange exist. We created a workflow allowing our clinical radiology practice to safely evaluate external AI models. The workflow encompassed four steps: study selection, extraction, inference, and assessment. A commercially available AI model for intracranial hemorrhage (ICH) was used as a proof of concept. Noncontrast head CT cases were collected using both an internal search engine and a neuroradiologist teaching file that contained 16 exams in which ICH had been missed on the original radiology interpretation. These challenge cases were included to enrich the cohort. Our DICOM de-identification and processing pipeline (D2P2) processed the header and stripped identifiable information. The cleaned data was made available to the external party for AI model processing. The processed results are matched against the ground truth, and performance metrics were calculated, including subgroup analyses. The overall precision was 0.98, the recall was 0.75, and the specificity was 0.97, with an F1 score of 0.85. This result was similar to the original radiologist performance for this enriched challenge cohort. The estimated combined performance of the original radiologist with AI improved the recall to 0.91. Subgroup analyses suggested that combined performance was improved over radiologists or AI alone for subtle hemorrhages or those without mass effect. An integrated workflow to handle the assessment and validation of external deep learning models before implementation is feasible and can aid radiology practices seeking to deploy AI products, particularly in assessing the performance in the local setting and clinical value-added scenarios.