When a CAD-AI network is created and employed, both safety and effectiveness need to be guaranteed for all subgroups of the target population. We present a novel toolbox for automatic slicing and performance assessment in a generic and modular approach, helping to find the subpopulations where cautiousness is warranted, and the model may need improvement. In a first step slices are generated and saved for further analysis inspired by the existing ‘Slice Finder’ algorithm. Depending on the type of AI task (classification, object detection, segmentation, instance segmentation...) multiple metrics are evaluated. Both labeled (specificity, sensitivity...), unlabeled (outlier score, confidence score...) and user-defined metrics can be included. Optionally, the confidence interval (CI) is calculated. In a last step, the metric values and CI are used to rank the slices to quickly find the slices of interest. Custom ranking methods can be added, keeping the full process from slice generation up to and including visualization modular and customizable. We illustrate the toolbox with a dermatology classification and object detection use-case. First a single model is evaluated down to crosses of three slices where slices of interest are detected on degree three which would be difficult to find if not automated. Additionally, the usage of unlabeled metrics such as outlier score is illustrated to automatically find slices of interest.
|