Segmentation plays a critical role in exposing connections between biological structure and function. The process of
label fusion collects and combines multiple observations into a single estimate. Statistically driven techniques provide
mechanisms to optimally combine segmentations; yet, optimality hinges upon accurate modeling of rater behavior.
Traditional approaches, e.g., Majority Vote and Simultaneous Truth and Performance Level Estimation (STAPLE), have
been shown to yield excellent performance in some cases, but do not account for spatial dependences of rater
performance (i.e., regional task difficulty). Recently, the COnsensus Level, Labeler Accuracy and Truth Estimation
(COLLATE) label fusion technique augmented the seminal STAPLE approach to simultaneously estimate regions of
relative consensus versus confusion along with rater performance. Herein, we extend the COLLATE framework to
account for multiple consensus levels. Toward this end, we posit a generalized model of rater behavior of which
Majority Vote, STAPLE, STAPLE Ignoring Consensus Voxels, and COLLATE are special cases. The new algorithm is
evaluated with simulations and shown to yield improved performance in cases with complex region difficulties. Multi-COLLATE achieve these results by capturing different consensus levels. The potential impacts and applications of
generative model to label fusion problems are discussed.