Multiatlas segmentation offers an exceedingly convenient process by which image segmentation tools can be created from a series of labeled atlases (i.e., raters). However, creation of the atlases is exceedingly time consuming and prone to shifts in clinical/research demands as anatomical definitions are refined, combined, or subdivided. Hence, a process by which atlases from distinct, but complementary, anatomical “protocols” could be combined would allow for greater innovation in structural analysis and efficiency of data (re)use. Recent innovation in protocol fusion has shown that propagation of information across distinct protocols is feasible. However, how to effectively include this information in simultaneous truth and performance level estimation (STAPLE) has been elusive. We present a generalization of the STAPLE framework to account for multiprotocol rater performance (i.e., accuracy of registered atlases). This approach, multiset STAPLE (MS-STAPLE), provides a statistical framework for combining label information from atlases that have been labeled with distinct protocols (i.e., whole brain versus subcortical) and is compatible with the current local, nonlocal, probabilistic, log-odds, and hierarchical innovations in STAPLE theory. Using the MS-STAPLE approach, information from a broad range of datasets can be combined so that each available dataset contributes in a spatially dependent manner to local labels. We evaluate the model in simulations and in the context of an experiment where an existing set of whole-brain labels (14 structures) is refined to include parcellation of subcortical structures (26 structures). In the empirical results, we see significant improvement in the Dice similarity coefficient when comparing MS-STAPLE to STAPLE and nonlocal MS-STAPLE to nonlocal STAPLE.