Segmentation of medical images is fundamental for many high-level applications. Unsupervised techniques such as region growing or merging allow automated processing of large data amounts. The regions are usually described by a mean feature vector, and the merging decisions are based on the Euclidean distance. This kind of similarity model is strictly local, since the feature vector of each region is calculated without evaluating the region's surrounding. Therefore, region merging often fails to extract visually comprehensible and anatomically relevant regions. In our approach, the local model is extended. Regional similarity is calculated for a pair of adjacent regions, e.g. considering the contrast along their common border. Global similarity components are obtained by analyzing the entire image partitioning before and after a hypothetical merge. Hierarchical similarities are derived from the iteration history. Local, regional, global, and hierarchical components are combined task-specifically guiding the iterative region merging process. Starting with an initial watershed segmentation, the process terminates when the entire image is represented as a single region. A complete segmentation takes only a few seconds. Our approach is evaluated contextually on plain radiographs that display human hands acquired for bone age determination. Region merging based on a local model fails to detect most bones, while a correct localization and delineation is obtained with the combined model. A gold standard is computed from ten manual segmentations of each radiograph to evaluate the quality of delineation. The relative error of labeled pixels is 15.7%, which is slightly more than the mean error of the ten manual references to the gold standard (12%). The flexible and powerful similarity model can be adopted to many other segmentation tasks in medical imaging.