For the past 2 decades, detection of buried explosive hazard has been studied extensively and several machine learning algorithms have been developed and adapted to this application. First, a pre-screener is used to identify areas of interest or alarms. Each alarm consists of a 3D data cube that corresponds to spatial down-track, crosstrack, and time respectively. Ground truth information is then used to label each alarm and generate labeled data to train a classifier to discriminate between targets and clutter objects. One of the main challenges in this approach is localizing the true depth of the alarm. On one hand, the buried object signature is not expected to cover all the depth values and extracting one global feature from all depth bins may not discriminate between object and clutter signatures effectively. On the other hand, depth ground truth is not available as this depends on the target type and size, soil properties, and other environmental conditions. Moreover, visually inspecting each alarm to select the optimal depth location is tedious, ambiguous, and not practical for very large training data. Two different approaches have been considered to train learning algorithms. The first one uses simple rules and machine learning algorithms to automate the selection of the optimal depth(s) for each alarm. The second approach avoids the labeling at the depth level and uses multiple instance learning (MIL) algorithms. In this context, each alarm is represented by a bag of multiple instances. Each instance corresponds to a feature extracted at a different depth. Since labels are needed at the bag level and not at the instance level, MIL does not require true depth information. In this paper, we propose a large-scale evaluation of the two approaches. For the first approach, we consider three methods to identify optimal depth locations and analyze their effect on the KNN and SVM classifiers. For the second approach, we consider four MIL algorithms that do not require depth information. For our analysis, we use large data collections accumulated across multiple dates and multiple test sites by a vehicle mounted downward looking ground penetrating radar (GPR) sensor. The data include a diverse set of buried explosive objects of varying shapes, metal content, and underground burial depths. Performance of all algorithms is analyzed using receiver operating characteristics (ROC).