Urban search and rescue (USAR) is one of the most dangerous and time-critical non-wartime activities. Researchers have been developing hardware and software to enable robots to perform some search and rescue functions so as to minimize the exposure of human rescue personnel to danger and maximize the survival of victims. Significant progress has been achieved, but much work remains. USAR demands a blending of numerous specialized technologies. An effective USAR robot must be endowed with key competencies, such as being able to negotiate collapsed structures, find victims and assess their condition, identify potential hazards, generate maps of the structure and victim locations, and communicate with rescue personnel. These competencies bring to bear work in numerous sub-disciplines of intelligent systems (or artificial intelligence) such as sensory processing, world modeling, behavior generation, path planning, and human-robot interaction, in addition to work in communications, mechanism design and advanced sensors. In an attempt to stimulate progress in the field, reference USAR challenges are being developed and propagated worldwide. In order to make efficient use of finite research resources, the robotic USAR community must share a common understanding of what is required, technologically, to attain each competency, and have a rigorous measure of the current level of effectiveness of various technologies. NIST is working with partner organizations to measure the performance of robotic USAR competencies and technologies. In this paper, we describe the reference test arenas for USAR robots, assess the current challenges within the field, and discuss experiences thus far in the testing effort.