This paper presents a method for evaluating expert systems. The evaluation method consists of criteria development, empirical performance assessment and feedback. Included are assessments by the domain expert, knowledge engineers, end users and management. The evaluation criteria include factors traditionally used to evaluate expert systems ( quality of advice, correctness of reasoning strategy, user interface, hardware environment, and response time) and factors that concern field deployment as a product (expectations of the end users, domain expert and management). The criteria are developed and applied by incorporating the viewpoints of various parties concerned with the development and field use of the expert system. The problem solving performance of the expert system is evaluated by using a data base of correctly diagnosed cases obtained from the field. The evaluation method was developed to test and refine a prototype knowledge-based system GEMS (Generalized Expert Maintenance System) and has been successfully used to evaluate the first phase of GEMS, the Trunk Trouble Analyzer (TTA). GEMS-TTA analyzes outage codes that can occur on trunks terminating on the 4ESS switch. The evaluation method has shown that GEMS-TTA covers 60% of all possible outage codes that can occur, correctly analyzes 100% of the trouble tickets in the test sample and uses reasoning strategies identical to those used by an expert technician. The evaluation method is comprehensive and general enough to test and refine expert systems in other domains as well.