Proceedings Volume Medical Imaging 2004: PACS and Imaging Informatics, (2004) https://doi.org/10.1117/12.536404
Medical records are written mainly, in natural language. The focus of this study is narrative radiological reports written in natural Japanese. These reports cannot be used for advanced retrieval, data mining, and so on, unless they are stored, using a structured format such as DICOM-SR. The goal is to structure narrative reports progressively, using natural language processing (NLP). Structure has many different levels, for example, DICOM-SR has three established levels -- basic text, enhanced and comprehensive. At the enhanced level, it is necessary to use numerical measurements and spatial & temporal coordinates. In this study, the wording used in the reports was first standardized, dictionaries were organized, and morphological analysis performed. Next, numerical measurements and temporal coordinates were extracted, and the objects to which they referred, analyzed. 10,000 CT and MR reports were separated into 82,122 sentences, and 34,269 of the 36,444 numerical descriptions were tagged. Periods, slashes, hyphens, and parentheses are ambiguously used in the description of enumerated lists, dates, image numbers, and anatomical names, as well as at the end of sentences; to resolve this ambiguity, descriptions were processed, according to the order -- date, size, unit, enumerated list, and abbreviation -- then, the tagged reports were separated into sentences.