Advertisement

A new era: Automated extraction of detailed prostate cancer information from narratively written health records. Pioneer work from a European tertiary care center

Abstract: PD37-06
Sources of Funding: none

Introduction

Detailed pathological information are necessary for follow-up analyses and new prediction tool development. Most institutional databases harbour a lack of either quantity or quality of such data. Expensive manpower needs to be manually invested for this purpose in daily clinical practice. _x000D_ Natural language (NLP) processing presents immense potential to automate the information-gathering process in the field of urology. To propose and validate a novel tool to extract specific detailed pathological information from written health records that contain continuous, narrative text in an automated and precise way.

Methods

Overall, 1500 postoperative narrative pathology reports of patients undergoing radical prostatectomy in 2015 were analyzed. Of these, 750 reports were randomly selected as training data and the remaining 750 reports were used as validation data. Using domain knowledge from clinical experts and Stanford treebank parser for German, rule based extraction algorithms were created for pathological staging, Gleason percentages, prostate dimension and laterality of the tumor by iterative review of misclassified reports until there are no longer any misclassified reports in the training data. Each number found in the reports was also verified by clinical experts.

Results

By applying the developed NLP system on the validation data, we assed the accuracy of each information extracted. The NLP derived accuracy for pT-stage, pN-stage, Gleason percentage, prostate weight, prostate volume and laterality of the tumour were 100%, 100%, 100%,100% and 97%, respectively.

Conclusions

We developed a novel NLP method that could extract detailed pathological information from a narrative, written pathological report with very high accuracy. This automated method can be implemented with the aim to greatly increase the efficiency and accessibility of research data in academic centers.

Funding

none

Authors
Sami-Ramzi Leyh-Bannurah
Zhe Tian
Pierre Karakiewicz
Dirk Pehrke
Hartwig Huland
Markus Graefen
Lars Budäus
back to top