Advertisement

Harnessing full text pathology data from the electronic health record to advance bladder cancer care – Development of a Natural Language Processing system to generate longitudinal pathology data

Login to Access Video or Poster Abstract: MP32-12
Sources of Funding: Dept of Veterans Affairs VISN 1 Career Development Award; Conquer Cancer Foundation Career Development Award; DHMC Dept of Surgery internal Career Development Award

Introduction

Population-based studies to advance bladder cancer care require longitudinal pathology data that allow measurement of disease recurrence and progression. The prime data source for population-based studies has been SEER-Medicare, but SEER data is limited because pathologic information is only abstracted at time of diagnosis. We set out to obtain longitudinal pathology data by developing a natural language processing (NLP) engine to automate abstraction of important details from full text pathology reports.

Methods

We selected a national random sample of 600 bladder pathology reports from the Department of Veterans Affairs (VA) Corporate Data Warehouse. These reports were independently annotated by two reviewers with discrepancies resolved by a third to develop a gold standard. We used Cohen&[prime]s kappa to evaluate inter-rater reliability for histology, invasion (presence versus absence and depth), grade, and statements regarding presence of muscularis propria and of carcinoma in situ. Next, we iteratively trained, developed, and tested the NLP engine&[prime]s ability to abstract these variables from the reports. We assessed NLP performance by calculating accuracy, precision (positive predictive value), and recall (sensitivity).

Results

Inter-rater reliability was excellent between the two reviewers (kappa ranging from 0.82 to 0.90). NLP achieved the highest accuracy for presence of carcinoma in situ (0.98), with accuracy for histology, invasion, grade, and presence of muscularis propria ranging from 0.82 to 0.93 (Table). The most challenging variable was depth of invasion, due to the high variability in the language used to describe findings. Nevertheless, we achieved acceptable accuracy (0.82) and precision (0.79; table).

Conclusions

We developed an NLP engine to accurately abstract important pathologic details from full text bladder cancer pathology reports. This engine now allows for abstraction of data from tens of thousands of bladder cancer pathology reports, enabling us to develop a population-based cohort of patients with longitudinal pathology data. The resulting unique dataset will be used to examine the extent to which bladder cancer care impacts recurrence and progression of disease.

Funding

Dept of Veterans Affairs VISN 1 Career Development Award; Conquer Cancer Foundation Career Development Award; DHMC Dept of Surgery internal Career Development Award

Authors
Florian Schroeck
Olga Patterson
Patrick Alba
Scott DuVall
Brenda Sirovich
Douglas Robertson
John Seigne
Philip Goodney
back to top