Electronic health records (EHRs) are digitizing valuable medical data on a massive scale. However, up to 70% of meaningful information for medical registries, outcomes researchers, and clinicians is held within practitioner notes. These free text fields are unstructured, so there is little to no standardization of the content, format, or quality of these notes. Consequently, transforming these free text fields into useful, quantified data remains a difficult problem.

Natural language processing (NLP) offers a computational means to synthesize this text. From a high level, NLP here involves feeding an algorithm large amounts of EHR notes from which it “learns” a set of rules to identify what is meaningful. These rules take the form of probabilistic and statistical functions. Lin et al. (2013) offers a simple example: an NLP-based algorithm should determine that “hypertension” and “elevated blood pressure” refer to the same concept. It should also be able to evaluate the context of these phrases and determine whether they reflect a new diagnosis, patient-reported symptoms, family history, and so on.

Ultimately, NLP promises a method to quickly and cost-effectively sift through huge amounts of EHR notes and give quantifications of phenomena of interest. And this is already happening, as exemplified in some of the following published research:

However, NLP successes remain largely in the lab, versus the clinic. This is due to the fact that developing highly accurate NLP programs is still a very difficult problem. Intuitively, processing free text in a clinically-meaningful way faces a number of challenges, such as:

  • Syntax and grammar subtleties, such as punctuation (e.g., the period after “Dr.” does not indicate a sentence’s end);
  • Identifying the start and end to meaningful phrases (e.g., is “extremely irritable” different from just “irritable”);
  • Synonyms and tense (e.g., “swelling”, “swollen”, and “inflammation” reflect the same phenomenon);
  • Spelling errors and abbreviations;
  • Sequencing, ordering, and temporality (“primary patient complaint…” vs. “patient initially experienced…” vs. “patient presented in March…”); 

These factors are on top of the intra- and inter-individual variability classically associated with outcomes research, and on top of the task of determining what metrics are relevant to a research question in the first place.

Altogether, NLP software is not at a point where it can be bought off-the-shelf, fed a handful of clinician notes, and produce highly-accurate, statistically significant inferences. Consequently, researchers considering NLP tools should be cognizant that successful studies generally benefit from the following:

  • Large amounts of data to train and validate a NLP algorithm;
    • Computing power closer to the scale of supercomputers than personal PCs;
    • A narrow research question (versus exploratory analyses, which can become unwieldy given that “teaching” an algorithm what is relevant is already difficult); 
    • Additional structured data to be used in conjunction with EHR notes (e.g., ICD-9 codes, prescription claims data).

See our recent blog post for information on the larger role RexDB can play in transforming EHR data into reliable research findings.