Case Study: Document Recognition

A nationwide workers’ compensation outcomes provider founded in the early 90s, it is an industry leader in solving catastrophic and complex healthcare challenges. With growth and expansion, the company expressed an interest in the automation of workflows. By leaning into the capabilities of Artificial Intelligence (AI), the workers’ compensation provider partnered with NLP Logix to automate the matching identifiers of patients, such as name and/or date of birth, to the other documents submitted by healthcare providers for specific cases.

On average, the task of matching documents was processing approximately 10,000+ records per month. The process involved the submission of various medical patient forms, each of which needed to be directed to one of three possible destinations based on the patient. The critical task was determining the appropriate routing for each form. Within this process, an employee received either a completed CMS-1500 or a UB04 document, health care financial forms, which can be either typed or handwritten.

Upon document receipt, a systematic approach processing methodology was followed:

Identifier of a patient
Cross-reference patient with a predefined list {of where the patient is routed to}
Ascertain which one of three categories the patient falls into

In cases where the patient’s name was not found on the list, an employee was to make sense of the form and take action correctly. It was the primary responsibility of the employee to ensure the accurate routing of documents based on the patient’s information.

The primary automation goal was to automate the process while keeping humans in the loop to oversee the workflow; thus, allowing employees to expand their knowledge to other areas of the business.

Gnarly Problem

One of this project’s most significant challenges was false positives. These were sizable concerns as patient-sensitive medical documents cannot be misdirected. The AI process developed must process and direct documents to the proper destination only if there is enough confidence that the patient match is correct.

Strategic Approach

NLP Logix was provided years of historical CMS-1500 and UB04 forms. These included well-scanned documents with little to no flaws, as well as documents with blurred backgrounds, documents with handwriting, and other complicating conditions.

AI-driven Optical Character Recognition (OCR) document processing was conducted in the beginning employing Azure tools. OCR technology was used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. By analyzing the shapes, patterns, and features of the characters in these documents OCR was able to translate them into machine-readable text. Utilizing the given years of records, necessary fields were identified within documents for AI processing. Fuzzy matching was completed on the names found in submitted forms to the names that exist in internal records. Fuzzy matching is a technique used in computer science and data analysis to compare two or more strings or sets of data and determine their similarity rather than exact equality. Fuzzy matching is particularly useful when dealing with data that may contain errors, typos, abbreviations, or variations in spelling and when you want to find approximate matches between records. The output from this technical process of acquiring data are JSON ‘ugly’ raw data objects files, at which point NLP Logix puts in the keyboard time into transforming this unusable data into a desirable outcome.

To prevent false matches, a confidence score was assigned each time a match on form fields was made. The confidence score for each document processed determines the outcome. Here is what this looks like broken down: if there are exact matches between patient-filled forms and the internal destination records, the exact name and the date of birth match perfectly, the confidence score will be high. Thus, the form will be forwarded to the right destination. Separately, let’s say a form was handwritten and the process, looking for a match for a fictional patient John Smith. However, John Smath (Smath not Smith) was only found on internal destination records, not John Smith. The match has a high confidence score, but will not be one hundred percent accurate because often times handwriting is difficult to decipher, and often misread. For validation, as the process continues forward, checks continue for additional matching fields between the form and internal records. Is the address for the patient a match, is the zip code a match, or is the state a match? Each sequential match has an effect on the confidence score for correctly matching the patient to the destination reference. The confidence score directs the process. If there is a low confidence score the process continues to read and match additional patient info fields to boost the confidence score. If there is a high confidence score then it will direct the document to the correct location. If the confidence score is too low after completing the process, documents are directed to the “exceptions” for human review.

Operational Success

An Azure-based form recognizer, layered with a confidence score custom logic, allowed for the identification of form type (CMS-1500 or UB04) and the subsequent extraction of data from the form into a database structure. Utilizing the NLP Logix document image processing solution, the workers’ compensation outcomes provider was able to maintain the privacy requirement and increase the document processing capability to 20,000+ per month. The initial routing success rate was above 90%. Due to the success, the client expanded the use case of the solution by routing different form types with a continued routing success rate above 85%.

Learn more about: Our Services – NLP Logix – AI Realized