3 Ways to Improve Conversion of PDF and Image Files

  • Written by Sam Edens

edens samIt can be argued that a large percentage of hospitals and medical facilities are using technology decades behind present-day patient and billing management systems. Agency owners are reminded of that debate every month when their new business file is once again delivered in a PDF file or even worse, a scanned image format. The painstaking process of manually entering every account and patient record resumes. Inefficiencies aside, the element of human error is ever-present and the probability of it is very high. It will not take much for someone to mistype a patient account number, procedure code, patient balance or some other critical data element. When asking your client for an electronically formatted final, you are left with a response something like, “we are working on it” or simply “no.” What else can be done?

A simple Internet search will result in a seemingly endless list of options. It’s not likely you will find a solution that works or more importantly, works for your use case. Essentially, the options fall into three different categories, which are listed and detailed below. If you are not lucky enough to stumble upon the perfect solution, consult with your end users and IT staff or vendor to understand the process of getting the data from the unstructured format into the collection software. This type of conversation will help direct you to the most logical path.


The most ideal option is working directly with the PDF or image file to electronically extract the data. This method does not mean the task can be completed without additional software. Rather, it means working with the raw file (source of the data). One positive attribute of the PDF or image file is that is has structure. This is obvious because it most cases, you can see it. The picture is clearly organized into columns and rows, usually with headings and summaries as well. Copy and paste is a common mistake. Any structure that was there is immediately corrupted. This lack of structure makes it tough to make sense of the data and you may have unknowingly lost some data during the transition.


The first option assumes you discovered some out of the box software that accurately reads the image file you are working with. There is a second option very much like the first, but involves a completely custom software application to understand and parse the image file. This is going to require a skilled IT staff or services from an outside vendor. With careful analysis of the image file to identify regular patterns, splits, and trends, technologies such as Java or Python can be used to identify and accurately extract the data fields into a more structured and workable format. Many custom developed applications will output to HTML or XML. This is often easier than trying to move data directly from the image file to Excel or CSV. This output of the data becomes the new source for further processing or for electronically loading into your collection software.


A more extreme option involves utilizing features of next generation tools which may otherwise be of no use to you. (If these tools are of use to you, it is likely in another division of the business so if you are part of a large organization, check with other departments because they may have something you can utilize.) Big Data technology, like Hadoop, is designed for extremely large datasets and robust environments, uncommon in most agencies, but may have features for reducing image files to understand and extract the data fields. Similarly, some Business Intelligence (BI) tools have features for reading PDF or image file sources, identifying the structure, and accurately extracting the data. For example, one of the latest releases of Tableau contains such features. If you opt for this route, there is a good chance you will find some other cool features and potential use cases for the Big Data or BI technology.

Finding a way to electronically manage PDF and image files is critical. It greatly improves operational efficiency and reduces the risk of human error. An element I haven’t even touched on involves culture and the work environment. In probably all cases, those performing the manual work to key new account and patient information into the system are likely not doing the work they were hired to do and are not doing work that is more rewarding, both personally and for the company. I hope you make it a priority to create an exciting and more productive environment through the elimination of manual processes.

Sam Edens has been with Emprise Technologies since 2006 and is currently serving as Vice President. Prior to his time with Emprise, Sam designed and developed performance and flow management software for UPS.