Riders for Health offers medical transportation and logistics services in Africa, especially in rural areas, using fleets of motorcycles to handle rougher terrains. However, their riders don’t just get to be awesome riding motorcycles around, they also have to spend considerable time keeping careful paper logbooks of when, where and what they collect and transport. To assist with this, we built ROCR. ROCR (Riders for Health OCR) is a prototype automated form processing and handwriting prediction tool to aid community health workers with more efficient digital data collection. The ultimate goal of the project is to let Riders for Health do what they do best—manage the logistics of getting into remote healthcare outposts and picking up medical samples—by minimizing the time spent in the field writing in paper logbooks. ROCR extracts and predicts the key handwritten information from regularly used forms. The figure below shows an example of this process. A photo of the World Health Organization case report for confirmed COVID-19 cases is passed through the ROCR tool that extracts and predicts two key fields: the reporting country and unique case identifier.
This task presents several challenges: How do you find the key field you are searching for in each new image? How do you predict handwriting (which is much more difficult than computer-generated text)? Can you use domain knowledge of the form’s use case to improve the predictions? To address these challenges, ROCR processes forms using the following steps:
- Image Alignment – Warp the input form image to align with the blank template form
- OCR/Handwriting Prediction – Predict the form’s text
- Key Field Extraction – Find OCR predictions text in regions of interest
- Post Processing – Apply domain knowledge to improve OCR prediction outputs
ROCR uses off-the-shelf OCR engines, Google Cloud Vision and Azure From Recognizer, but needs “glue” code (e.g. proper image alignment, key field extraction, OCR cloud response handling and OCR engine post processing) to use them for the ROCR tool and get reasonable results.
Like all interesting projects, this work was challenging and we learned a lot. Some of the key technical takeaways about ROCR and its implementation are:
- ROCR uses off-the-shelf tools like Google Cloud Vision and Azure Form Recognizer and consequently benefits from a wealth of prior research and expertise in handwriting recognition. While we experimented with custom OCR models, we ultimately found these cloud solutions worked best.
- Image alignment is hard. We perform image alignment here with feature matching. We had to do a fair amount of tuning to get it to work and found that calibrating the feature match outlier detection (to a threshold much greater than the traditional parameters) gave us the most improvement. ROCR also features bad image alignment detection that flags to the user when human intervention is required.
- We improved ROCR’s performance by applying post processing techniques to OCR model outputs. They allow ROCR to leverage domain knowledge of a form or particular OCR engine issues to improve accuracy. One technique applied in ROCR is to compare the OCR prediction to a list of known possible values in order to determine the most likely output. In the WHO case report for confirmed COVID-19 cases example above, we know that the reporting country has to be one of the 194 member countries. So, if the OCR prediction is “United Bingdom”, we can compare that result to a list of all WHO member countries and update the output to be “United Kingdom”.
ROCR was built for Riders for Health: an international nonprofit that offers medical transportation and logistics services in Africa. It’s difficult to overstate the challenge of bringing healthcare to rural villages in developing countries. Riders for Health overcomes these challenges mostly with a fleet of motorcycles capable of handling rougher terrains effectively and efficiently. Working with Riders for Health is basically like working with superheroes riding motorcycles, with kindness and humility to boot.
This work is driven by DataKind, a non-profit committed to the application of data science for social good. DataKind brings together pro-bono data scientists and social impact organizations who benefit from their time and expertise. Alex Fried, Karry Lu, Amy Roberts, Alexander Sack and Anna Dixon formed the team that built and tested the ROCR tool. ROCR is part of a portfolio of projects designed to impact community health workers.