ROCR: Turning State-of-the-art OCR into Automated Form Processing

Riders for Health offers medical transportation and logistics services in Africa, especially in rural areas, using fleets of motorcycles to handle rougher terrains. However, their riders don’t just get to be awesome riding motorcycles around, they also have to spend considerable time keeping careful paper logbooks of when, where and what they collect and transport. To assist with this, we built ROCR. ROCR (Riders for Health OCR) is a prototype automated form processing and handwriting Read more…

With great data comes great responsibility

The world seems to be full of great data and as a data scientist, engineer, and researcher, I am in awe of what humankind has accomplished in the data science field. Despite all that we have accomplished, as a human being I started to ask, “are we doing great things with all this great data?” So I set myself on a new and exciting path, looking for ways to give back while still doing what Read more…

Data Behind Beer #1: Does he have a type (of beer)?

My husband and I love craft beer. We enjoy searching for rare and yummy beers, making our own home-brews, discussing the different flavors, and most importantly, drinking craft beers! This blog post is the first of a series where I explore different aspects of that passion. The Data: The dataset is the records from my Colin’s Untappd account. He has graciously volunteered to let me use his data here… thank you hubby! Untappd is a craft Read more…

Happy Reading with Goodreads: Naïve Bayes Classification and Feature Selection

For this blog post, I’m focusing on a friend’s Goodreads data. Goodreads is a social media and rating platform for books. I love fantasy novels and enjoy talking about them with other people even more. I used this user’s data because she has been using the Goodreads platform longer than me (but I’m aiming to catch up!). The Data and Data Cleaning: Goodreads allows users easy access to all their data. Hooray! I loaded the Read more…