Chargemaster: Californian hospital service prices for the uninsured

The App

This is my Insight Data Science (end of 2017 cohort) project: a web app that predicts hospital service prices for the uninsured, powered by data made available by the state of California (OSHPD).

Select the service, enter an address, and radius in miles and the app will predict the current unnegotiated price an uninsured person would pay for hospitals within that radius and display them for those with the 10 lowest prices along with their contact info.


Chargemasters are lists of negotiating starting points for procedures kept by hospitals. Since prices on these lists are used for negotiating, they tend to start higher than what is eventually billed to insurance providers. Additionally, insurance usually covers a certain percentage of what the healthcare provider ultimately charges, leaving the rest for the patient to pay. Without someone to negotiate on their behalf, the uninsured often end up paying even more than would be charged to the insurance company. Having access to current negotiating starting prices might be helpful for comparison shopping and even negotiating with healthcare providers.

How It Started

Insight Data Science interviews candidates with Ph.Ds and accepts those that already have most of the skills needed to work as a data scientist. These Insight Fellows either come up with their own project or work on one for a company.

I was accepted into the end-of-year 2017 cohort, discovered hospital chargemaster data provided by the then Office of Statewide Health Planning and Development (OSHPD) and, despite having no website experience, came up with the idea, scraped and cleaned the data, and implemented the app in a couple of weeks.

How It Works

Currently, the app has an HTML/CSS front-end linked to a Python and FastAPI backend via Jinja2 templating. The Python code preloads pre-trained model (i.e. linear regression or ARIMA) weights for each hospital and service combination from a SQLite3 file into a Pandas dataframe. The address is converted to lattitude/longitude coordinates using geopy. These lat/long coordinates are used to filter the data to that for hospitals in the desired radius. The expected current chargemaster prices are calculated using the current timestamp and model weights and the results are output for hospitals with the cheapest prices.

Originally, the models were trained upon each user submission and accessed the original data in a PostgresSQL database. To save computation and speed up the app, I pretrained the models. I chose SQLite because there are only about 9k rows and 11 columns in the pre-trained model data and it's a lot lighter than Postgres.

Please Don't Trust It :(

The latest data that powers this app is from 2016. A lot has happened in the last 6 years and I suspect many (most?) of these prices are much higher. Additionally, the data was very messy, its totally possible I made a mistake, and it was so sparse I used a couple of simple high bias models. The main purpose of this app is to demonstrate my ability to create a live app that uses machine learning, even if the ML itself is a bit lame ;)