The US$ 3 million Heritage Health Price competition is on so we take a look at how to get started using the R statistical computing and analysis platform.
We do not have the full set of data yet, so this is a simple warm-up
session to predict the days in hospital in year 2 based on the year 1
Obviously you need to have R installed,
and you should also have signed up for the competition (be sure to
read the terms carefully) and downloaded and extracted the release 1
Let’s load the data into R and do some basic housekeeping:
We will need a function to score our predictions p against the actual
values a. The formula is on the evaluation page and we implement it
The simplest benchmarks
The simplest models don’t really model at all: they just use the
average and are simple benchmarks.
Simple single-variable linear models
OK, a model that doesn’t use past data isn’t much of a model, so let’s
improve on that:
Let the competition begin.