Faster R through better BLAS

Can we make our analysis using the R statistical computing and analysis platform run faster? Usually the answer is yes, and the best way is to improve your algorithm and variable selection.

But recently David Smith was suggesting that a big benefit of their (commercial) version of R was that it was linked to a to a better linear algebra library. So I decided to investigate.

The quick summary is that it only really makes a difference for fairly artificial benchmark tests. For “normal” work you are unlikely to see a difference most of the time.

The environment

I use R on a 64-bit Fedora 12 Linux system. Fortunately, it is very easy to rebuild R using different libraries on this platform. For the following, I will assume that you have a working rpmbuild environment. The test system has a quad core Intel Xeon E5420 CPU with each core running at 2.50 GHz.

Benchmarks

Benchmarking R is complex. Very complex. But for this simple test we use two tests from the R Benchmarks page: MASS-ex.R and R-benchmark-25.R. The first is a simple benchmark using the examples from the MASS package, and has the advantage that it reflects real-world problems and real-world analysis, albeit small problems and short analysis. The second is a much more artificial example and primarily test matrix operations.

We run the MASS benchmark as:

/usr/bin/time -p R --vanilla CMD BATCH MASS-ex.R /dev/null

While the R-benchmark-25 is simply:

Rscript --vanilla R-benchmark-25.R

For the MASS benchmark we simply capture the real elapsed time while the R benchmark 2.5 provides more detailed output for the three classes of tests (matrix calculation, -functions, and program execution) as well as overall summaries. They are all shown in the table below.

Compiler-optimized R

For the experiments that follow the first thing to do is to grab copies of the source RPMs for R and for ATLAS:

cd ~/rpmbuild/SRPMS
yumdownloader --source atlas R
cd ..

At the time I did this, I got R-2.11.0-1.fc12.src.rpm and atlas-3.8.3-12.fc12.src.rpm. I crank up the level of optimization that I do when building from source so the first thing is to edit ~/.rpmrc to include the line optflags: x86_64 -O3 -march=native -m64 -g. With that in place we can simply do:

rpmbuild --rebuild SRPMS/R-2.11.0-1.fc12.src.rpm  #  Change version numbers as needed
su -c 'rpm -Uhv --force RPMS/x86_64/R*2.11.0-1*.rpm RPMS/x86_64/libRmath*2.11.0-1*.rpm'

We now have a compiler-optimized version of R and we can re-run our tests. It doesn’t make much difference, but that is also good to know.

ATLAS BLAS libraries

Now let’s try linking to the ATLAS BLAS libraries instead. I assume you have them installed (yum install atlas if not) so you can just grab a copy of R-atlas.diff to change the spec file like this:

rpm -ihv SRPMS/R-2.11.0-1.fc12.src.rpm   # Install to your rpmbuild environment
cd SPECS
wget http://static.cybaea.net/files/R-atlas.diff
patch -o R-atlas.spec R.spec R-atlas.diff
cd ..
rpmbuild -bb SPECS/R-atlas.spec
su -c 'rpm -Uhv --force RPMS/x86_64/R*2.11.0-1*.rpm RPMS/x86_64/libRmath*2.11.0-1*.rpm'

You now have a version of R that uses the ATLAS BLAS libraries, so you can re-run the tests. The results are in the table below in the “Optimized R + Standard ATLAS” row.

As expected, the matrix operations from the R-benchmark-25.R runs a lot faster: they complete in about 30-40% of the time, much of which comes from the multi-threading so all four CPU cores are used.

However, for the analysis-heavy code in MASS-ex.R there is little difference. If anything, we see a tiny increase in running time.

Multi-threaded BLAS libraries make no significant difference to real-world analysis problems using R.

Other BLAS libraries

For good measure we also try an optimized version of ATLAS, but it does not make much difference on the x86_64 architecture:

rpmbuild -D "enable_native_atlas 1" --rebuild SRPMS/atlas-3.8.3-12.fc12.src.rpm
su -c 'rpm -Uhv --force RPMS/x86_64/atlas*3.8.3-12*.rpm'

And (only) for completeness, we also try the standard Netlib BLAS and LAPACK libraries (yum install blas lapack) by the same method as the ATLAS library above but with a slightly different change to the SPEC file: R-blas.diff. It performs a little better than vanilla R.

For more information about rebuilding R with different BLAS libraries, see the linear algebra section in the R Installation and Administration manual.

Benchmark results

Benchmark results for various optimizations of R and the BLAS library
R version MASS-ex.R R benchmark 2.5
Real Total time Overall mean Ⅰ. Matrix calc. Ⅱ. Matrix functions Ⅲ. Program.
secsindexsecsindexsecsindex secsindexsecsindexsecsindex
Base install 19.001.0078.491.002.111.002.321.003.861.001.051.00
Optimized R 18.981.0076.110.972.020.962.361.023.460.901.020.97
Optimized R + Netlib BLAS 18.560.9873.220.931.810.862.361.022.410.621.040.99
Optimized R + Standard ATLAS 19.431.0216.740.210.970.460.900.391.040.270.990.95
Optimized R + Optimized ATLAS 19.311.0216.360.210.950.450.840.361.020.261.000.95