On 2009-06-12 10:23:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
Somebody on the R-help mailing list asked how to get Rmpi working on his Fedora Linux machine so he could do high-performance computing on a cluster of machines (or a single multicore machine) using the R statistical computing and analysis platform. Since it is unusually painful to get working, I might as well copy the instructions here.
First install the openmpi libraries using:
yum install openmpi openmpi-devel openmpi-libs
The default installation on Fedora still doesn’t quite work, so you need to execute the following command as root (only once is required, after installation of the package):
ldconfig /usr/lib64/openmpi/lib/
You are not quite done: for R to work right with the libraries, you need to modify the LD_LIBRARY_PATH environment variable to include the path to the Open MPI libraries. I have the following in my ~/.bash_profile:
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/usr/lib64/openmpi/lib/"
Edit your file to contain the same, and execute that line at the command prompt and you are ready to continue.
Rmpi package for R
Now that your Open MPI libraries are set up, and what you do next depends on what version of Rmpi you are installing. Most likely you are installing the latest version in which case the following section applies. The instructions for older versions are retained in a later section for reference.
Rmpi package
Make sure you have executed the ldconfig command and set the LD_LIBRARY_PATH environment variables as described in the previous section before you continue.
Since at least version 0.5-8 of the Rmpi library you can install it from the R command line after you have fixed the Open MPI install. At the R prompt do:
install.packages("Rmpi",
configure.args =
c("--with-Rmpi-include=/usr/include/openmpi-x86_64/",
"--with-Rmpi-libpath=/usr/lib64/openmpi/lib/",
"--with-Rmpi-type=OPENMPI"))
It should work and install OK. This is obviously quite a mouthful to remember, but help is at hand through the options() mechanism in R. In your ~/.Rprofile you can add something like:
local({
my.configure.args <-
list("Rmpi" =
c("--with-Rmpi-include=/usr/include/openmpi-x86_64/",
"--with-Rmpi-libpath=/usr/lib64/openmpi/lib/",
"--with-Rmpi-type=OPENMPI"),
## Not needed for Rmpi but shown to illustrate the format
"ncdf" =
c("-with-netcdf_incdir=/usr/include/netcdf",
"-with-netcdf_libdir=/usr/lib64/")
);
options("configure.args" = my.configure.args)
})
Then you can just type install.packages("Rmpi") at the R command prompt to install the package.
Rmpi package
The problem is the configuration file configure.ac which is, unfortunately, completely brain-damaged with hard-coded assumptions about which subdirectories should contain header and library files and no way of overriding it.
Download the latest Rmpi package from CRAN and unpack it using tar zxvf Rmpi_0.5-7.tar.gz. Go to the new Rmpi directory and replace the file configure.ac with the one below (for a x86_64 system; for 32 bit you probably need to change -64 to -32):
Process this file with autoconf to produce a configure script.
AC_INIT(DESCRIPTION)
AC_PROG_CC
MPI_LIBS=`pkg-config --libs openmpi-1.3.1-gcc-64`
MPI_INCLUDE=`pkg-config --cflags openmpi-1.3.1-gcc-64`
MPITYPE="OPENMPI"
MPI_DEPS="-DMPI2"
AC_CHECK_LIB(util, openpty, [ MPI_LIBS="$MPI_LIBS -lutil" ])
AC_CHECK_LIB(pthread, main, [ MPI_LIBS="$MPI_LIBS -lpthread" ])
PKG_LIBS="${MPI_LIBS} -fPIC"
PKG_CPPFLAGS="${MPI_INCLUDE} ${MPI_DEPS} -D${MPITYPE} -fPIC"
AC_SUBST(PKG_LIBS)
AC_SUBST(PKG_CPPFLAGS)
AC_SUBST(DEFS)
AC_OUTPUT(src/Makevars)
The number 1.3.1 may change in future releases of Fedora: see /usr/lib64/pkgconfig/openmpi-*.pc for the current value.
Still in the Rmpi directory do the following in your shell:
autoconf cd .. tar zcvf Rmpi_0.5-7-F11.tar.gz Rmpi R CMD INSTALL Rmpi_0.5-7-F11.tar.gz
Now Rmpi should be working in R:
> library("Rmpi")
> mpi.spawn.Rslaves(nslaves=2)
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: server
slave1 (rank 1, comm 1) of size 3 is running on: server
slave2 (rank 2, comm 1) of size 3 is running on: server
> x <- c(10,20)
> mpi.apply(x,runif)
[[1]]
[1] 0.25142616 0.93505554 0.03162852 0.71783194 0.35916139 0.85082154
[7] 0.35404191 0.14221315 0.60063773 0.71805190
[[2]]
[1] 0.84157864 0.63481773 0.38217188 0.67839089 0.27827728 0.35429266
[7] 0.04898744 0.96601584 0.25687905 0.77381186 0.69011927 0.37391028
[13] 0.19017369 0.51196594 0.51970563 0.15791524 0.21358237 0.69642478
[19] 0.12690207 0.44177656
On 2010-07-13 07:47:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
I am not sure apeescape’s ggplot2 area plot with intensity colouring is really the best way of presenting the information, but it had me intrigued enough to replicate it using base R graphics.
The key technique is to draw a gradient line which R does not support natively so we have to roll our own code for that. Unfortunately, lines(..., type="l") does not recycle the colour col= argument, so we end up with rather more loops than I thought would be necessary.
We also get a nice opportunity to use the under-appreciated read.fwf function.
Read more (~535 words).
On 2010-06-22 11:45:00, Allan Engelhardt wrote in CYBAEA Journal:
We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is scary.
We now re-do the analysis four years later and, just because we can, we are using the leading companies of the London stock exchange instead of the largest American companies.
The results still hold. We called it the 3/2 rule: treble the number of workers and you halve their individual productivity. Large companies with ten times the number of employees are ¼ as productive as their smaller competitors.
Employee productivity is a big issue. If all the FTSE-100 companies achieved their average profits per employee, then the index would generate almost £1 trn of additional net profits for the economy.
Read more (~245 words).
On 2010-06-22 11:20:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is mildly scary.
We revisit the analysis for the FTSE-100 constituent companies and find that the relation still holds four years later and across a continent.
Read more (~763 words, 5 comments).
On 2010-06-17 09:05:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
Following on from my previous post about improving performance of R by linking with optimized linear algebra libraries, I thought it would be useful to try out the five benchmarks Revolutions Analytics have on their Revolutionary Performance pages.
Read more (~300 words, 2 comments).
On 2010-06-15 10:21:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
Can we make our analysis using the R statistical computing and analysis platform run faster? Usually the answer is yes, and the best way is to improve your algorithm and variable selection.
But recently David Smith was suggesting that a big benefit of their (commercial) version of R was that it was linked to a to a better linear algebra library. So I decided to investigate.
The quick summary is that it only really makes a difference for fairly artificial benchmark tests. For “normal” work you are unlikely to see a difference most of the time.
Read more (~934 words, 1 comments).
Join the discussion
many thanks
@Allan: Thanks for the tip! That works well on a CentOS 5.4 system too.
On my machine, OpenMPI is installed in the path /opt/open-mpi/, and the directory contains multiple versions with different compiler types. They are under subdirectories named 'tcp-gnu41','tcp-gnu42', 'tcp-gnu43', etc. So I chose the gnu41 version and in R I typed the command as Allan suggested :
> install.packages('Rmpi',repos='cran.r-project.org', configure.args=c("--with-Rmpi-include=/opt/open-mpi/tcp-gnu41/include/", "--with-Rmpi-libpath=/opt/open-mpi/tcp-gnu41/lib/", "--with-Rmpi-type=OPENMPI"))
Bin, there we go :) Thanks also to everybody that shares their information.
Simpler version
I have successfully built the latest version of Rmpi by adding the library to LD_LIBRARY_PATH in ~/.Renviron
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib64/openmpi/lib/
(or, equivalently, running interactively ‘Sys.setenv("LD_LIBRARY_PATH" = paste(Sys.getenv("LD_LIBRARY_PATH"), "/usr/lib64/openmpi/lib/", sep=":"))’)
and then from the R session running:
install.packages("Rmpi", configure.args=c("--with-Rmpi-include=/usr/include/openmpi-x86_64/", "--with-Rmpi-libpath=/usr/lib64/openmpi/lib/", "--with-Rmpi-type=OPENMPI"))
My thanks to the package maintainers.
Errors loading library Rmpi in CentOS
@CentOS
It should be a ldconfig issue.
Try starting R with
LD_LIBRARY_PATH=/path/to/mpi/libraries/ R --no-save -q
as a temporary workaround.
Installing Rmpi on centos
I tried the same recipe on a centos 5.3 machine, and it installs fine, but when I try to use it, I get this:
> library(Rmpi)
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared library '/usr/lib64/R/library/Rmpi/libs/Rmpi.so':
libmpi.so.0: cannot open shared object file: No such file or directory
Error in library(Rmpi) : .First.lib failed for 'Rmpi'
Error in dyn.unload(file.path(libpath, "libs", paste("Rmpi", .Platform$dynlib.ext, :
dynamic/shared library '/usr/lib64/R/library/Rmpi/libs/Rmpi.so' was not loaded
Any idea why?
Able to install unedited Rmpi library
I was able to install the unedited Rmpi library.
R CMD INSTALL Rmpi_0.5-7.tar.gz
R
>mpi.spawn.Rslaves(nslaves=2)
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: localhost
slave1 (rank 1, comm 1) of size 3 is running on: localhost
slave2 (rank 2, comm 1) of size 3 is running on: localhost
I think the main problem was getting openMPI installed correctly.
Thanks Again.
Still having problems
I have attempted the fix you have graciously created for Rmpi, but I am running into a problem.
When I run the command
R CMD INSTALL Rmpi-0.5-7-F11.tar.gz
I get the error.
ERROR: cannot extract package from ‘Rmpi-0.5-7-F11.tar.gz’
I used gedit to open the configure.ac file and replace it contents. I also deleted the residual configure.ac~ file left after editing. Otherwise I followed the advice exactly. I have tried several times, each time giving the same result.
Any Advice?