On 2009-06-12 10:23:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
Somebody on the R-help mailing list asked how to get Rmpi working on his Fedora Linux machine so he could do high-performance computing on a cluster of machines (or a single multicore machine) using the R statistical computing and analysis platform. Since it is unusually painful to get working, I might as well copy the instructions here.
The problem is the configuration file configure.ac which is, unfortunately, completely brain-damaged with hard-coded assumptions about which subdirectories should contain header and library files and no way of overriding it.
First install the openmpi libraries using:
yum install openmpi openmpi-devel openmpi-libs
Then download the latest Rmpi package from CRAN and unpack it using tar zxvf Rmpi_0.5-7.tar.gz. Go to the new Rmpi directory and replace the file configure.ac with the one below (for a x86_64 system; for 32 bit you probably need to change -64 to -32):
Process this file with autoconf to produce a configure script.
AC_INIT(DESCRIPTION)
AC_PROG_CC
MPI_LIBS=`pkg-config --libs openmpi-1.3.1-gcc-64`
MPI_INCLUDE=`pkg-config --cflags openmpi-1.3.1-gcc-64`
MPITYPE="OPENMPI"
MPI_DEPS="-DMPI2"
AC_CHECK_LIB(util, openpty, [ MPI_LIBS="$MPI_LIBS -lutil" ])
AC_CHECK_LIB(pthread, main, [ MPI_LIBS="$MPI_LIBS -lpthread" ])
PKG_LIBS="${MPI_LIBS} -fPIC"
PKG_CPPFLAGS="${MPI_INCLUDE} ${MPI_DEPS} -D${MPITYPE} -fPIC"
AC_SUBST(PKG_LIBS)
AC_SUBST(PKG_CPPFLAGS)
AC_SUBST(DEFS)
AC_OUTPUT(src/Makevars)
The number 1.3.1 may change in future releases of Fedora: see /usr/lib64/pkgconfig/openmpi-*.pc for the current value.
Still in the Rmpi directory do the following in your shell:
autoconf cd .. tar zcvf Rmpi_0.5-7-F11.tar.gz Rmpi R CMD INSTALL Rmpi_0.5-7-F11.tar.gz
Now it should be working in R:
> library("Rmpi")
> mpi.spawn.Rslaves(nslaves=2)
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: server
slave1 (rank 1, comm 1) of size 3 is running on: server
slave2 (rank 2, comm 1) of size 3 is running on: server
> x <- c(10,20)
> mpi.apply(x,runif)
[[1]]
[1] 0.25142616 0.93505554 0.03162852 0.71783194 0.35916139 0.85082154
[7] 0.35404191 0.14221315 0.60063773 0.71805190
[[2]]
[1] 0.84157864 0.63481773 0.38217188 0.67839089 0.27827728 0.35429266
[7] 0.04898744 0.96601584 0.25687905 0.77381186 0.69011927 0.37391028
[13] 0.19017369 0.51196594 0.51970563 0.15791524 0.21358237 0.69642478
[19] 0.12690207 0.44177656
Painful.
On 2010-03-08 14:46:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
I needed a fast way of eliminating observed values with zero variance from large data sets using the R statistical computing and analysis platform. In other words, I want to find the columns in a data frame that has zero variance. And as fast as possible, because my data sets are large, many, and changing fast. The final result surprised me a little.
Read more (~501 words).
On 2009-08-17 09:18:00, Allan Engelhardt wrote in CYBAEA Journal:
We knew the potential existed already, of course. Mobile devices in the USA generates some 600 billion transactions per day, each tagged with the location and time. Jeff Jonas: Every call, text message, email and data transfer handled by your mobile device creates a transaction with your space-time coordinate[...].
The mobile operators have this data, of course. We all know this (especially here where we have been using some of it for social network analysis). No real surprises here, except perhaps in the volumes.
But did you know that the operators are sharing your data? What is new, at least to me, is that this data is being provided to third parties that are leveraging specially designed analytics to make sense of our space-time-travel data.
Read more (~449 words, 1 comments).
On 2009-07-27 19:38:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
O'Reilly's recent publication Beautiful Data has a chapter by Jeff Jonas which is enough reason in itself for me to recommend it. The chapter, Data Finds Data, is also available as a PDF download.
Read more (~66 words).
On 2009-07-22 13:37:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
This is by far the best description of why traditional parallel databases (like Teradata, Greenplum et al.) is a evolutionary dead end. But much more than a theoretical discussion, they have built a solution which they call HadoopDB. It is based on Hadoop, PostgreSQL, and Hive and is completely Open Source. Alternative, column-based, backends to PostgreSQL are being implemented now. Read: Announcing release of HadoopDB.
Read more (~83 words).
On 2009-07-22 06:59:00, Allan Engelhardt wrote in CYBAEA Journal:
The nice people at Velocity has released The B2B Content Marketing Workbook. It is behind a registration wall which means we wouldn’t normally recommend it but you can just type junk in the fields if you are not comfortable with giving your personal details to a marketing agency. (Think about it....) If you are relatively new in the B2B world, say having joined a professional services or consulting organization, you may find this one useful.
Read more (~263 words).
Join the discussion
Installing Rmpi on centos
I tried the same recipe on a centos 5.3 machine, and it installs fine, but when I try to use it, I get this:
> library(Rmpi)
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared library '/usr/lib64/R/library/Rmpi/libs/Rmpi.so':
libmpi.so.0: cannot open shared object file: No such file or directory
Error in library(Rmpi) : .First.lib failed for 'Rmpi'
Error in dyn.unload(file.path(libpath, "libs", paste("Rmpi", .Platform$dynlib.ext, :
dynamic/shared library '/usr/lib64/R/library/Rmpi/libs/Rmpi.so' was not loaded
Any idea why?
Able to install unedited Rmpi library
I was able to install the unedited Rmpi library.
R CMD INSTALL Rmpi_0.5-7.tar.gz
R
>mpi.spawn.Rslaves(nslaves=2)
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: localhost
slave1 (rank 1, comm 1) of size 3 is running on: localhost
slave2 (rank 2, comm 1) of size 3 is running on: localhost
I think the main problem was getting openMPI installed correctly.
Thanks Again.
Still having problems
I have attempted the fix you have graciously created for Rmpi, but I am running into a problem.
When I run the command
R CMD INSTALL Rmpi-0.5-7-F11.tar.gz
I get the error.
ERROR: cannot extract package from ‘Rmpi-0.5-7-F11.tar.gz’
I used gedit to open the configure.ac file and replace it contents. I also deleted the residual configure.ac~ file left after editing. Otherwise I followed the advice exactly. I have tried several times, each time giving the same result.
Any Advice?