On 2009-03-25 20:59:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
In this entry in a small series of tips for the use of the R statistical analysis and computing tool, we look at how to keep your addon packages up-to-date.
One of the great strengths of R is the many packages available. All the new approaches, as well as some of the best implementations of your old favorites are there. But it can also be a little daunting, and so the CRAN task views are often the best way to get started and download a reasonable “bundle” of packages for your analysis.
First we need a place to store the packages. On Linux (and other Unix-like systems) I use the file ~/.Renviron to set the R_LIBS variable to where I want the files:
## R environment R_LIBS="~/R"
On Windows, I set the same variable for the user account. Don’t forget to create the directory.
Now your can start R and install the CRAN task view package:
> install.packages("ctv")
Then I have a few things in my ~/.Rprofile startup file. The previous command probably prompted you for a download mirror which is annoying, so let’s exit R and edit the startup file to contain:
## Default CRAN mirror
local({r <- getOption("repos"); r["CRAN"] <- "http://cran.uk.r-project.org"; options(repos=r)})
## Libraries
require("utils", quietly=TRUE)
require("ctv", quietly=TRUE)
Then I define three functions. The first is to install the views I need. I like to try new things, so my list is long. Edit it to suit your needs:
install.myviews <- function() {
require("ctv", quietly=TRUE)
my.views = c("Bayesian", "Cluster", "Graphics", "gR", "HighPerformanceComputing", "MachineLearning", "Multivariate", "NaturalLanguageProcessing", "Robust", "SocialSciences", "Spatial", "Survival", "TimeSeries")
install.views(views=my.views, lib=Sys.getenv("R_LIBS"), dependencies=c("Depends","Suggests"))
}
Try it out! Save the file, start R, and type install.myviews() at the prompt. If your list is as long as mine, then this may take some time and you may get some warnings and errors. We might add a tip on these later, but the main reason for the errors is probably that you are missing the development files for external libraries (or that R just can’t find it).
Now that we have finally got them, we need to make sure they are up-to-date. I add two functions to ~/.Rprofile:
update.local <- function() {
update.packages(lib.loc=Sys.getenv("R_LIBS"), ask=FALSE)
}
update.myviews <- function() {
require("ctv", quietly=TRUE)
my.views = c("Bayesian", "Cluster", "Graphics", "gR", "HighPerformanceComputing", "MachineLearning", "Multivariate", "NaturalLanguageProcessing", "Robust", "SocialSciences", "Spatial", "Survival", "TimeSeries")
update.views(views=my.views, lib.loc=Sys.getenv("R_LIBS"))
}
The first allows me to easily update all my locally installed libraries (not just these installed from views). The second updates my views which is useful when the view definitions change (rarely, but it happens as the recommended packages evolve).
Now I can of course update from the R command prompt using update.local() or update.myviews(). But that is not the main benefit. I can now update directly from the shell command line using commands like:
echo "update.local()" > /tmp/r.cmd R CMD BATCH /tmp/r.cmd /tmp/r.out
The beauty of this is that I can add it to my crontab(5) and have it run automatically every night or every week as I feel I need it. This way I always have the latest versions installed.
On 2010-07-13 07:47:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
I am not sure apeescape’s ggplot2 area plot with intensity colouring is really the best way of presenting the information, but it had me intrigued enough to replicate it using base R graphics.
The key technique is to draw a gradient line which R does not support natively so we have to roll our own code for that. Unfortunately, lines(..., type="l") does not recycle the colour col= argument, so we end up with rather more loops than I thought would be necessary.
We also get a nice opportunity to use the under-appreciated read.fwf function.
Read more (~535 words).
On 2010-06-22 11:45:00, Allan Engelhardt wrote in CYBAEA Journal:
We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is scary.
We now re-do the analysis four years later and, just because we can, we are using the leading companies of the London stock exchange instead of the largest American companies.
The results still hold. We called it the 3/2 rule: treble the number of workers and you halve their individual productivity. Large companies with ten times the number of employees are ¼ as productive as their smaller competitors.
Employee productivity is a big issue. If all the FTSE-100 companies achieved their average profits per employee, then the index would generate almost £1 trn of additional net profits for the economy.
Read more (~245 words).
On 2010-06-22 11:20:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is mildly scary.
We revisit the analysis for the FTSE-100 constituent companies and find that the relation still holds four years later and across a continent.
Read more (~763 words, 5 comments).
On 2010-06-17 09:05:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
Following on from my previous post about improving performance of R by linking with optimized linear algebra libraries, I thought it would be useful to try out the five benchmarks Revolutions Analytics have on their Revolutionary Performance pages.
Read more (~300 words, 2 comments).
On 2010-06-15 10:21:00, Allan Engelhardt wrote in CYBAEA Data and Analysis:
Can we make our analysis using the R statistical computing and analysis platform run faster? Usually the answer is yes, and the best way is to improve your algorithm and variable selection.
But recently David Smith was suggesting that a big benefit of their (commercial) version of R was that it was linked to a to a better linear algebra library. So I decided to investigate.
The quick summary is that it only really makes a difference for fairly artificial benchmark tests. For “normal” work you are unlikely to see a difference most of the time.
Read more (~934 words, 1 comments).
Join the discussion
There are no comments yet. Be the first to comment.