The results from the KDD Cup 2009 challenge (which we wrote about before) are in, and the winner of the slow challenge used the R statistical computing and analysis platform for their winning submission.
Someone on the R-help mailing list had a data frame with a column containing IP addresses in quad-dot format (e.g. 126.96.36.199). He wanted to sort by this column and I proposed a solution involving
strsplit. But Peter Dalgaard comes up with a much nicer method using
read.table on a
Havard Business School has an interesting study titled Do Friends Influence Purchases in a Social Network?. I would like to get my hands on the raw data (which is from the Korean social site Cyworld), but the outline conclusions seems plausible:
I am always on the lookout for useful data sources for training in statistics, so I am excited that data.gov has opened for business. The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the US Government.
The results from the KDD Cup 2009 are both interesting and fundamentally not interesting. For this public data mining challenge Orange, the mobile telecommunications company, provided anonymous data sets on mobile customers: 50,000 records each of training and testing data with 15,000 variables. (The data set are still available for download and there are also smaller data sets with only 230 variables.) The competition was to provide the best models for churn, cross-sell (“appetency”), and up-sell.
We are interested in Social Network Analysis using the statistical analysis and computing platform R. The documentation for R is voluminous but typically not very good, so this entry is part of a series where we document what we learn as we explore the tool and the packages.
We are interested in Social Network Analysis using the statistical analysis and computing platform R. As usual with R, the documentation is pretty bad, so this series collects our notes as we learn more about the available packages and how they work. We use here the statnet group of packages, which seems to be the most comprehensive and most actively maintained network analysis packages.
Using R, the statistical analysis and computing platform, swapping two columns in a matrix is really easy:
m[ , c(1,2)] <- m[ , c(2,1)].
When using R, the statistical analysis and computing platform, I find it really annoying that it always prompts to save the workspace when I exit. This is how I turn it off.
In this entry in a small series of tips for the use of the R statistical analysis and computing tool, we look at how to keep your addon packages up-to-date.