Blog posts from CYBAEA

R tips: Use read.table instead of strsplit to split a text column into multiple columns

29 May 2009

Someone on the R-help mailing list had a data frame with a column containing IP addresses in quad-dot format (e.g. He wanted to sort by this column and I proposed a solution involving strsplit. But Peter Dalgaard comes up with a much nicer method using read.table on a textConnection object:

Read more (~100 words)

Do social networks influence purchases?

22 May 2009

Havard Business School has an interesting study titled Do Friends Influence Purchases in a Social Network?. I would like to get my hands on the raw data (which is from the Korean social site Cyworld), but the outline conclusions seems plausible:

Read more (~170 words)

22 May 2009

I am always on the lookout for useful data sources for training in statistics, so I am excited that has opened for business. The purpose of is to increase public access to high value, machine readable datasets generated by the US Government.

Read more (~90 words)

KDD Cup 2009

12 May 2009

The results from the KDD Cup 2009 are both interesting and fundamentally not interesting. For this public data mining challenge Orange, the mobile telecommunications company, provided anonymous data sets on mobile customers: 50,000 records each of training and testing data with 15,000 variables. (The data set are still available for download and there are also smaller data sets with only 230 variables.) The competition was to provide the best models for churn, cross-sell (“appetency”), and up-sell.

Read more (~440 words)

SNA with R: Loading large networks using the igraph library

6 May 2009

We are interested in Social Network Analysis using the statistical analysis and computing platform R. The documentation for R is voluminous but typically not very good, so this entry is part of a series where we document what we learn as we explore the tool and the packages.

Read more (~620 words)

SNA with R: Loading your network data in statnet

1 April 2009

We are interested in Social Network Analysis using the statistical analysis and computing platform R. As usual with R, the documentation is pretty bad, so this series collects our notes as we learn more about the available packages and how they work. We use here the statnet group of packages, which seems to be the most comprehensive and most actively maintained network analysis packages.

Read more (~1170 words)

R tips: Swapping columns in a matrix

31 March 2009

Using R, the statistical analysis and computing platform, swapping two columns in a matrix is really easy: m[ , c(1,2)] <- m[ , c(2,1)].

Read more (~70 words)

R tips: Eliminating the “save workspace image” prompt on exit

26 March 2009

When using R, the statistical analysis and computing platform, I find it really annoying that it always prompts to save the workspace when I exit. This is how I turn it off.

Read more (~210 words)

R tips: Keep your packages up-to-date

25 March 2009

In this entry in a small series of tips for the use of the R statistical analysis and computing tool, we look at how to keep your addon packages up-to-date.

Read more (~510 words)

The financial crisis and physicists

20 March 2009

The financial crisis is all my fault. Or so David Smith from our friends REvolution seems to suggest in his post Physicists, models, and the credit crisis:

Read more (~950 words)