Blog posts from CYBAEA

A warning on the R save format

23 August 2011

The save() function in the R platform for statistical computing is very convenient and I suspect many of us use it a lot. But I was recently bitten by a “feature” of the format which meant I could not recover my data.

Read more (~640 words)

Friday quote: the handmaiden and the whore

19 August 2011

Because it is Friday and because we collect quotes, here is one on statistics being the best and worst of disciplines. Which one of the two views are closest to your opinion?

Read more (~170 words)

Spreadsheet errors

20 April 2011

For my sins, I have done more than my fair share of analysis in Excel. I am quite capable of building and maintaining 130Mb spreadsheets (I had a dozen of them for one client). Excel is pretty much installed everywhere, so it is sometimes the only way to get started getting commercial value of the data in the organisation. But I don’t like it and let’s have a look at one reason why. In order not to always pick on Microsoft, we use another application, but you get the same results with Excel.

Read more (~640 words)

Getting started with the Heritage Health Price competition

8 April 2011

The US$ 3 million Heritage Health Price competition is on so we take a look at how to get started using the R statistical computing and analysis platform.

Read more (~630 words)

Inflow segmentation – measuring new customers by value not volume

6 January 2011

Do you have accurate and timely analysis of the quality of the customers you are acquiring? Most companies carefully track the quantity of new customers by the hour, day, or certainly the week, but it is still less common to track the quality of the inflow as it happens. It is interesting to know that we have acquired, say, 1000 new customers today, but so very much more informative to know that this inflow will bring in £22,000 of revenues over the next year at 35% margin. Break it down by channel and product to see who is performing and who is not, and I as a marketing manager get really excited: I have the tools to do my job!

Read more (~800 words)

Understanding reasons for churn – and what you can do about it

5 January 2011

We argued in our article on commercial churn modelling that you want to predict not only the probability of a customer leaving you but even more importantly what you can do about it. We want to predict why the customer is churning or, more precisely, his likelihood to stay (given that he was likely to leave) after we extend an offer or perform an action from a list of activities for churn management, as well as his profitability after the save.

Read more (~740 words)

Commercial churn modelling

4 January 2011

Churn modelling is easy; commercial churn modelling is hard. Let us compare the two to explain what we mean by the latter.

Read more (~1150 words)

Why?

3 January 2011

Why do we do analytics? You will come to know the truth, and the truth will set you free, said the teacher, and while he wasn’t talking about commercial data mining we think he could have been.

Read more (~370 words)

Benchmarking feature selection with Boruta and caret

25 November 2010

Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an important step in the analysis process. And since we often work on very large data sets the performance of our process is very important to us.

Read more (~1290 words)

Feature selection: Using the caret package

16 November 2010

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. In a previous post we looked at all-relevant feature selection using the Boruta package while in this post we consider the same (artificial, toy) examples using the caret package. Max Kuhn kindly listed me as a contributor for some performance enhancements I submitted, but the genius behind the package is all his.

Read more (~990 words)