Blog posts from CYBAEA

Inflow segmentation – measuring new customers by value not volume

6 January 2011

Do you have accurate and timely analysis of the quality of the customers you are acquiring? Most companies carefully track the quantity of new customers by the hour, day, or certainly the week, but it is still less common to track the quality of the inflow as it happens. It is interesting to know that we have acquired, say, 1000 new customers today, but so very much more informative to know that this inflow will bring in £22,000 of revenues over the next year at 35% margin. Break it down by channel and product to see who is performing and who is not, and I as a marketing manager get really excited: I have the tools to do my job!

Read more (~800 words)

Understanding reasons for churn – and what you can do about it

5 January 2011

We argued in our article on commercial churn modelling that you want to predict not only the probability of a customer leaving you but even more importantly what you can do about it. We want to predict why the customer is churning or, more precisely, his likelihood to stay (given that he was likely to leave) after we extend an offer or perform an action from a list of activities for churn management, as well as his profitability after the save.

Read more (~740 words)

Commercial churn modelling

4 January 2011

Churn modelling is easy; commercial churn modelling is hard. Let us compare the two to explain what we mean by the latter.

Read more (~1150 words)


3 January 2011

Why do we do analytics? You will come to know the truth, and the truth will set you free, said the teacher, and while he wasn’t talking about commercial data mining we think he could have been.

Read more (~370 words)

Benchmarking feature selection with Boruta and caret

25 November 2010

Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an important step in the analysis process. And since we often work on very large data sets the performance of our process is very important to us.

Read more (~1290 words)

Feature selection: Using the caret package

16 November 2010

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. In a previous post we looked at all-relevant feature selection using the Boruta package while in this post we consider the same (artificial, toy) examples using the caret package. Max Kuhn kindly listed me as a contributor for some performance enhancements I submitted, but the genius behind the package is all his.

Read more (~990 words)

Feature selection: All-relevant selection with the Boruta package

15 November 2010

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. There are two main approaches to selecting the features (variables) we will use for the analysis: the minimal-optimal feature selection which identifies a small (ideally minimal) set of variables that gives the best possible classification result (for a class of classification models) and the all-relevant feature selection which identifies all variables that are in some circumstances relevant for the classification.

Read more (~1210 words)

Big data for R

5 August 2010

Revolutions Analytics recently announced their “big data” solution for R. This is great news and a lovely piece of work by the team at Revolutions.

Read more (~760 words)

Area Plots with Intensity Coloring

13 July 2010

I am not sure apeescape’s ggplot2 area plot with intensity colouring is really the best way of presenting the information, but it had me intrigued enough to replicate it using base R graphics.

Read more (~420 words)

Employee productivity revisited

22 June 2010

We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is scary.

Read more (~300 words)