List of posts in CYBAEA
Analytics for Marketing online training 25 - 28 September 2012
I am excited to be giving the Analytics for Marketing online training course on 25-28 September 2012. Sign up before 25 August 2012 for the early bird discount. Our friends at Revolution Analytics who will provide the infrastructure to host the event.
Update: For clarification, this is an online, instructor led training course. We are using the Cisco WebEx Training Center to provide the training room. This allows us to keep the interactivity of classroom training without everybody having to physically travel. There is a limit on the number of participants so book early to ensure your seat (and for the early bird discount).
When Big Data Matters
Big Data is a buzzword, but is it real: does it address real business issues or is it just an excuse to sell more computers, software, and consulting services?
We argue that it is real and it does matter, but only in some well-defined circumstances: it is not a universal solution or requirement to every problem. We provide a framework for determining where the Big Data applications are within your work and where traditional approaches apply.
Get this article as a PDF: When Big Data matters.
R code for Chapter 2 of Non-Life Insurance Pricing with GLM
We continue working our way through the examples, case studies, and exercises of what is affectionately known here as “the two bears book” (Swedish björn = bear) and more formally as Non-Life Insurance Pricing with Generalized Linear Models by Esbjörn Ohlsson and Börn Johansson (Amazon UK | US).
At this stage, our purpose is to reproduce the analysis from the book using the R statistical computing and analysis platform, and to answer the data analysis elements of the exercises and case studies. Any critique of the approach and of pricing and modeling in the Insurance industry in general will wait for a later article.
R code for Chapter 1 of Non-Life Insurance Pricing with GLM
Insurance pricing is backwards and primitive, harking back to an era before computers. One standard (and good) textbook on the topic is Non-Life Insurance Pricing with Generalized Linear Models by Esbjorn Ohlsson and Born Johansson. We have been doing some work in this area recently. Needing a robust internal training course and documented methodology, we have been working our way through the book again and converting the examples and exercises to R, the statistical computing and analysis platform. This is part of a series of posts containing elements of the R code.
They have finally pulled that buggy unreliable piece of code that was doSMP from the CRAN mirrors while (I hear) Revolutions are re-writing it. To use all your cores for analysis on the Windows platform, you can try doSNOW instead; my code is something like the fragment below. Neither option is as attractive as doMC on anything-but-Windows platforms, but sometimes you have to work with legacy systems.
R versus SAS/SPSS in corporations
A recent question on one of the LinkedIn groups about the advantages of using R over commercial tools like SAS or IBM SPSS Modeller drew lots of comments for R. We like R a lot and we use it extensively, but I also wanted to balance the discussion. R is great, but looking at commercial organizations near the end of 2011 it is not necessarily the right choice to make.
Commercial Analytics: The Capabilities
Commercial Analytics is the kind that makes money. From data to dollars, insights to income, this is all about how to run the business better. To do it and to do it well you need certain capabilities in place. This article builds a map of those business capabilities to help you assess, understand, and plan your business.
Usually we talk about this and we are happy to talk to you about it (just contact us) but we recently had occasion to make a slide pack that covered some of the materials as a stand-alone presentation. This article is based on that pack which is also available for download.
5 common pitfalls of commercial analytics projects
We have seen data mining and other analytics projects fail; we have seen insights teams unable to deliver the insights needed to actually improve the business; we have seen marketing teams unable to use data effectively to guide and quantify their activities; we have seen business leaders who are sitting on piles of data but are effectively flying blind because they can not get from the data to the knowledge they need to inform their decisions.
Below we have listed five common pitfalls of analytics in a commercial environment, their warning signs, and what you can do differently.
Friday quote: what is the question to which this number is the answer?
John Kay muses on interpreting statistical data:
Always ask of such data “what is the question to which this number is the answer?”. “Earnings before interest, tax, depreciation and amortisation on a like-for-like basis before allowance for exceptional restructuring costs” is the answer to the question “what is the highest profit number we can present without attracting flat disbelief?”.
A warning on the R save format
save() function in the R platform for statistical computing is very convenient and I suspect many of us use it a lot. But I was recently bitten by a “feature” of the format which meant I could not recover my data.
I recommend that you save data in a data format (e.g. CSV or CDF), not using the
save() function which is really for objects (data and code). What is your approach?