Blog posts from CYBAEA

Employee productivity as function of number of workers revisited

22 June 2010

We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is mildly scary.

Read more (~670 words)

Comparing standard R with Revoutions for performance

17 June 2010

Following on from my previous post about improving performance of R by linking with optimized linear algebra libraries, I thought it would be useful to try out the five benchmarks Revolutions Analytics have on their Revolutionary Performance pages.

Read more (~270 words)

Faster R through better BLAS

15 June 2010

Can we make our analysis using the R statistical computing and analysis platform run faster? Usually the answer is yes, and the best way is to improve your algorithm and variable selection.

Read more (~750 words)

Eliminating observed values with zero variance in R

8 March 2010

I needed a fast way of eliminating observed values with zero variance from large data sets using the R statistical computing and analysis platform. In other words, I want to find the columns in a data frame that has zero variance. And as fast as possible, because my data sets are large, many, and changing fast. The final result surprised me a little.

Read more (~470 words)

Your mobile phone knows everything about you … and it is telling

17 August 2009

We knew the potential existed already, of course. Mobile devices in the USA generates some 600 billion transactions per day, each tagged with the location and time. Jeff Jonas says, Every call, text message, email and data transfer handled by your mobile device creates a transaction with your space-time coordinate[…]. Got a Blackberry? Every few minutes, it sends a heartbeat, creating a transaction whether you are using the phone or not. That is some 7 million transactions per second, on average.

Read more (~440 words)

Beautiful Data

27 July 2009

O’Reilly’s recent publication Beautiful Data has a chapter by Jeff Jonas which is enough reason in itself for me to recommend it. The chapter, Data Finds Data, is also available as a PDF download.

Read more (~60 words)

Massively parallel database for analytics

22 July 2009

This is by far the best description of why traditional parallel databases (like Teradata, Greenplum et al.) is a evolutionary dead end. But much more than a theoretical discussion, they have built a solution which they call HadoopDB. It is based on Hadoop, PostgreSQL, and Hive and is completely Open Source. Alternative, column-based, backends to PostgreSQL are being implemented now. Read: Announcing release of HadoopDB.

Read more (~80 words)

B2B Content Marketing

22 July 2009

The nice people at Velocity has released The B2B Content Marketing Workbook. It is behind a registration wall which means we wouldn’t normally recommend it but you can just type junk in the fields if you are not comfortable with giving your personal details to a marketing agency. (Think about it….) If you are relatively new in the B2B world, say having joined a professional services or consulting organization, you may find this one useful.

Read more (~260 words)

Marketing lessons from antiquity

10 July 2009

A story from antiquity involving a king of Rome and a Greek Sibyl has lovely marketing lessons.

Read more (~260 words)

The Knapsack Problem

10 July 2009

David posts a question about how to solve this knapsack problem using the R statistical computing and analysis platform. My reply in the comments seems to have disappeared for a while so here is my proposed solution. See David’s blog for my earlier proposed solution with a very common error.

Read more (~140 words)