Explorers and farmers: What is a ‘Data Scientist’?

data
business
Published

27 March 2017

I am not sure the term ‘Data Scientist’ means anything anymore. As often happens with new buzzwords, they take on a life of their own and the original meaning becomes diluted. But I do think the term was useful and could be useful to distinguish from other data related functions in the organization, and I feel it is worth reclaiming the term for this precise usage.

The image of the scientist that I want to evoke with this term is the eureka of scientific discovery. To me, the defining characteristic of a data scientist is that he or she applies the scientific method to data and aims to discover new insights from the data itself. Knowledge discovery from data is the key. Because this often means using large volumes of possibly difficult data, the term may extend to this area of the effective management of ‘big data’ as an adjacent space and key enabler.

Gustave Doré: Return of the Spies From the Land of Promise

It is useful to think of two types of data scientists; let’s call them ‘explorers’ and ‘farmers’. The explorer is the intrepid traveller who enters the new country and brings back fruits of the land, as in the old story. The farmer cultivates once the land is conquered and the fields are fenced. An explorer data scientist will get excited about building your first ever churn model while the farmer looks forward to optimising the run time, the model parameters, and the automation of the solution. For the next 30 years or so….

Do not confuse data scientist explorers and farmers. They are different people with complementary skills and very different temperaments. You would normally want them in different parts of your organization: the explorer clearly performs a business function and needs to report to the commercial business. She needs to measure her success in big strategic numbers: financial terms or terms that have a clear financial impact such as revenues, EBITDA, NPS, product innovation, and so on.

The farmer is an operational role; he may well sit in an operational function such as IT, Marketing, or Service, and he will measure his success in operational numbers: direct costs, incremental improvements in internal metrics such as model lift or NPS survey responses, etc.

Do not confuse the two. If you are recruiting, be clear about which one you want. And for how long. And for which outcomes. If you try to find one person to do both roles then you are searching for a unicorn.

Data scientist explorers

Data scientist explorers are supremely comfortable with being uncomfortable; with a life that is full of the unknown, uncertain, uncharted, and untried. Science is about what we don’t know, as legendary physicist Richard Feynman understood:

The scientist has a lot of experience with ignorance and doubt and uncertainty, and this experience is of very great importance, I think. When a scientist doesn’t know the answer to a problem, he is ignorant. When he has a hunch as to what the result is, he is uncertain. And when he is pretty darn sure of what the result is going to be, he is still in some doubt. We have found it of paramount importance that in order to progress we must recognize our ignorance and leave room for doubt. Scientific knowledge is a body of statements of varying degrees of certainty — some most unsure, some nearly sure, but none absolutely certain.

Now, we scientists are used to this, and we take it for granted that it is perfectly consistent to be unsure, that it is possible to live and not know. But I don’t know whether everyone realizes this is true.

To Feynman’s last point, I know that not everyone realises this truth and I know that even some of those who realise it are not comfortable to ‘live and not know’. I am good with that, we are all different and our difference add to our overall lives. But when I interview for a data science role I really want to know if you are comfortable with ‘ignorance and doubt and uncertainty’.

Explorers come back with the fruits of the land. They explore data for insights, build models to use those insights, and they deploy them in the business. They are focused on outcomes, and they work quickly. Great data science explorers are like Joshua and Caleb in the story: they come back and inspire and lead their organizations through change to exploit the opportunities in the data. We should go up and take possession of the land, for we can certainly do it.

Poor data scientist explorers are like the ten other explorers in the story who feared the land of the giants, failed to inspire their community to change, and thereby condemned it to forty years of wandering in the wilderness. The land we explored devours those living in it.

You want to recruit people like Joshua and Caleb, and you want to avoid the others at all cost. McKinsey talks about business translators who combine data savvy with industry and functional expertise as a key skill, and that is certainly part of it. You need people who love creating new things; who fluently ‘speak’ commercials, customers, and data; and who can inspire and lead the change. To make them effective, give them ‘big strategic numbers’ as their goals.

Data scientist farmers

I spent some time about a decade ago establishing and running analytics and insights teams in several countries for a large mobile telecommunications company. With a big prestigious brand behind me, it was easy to recruit good talent. But getting them to stay for the long haul was almost impossible. I identified two reasons. First, I was attracting explorers. The reality of telco insights is that there is a lot of incremental work. You need to get that churn model and the fraud detection incrementally better. All the time. And you need to model, analyse, and optimize campaign performance for the same sort of campaigns again and again. In other words, very quickly you turn to farming.

Second, at the time the people who worked as analysts saw their career not as ‘telecommunications’ but as ‘analytics’ or, more often, even narrower as ‘SAS’. With a narrow technical focus they were never going to be satisfied in a commercial business function such as Marketing or Customer Value Management, which is where we placed them. And with the narrow focus there really wasn’t much of a career path for them within the organization. So they joined, got the brand on their CV, and moved on. Perfectly rational.

The data scientist farmer does have strong technical skills. He is an expert at designing, building, or maintaining robust, high performance solutions. He loves to make things better. He enjoys and appreciate process.

I have built successful farming functions within IT or Finance, though you could also look at Service or other operational functions. You want to build strong data science processes supported by strong data quality and -infrastructure capabilities. You want to set operational metrics, such as model improvements, uptime, direct costs, response time, etc.