Telling the truth: technique for unbiased survey answers – Make money from data

The truth. It is not always easy to get to it. If you are doing surveys or questionnaires you will know that people do not always reply honestly but are biased towards social norms. Ask for which sexually transmitted diseases a person is currently suffering from, and you will get lower responses than the frequency of the diseases in the population.

However, there is a technique for obtaining honest answers which we have used before and which is widely applicable in practical situations.

Figure 1 can illustrate a typical survey form using the example of sexually transmitted diseases (STD). The problem is that some people considers it a social stigma to have a STD and you will see a lower frequency of the diseases in your results than in the population. Another example may be to ask for the number of sexual partners which young men in particular will tend to over-report.

Biased answers

Figure 1: Typical positive questionnaire.

But there is a way to get honest answers. Consider the inverse question: name one sexually transmitted disease you do not currently have.

(Possibly) unbiased answers

Figure 2: Example negative questionnaire.

Now that is a quite different question from a psychological and social point of view. There is no particular (real or imagined) social stigma associated with not having herpes. You are much more likely to get an honest answer to this question.

But even if everybody answers honestly, what does a negative questionnaire like Figure 2 tell you? It does not tell you directly the prevalence of the diseases in the population, but it does tell you the difference in probability of the options, and combined with a positive survey like Figure 1 it can give you true unbiased results.

Consider first a negative survey with two options which we might call “¬A” and “¬B” (the negation sign “¬” reminding us that we are asking if you do not have A or B).

Assume for the moment that there is no overlap in the options, i.e. nobody has both A and B. Let’s call the probabilities in the population for each of the options a and b, and let us call the probability for having neither X. Further assume for the moment that there is no bias in selecting one option over another when both apply (so the healthy people have no preference for selecting “¬A” or “¬B”). Then you expect a distribution of answers as in the diagram below.

[Frequency distribution] — Figure 3: Frequency distribution of negative 2-question survey.

All the people who have A must answer “¬B” and all the people who have B must answer “¬A”, while the people with neither disease selects either option with equal probability.

Subtracting the responses to “¬B” from the responses to “¬A” gives you a true unbiased estimate for the probability difference a-b.

The case with three options in the survey is similar and shown below:

[Frequency distribution of negative 3-question survey] — Figure 4: Frequency distribution of negative 3-question survey.

Subtractions of the response frequencies will again give you true unbiased estimates for the differences in population probabilities a-b, b-c, and a-c.

This may already be enough for your needs. You may know one of the probabilities already. For example, one of the diseases may lead to a quick death if not treated and you are confident that the treatments and death certificates are reported accurately. Or in a marketing setting, you may know very well the sales of model A and are interested in knowing how well two new models B and C will perform.

If not, you can often combine a negative survey (Figure 2) with a positive one (Figure 1) to get a good unbiased estimate for the probabilities.

Assume for the moment that all options in Figure 1 are equally difficult to admit. Then you would expect that a certain proportion, let us call it F, of your respondents will leave the box unchecked and you get a distribution like the one below:

[Frequency distribution of positive 2-question survey with bias F] — Figure 4: Frequency distribution of positive 2-question survey with bias F.

Of course you do not know the bias, i.e. the value of F. That is the problem. But subtracting the two response frequencies gives you (a-b)*F. From the negative survey you know the difference a-b allowing you to easily solve for F. Applying this value to the frequencies in the positive survey gives you the true unbiased probability estimates.

Summary

By combining positive and negative question surveys we are able to get an estimate for the true population probability independent of the bias in responses.

Of course we have simplified somewhat in this discussion. Maybe there isn’t an equal probability of selecting among the true answers. Maybe the bias is not a single number like above. We’ll leave our readers to think about these objections before perhaps returning to them later. But in many practical cases you can make reasonable assumptions about the distributions that skew the results or you can change the approach to reduce or eliminate a bias (e.g. random order of the options). It will not be perfect, but you are not aiming for a perfect result but one that is accurate to maybe a tenth of the true probability (two significant digits).