Coding Example 1: Noise and Evidence

This example is designed to give you a better feel for the importance of noise and its effects on decision-making. It is written using R Markdown. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

You can download the code file used to create this page at http://decisionneurolab.com/resources/Intro_to_DDM/exercises/noise.Rmd.

Author: Cendri Hutcherson

Date last modified: May 1, 2019

Let’s imagine that you are deciding whether or not to eat a piece of chocolate, and that the “true” reward to you of consuming that chocolate is +5, which means you should choose to eat it. Unfortunately, you don’t actually know what that value is until you’ve actually consumed it. Instead, you have to make an inference about that value based on predictions about it. This is the heart of decision making. Those predictions consistitute “evidence” either for or against eating the chocolate, and should help to determine your choice. That evidence might be based on rewards you’ve experienced in the past from consuming chocolate, or your predictions based on current goals (e.g. enjoying a nice dessert vs. losing a meal). Importantly, we assume that this evidence is generated by the brain with noise at any given moment in time (say, due to the noise inherent to neural firing). This first exercise explores the implications of this noise for making a good or correct choice.

Example 1.1. Evidence with Gaussian noise

Let’s assume that your brain computes the true value +5 with noise that has a standard deviation of +2. Assuming you drew only a single piece of evidence before making your choice, what is the likelihood that you would correctly say yes to eating the chocolate? Here is some code for computing this likelihood.

E = 5 # set the evidence to 5
noise = 2 # set the noise to +/- 2

# generate 1000 samples of evidence
E_t = rnorm(1000, mean = E, sd = noise)

# plot a histogram of the samples
hist(E_t)

# determine what percentage are below 0
mean(E_t < 0)

## [1] 0.006

# Or we could compute this value analytically
pnorm(0, mean = E, sd = noise, lower.tail=TRUE)

## [1] 0.006209665

In this example, it appears that we wouldn’t choose incorrectly very often.

Example 1.2. Effects of increasing noise

What if the noise were much higher, relative to the evidence (say, +/- 10 instead of +/- 2)? What do you think that would do to the likelihood of a single sample of evidence leading to the correct choice? Well, let’s see.

E = 5 # set the evidence to 5
noise = 10 # set the noise to +/- 10

# generate 1000 samples of evidence
E_t = rnorm(1000, mean = E, sd = noise)

# plot a histogram of the samples
hist(E_t)

# determine what percentage are below 0
mean(E_t < 0)

## [1] 0.289

# Or we could compute this value analytically
pnorm(0, mean = E, sd = noise, lower.tail=TRUE)

## [1] 0.3085375

30.8%. That’s a pretty high error rate!

Example 1.3. Effects of decreasing evidence.

What if the noise were set back to +/-2, but the evidence itself were lower, say 1 instead of 2?

E = 1 # set the evidence to 1
noise = 2 # set the noise to +/- 2

# generate 1000 samples of evidence
E_t = rnorm(1000, mean = E, sd = noise)

# plot a histogram of the samples
hist(E_t)

# determine what percentage are below 0
mean(E_t < 0)

## [1] 0.297

# Or we could compute this value analytically
pnorm(0, mean = E, sd = noise, lower.tail=TRUE)

## [1] 0.3085375

Notice that the error rate here is EXACTLY the same as in our example above when the evidence was 5 but the noise was 10. This isn’t a coincidence. It illustrates an important principle: the likelihood of an error does not depend on the absolute magnitude of the evidence or the noise, but on the ratio between them.

Example 1.4. Taking more samples

So far, we’ve seen that simply taking a single sample of evidence might not be the best strategy in the presence of noisy evidence. But, you might be asking, what if we just took not ONE, but TWO samples of evidence? What is the likelihood that we would make an error if we took TWO samples and decided based on whether their sum was positive? Let’s see!

E = 5 # set the evidence to 5
noise = 10 # set the noise to +/- 10

# generate 1000 simulations of evidence for the first sample
E_t1 = rnorm(1000, mean = E, sd = noise)

# generate 1000 simulations of evidence for the second sample
E_t2 = rnorm(1000, mean = E, sd = noise)

# compute their sum
E_total = E_t1 + E_t2

# plot a histogram of the samples
hist(E_total)

# determine what percentage are below 0
mean(E_total < 0)

## [1] 0.209

# Or we could compute this value analytically
pnorm(0, mean = 2*E, sd = sqrt(2*noise^2), lower.tail=TRUE)

## [1] 0.2397501

Why is the standard deviation in this example computed as \(\sqrt{2\times noise^2}\) ? Because the standard deviation of the sum of two random variables is the square root of the sum of their variances.

Note that our error rate has dropped from 30.8% with one sample to 23.9% with two. That’s a pretty good return on investment. And if we had taken three samples, our error rate would be even better:

pnorm(0, mean = 3*E, sd = sqrt(3*noise^2), lower.tail=TRUE)

## [1] 0.1932381

Example 1.5. Taking more samples

In fact, we can plot how our error rate decreases as we increase the number of noisy samples we take, let’s say up to 100 samples:

# function to compute likelihood of error for given no. of samples
pError = function(x)pnorm(0, mean = x*E, sd = sqrt(x*noise^2), lower.tail=TRUE)

plot(1:100, pError(1:100), pch = 16, xlab = "No. of samples", ylab = "Error probability")

Of course, if our evidence is weaker compared to the noise, our error rate will decrease more slowly. But the point is that it will decrease!

E = 2.5 # set evidence to 2.5 instead of 5
pError = function(x)pnorm(0, mean = x*E, sd = sqrt(x*noise^2), lower.tail=TRUE)

plot(1:100, pError(1:100), pch = 16, xlab = "No. of samples", ylab = "Error probability")

Thought Exercises/Test Yourself

How would you calculate the error rate for a single sample of evidence if the “true” value of the evidence were -5 instead of +5?
If the evidence had a strength of +5 and noise level of 15, how many samples would would be required to ensure that a decision maker had at least a 90% probability of making the correct choice?