What is Bessel’s Correction. Is it really needed ?

Swapna Samir Shukla
5 min readDec 30, 2020

A few days into an Introductory Statistics course, and you encounter one of the basic formulas for estimating Variance or Standard Deviation of a population. The formula seems fairly intuitive but for (n-1) in the denominator, and you would like to ask yourself: Why not divide the sum of squared differences between the sample observations and sample mean, by the number of observations(n)?

Before, we delve any further and try to answer the above question, let’s have a small refresher on some of the basic statistics terminologies: -

Population and Sample: A population represents the entire set of observations needed to answer a specific research question. For E.g.: — What is the average number of days needed for people (from a particular city, say Bangalore) to earn an Orange Belt in Karate? The target population will include the entire set of people enrolled in registered Karate Dojos in that city. Often data collection from the entire population is difficult and expensive, and therefore, we settle for just a subset of that population. This subset, commonly referred to as a sample is assumed to be a representative of the entire population. The sampling strategy would again depend on data distribution within the population and research objectives. Some of the popular sampling strategies are Simple Random Sampling, Stratified Sampling (proportionate and disproportionate), Cluster Sampling etc. In the Karate example, if the enrollment patterns (age group, body types etc.) are similar across the Dojos, we may go for a Multistage sampling, wherein we choose a Cluster of Dojos in the first stage and randomly choose Karate practitioners within that cluster in the second stage.

Population Mean and Sample Mean:

μ — Population Mean, x̅ — Sample Mean, N-no of observations in population, n-no of sample observations

Population Standard Deviation and Sample Standard Deviation:

σ — Population Standard Deviation , s-Uncorrected Sample Standard Deviation, (s-hat) — Sample Standard Deviation

μ and σ are commonly referred to as population parameters, while x̅ ,(s-hat) are referred to as sample statistic. We shall be using this notation throughout this article.

Addressing the elephant in the room! Why n-1?

In many real world scenarios, we do not really know what the population mean(μ) is, and therefore we can’t calculate the population standard deviation(σ). To circumvent this issue, we replace μ with its unbiased estimator (x̅ ;the sample mean). However, when we do this, we “severely” underestimate the sum of squared differences and therefore,the estimated population variance. This “uncorrected sample standard deviation” cannot be used to estimate population standard deviation!

In fact, replacing μ by generates the least possible value for sum of squared differences. The mathematical proof is shown below:

Now, one would like to make some adjustment to this uncorrected sample deviation, so that it can be closer to population mean. Friedrich Wilhelm Bessel proposed that the sample variance(s-hat²) can be an unbiased estimator of Population Variance(σ²), if we multiply the uncorrected sample variance by (n/n-1), a postulate commonly known as “Bessel’s Correction” in literature. The bias arises mostly due to finite sample size[1]. A mathematical proof of the same can be found in this Wikipedia article[3]. Please note that many other population parameters such as skewness and kurtosis also suffer from finite sample bias.

Now that we have understood why n-1 was used, we move on to the next question : Is Bessel Correction always necessary ? We attempt to answer this with some empirical evidences:-

The above code gives an interactive dashboard to experiment with (Plotly Dash is awesome !). The experiment consists of varying the sample size from 10 to 1000 drawn from a population of 50000 integers between 0 and 100000. Once a sample size is selected from the Slider object in the dashboard, 300 samples of that particular size are drawn without replacement ( this ensures independence), and each time the correction factor is changed from -3 to 2 in steps of 0.1. The black horizontal line is the population Standard Deviation and the blue line charts out Sample Standard Deviation at different values of correction factors.

Simulation for effect of sample size on Bessel’s Correction

Dev@0 is the percentage difference between the true population Standard Deviation(σ) and the uncorrected standard deviation(s). Dev@-1 is the percentage difference between the true population Standard Deviation(σ) and unbiased sample standard deviation(s-hat)

We observe that Bessel Correction helps when the sample size is small. As shown in the above illustration, for example: for a sample of size 10, n-1 correction factor reduces the difference between population mean and sample mean to 0.28%, as compared to uncorrected standard deviation, which is around 4.8%. As we keep on increasing the sample size, the difference between Dev@0 and Dev@-1 keeps on decreasing.

This is intuitive, larger samples contain more data points from the population and therefore the sample mean converges to population mean. This is more so true, in case the Gaussian distributions, where the probability of selecting points closer to mean is more likely than selecting outliers [4]. In smaller samples, there’s a possibility that some outliers may pull the sample mean away from the true population mean.

As, we keep on increasing sample sizes beyond 1000, in my many cases, n-1 correction factor would seem to overestimate the population standard deviation. Even though empirically we have shown that, Bessel’s Correction is more useful for smaller samples,there isn’t a rule of thumb as to when Bessel’s correction would be used. In the absence of other bias correction measures, Bessel’s correction is still the best bet to get an unbiased estimate of population standard deviation

References:

  1. Hayes, K. (2014). Finite-sample bias-correction factors for the median absolute deviation.
    Communications in Statistics: Simulation and Computation, 43:2205{2212.
  2. Stephanie Glen. “Bessel’s Correction: Why Use N-1 For Variance/Standard Deviation?” From StatisticsHowTo.com: Elementary Statistics for the rest of us! https://www.statisticshowto.com/bessels-correction/
  3. https://en.wikipedia.org/wiki/Bessel%27s_correction
  4. Brayton Hall. (August 23,2020). The reasoning behind Bessel’s Correction https://towardsdatascience.com/the-reasoning-behind-bessels-correction-n-1-eeea25ec9bc9

--

--

Swapna Samir Shukla

A Software Engineer turned Business Analyst, waiting to go back to Software Engineering days