> It’s very subjective, but I think the uniform stsrts looking reasonably good at a sample size of 8. The exponential however takes much longer to converge to a normal.
That's a good observation. The main idea behind the Central Limit Theorem is to take the Fourier Transform, operate and then go back. After that, after normalization the result is that the new distribution for the sum of N variables is something like
Normal(X) + 1/N * "Skewness" * Something(X) + 1/N^2 * IDont * Remember(X) + ...
Where "Skewness" is a number defined in https://en.wikipedia.org/wiki/SkewnessThe uniform distribution is symmetric, so skewness=0 and the correction decrease like 1/N^2.
The exponential distribution is very asymmetrical and and skewness!=0, so the main correction is like 1/N and takes longer to dissapear.
The intuition behind it is that when we take batches of samples from some arbitrarily shaped distribution, and we summarize the information by looking at the mean values of the batches of samples, we find that those mean values are moving away from the arbitrarily shaped distribution. The larger are the batches, the more those means approach a normal distribution.
In other words, the means of large batches of samples from some funny shaped distribution themselves constitute a sequence of numbers, and that sequence follow a normal distribution. Or closer and closer to one the larger the batches are.
This observation legitimizes our uses of statistical inference tools derived from the normal distribution, like confidence intervals, provided we are working with large enough batches of samples.
I love the simulations. They are such a good way to learn STATS... you can still look at the theorem using math notation after, but if you've seen it work first using simulated random samples, then the math will make a lot more sense.
Here is a notebook with some more graphs and visualizations of the CLT: https://nobsstats.com/site/notebooks/28_random_samples/#samp...
runnable link: https://mybinder.org/v2/gh/minireference/noBSstats/main?labp...
Highly entertaining, here a little fun fact: there exist a generalisation of the central limit theorem for distributions without find out variance.
For some reasons this is much less known, also the implications are vast. Via the detour of stable distributions and limiting distributions, this generalised central limit theorem plays an important role in the rise of power laws in physics.
> I always avoided statistics subjects.
I don't believe you. Even if you had a good control group, the fact that one subject engaged in fewer statistics subjects than the control group doesn't lead to the conclusion that there is an avoidance mechanism (or any mechanism). You need a sample of something like 30 or 40 more of you to detect a statistically valid pattern of diminished engagement with statistics subjects that could then be hypothesized as being caused by avoidance.
There's an interesting extension of the Central Limit Theorem called the Edgeworth Series. If you have a large but finite sample, the resulting distribution will be approximately Gaussian, but will deviate from a Gaussian distribution in a predictable way described by Hermite polynomials.
This is a very neat illustration, but I want to leave a reminder that when we cherry-pick well-behaved distributions for illustrating the CLT, people get unrealistic expectations of what it means: https://entropicthoughts.com/it-takes-long-to-become-gaussia...
I was definitely expecting you'd need a higher sample size for the Q-Q plots to start looking good. All the points in other comments about drawbacks or poorly behaved distributions are well taken, and this is nothing new, but wow it really does work well.
Speaking of CLTs, is there a good book or reference paper that discusses various CLTs (not just the basic IID one) in a somewhat introductory manner?
Looking at the R code in this article, I'm having a hard time understanding the appeal of tidyverse.
“ You’re also likely not going to have the resources to take twenty-thousand different samples.”
There are methods to calculate how many estimated samples you need. It’s not in the 20k unless your population is extremely high
The definition under "A Brief Recap" seems incorrect. The sample size doesn't approach infinity, the number of samples does. I'm in a similar situation to the author, I skipped stats, so I could be wrong. Overall great article though.
Bravo
> Maybe there’s a story to be told about a young person finding uncertainty uncomfortable,
I really like this blog post but I also want to talk about this for a minute.Us data oriented STEM loving types love being right, right? So I find it weird that this makes many of us dislike statistics. I find this especially considering how many people love to talk about quantum mechanics. But I think one of the issues here is that people have the wrong view of statistics and misunderstand what probability is really about. OP is exactly right, it is about uncertainty.
So if we're concerned with being right, you have to use probability and statistics. In your physics and/or engineering classes you probably had a teacher or TA who was really picky with things like sigfigs[0] or including your errors/uncertainty (like ±). The reason is because these subtle details are actually incredibly powerful. I'm biased because I came over from physics and moved into CS, but I found these concepts translated quite naturally and were still very important over here. Everything we work with is discrete and much of it is approximating continuous functions. Probabilities give us this really powerful tool to be more right!
Think about any measurement you make. Go grab a ruler. Which is more accurate? Writing 10cm or 10cm ± 1cm? It's clearly the latter, right? But this isn't so different than writing something like U(9cm,11cm) or N(10cm,0.6cm). In fact, you'd be even more correct if you wrote down your answer distributionally![1] It gives us much more information!
So honestly I'd love to see a cultural shift in our nerd world. For more appreciation of probabilities and randomness. While motivated by being more right it opens the door to a lot of new and powerful ways of thinking. You have to constantly be guessing your confidence levels and challenging yourself. You no longer can read data as absolute and instead read it as existing with noise. You no longer take measurements with absolute confidence because you will be forced to understand that every measurement is a proxy for what you want to measure. These concepts are paradigm shifting in how one thinks about the world. They will help you be more right, they will help you solve greater challenges, and at the end of the day, when people are on the same page it makes it easier to communicate. Because it no longer is about being right or wrong, it is about being more or less right. You're always wrong to some degree, so it never really hurts when someone points out something you hadn't considered. There's no ego to protect, just updating your priors. Okay, maybe that last one is a little too far lol. But I absolutely love this space and I just want to share that with others. There's just a lot of mind opening stuff to be learned from this (and other) math field, especially as you get into metric theory. Even if you never run the numbers or write the equations, there are still really powerful lessons to learn that can be used in your day to day life. Math, at the end of the day, is about abstraction and representation. As programmers, I think we've all experienced how powerful these tools are.
[0] https://en.wikipedia.org/wiki/Significant_figures
[1] Technically 10cm ± 1cm is going to be Uniform(9cm,11cm) but realistically that variance isn't going to be uniformly distributed and much more likely to be normal-like. You definitely have a bias towards the actual mark, right?! (Usually we understand ± through context. I'm not trying to be super precise here and instead focusing on the big picture. Please dig in more if you're interested and please leave more nuance if you want to expand on this, but let's also make sure big picture is understood before we add complexity :)
Obligatory 3Blue1Brown reference
Edit: OP confirms there's no AI-generated code, so do ignore me.
The code style - and in particular the *comments - indicate most of the code was written by AI. My apologies if you are not trying to hide this fact, but it seems like common decency to label that you're heavily using AI?
*Comments like this: "# Anonymous function"
[dead]
There is an analogue of the CLT for extreme values. The Fisher–Tippett–Gnedenko theorem is the extreme-values analogue of the CLT: if the properly normalized maximum of an i.i.d. sample converges, it must be Gumbel, Fréchet, or Weibull—unified as the Generalized Extreme Value distribution. Unlike the CLT, whose assumptions (in my experience) rarely hold in practice, this result is extremely general and underpins methods like wavelet thresholding and signal denoising—easy to demonstrate with a quick simulation.