
[Image source]
We consider the problem of picking a random sample of a given size k from a large dataset of some unknown size n. The hidden assumption here is that n is large enough that the whole dataset does not fit into main memory, whereas the desired sample does....