Reservoir sampling is a family of randomized algorithms for randomly choosing k samples from a list of n items, where n is either a very large number. Typically n is large enough that the list doesn’t fit into main memory. For example, a list of search queries in Google and Facebook.
Question
given a big array (or stream) of numbers (to simplify), and we need to write an efficient function to randomly select k numbers where 1 <= k <= n. Let the input array be stream[].
Solution
O(n) time!
Create an array sample[0..k-1] and copy first k items of stream[] to it.
Now one by one consider all items from (k+1)th item to nth item.
Generate a random number ‘j’ from 0 to i where i is index of current item in stream[].
If j is in range 0 to k-1, replace sample[j] with stream[i]
Code
not written