Sunday, March 22, 2015

benchmarking for shuf vs fast_sample

Sometimes we want to randomly select a proportion of lines from a txt file. The easiest way is to use the Unix command shuf. On a mac machine, you can install it by home brew.
brew install coreutils
But you need to invoke it as gshuf.
I also came across a tool called fast_sample that can do the same thing
I did some benchmarking for them.

The take home message for the benchmarking is that Unix tools sometimes are better than tools you write in terms of speed and memory efficiency.

