Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Sunday, March 22, 2015

benchmarking for shuf vs fast_sample

Sometimes we want to randomly select a proportion of lines from a txt file. The easiest way is to use the Unix command shuf. On a mac machine, you can install it by home brew.
brew install coreutils
But you need to invoke it as gshuf. https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x/
I also came across a tool called fast_sample that can do the same thing
https://github.com/earino/fast_sample
I did some benchmarking for them.

The take home message for the benchmarking is that Unix tools sometimes are better than tools you write in terms of speed and memory efficiency.

No comments:

Post a Comment