Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Sunday, March 22, 2015

benchmarking for shuf vs fast_sample

Sometimes we want to randomly select a proportion of lines from a txt file. The easiest way is to use the Unix command shuf. On a mac machine, you can install it by home brew.
brew install coreutils
But you need to invoke it as gshuf. https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x/
I also came across a tool called fast_sample that can do the same thing
https://github.com/earino/fast_sample
I did some benchmarking for them.

The take home message for the benchmarking is that Unix tools sometimes are better than tools you write in terms of speed and memory efficiency.

1 comment:

  1. Cheap Alexander McQueen uk, combining elegant style and cutting-edge technology, a variety of styles of replica Alexander McQueen womens bleach white oversized sneaker, the pointer walks between your exclusive taste style.

    ReplyDelete