Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Monday, October 7, 2013

choose random lines from a file

I have a bed file which contains promoter regions. I want to get 1000 random promoters from it. First I think I would write a python script using the random module, but before I do it, I did a quick google search since it is such an ordinary job, there may be tools out there already.

http://stackoverflow.com/questions/9245638/select-random-lines-from-a-file-in-bash
http://stackoverflow.com/questions/448005/whats-an-easy-way-to-read-random-line-from-a-file-in-unix-command-line

I like the sort  -R solution
Sort the file randomly and pick first 100 lines:
$ sort -R input | head -n 100 >output

but it would be very slow if the file is very huge. The promoter file only contains 50k lines, so it is not a very big deal.

It turns out that shuf is a built-in command for this kind of task.
Use shuf as shown below, to get N random lines:
shuf -n N input > output

of course, there are other solutions using awk and sed, but why not use this simple one:)
Linux is awesome!

By the way, the random command from the bedtools http://bedtools.readthedocs.org/en/latest/content/tools/random.html  generates random bed Intervals from a genome. If that's what you want, use it.



No comments:

Post a Comment