I have been using unix since 2013, yet I learn unix tricks almost everyday. The most powerful commands I have learned so far is the find
, xargs
and parallel
commands.
please check
parallel
GNU page for documentations.parallel
has changed my way to do repetitive works. Now, I use fewer and fewer for loops
.
Use case 1: I have 100 folders with names starting with H3K4me3, inside each folder, I have 5
.gz files
that I want to cat
together. The usual way to do it:# !/bin/bash
for dir in H3K4me3*/
do
cd $dir && cat *H3K4me3.bed.gz > ${dir}_merged.gz
cd ..
done
Note that cat works well with *gz files.
The parallel way:
ls -d H3K4me3* | parallel 'find {} -name "*H3K4me3*bed.gz" | xargs cat > {}_H3K4me3.bed.gz'
Using
parallel
, I can take full advantage of the multi-core nodes on the computing cluster, so it is much faster.
Use case 2: I have 100 folders (50 folder names start with H3K4me3, 50 start with H3K4me), each folder has multiple levels of sub-folders. I want to delete some bam files in 50 of them with name starting with H3K4me3, but I do not know which sub-folder the bam files may exist.
I do not really know a way to do it without using
find
. My solution would be:ls -d H3K4me3* | parallel 'find {} -name "*bam"' | parallel rm {}
piping to two
Edit on 04/04/2016:
With greater power comes greater responsibility. When you have too many files to process,
it is good to restrict parallel to only use certain number of CPUs with -j and not use swap-memory --noswap.
parallel
is the magic of this solution. Unix commands are elegant and efficient!!Edit on 04/04/2016:
With greater power comes greater responsibility. When you have too many files to process,
it is good to restrict parallel to only use certain number of CPUs with -j and not use swap-memory --noswap.
This comment has been removed by the author.
ReplyDeleteNice intro to parallel...
ReplyDeleteIn your bash script in use case #1, you're missing a 'cd ..' in the loop it seems.
For use case #2, could your 'double parallel' be replaced with this?
find H3K4me3* -name "*bam" -delete
Nico Stransky
Nice intro to parallel...
ReplyDeleteIn your bash script in use case #1, you're missing a 'cd ..' in the loop it seems.
For use case #2, could your 'double parallel' be replaced with this?
find H3K4me3* -name "*bam" -delete
Nico Stransky
Thx Nico, I edited the #1 accordingly. for case#2, it should be find H3K4me3 -name "*bam" -exec rm -rf {} \;
DeleteYou are correct in the case of directories. For simple files, '-delete' works.
DeleteNico
because you have -name "*bam" in the command, I assumed you were only looking to delete files. -delete will work in that case. To ensure that 'find' only returns files, you can add '-type f'.
DeleteNico
All thanks to Mr Anderson for helping with my profits and making my fifth withdrawal possible. I'm here to share an amazing life changing opportunity with you. its called Bitcoin / Forex trading options. it is a highly lucrative business which can earn you as much as $2,570 in a week from an initial investment of just $200. I am living proof of this great business opportunity. If anyone is interested in trading on bitcoin or any cryptocurrency and want a successful trade without losing notify Mr Anderson now.Whatsapp: (+447883246472 )
ReplyDeleteEmail: tdameritrade077@gmail.com
ReplyDeleteGolden Retriever Puppies For Sale
Golden retriever Puppies For Sale in pa
labrador retriever puppies for sale in Indiana
golden retriever puppies for sale in Ohio
Blue Great Dane puppies for sale
Blue Great Dane puppies for sale in PA
Blue Great Dane puppies for sale
Driving license for sale
Diplomas for sale
ID card for sale