You can find a body function here which can only accept streams and assumes only one line header.
Use subshells and awk can do the same job and potentially more flexible.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## imagine we have a file with one line header, and we want to keep the header after sorting | |
## use subshells http://bash.cyberciti.biz/guide/What_is_a_Subshell%3F | |
(sed -n '1p' your_file; cat your_file | sed '1d' | sort) > sort_header.txt | |
## if you have two header lines and want to keep both of them: | |
(sed -n '1,2p' your_file; cat your_file | sed '1,2d' | sort) > sort_header.txt | |
## if you have many lines starting with "#" as header, like vcf files | |
(grep "^#" my_vcf; grep -v "^#" my_vcf | sort -k1,1V -k2,2n) > sorted.vcf | |
## one can also use awk | |
cat my_vcf | awk '$0~"^#" { print $0; next } { print $0 | "LC_ALL=C sort -k1,1V -k2,2n" }' | |
## I am a useless cat user :) http://stackoverflow.com/questions/11710552/useless-use-of-cat | |
The original credits go to Aaron Quinlan. see a gist below to sort vcf files in natural chromosome order :chr1 chr2 chr3.... rather than chr1 chr10 chr11....
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
chmod a+x vcfsort.sh | |
vcfsort.sh trio.trim.vep.vcf.gz |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sort VCF and keep (only the first) header as-is: | |
awk 'BEGIN{x=0;} $0 ~/^#/{ if(x==0) {print;} next}{x=1; print $0 | "sort -k1,1 -k2,2n"}' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Faster, but can't handle streams | |
[ $# -eq 0 ] && { echo "Sorts a VCF file in natural chromosome order";\ | |
echo "Usage: $0 [my.vcf | my.vcf.gz]"; exit 1; | |
} | |
# cheers, @michaelhoffman | |
if (zless $1 | grep ^#; zless $1 | grep -v ^# | LC_ALL=C sort -k1,1V -k2,2n); | |
then | |
exit 0 | |
else | |
printf 'sort failed. Does your version of sort support the -V option?\n' | |
printf 'If not, you should update sort with the latest from GNU coreutils:\n' | |
printf 'git clone git://git.sv.gnu.org/coreutils' | |
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Slower, but handles streams. | |
[ $# -eq 0 ] && { echo "Sorts a VCF file in natural chromosome order";\ | |
echo "Usage: $0 [my.vcf | my.vcf.gz]"; \ | |
echo "Usage: cat [my.vcf | my.vcf.gz] | $0"; \ | |
exit 1; | |
} | |
# cheers, @colbychiang | |
if zless $1 | awk '$0~"^#" { print $0; next } { print $0 | "LC_ALL=C sort -k1,1V -k2,2n" }'; | |
then | |
exit 0 | |
else | |
printf 'sort failed. Does your version of sort support the -V option?\n' | |
printf 'If not, you should update sort with the latest from GNU coreutils:\n' | |
printf 'git clone git://git.sv.gnu.org/coreutils' | |
fi |