In the protocol it requires fastq files listed as: *001_R1.fastq.gz
001 is the replicate number, it can be 002 or 003 or any number of replicates you have. ( for RNA-seq, sequence as many as biological samples as possible !)
R1 is the pair-end reads 1, it can be R2
What he has is something like:
1_egg_r1_01_sub.fastq.gz
1 is the stage of the egg. He sequenced 4 eggs, so he has 1_egg, 2_egg., 3_egg and 4_egg
r1 is the pair-end reads 1
01 is the first replicates. He has two replicates for each egg.
Basically, he wants to rename these files to the khmer convention.
This problem gets down to writing a regular expression.
To recapture the problem, I made some dummy files:
mkdir foo && cd foo
I have a txt file contains the names of the file:
foo$ cat files.txt
1egg_r1_01_sub.fastq.gz
1egg_r2_01_sub.fastq.gz
1egg_r1_02_sub.fastq.gz
1egg_r2_02_sub.fastq.gz
2egg_r1_01_sub.fastq.gz
2egg_r2_01_sub.fastq.gz
2egg_r1_02_sub.fastq.gz
2egg_r2_02_sub.fastq.gz
3egg_r1_01_sub.fastq.gz
3egg_r2_01_sub.fastq.gz
3egg_r1_02_sub.fastq.gz
3egg_r2_02_sub.fastq.gz
4egg_r1_01_sub.fastq.gz
4egg_r2_01_sub.fastq.gz
4egg_r1_02_sub.fastq.gz
4egg_r2_02_sub.fastq.gz
Now I want to make dummy files with the names in this file.
one can make the dummy files in a fly also.
=====update on 08/26/14======
one can use the {} expansion to create the dummy files
tommy@tommy-ThinkPad-T420[foo] touch {1,2,3,4}_r{1,2}_0{1,2}_sub.fastq.gz
tommy@tommy-ThinkPad-T420[foo] ls [ 3:45PM]
1_r1_01_sub.fastq.gz 2_r2_01_sub.fastq.gz 4_r1_01_sub.fastq.gz
1_r1_02_sub.fastq.gz 2_r2_02_sub.fastq.gz 4_r1_02_sub.fastq.gz
1_r2_01_sub.fastq.gz 3_r1_01_sub.fastq.gz 4_r2_01_sub.fastq.gz
1_r2_02_sub.fastq.gz 3_r1_02_sub.fastq.gz 4_r2_02_sub.fastq.gz
2_r1_01_sub.fastq.gz 3_r2_01_sub.fastq.gz
2_r1_02_sub.fastq.gz 3_r2_02_sub.fastq.gz
========================
=====update on 08/26/14======
one can use the {} expansion to create the dummy files
tommy@tommy-ThinkPad-T420[foo] touch {1,2,3,4}_r{1,2}_0{1,2}_sub.fastq.gz
tommy@tommy-ThinkPad-T420[foo] ls [ 3:45PM]
1_r1_01_sub.fastq.gz 2_r2_01_sub.fastq.gz 4_r1_01_sub.fastq.gz
1_r1_02_sub.fastq.gz 2_r2_02_sub.fastq.gz 4_r1_02_sub.fastq.gz
1_r2_01_sub.fastq.gz 3_r1_01_sub.fastq.gz 4_r2_01_sub.fastq.gz
1_r2_02_sub.fastq.gz 3_r1_02_sub.fastq.gz 4_r2_02_sub.fastq.gz
2_r1_01_sub.fastq.gz 3_r2_01_sub.fastq.gz
2_r1_02_sub.fastq.gz 3_r2_02_sub.fastq.gz
========================
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
!# /usr/bin/bash | |
while read name | |
do | |
echo "Name read from file - $name" | |
touch $name | |
done < $1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
while read name | |
do | |
echo "Name read from file - $name" | |
touch $name | |
done < files.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for i in 1 2 3 4 | |
do | |
for j in 1 2 | |
do | |
for k in 1 2 | |
do | |
touch $i\_egg\_r$j\_0$k\_sub.fastq.gz | |
done | |
done | |
done |
The difference of make_dummy_file.sh and make_dummy_file_1.sh is that I specified shebang line in the make_dummy_file.sh script to tell the bash that it is a bash script, to invoke it: ./make_dummy_file.sh files.txt
In contrast, to invoke the other two which I did not specify the shebang: bash make_dummy_file_1.sh bash make_dummy_file_2.sh
Rename the files with regular expression by either using sed or rename command
In contrast, to invoke the other two which I did not specify the shebang: bash make_dummy_file_1.sh bash make_dummy_file_2.sh
Rename the files with regular expression by either using sed or rename command
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for fspec1 in *.gz | |
do | |
#echo $fspec1 | |
fspec2=$(echo ${fspec1} | sed "s/\([1-4]egg\)_r\([1-2]\)_0\([1-2]\)_sub.fastq.gz/\1_R\3_00\2.fastq.gz/") | |
echo $fspec2 | |
mv ${fspec1} ${fspec2} | |
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rename "s/([1-4]egg)_r([1-2])_0([1-2])_sub.fastq.gz/\$1_R\$3_00\$2.fastq.gz/" *fastq.gz |
the sed command need to escape the () which are used to capture the back reference
before:
tommy@tommy-ThinkPad-T420:~/foo$ ls 1_egg_r1_01_sub.fastq.gz 2_egg_r1_01_sub.fastq.gz 3_egg_r1_01_sub.fastq.gz 4_egg_r1_01_sub.fastq.gz copy make_dummy_file_1.sh 1_egg_r1_02_sub.fastq.gz 2_egg_r1_02_sub.fastq.gz 3_egg_r1_02_sub.fastq.gz 4_egg_r1_02_sub.fastq.gz dummy make_dummy_file_2.sh 1_egg_r2_01_sub.fastq.gz 2_egg_r2_01_sub.fastq.gz 3_egg_r2_01_sub.fastq.gz 4_egg_r2_01_sub.fastq.gz files.txt rename.sh 1_egg_r2_02_sub.fastq.gz 2_egg_r2_02_sub.fastq.gz 3_egg_r2_02_sub.fastq.gz 4_egg_r2_02_sub.fastq.gz make_dummy_file.sh rename_one_liner.sh
after:
tommy@tommy-ThinkPad-T420:~/foo$ ls 1egg_R1_001.fastq.gz 2egg_R1_001.fastq.gz 3egg_R1_001.fastq.gz 4egg_R1_001.fastq.gz copy make_dummy_file_1.sh 1egg_R1_002.fastq.gz 2egg_R1_002.fastq.gz 3egg_R1_002.fastq.gz 4egg_R1_002.fastq.gz dummy make_dummy_file_2.sh 1egg_R2_001.fastq.gz 2egg_R2_001.fastq.gz 3egg_R2_001.fastq.gz 4egg_R2_001.fastq.gz files.txt rename.sh 1egg_R2_002.fastq.gz 2egg_R2_002.fastq.gz 3egg_R2_002.fastq.gz 4egg_R2_002.fastq.gz make_dummy_file.sh rename_one_liner.sh
References: http://stackoverflow.com/questions/399078/what-special-characters-must-be-escaped-in-regular-expressions
http://stackoverflow.com/questions/10929453/bash-scripting-read-file-line-by-line
https://www.cs.tut.fi/~jkorpela/perl/regexp.html