Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Tuesday, March 18, 2014

qseq to fastq conversion

I was looking at some old ChIP-seq data with raw file in qseq format. I want to convert them to fastq file for mapping with bowtie.

A quick google I found:
http://www.biostars.org/p/6682/

the qseqtofastq C++ program http://www.dna.bio.keio.ac.jp/~krisp/qseq2fastq/ has to be compiled by scons, and I encountered some compiling problem. I gave it up and used this java program http://sourceforge.net/projects/snpeff/files/qseq2fastq.jar/download

Usage:

zcat myfile.qseq.tgz | java -jar qseqtofastq.jar -phred64 > myfile.fastq

it took me around 1 hour to finish the conversion of a 600MB tgz file on the computing cluster (single cpu 4GB ram).

I did have a problem at the end:

Exception in thread "main" java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 8
        at ca.mcgill.mcb.pcingola.Qseq2Fastq.main(Qseq2Fastq.java:51)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
        at ca.mcgill.mcb.pcingola.Qseq2Fastq.main(Qseq2Fastq.java:43)


I counted the lines of the fastq file (divided by 4) and the original qseq file  , and they are equal.  So, I went ahead and mapped the fastq file with bowtie.



Wednesday, March 12, 2014

Several NGS bioinformatics training materials

1.  Next Generation sequencing wiki book http://en.wikibooks.org/wiki/Next_Generation_Sequencing_(NGS)

2. UCDavis training course https://training.bioinformatics.ucdavis.edu/documentation/

3. MSU training course by Titus Brown http://ged.msu.edu/angus/index.html

4. GOBLET http://www.mygoblet.org/training-portal

5.  more http://ged.msu.edu/angus/bioinformatics-courses.html

6. a list maintained by Stephen Turner in UVA http://stephenturner.us/edu.html

7. GVL practial protocols for ChIP-seq, RNA-seq, variant calling etc https://genome.edu.au/wiki/GVL

These will keep me busy for a while...I feel I do not have enough time to learn :)

Monday, March 3, 2014

Unix sort except the first line

Many times I have a file with a header  needs  to be sorted, but I do not want to sort the header.
I saw it on Twitter:

 Retweeted by 
todays fav linux oneliner: command | (read -r; printf "%s\n" "$REPLY"; sort) > output sort everything except first line.

REPLY is the default variable for the read command.

one more general use of the "body" function
http://stackoverflow.com/questions/9281449/unix-skip-header-bash-function
http://unix.stackexchange.com/questions/11856/sort-but-keep-header-line-in-the-at-the-top

# print the header (the first line of input)
# and then run the specified command on the body (the rest of the input)
# use it in a pipeline, e.g. ps | body grep somepattern
body() {
    IFS= read -r header
    printf '%s\n' "$header"
    "$@"
}
IFS is the bash Shell Bourne variable: A list of characters that separate fields used by the shell to split text strings.

$#    Stores the number of command-line arguments that 
      were passed to the shell program.
$?    Stores the exit value of the last command that was 
      executed.
$0    Stores the first word of the entered command (the 
      name of the shell program).
$*    Stores all the arguments that were entered on the
      command line ($1 $2 ...).
"$@"  Stores all the arguments that were entered
      on the command line, individually quoted ("$1" "$2" ...).

Or an awk solution awk 'NR == 1; NR > 1 {print $0 | "sort -n"}'
It is very handy to use.

Test:
tommy@tommy-ThinkPad-T420:~$ cat body_sort.txt 
header
4
6
9
7
14
8

tommy@tommy-ThinkPad-T420:~$ cat body_sort.txt |( read -r; printf "%s\n" "$REPLY"; sort -n)
header
4
6
7
8
9
14