Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Tuesday, March 18, 2014

qseq to fastq conversion

I was looking at some old ChIP-seq data with raw file in qseq format. I want to convert them to fastq file for mapping with bowtie.

A quick google I found:

the qseqtofastq C++ program has to be compiled by scons, and I encountered some compiling problem. I gave it up and used this java program


zcat myfile.qseq.tgz | java -jar qseqtofastq.jar -phred64 > myfile.fastq

it took me around 1 hour to finish the conversion of a 600MB tgz file on the computing cluster (single cpu 4GB ram).

I did have a problem at the end:

Exception in thread "main" java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 8
        at ca.mcgill.mcb.pcingola.Qseq2Fastq.main(
Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
        at ca.mcgill.mcb.pcingola.Qseq2Fastq.main(

I counted the lines of the fastq file (divided by 4) and the original qseq file  , and they are equal.  So, I went ahead and mapped the fastq file with bowtie.

Wednesday, March 12, 2014

Several NGS bioinformatics training materials

1.  Next Generation sequencing wiki book

2. UCDavis training course

3. MSU training course by Titus Brown


5.  more

6. a list maintained by Stephen Turner in UVA

7. GVL practial protocols for ChIP-seq, RNA-seq, variant calling etc

These will keep me busy for a while...I feel I do not have enough time to learn :)

Monday, March 3, 2014

Unix sort except the first line

Many times I have a file with a header  needs  to be sorted, but I do not want to sort the header.
I saw it on Twitter:

 Retweeted by 
todays fav linux oneliner: command | (read -r; printf "%s\n" "$REPLY"; sort) > output sort everything except first line.

REPLY is the default variable for the read command.

one more general use of the "body" function

# print the header (the first line of input)
# and then run the specified command on the body (the rest of the input)
# use it in a pipeline, e.g. ps | body grep somepattern
body() {
    IFS= read -r header
    printf '%s\n' "$header"
IFS is the bash Shell Bourne variable: A list of characters that separate fields used by the shell to split text strings.

$#    Stores the number of command-line arguments that 
      were passed to the shell program.
$?    Stores the exit value of the last command that was 
$0    Stores the first word of the entered command (the 
      name of the shell program).
$*    Stores all the arguments that were entered on the
      command line ($1 $2 ...).
"$@"  Stores all the arguments that were entered
      on the command line, individually quoted ("$1" "$2" ...).

Or an awk solution awk 'NR == 1; NR > 1 {print $0 | "sort -n"}'
It is very handy to use.

tommy@tommy-ThinkPad-T420:~$ cat body_sort.txt 

tommy@tommy-ThinkPad-T420:~$ cat body_sort.txt |( read -r; printf "%s\n" "$REPLY"; sort -n)