Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Thursday, June 30, 2016

How to deal with paired samples for snakemake

I am currently experimenting snakemake for pipelines. We will have paired samples when processing data. e.g. ChIP-seq (IP vs Input), DNA-seq (Normal vs Tumor). How to specify them in snakemake?
The trick is to use a python dictionary to pair samples and a function to define Input files.

I will show an toy example with ChIP-seq processing.

aDict = {"B":"inputG1", "A":"inputG1", "C":"inputG2"}
rule all:
input: ["C.bed", "A.bed", "B.bed"]
def get_files(wildcards):
case = wildcards.case
control = aDict[case]
return [case + ".sorted.bam", control + ".sorted.bam"]
rule call_peak:
input: get_files
output: "{case}.bed"
run:
case = input[0]
control = input[1]
shell("echo macs14 -t {case} -c {control} -n {wildcards.case}")
shell("touch {output}")
# read https://groups.google.com/forum/#!searchin/snakemake/dependencies/snakemake/iDnr3PIcsfE/x-qQvzWBBgAJ
# https://groups.google.com/forum/#!searchin/snakemake/dependencies/snakemake/1QelazgzilY/oBgZoP19BL4J
aDict = {"B":"inputG1", "A":"inputG1", "C":"inputG2"}
rule all:
input: ["C.bed", "A.bed", "B.bed"]
def get_files(wildcards):
case = wildcards.case
control = aDict[case]
return [case + ".sorted.bam", control + ".sorted.bam"]
rule call_peak:
input: get_files
output: "{case}.bed"
shell:
"""
echo macs14 -t {input[0]} -c {input[1]} -n {wildcards.case}
touch {output}
"""