snakemake进阶 - 扩展自定义参数进行流程控制

分支的实现

在进行snakemake分析时,所有的参数都是通过 -C 进行传递的,例如 -C S=”string” 则在smk中可以通过使用 config[‘S’] 调用相关参数完成参数传递。 -C 传递的值统一保存在字典 “config” 中。

1
2
3
4
5
6
7
-C \
# 指定一些分支的选择(例如688中是否进行某些数据库的分析),
Tequila=T \
# 指定输入文件
S=/ifstj2/B2C_RD_H1/Project/Halos_VersionUpdate_test/v1.4.4.6/4Pipe.path \
# 指定输出文件
O=/ifstj2/B2C_RD_H1/Project/Halos_VersionUpdate_test/v1.4.4.6

向流程传递参数

数据的二次加工

snakemake本身是基于python语法解析的,因此整个snakemake中是支持python语法进行数据处理的。借助这个特点,我们可以在snakemake中进行一些高级的操作来帮助我们更便捷的进行数据的处理。
比如针对输入文件,生成一些相对复杂的数据结构,实现一些复杂的功能或者业务逻辑

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
aln_bam = "{0}__{1}/{2}/3.BWA_Aln/{3}__{4}__{5}__{6}_sort.bam".format(product, sample, type, chip, lane, barcode, umi)
if not aln_bam in library_bams[(product, sample, type, library)]:
library_bams[(product, sample, type, library)].append(aln_bam)
if library_bamnum[(product, sample, type, library)] == "S" or library_bamnum[(product, sample, type, library)] == "M" :
library_bamnum[(product, sample, type, library)] = "M";
else:
library_bamnum[(product, sample, type, library)]="S";

######## Save Sample Bam ############
Markdup_bam = "{0}__{1}/{2}/4.MarkDup.lib/{3}.markdup.bam".format(product, sample, type, library)
if not Markdup_bam in sample_bams[(product, sample, type)]:
sample_bams[(product, sample, type)].append(Markdup_bam)
if sample_bamnum[(product, sample, type)] == "S" or sample_bamnum[(product, sample, type)] == "M" :
sample_bamnum[(product, sample, type)] = "M";
else:
sample_bamnum[(product, sample, type)] = "S";

补充日志

补充一个相对完善的流程说明帮助文档

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
try:
SampleFile = {config["S"]}.pop()
except:
Errlog = """\033[0;32m\nMiss The Config File for Sample Infomation\nPlease Add Sample Information By ' -C S=Sample_Infomation O=OutDir ')

Usage:

snakemake -s PanCancer.Pipeline.smk -p -C S=Sample_Infomation O=OutDir \\
Analysis="MSI,QC21,filt_variation,filt_sv,Drug_ct,final.CNV,CNV.detail,final.hot,CisMut,final_hereditary_tumor" ##: Analysis Some part of this pipeline \\
UnAnalysis="MSI,QC21,filt_variation,filt_sv,Drug_ct,final.CNV,CNV.detail,final.hot,CisMut,final_hereditary_tumor" ## : skip some part of this pipeline ; \\
Tequila=T (T|F Use Tequila Online Database or Not ) \\
--cluster "qsub -clear -cwd -P P19Z11900N0186 -q b2c_rd.q -l num_proc={threads} -l vf={resources.mem_mb}M -binding linear:{threads}" \\
--jobs 500 \\
--rerun-incomplete \\
--restart-times 5

Version:
v1.3.1(20200903@ Liubo4)

=====================Sample_Infomation format==================
#Product Sample Type chipInfo Lane Barcode UMI_ID Fq1 Fq2
PanCancer Test Tissue V300014810 L04 14 14 XXXXX.14_1.fq.gz XXXXX.14_2.fq.gz
PanCancer Test Normal V300014810 L04 15 15 XXXXX.15_1.fq.gz XXXXX.15_2.fq.gz
================================End============================
\033[0m\n"""
sys.exit(Errlog)

生成一系列的记录文档,记录分析的相关信息。

1
2
3
4
5
6
7
8
9
# 获取分析日期
Analysisdate =datetime.datetime.now().strftime('%Y-%m-%d')
# 基于git管理的流程,获取流程的版本
PipeVersion = subprocess.getoutput("cut -d ' ' -f2 "+bin_path+"/.git/logs/HEAD| tail -n1 ")

with open("log","w+") as LOG:
LOG.write("Start Date :"+Analysisdate+"\n\n")
LOG.write("Analysis Version :" + PipeVersion + "\n\n")
LOG.write("Analysis Dir :"+config['O']+"\n\n")

-------------本文结束感谢您的阅读-------------