WDL - 代码片段模板(VS Code)

WDL 代码片段按维度组织:每个 snippet 只提供一个主框架,不同应用场景以注释形式并列,按需取消注释即可。语法基于 WDL v1.0;默认组合为 runtime docker + command <<< >>>

配合 常用组件介绍内置函数demo示例 使用。

段末注释:WDL(Workflow Description Language)是 Broad Institute 推出的工作流描述语言。


1. 配置方式

全局 User Snippets(已配置),任意项目均可使用:

编辑器 路径
Cursor ~/Library/Application Support/Cursor/User/snippets/
VS Code ~/Library/Application Support/Code/User/snippets/
文件 前缀
wdl.code-snippets wdl-file / wdl-workflow / wdl-task / wdl-structure / wdl-tooling
snakemake.code-snippets Snakemake 片段(见 Snakemake 模板

WDL 语法高亮需安装 WDL 插件
修改片段:命令面板 → Snippets: Configure User Snippets → 选择 wdl.code-snippets

工具链:womtool validate / womtool inputs / cromwell run,详见 开发应用环境配置


2. 片段速查

前缀 维度 说明
wdl-file 完整脚本 文件头 + import + workflow
wdl-workflow 流程编排 文件头 + import task/subworkflow + call 链
wdl-task 计算任务 command + runtime;Docker 默认启用
wdl-structure 数据结构 struct / Array / Map / Object / Pair
wdl-tooling 工具命令 womtool 校验、inputs 生成、Cromwell 投递

3. 主框架预览

3.1 wdl-task — 计算任务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
task align {
input {
File fq1
File fq2
String sample_id
}

String out_bam = sample_id + ".bam"

command <<<
bwa mem ~{fq1} ~{fq2} | samtools sort -o ~{out_bam}
>>>
# command {
# bwa mem ${fq1} ${fq2} | samtools sort -o ${out_bam}
# }

output {
File bam = out_bam
}

runtime {
docker: "biocontainers/bwa:v0.7.17_cv1"
cpu: 8
memory: "16 GB"
disks: "local-disk 50 SSD"
}
# runtime {
# cpu: 8
# memory: "16 GB"
# }
}

3.2 wdl-structure — 数据结构

适用于 structs/*.wdl,默认展开三层 struct;复合类型、读写与 scatter 以注释提供。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
version 1.0

# --- 基础类型(primitive)---
# Int i = 0
# Float f = 27.3
# Boolean b = true
# String s = "hello"
# File f = "path/to/file"

# --- 自定义 struct ---
struct SampleInfo {
String sample_id
File fq1
File fq2
String? sample_type # 可选字段(?)
}

struct LibraryInfo {
String lib_id
Array[Array[String]] lib_data # 多层 Fastq 对
}

struct SampleBatch {
String sample_id
Array[LibraryInfo] cancer
Array[LibraryInfo] normal
}

# --- 复合类型(compound)---
# Array[String] samples = ["S1", "S2"]
# Map[String, Int] counts = {"a": 1, "b": 2}
# Object config = {"pipeline_root": "/path"}
# Pair[File, File] fq_pair = (fq1, fq2)
# Array[Pair[File, File]]+ sequence_data # 非空数组(+)

# --- 读取 ---
# SampleBatch batch = read_json(sample_json)
# Object software = read_map(config_file)

# --- scatter 遍历 ---
# scatter (lib in batch.cancer) { ... }
# scatter (pair in counts) { String key = pair.left; Int value = pair.right; }

详见 变量的数据结构

3.3 wdl-workflow — 流程编排

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
version 1.0
## language: WDL
## File : main.wdl
## Time : 2023/02/14 16:06:20
## Author : Liu.Bo
## Version : v1.0
## Contact : liubo4@genomics.cn/614347533@qq.com
## WebSite : http://www.ben-air.cn/
##
## Cromwell version support :
## - Successfully tested on v36
## - Does not work on versions < v23 due to output syntax
##
## WORKFLOW DEFINITIONS
# description

import "tasks/align.wdl" as align_lib
import "subworkflow/variant.wdl" as variant_lib

workflow my_pipeline {
input {
File fq1
File fq2
String sample_id
}

# --- 引入外部 task(tasks/*.wdl)---
call align_lib.align as align {
input:
fq1 = fq1,
fq2 = fq2,
sample_id = sample_id
}

# --- 引入外部 subworkflow(subworkflow/*.wdl)---
call variant_lib.variant_call as variant_call {
input:
bam = align.bam, # 引用上游 call 输出
sample_id = sample_id
}

output {
File result_vcf = variant_call.vcf
}
}

import 引用规则:

类型 语法 call 示例
外部 task import "tasks/align.wdl" as align_lib call align_lib.align as align
外部 subworkflow import "subworkflow/variant.wdl" as variant_lib call variant_lib.variant_call
引用输出 align.bam(call 别名 + output 变量名)

3.4 wdl-file — 完整脚本

wdl-workflow 相同文件头 + import,workflow 内 call 外部 task/subworkflow(不含内联 task 定义;task 放 tasks/*.wdl)。


4. snippets JSON

全局配置文件路径见 §1。完整 JSON 如下(备份 / 迁移用):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
{
"WDL File": {
"prefix": "wdl-file",
"body": [
"version 1.0",
"",
"# import \"subworkflow/${1:module}.wdl\" as ${2:sub}",
"",
"workflow ${3:main} {",
" input {",
" ${4|File,String|} ${5:input_param}",
" String sample_id",
" }",
"",
" call ${6:process} {",
" input:",
" ${5:input_param} = ${5:input_param},",
" sample_id = sample_id",
" }",
"",
" call ${9:downstream} {",
" input:",
" in_file = ${6:process}.${8:out}, # 引用上游 task 输出",
" sample_id = sample_id",
" }",
"",
" output {",
" File ${7:result} = ${9:downstream}.${10:final_out}",
" }",
"}",
"",
"task ${6:process} {",
" input {",
" ${4|File,String|} ${5:input_param}",
" String sample_id",
" }",
"",
" String out_file = sample_id + \".${9:txt}\"",
"",
" command <<<",
" ${10:echo done > ~{out_file}}",
" >>>",
"",
" output {",
" File ${8:out} = out_file",
" }",
"",
" runtime {",
" docker: \"${11:ubuntu:22.04}\"",
" cpu: ${12:2}",
" memory: \"${13:4 GB}\"",
" }",
"}"
],
"description": "完整 WDL:version + workflow + task"
},
"WDL Workflow": {
"prefix": "wdl-workflow",
"body": [
"workflow ${1:my_pipeline} {",
" input {",
" File ${2:fq1}",
" File ${3:fq2}",
" String ${4:sample_id}",
" # Boolean run_qc = true",
" }",
"",
" # --- 数据读取(按需启用一种)---",
" # Array[Array[String]] batch = read_tsv(sample_tsv)",
" # Object all_sample = read_json(sample_json)",
"",
" call ${5:align} {",
" input:",
" fq1 = ${2:fq1},",
" fq2 = ${3:fq2},",
" sample_id = ${4:sample_id}",
" }",
"",
" call ${6:variant_call} {",
" input:",
" bam = ${5:align}.bam, # 引用上游 task 输出",
" sample_id = ${4:sample_id}",
" }",
"",
" # call ${5:align} as ${5:align}_tumor { input: fq1 = ..., fq2 = ..., sample_id = \"tumor\" }",
" # call ${5:align} as ${5:align}_normal { input: fq1 = ..., fq2 = ..., sample_id = \"normal\" }",
" # call ${7:merge_bam} {",
" # input: bam1 = ${5:align}_tumor.bam, bam2 = ${5:align}_normal.bam # 引用别名 call 输出",
" # }",
"",
" # scatter (row in batch) {",
" # call ${5:align} { input: fq1 = row[1], fq2 = row[2], sample_id = row[0] }",
" # call ${6:variant_call} { input: bam = ${5:align}.bam, sample_id = row[0] }",
" # }",
"",
" # if (run_qc) {",
" # call ${8:qc} { input: bam = ${5:align}.bam }",
" # }",
" # File qc_report = select_first([${8:qc}.report, \"\"])",
"",
" # import \"subworkflow/qc.wdl\" as sub",
" # call sub.qc_workflow { input: bam = ${5:align}.bam }",
"",
" output {",
" File ${9:result} = ${6:variant_call}.vcf",
" }",
"}"
],
"description": "workflow 编排:call 链式引用 task 输出,scatter/if/别名注释"
},
"WDL Task": {
"prefix": "wdl-task",
"body": [
"task ${1:task_name} {",
" input {",
" File ${2:input_file}",
" String ${3:sample_id}",
" }",
"",
" String ${4:out_name} = ${3:sample_id} + \".${5:bam}\"",
"",
" command <<<",
" ${6:tool -i ~{input_file} -o ~{out_name}}",
" >>>",
" # command {",
" # ${6:tool -i ${input_file} -o ${out_name}}",
" # }",
"",
" output {",
" File ${7:out} = ${4:out_name}",
" }",
"",
" runtime {",
" docker: \"${8:biocontainers/tool:latest}\"",
" cpu: ${9:4}",
" memory: \"${10:8 GB}\"",
" disks: \"local-disk ${11:50} SSD\"",
" }",
" # runtime {",
" # cpu: ${9:4}",
" # memory: \"${10:8 GB}\"",
" # }",
"",
" # parameter_meta {",
" # ${2:input_file}: \"输入文件\"",
" # }",
" # meta { version: \"1.0.0\" }",
"}"
],
"description": "task:默认 docker + command<<<>>>,其他注释"
},
"WDL Structure": {
"prefix": "wdl-structure",
"body": [
"version 1.0",
"struct SampleInfo { ... }",
"struct LibraryInfo { ... }",
"struct SampleBatch { ... }",
"# Array / Map / Object / Pair / ? / + 见注释",
"# read_json / read_map / read_tsv / scatter 见注释"
],
"description": "数据结构:struct + 复合类型 + 读写 scatter 注释"
},
"WDL Tooling": {
"prefix": "wdl-tooling",
"body": [
"# 语法校验",
"java -jar womtool.jar validate ${1:main.wdl}",
"",
"# 生成 inputs 模板",
"java -jar womtool.jar inputs ${1:main.wdl} > ${2:main.input.json}",
"",
"# 流程图",
"java -jar womtool.jar graph ${1:main.wdl} > ${1:main}.dot",
"dot -Tsvg -o ${1:main}.svg ${1:main}.dot",
"",
"# Cromwell 本地运行",
"java -jar cromwell.jar run ${1:main.wdl} --inputs ${2:main.input.json}",
"",
"# Cromwell Server 提交",
"# java -jar cromwell.jar submit -t wdl -i ${2:main.input.json} -h http://localhost:8000",
"",
"# inputs JSON 键名格式: \"${3:workflow}.${4:param}\": \"value\""
],
"description": "womtool + Cromwell 常用命令"
}
}

5. 场景切换说明

维度 默认 注释备选
数据结构 struct 三层嵌套 primitive、Array、Map、Object、Pair、?+
command <<< >>>(v1.0) { } 花括号(draft-2 兼容)
runtime docker 仅 cpu / memory(本地或无容器后端)
workflow 编排 线性 call 链(align.bam 引用上游输出) as 别名、scatter/if + select_firstimport 子流程
数据输入 workflow input read_tsv / read_json(见 内置函数

嵌套 scatter 完整案例见 demo示例


6. 相关文档

-------------本文结束感谢您的阅读-------------