WDL - 撰写语法

和其他编程语法一样,WDL也存在不同版本间的语法差异,因此在使用WDL进行流程撰写时,需要明确参考的版本规范
所以后文语法如无特殊标准,本文想过内容均基于 WDL v1.0 版本。

变量类型

基础变量(primitive types )

1
2
3
4
5
Int i = 0                  # An integer value
Float f = 27.3 # A floating point number
Boolean b = true # A boolean true/false
String s = "hello, world" # A string value
File f = "path/to/file" # A file

复合变量(compound types)

1
2
3
4
5
# In the examples below P represents any of the primitive types above, and X and Y represent any valid type (even nested compound types)
Array[X] xs = [x1, x2, x3] # An array of Xs
Map[P,Y] p_to_y = { p1: y1, p2: y2, p3: y3 } # A map from Ps to Ys
Pair[X,Y] x_and_y = (x, y) # A pair of one X and one Y
Object o = { "field1": f1, "field2": f2 } # Object keys are always `String`s

自定义结构(Struct Definition)

struct是一种类似c的构造,它允许用户创建由先前存在的类型组成的新的复合类型。然后,可以在Task或Workflow定义中使用struct作为声明来代替任何其他常规类型。在许多情况下,结构体替代了Object类型,并允许对其成员进行适当的类型设置。

1
struct SampleData{}

复合类型还可以在结构中使用,以便轻松地将它们封装在单个对象中。

Json Type 2 WDL Type

JSON Type WDL Type
object Map[String, ?]
array Array[?]
number Int or Float
string String
boolean Boolean
null ???

object转换成Map存疑,测试过程中,使用object的方式可以正常,但是使用map的遍历方式失败。

示例Demo存档

可以看到WDL本身提供了比较充足的变量类型,但是由于wdl本身对于文本/变量的处理函数非常匮乏,因此无法像snakemake一样,在读取数据后进行自定义的数据处理构造需要的数据结构。因此针对wdl应用到生信检测过程时,一些复杂的逻辑关系需要提前梳理形成特定的数据结构,本文记录一些学习开发阶段,接触的复杂数据结构和 WDL 解析方式,以备后用。

样本文库下机数据多层依赖关系

最终生效的输入文件格式(json),通过wdl解析成object + 数组,实现多层结构实现。

单样本解析

输入的文本文件.json

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"cancer":[{
"LibID":"1C-Lib-1",
"LibData":[["FastqA1","FastqA2"],["FastqB1","FastqB2"],["FastqC1","FastqC2"],["FastqD1","FastqD2"]]
},{
"LibID":"1C-Lib-2",
"LibData":[["L2FastqA1","L2FastqA2"],["L2FastqB1","L2FastqB2"],["L2FastqC1","L2FastqC2"]]
}],
"normal":[{
"LibID":"1N-Lib-1",
"LibData":[["NFastqA1","NFastqA2"],["NFastqB1","NFastqB2"],["NFastqC1","NFastqC2"]]
}]
}

解析脚本.wdl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
version 1.0

workflow wf_echo {
input{
File data
}

Object All_Sample = read_json(data)

scatter ( singleLin in All_Sample.cancer){
String lib = singleLin.LibID
Array[Array[String]] FastqList = singleLin.LibData

scatter (singleFastq in FastqList){
Array[String] Fastq = singleFastq
call echoa as cancer_fastq{
input:
Fastq=Fastq
}

}
call Singlelib as Cancer_lib{
input:
lib = lib,
Fastq = cancer_fastq.outputa

}
}

call Singlelib as Cancer_sample{
input:
lib = "All_cancer",
Fastq = Cancer_lib.Tag
}
}

task Singlelib {
input{
String lib
Array[String] Fastq
}
String out = lib + ".txt"
String sample = lib
command <<<
echo ~{sep="," Fastq} > ~{sample}.txt
>>>

output {
String outFile = out
String Tag = lib
}
}

task echoa {
input{
Array[String] Fastq
}
String Fastq1=Fastq[0]
String Fastq2=Fastq[1]
command <<<
echo ~{Fastq1} ~{Fastq2}
>>>
output{
String outputa=Fastq1+":"+Fastq2
}
}

执行示例脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

[2023-02-09 20:13:43,87] [info] BackgroundConfigAsyncJobExecutionActor [70365163wf_echo.cancer_fastq:0:1]: echo L2FastqA1 L2FastqA2
[2023-02-09 20:13:43,87] [info] BackgroundConfigAsyncJobExecutionActor [3956881dwf_echo.cancer_fastq:3:1]: echo FastqD1 FastqD2
[2023-02-09 20:13:43,87] [info] BackgroundConfigAsyncJobExecutionActor [70365163wf_echo.cancer_fastq:2:1]: echo L2FastqC1 L2FastqC2
[2023-02-09 20:13:43,87] [info] BackgroundConfigAsyncJobExecutionActor [3956881dwf_echo.cancer_fastq:2:1]: echo FastqC1 FastqC2
[2023-02-09 20:13:43,87] [info] BackgroundConfigAsyncJobExecutionActor [3956881dwf_echo.cancer_fastq:0:1]: echo FastqA1 FastqA2
[2023-02-09 20:13:43,87] [info] BackgroundConfigAsyncJobExecutionActor [70365163wf_echo.cancer_fastq:1:1]: echo L2FastqB1 L2FastqB2
[2023-02-09 20:13:43,87] [info] BackgroundConfigAsyncJobExecutionActor [3956881dwf_echo.cancer_fastq:1:1]: echo FastqB1 FastqB2

[2023-02-09 20:13:50,36] [info] 70365163-898e-417f-b7a7-7ce6866b4dae-SubWorkflowActor-SubWorkflow-ScatterAt45_12:1:1 [70365163]: Workflow ScatterAt45_12 complete. Final Outputs:
{
"cancer_fastq.outputa": ["L2FastqA1:L2FastqA2", "L2FastqB1:L2FastqB2", "L2FastqC1:L2FastqC2"],
"Fastq": [["L2FastqA1", "L2FastqA2"], ["L2FastqB1", "L2FastqB2"], ["L2FastqC1", "L2FastqC2"]]
}
[2023-02-09 20:13:50,36] [info] 3956881d-5958-469c-bd04-e4f41509fd51-SubWorkflowActor-SubWorkflow-ScatterAt45_12:0:1 [3956881d]: Workflow ScatterAt45_12 complete. Final Outputs:
{
"Fastq": [["FastqA1", "FastqA2"], ["FastqB1", "FastqB2"], ["FastqC1", "FastqC2"], ["FastqD1", "FastqD2"]],
"cancer_fastq.outputa": ["FastqA1:FastqA2", "FastqB1:FastqB2", "FastqC1:FastqC2", "FastqD1:FastqD2"]
}


[2023-02-09 20:14:03,75] [info] BackgroundConfigAsyncJobExecutionActor [00e87cbbwf_echo.Cancer_lib:1:1]: echo L2FastqA1:L2FastqA2,L2FastqB1:L2FastqB2,L2FastqC1:L2FastqC2 > 1C-Lib-2.txt
[2023-02-09 20:14:03,75] [info] BackgroundConfigAsyncJobExecutionActor [00e87cbbwf_echo.Cancer_lib:0:1]: echo FastqA1:FastqA2,FastqB1:FastqB2,FastqC1:FastqC2,FastqD1:FastqD2 > 1C-Lib-1.txt

[2023-02-09 20:14:13,74] [info] BackgroundConfigAsyncJobExecutionActor [00e87cbbwf_echo.Cancer_sample:NA:1]: echo 1C-Lib-1,1C-Lib-2 > All_cancer.txt

[2023-02-09 20:14:18,89] [info] WorkflowExecutionActor-00e87cbb-2f53-4a0e-ae74-4ff2d8eaf88c [00e87cbb]: Workflow wf_echo complete. Final Outputs:
{
"wf_echo.cancer_fastq.outputa": [["FastqA1:FastqA2", "FastqB1:FastqB2", "FastqC1:FastqC2", "FastqD1:FastqD2"], ["L2FastqA1:L2FastqA2", "L2FastqB1:L2FastqB2", "L2FastqC1:L2FastqC2"]],
"wf_echo.lib": ["1C-Lib-1", "1C-Lib-2"],
"wf_echo.Cancer_sample.Tag": "All_cancer",
"wf_echo.Fastq": [[["FastqA1", "FastqA2"], ["FastqB1", "FastqB2"], ["FastqC1", "FastqC2"], ["FastqD1", "FastqD2"]], [["L2FastqA1", "L2FastqA2"], ["L2FastqB1", "L2FastqB2"], ["L2FastqC1", "L2FastqC2"]]],
"wf_echo.Cancer_lib.Tag": ["1C-Lib-1", "1C-Lib-2"],
"wf_echo.Cancer_lib.outFile": ["1C-Lib-1.txt", "1C-Lib-2.txt"],
"wf_echo.Cancer_sample.outFile": "All_cancer.txt",
"wf_echo.FastqList": [[["FastqA1", "FastqA2"], ["FastqB1", "FastqB2"], ["FastqC1", "FastqC2"], ["FastqD1", "FastqD2"]], [["L2FastqA1", "L2FastqA2"], ["L2FastqB1", "L2FastqB2"], ["L2FastqC1", "L2FastqC2"]]]
}

批次多样本解析

wdl在解析json时,最外层总会强制解析成一个object(输入 list 也无法识别为Array进行scatter操作),所以顶层必须使用object。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{"All_Sample":[{
"sampleID":"SampleA",
"cancer":[{
"LibID":"1C-Lib-1",
"LibData":[["FastqA1","FastqA2"],["FastqB1","FastqB2"],["FastqC1","FastqC2"],["FastqD1","FastqD2"]]
},{
"LibID":"1C-Lib-2",
"LibData":[["L2FastqA1","L2FastqA2"],["L2FastqB1","L2FastqB2"],["L2FastqC1","L2FastqC2"]]
}],
"normal":[{
"LibID":"1N-Lib-1",
"LibData":[["NFastqA1","NFastqA2"],["NFastqB1","NFastqB2"],["NFastqC1","NFastqC2"]]
}]
},{
"sampleID":"SampleB",
"cancer":[{
"LibID":"1C-Lib-1",
"LibData":[["FastqA1","FastqA2"],["FastqB1","FastqB2"],["FastqC1","FastqC2"],["FastqD1","FastqD2"]]
},{
"LibID":"1C-Lib-2",
"LibData":[["L2FastqA1","L2FastqA2"],["L2FastqB1","L2FastqB2"],["L2FastqC1","L2FastqC2"]]
}],
"normal":[{
"LibID":"1N-Lib-1",
"LibData":[["NFastqA1","NFastqA2"],["NFastqB1","NFastqB2"],["NFastqC1","NFastqC2"]]
}]
}]
}

解析脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
version 1.0

workflow wf_echo {
input{
File data
}
Object All_Sample = read_json(data)
scatter (singlesample in All_Sample.All_Sample){
String sampleID = singlesample.sampleID
scatter (singleLin in singlesample.cancer){
String Type = "cancer"
String lib = singleLin.LibID
Array[Array[String]] FastqList = singleLin.LibData

scatter (singleFastq in FastqList){
Array[String] Fastq = singleFastq
call echoa as cancer_fastq{
input:
Fastq=Fastq
}

}
call Singlelib as Cancer_lib{
input:
sample = sampleID,
Type = Type,
lib = lib,
Fastq = cancer_fastq.outputa

}
}

call Singlelib as Cancer_sample{
input:
sample = sampleID,
Type = "cancer",
lib = "All_lib",
Fastq = Cancer_lib.Tag
}

}
call Singlelib as sample_level{
input:
sample = "All_sample",
Type = "All_type",
lib = "All_case",
Fastq = Cancer_sample.Tag
}
}

task Singlelib {
input{
String sample
String Type
String lib
Array[String] Fastq
}
String out = lib + ".txt"
command <<<
echo ~{sep="," Fastq} > ~{sample}.txt
>>>

output {
String outFile = out
String Tag = lib
}
}

task echoa {
input{
Array[String] Fastq
}
String Fastq1=Fastq[0]
String Fastq2=Fastq[1]
command <<<
echo ~{Fastq1} ~{Fastq2}
>>>
output{
String outputa=Fastq1+":"+Fastq2
}
}

-------------本文结束感谢您的阅读-------------