在main.nf的相同目录下添加配置文件nextflow.config,执行nf run sample.nf -resume
main.nf
nextflow.config
nf run sample.nf -resume
executor { name = 'slurm' cpus = 36 memory = '150 GB' queueSize= 10 } process { cpus = 8 memory = '80 GB' clusterOptions = '-p low' queueSize= 5 }
查看slurm上的job
scontrol show job 578447
JobId=578447 JobName=nf-SAMPLE UserId=zyd(1001) GroupId=zyd(1001) MCS_label=N/A Priority=1 Nice=0 Account=test QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:13 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2023-03-16T18:24:33 EligibleTime=2023-03-16T18:24:33 AccrueTime=2023-03-16T18:24:33 StartTime=2023-03-16T18:24:33 EndTime=Unknown Deadline=N/A PreemptEligibleTime=2023-03-16T18:24:33 PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-03-16T18:24:33 Scheduler=Main Partition=low AllocNode:Sid=node1:6453 ReqNodeList=(null) ExcNodeList=(null) NodeList=node4 BatchHost=node4 NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=8 ReqB:S:C:T=0:0:*:* TRES=cpu=8,mem=80G,node=1,billing=8 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=8 MinMemoryNode=80G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=.command.run WorkDir=/data/metagenomics/pml_nextflow/work/a4/afdcae3a30f9777ff14a41e5b23254 StdErr=/data/metagenomics/pml_nextflow/work/a4/afdcae3a30f9777ff14a41e5b23254/.command.log StdIn=/dev/null StdOut=/data/metagenomics/pml_nextflow/work/a4/afdcae3a30f9777ff14a41e5b23254/.command.log Power=
可以看到slurm的实际使用资源是TRES=cpu=8,mem=80G,即是在process中所配置的资源,有关在slurm中配置资源见资源的管理
TRES=cpu=8,mem=80G
既然是在process中配置的资源生效,那么在executor中配置的资源有什么作用?如果将process的配置注释掉,那么结果是TRES=cpu=1,mem=1G,这是slurm的默认配置。
executor
process
TRES=cpu=1,mem=1G
查看netxflow的文档,我们可以发现,在process Directives中可以添加memory、cpu这些配置。
netxflow
process SAMPLE{ publishDir = [ path: { "test_data" }, mode: 'copy', saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] cpus = 8 memory = '80 GB' clusterOptions = '-p low' queueSize= 5 input: tuple val(pair_id),path(reads) output: path "${pair_id}/*.gz" script: // println reads[0] """ zcat ${reads[0]} | seqkit sample -n 1000 -o ${pair_id}/${pair_id}_1.fastq.gz zcat ${reads[1]} | seqkit sample -n 1000 -o ${pair_id}/${pair_id}_2.fastq.gz """ } workflow{ SAMPLE(["ZF230216-57", ["/xxx/ZF230216-57/ZF230216-57_1.fq.gz","/xxx/ZF230216-57/ZF230216-57_2.fq.gz"]]) }
同样的道理,我们可以为process添加label
process SAMPLE{ publishDir = [ path: { "test_data" }, mode: 'copy', saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] label 'process_low' input: tuple val(pair_id),path(reads) output: path "${pair_id}/*.gz" .... }
nextflow.config配置文件内容如下:
executor { name = 'slurm' cpus = 36 memory = '150 GB' queueSize= 10 } process { withLabel: 'process_low' { cpus = 8 memory = '80 GB' clusterOptions = '-p low' } }
scontrol show job 578450
JobId=578450 JobName=nf-SAMPLE UserId=zyd(1001) GroupId=zyd(1001) MCS_label=N/A Priority=1 Nice=0 Account=test QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:29 TimeLimit=365-00:00:00 TimeMin=N/A SubmitTime=2023-03-16T19:53:01 EligibleTime=2023-03-16T19:53:01 AccrueTime=2023-03-16T19:53:01 StartTime=2023-03-16T19:53:01 EndTime=2024-03-15T19:53:01 Deadline=N/A PreemptEligibleTime=2023-03-16T19:53:01 PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-03-16T19:53:01 Scheduler=Backfill Partition=low AllocNode:Sid=node1:103542 ReqNodeList=(null) ExcNodeList=(null) NodeList=node4 BatchHost=node4 NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=8 ReqB:S:C:T=0:0:*:* TRES=cpu=8,mem=80G,node=1,billing=8 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=8 MinMemoryNode=80G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=.command.run WorkDir=/data/metagenomics/pml_nextflow/work/d4/366249842debfb588868e690f85f12 StdErr=/data/metagenomics/pml_nextflow/work/d4/366249842debfb588868e690f85f12/.command.log StdIn=/dev/null StdOut=/data/metagenomics/pml_nextflow/work/d4/366249842debfb588868e690f85f12/.command.log Power=