nextflow运行在Kubernetes
学习资料
- https://www.nextflow.io/docs/latest/kubernetes.html
- https://nextflow.io/blog/2023/the-state-of-kubernetes-in-nextflow.html
https://github.com/nextflow-io/nextflow/blob/ea7d9f3a42641b7dcc8c169a8a075eb43d60f831/plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchTaskHandler.groovy#L409-L418
https://github.com/nextflow-io/nextflow/blob/a5a6a1fc8f76a9a7ebbe98cc94b5356f2a23bdec/modules/nextflow/src/main/groovy/nextflow/k8s/K8sTaskHandler.groovy#L347-L360
k8s.debug.yaml = true
https://github.com/nextflow-io/nextflow/issues/4530
apiVersion: batch/v1
kind: Job
metadata: &id001
name: nf-38f9d12733091e2f0be300f7f3127c01
namespace: default
labels: {nextflow.io/processName: METAPHLAN, nextflow.io/runName: compassionate_cuvier,
nextflow.io/sessionId: uuid-42a39443-208d-4d3e-9d86-e73dfc0ae68b, nextflow.io/app: nextflow,
nextflow.io/taskName: METAPHLAN_name_m5_dataKey_KY_2312220836530126_species_human}
spec:
backoffLimit: 0
template:
metadata: *id001
spec:
restartPolicy: Never
containers:
- name: nf-38f9d12733091e2f0be300f7f3127c01
image: 192.168.3.60:5001/metaphlan:4.0.2
args: [/bin/bash, -ue, /data/k8sData/42a39443-208d-4d3e-9d86-e73dfc0ae68b/38/f9d12733091e2f0be300f7f3127c01/.command.run]
resources:
requests: {cpu: 30, memory: 102400Mi}
limits: {memory: 102400Mi}
volumeMounts:
- {name: vol-1, mountPath: /data}
serviceAccountName: default
volumes:
- name: vol-1
persistentVolumeClaim: {claimName: nfdata}
~
The k8s executor allows you to run a pipeline on a Kubernetes cluster.
使用docker 作为 nextflow driver
The workflow execution needs to be submitted from a computer able to connect to the Kubernetes cluster.
Nextflow uses the Kubernetes configuration file available at the path $HOME/.kube/config
or the file specified by the environment variable KUBECONFIG
.
前提条件
- 需要将目录
$HOME/.kube/
挂载到docker容器中,进入容器可使用命令kubectl cluster-info
验证 - 至少一个
Persistent Volume
,并且访问模式为ReadWriteMany
,并且可以被Persistent Volume Claim
接受
Kubernetes配置
创建文件nextflow.yml
,写入以下内容
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: nextflowdata
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem # Filesystem(文件系统) Block(块)
accessModes:
- ReadWriteMany # 卷可以被一个节点以读写方式挂载
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /ssd1/wy/workspace/nf-hello/workDir
nodeAffinity:
required:
# 通过 hostname 限定在某个节点创建存储卷
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nextflowdata
spec:
accessModes: ["ReadWriteMany"]
storageClassName: "local-storage"
resources:
requests:
storage: 2Gi
注意
accessModes
的值必须是ReadWriteMany
运行命令
kubectl apply -f nextflow.yml
可以使用下面命令查看创建的sc
、pv
、pvc
kubectl get sc
kubectl get pv
kubectl get pvc
nextflow配置
创建测试的main.nf
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process sayHello {
container "ubuntu:rolling"
input:
val x
output:
stdout
script:
"""
echo '$x world!'
"""
}
println params.name1
println params.name2
workflow {
Channel.of('Bonjour', 'Ciao', 'Hello', 'Hola') | sayHello | view
}
创建配置文件nextflow.config
process.container = 'quay.io/nextflow/bash'
params {
name1='value1'
name2='value2'
}
process {
executor = 'k8s'
}
k8s {
storageClaimName = 'nextflowData'
computeResourceType = 'Job'
storageMountPath = '/data/wangyang/nf-hello/workDir'
// 查看cotext: kubectl config get-contexts
context = 'kubernetes-admin@kubernetes'
}
注意
storageMountPath
的值必须与文件nextflow.yml
中path
的值是相同的,否则将会报以下错误
[9d/83fa37] process > sayHello (2) [100%] 4 of 4, failed: 4
Error executing process > 'sayHello (3)'
Caused by:
Process `sayHello (3)` terminated for an unknown reason -- Likely it has been terminated by the external system
Command executed:
echo 'Hello world!'
Command exit status:
-
Command output:
(empty)
Command wrapper:
/bin/bash: .command.run: No such file or directory
Work dir:
/data/wangyang/nf-hello/workDir/3b/8ce589cd8a8a34006b56153378bf7a
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
这是因为当nextflow的executor
设置为k8s
时,nextflow会将每一个jobs都创建为一个Kubernetes
的pod
。此时storageMountPath
的作用就相当于使用docker的-v /data/wangyang/nf-hello/workDir:/data/wangyang/nf-hello/workDir
将宿主机的目录/data/wangyang/nf-hello/workDir
挂载到容器中。
稍后可以看到我们在运行nextflow时必须将其工作目录即-w
指定为Persistent Volume
中创建的目录(这个目录在整个据集群的节点上是共享的),因此nextflow运行中生成的中间文件就必定创建在目录/data/wangyang/nf-hello/workDir/3b/8ce589cd8a8a34006b56153378bf7a
,因此我们容器内的目录也必须设为/data/wangyang/nf-hello/workDir
才能正常访问。
运行docker容器
这里将宿主机的所有必要目录挂载到容器
交互式启动容器
docker run --rm \
--user $(id -u):$(id -g) \
-v /etc/passwd:/etc/passwd \
-v /etc/group:/etc/group \
-v $HOME:$HOME \
-v /data:/data \
-v /ssd2:/ssd2 -it \
wybioinfo/nextflow bash
运行nextflow脚本
cd /ssd1/wy/Down # wy有权限的目录
nextflow run /ssd1/wy/workspace/nf-hello/main.nf -w /data/wangyang/nf-hello/workDir
直接运行容器
docker run --rm -v $PWD:$PWD -it wybioinfo/nextflow bash
docker run --rm \
--user $(id -u):$(id -g) \
-v /etc/passwd:/etc/passwd \
-v /etc/group:/etc/group \
-v $HOME:$HOME \
-v /data:/data \
-v /ssd2:/ssd2 -it \
-w /ssd1/wy/Down \
wybioinfo/nextflow \
nextflow run /ssd1/wy/workspace/nf-hello/main.nf -w /data/wangyang/nf-hello/workDir
注意此时可以使用docker的
-d
命令将该进程提交到后台,然后使用以下命名监控
docker ps
docker stop xxx
docker logs xxx
使用 Kubernetes作为 nextflow driver
调试过程
kubectl apply -f nextflow-k8s.yml
kubectl get pod
kubectl describe pod nextflow
sudo crictl images
使用宿主机作为 nextflow driver
nf run /ssd1/wy/workspace/nf-hello/main.nf -w /data/wangyang/nf-hello/workDir
三者比较
- 使用docker 作为 nextflow driver:可以使用docker管理nextflow的进程
- 使用 Kubernetes作为 nextflow driver:可以使用Kubernetes管理nextflow的进程
- 使用宿主机作为 nextflow driver:方便本地开发调试,需要自己去管理nextflow的进程
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: nextflowdata
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
nfs:
server: 192.168.3.60
path: /data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nextflowdata
spec:
accessModes: ["ReadWriteMany"]
storageClassName: ""
resources:
requests:
storage: 2Gi