学习资料
https://github.com/nextflow-io/nextflow/blob/ea7d9f3a42641b7dcc8c169a8a075eb43d60f831/plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchTaskHandler.groovy#L409-L418https://github.com/nextflow-io/nextflow/blob/a5a6a1fc8f76a9a7ebbe98cc94b5356f2a23bdec/modules/nextflow/src/main/groovy/nextflow/k8s/K8sTaskHandler.groovy#L347-L360
k8s.debug.yaml = truehttps://github.com/nextflow-io/nextflow/issues/4530
apiVersion: batch/v1 kind: Job metadata: &id001 name: nf-38f9d12733091e2f0be300f7f3127c01 namespace: default labels: {nextflow.io/processName: METAPHLAN, nextflow.io/runName: compassionate_cuvier, nextflow.io/sessionId: uuid-42a39443-208d-4d3e-9d86-e73dfc0ae68b, nextflow.io/app: nextflow, nextflow.io/taskName: METAPHLAN_name_m5_dataKey_KY_2312220836530126_species_human} spec: backoffLimit: 0 template: metadata: *id001 spec: restartPolicy: Never containers: - name: nf-38f9d12733091e2f0be300f7f3127c01 image: 192.168.3.60:5001/metaphlan:4.0.2 args: [/bin/bash, -ue, /data/k8sData/42a39443-208d-4d3e-9d86-e73dfc0ae68b/38/f9d12733091e2f0be300f7f3127c01/.command.run] resources: requests: {cpu: 30, memory: 102400Mi} limits: {memory: 102400Mi} volumeMounts: - {name: vol-1, mountPath: /data} serviceAccountName: default volumes: - name: vol-1 persistentVolumeClaim: {claimName: nfdata} ~
The k8s executor allows you to run a pipeline on a Kubernetes cluster.
The workflow execution needs to be submitted from a computer able to connect to the Kubernetes cluster.Nextflow uses the Kubernetes configuration file available at the path $HOME/.kube/config or the file specified by the environment variable KUBECONFIG.
$HOME/.kube/config
KUBECONFIG
$HOME/.kube/
kubectl cluster-info
Persistent Volume
ReadWriteMany
Persistent Volume Claim
创建文件nextflow.yml,写入以下内容
nextflow.yml
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer --- apiVersion: v1 kind: PersistentVolume metadata: name: nextflowdata spec: capacity: storage: 2Gi volumeMode: Filesystem # Filesystem(文件系统) Block(块) accessModes: - ReadWriteMany # 卷可以被一个节点以读写方式挂载 persistentVolumeReclaimPolicy: Delete storageClassName: local-storage local: path: /ssd1/wy/workspace/nf-hello/workDir nodeAffinity: required: # 通过 hostname 限定在某个节点创建存储卷 nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - node1 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: nextflowdata spec: accessModes: ["ReadWriteMany"] storageClassName: "local-storage" resources: requests: storage: 2Gi
注意accessModes的值必须是ReadWriteMany
accessModes
运行命令
kubectl apply -f nextflow.yml
可以使用下面命令查看创建的sc、pv、pvc
sc
pv
pvc
kubectl get sc kubectl get pv kubectl get pvc
创建测试的main.nf
main.nf
#!/usr/bin/env nextflow nextflow.enable.dsl=2 process sayHello { container "ubuntu:rolling" input: val x output: stdout script: """ echo '$x world!' """ } println params.name1 println params.name2 workflow { Channel.of('Bonjour', 'Ciao', 'Hello', 'Hola') | sayHello | view }
创建配置文件nextflow.config
nextflow.config
process.container = 'quay.io/nextflow/bash' params { name1='value1' name2='value2' } process { executor = 'k8s' } k8s { storageClaimName = 'nextflowData' computeResourceType = 'Job' storageMountPath = '/data/wangyang/nf-hello/workDir' // 查看cotext: kubectl config get-contexts context = 'kubernetes-admin@kubernetes' }
注意storageMountPath的值必须与文件nextflow.yml中path的值是相同的,否则将会报以下错误
storageMountPath
path
[9d/83fa37] process > sayHello (2) [100%] 4 of 4, failed: 4 Error executing process > 'sayHello (3)' Caused by: Process `sayHello (3)` terminated for an unknown reason -- Likely it has been terminated by the external system Command executed: echo 'Hello world!' Command exit status: - Command output: (empty) Command wrapper: /bin/bash: .command.run: No such file or directory Work dir: /data/wangyang/nf-hello/workDir/3b/8ce589cd8a8a34006b56153378bf7a Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
这是因为当nextflow的executor设置为k8s时,nextflow会将每一个jobs都创建为一个Kubernetes的pod。此时storageMountPath的作用就相当于使用docker的-v /data/wangyang/nf-hello/workDir:/data/wangyang/nf-hello/workDir将宿主机的目录/data/wangyang/nf-hello/workDir挂载到容器中。
executor
k8s
Kubernetes
pod
-v /data/wangyang/nf-hello/workDir:/data/wangyang/nf-hello/workDir
/data/wangyang/nf-hello/workDir
稍后可以看到我们在运行nextflow时必须将其工作目录即-w指定为Persistent Volume中创建的目录(这个目录在整个据集群的节点上是共享的),因此nextflow运行中生成的中间文件就必定创建在目录/data/wangyang/nf-hello/workDir/3b/8ce589cd8a8a34006b56153378bf7a,因此我们容器内的目录也必须设为/data/wangyang/nf-hello/workDir才能正常访问。
-w
/data/wangyang/nf-hello/workDir/3b/8ce589cd8a8a34006b56153378bf7a
这里将宿主机的所有必要目录挂载到容器
docker run --rm \ --user $(id -u):$(id -g) \ -v /etc/passwd:/etc/passwd \ -v /etc/group:/etc/group \ -v $HOME:$HOME \ -v /data:/data \ -v /ssd2:/ssd2 -it \ wybioinfo/nextflow bash
运行nextflow脚本
cd /ssd1/wy/Down # wy有权限的目录 nextflow run /ssd1/wy/workspace/nf-hello/main.nf -w /data/wangyang/nf-hello/workDir
docker run --rm -v $PWD:$PWD -it wybioinfo/nextflow bash
docker run --rm \ --user $(id -u):$(id -g) \ -v /etc/passwd:/etc/passwd \ -v /etc/group:/etc/group \ -v $HOME:$HOME \ -v /data:/data \ -v /ssd2:/ssd2 -it \ -w /ssd1/wy/Down \ wybioinfo/nextflow \ nextflow run /ssd1/wy/workspace/nf-hello/main.nf -w /data/wangyang/nf-hello/workDir
注意此时可以使用docker的-d命令将该进程提交到后台,然后使用以下命名监控
-d
docker ps docker stop xxx docker logs xxx
kubectl apply -f nextflow-k8s.yml kubectl get pod kubectl describe pod nextflow sudo crictl images
nf run /ssd1/wy/workspace/nf-hello/main.nf -w /data/wangyang/nf-hello/workDir
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer --- apiVersion: v1 kind: PersistentVolume metadata: name: nextflowdata spec: capacity: storage: 2Gi volumeMode: Filesystem accessModes: - ReadWriteMany nfs: server: 192.168.3.60 path: /data --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: nextflowdata spec: accessModes: ["ReadWriteMany"] storageClassName: "" resources: requests: storage: 2Gi