学习资料
SLURM(Simple Linux Utility for Resource Management)是一种开源的集群管理和作业调度系统,常用于高性能计算(HPC)环境中。SLURM 提供了对计算资源的管理和作业调度,以便有效地利用集群资源并执行作业。slurm的使用请看这里。
本文介绍如何在单个Docker容器中安装slurm,具体的Dockerfile及配置文件请参考这里
在该容器中需要启动服务包括slurmctld、slurmd、slurmdbd、slurmrestd、jupyterlab_slurm
cd master docker build -t wangyang1749/slurm-all:1.0 .
docker run --name slurm-all --privileged --network=host --rm -it wangyang1749/slurm-all:1.0
docker exec -it slurm-all bash
FROM ubuntu:22.04 RUN apt update -y && apt install munge -y && apt install vim -y && apt install build-essential -y && apt install git -y && apt-get install mariadb-server -y && apt install wget -y ARG DEBIAN_FRONTEND=noninteractive RUN apt install slurmd slurm-client slurmctld slurmdbd slurmrestd -y RUN apt-get update && apt-get install mariadb-server -y RUN apt-get install sudo -y # RUN apt install sudo -y && apt install python3.9 python3-pip -y RUN useradd -m admin -s /usr/bin/bash -d /home/admin && echo "admin:admin" | chpasswd && adduser admin sudo && echo "admin ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers RUN echo "root ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers # RUN apt update -y && apt install libopenmpi-dev -y && pip3 install mpi4py RUN wget https://deb.nodesource.com/setup_18.x RUN bash ./setup_18.x RUN apt-get install nodejs -y RUN apt-get install python3-pip -y RUN pip install jupyterlab RUN pip install jupyterlab_slurm COPY slurm.conf /etc/slurm/ COPY cgroup.conf /etc/slurm/ COPY slurmdbd.conf /etc/slurm/ COPY docker-entrypoint.sh /etc/slurm/ EXPOSE 6817 6818 6819 3306 RUN chmod 600 /etc/slurm/slurmdbd.conf WORKDIR /home/admin COPY initialize-mariadb.sh . COPY start-slurmrestd.sh . ENV USER admin ENTRYPOINT ["/etc/slurm/docker-entrypoint.sh"]
docker-entrypoint.sh
#!/bin/bash # sudo sed -i "s/REPLACE_IT/CPUs=$(nproc)/g" /etc/slurm-llnl/slurm.conf ################################################### # 授权服务 sudo service munge start ################################################### ################################################### # 执行节点 sudo service slurmd start ################################################### ################################################### # 数据库配置 sudo service mariadb start sudo mysql -u root < initialize-mariadb.sh sudo service slurmdbd start ################################################### ################################################### # 控制节点 sleep 2 echo 'starting slurmctld' sudo service slurmctld start ################################################### ################################################### # slurmrestd 配置,要运行slurmrestd必须加上参数--privileged # docker run --name slurm-all --privileged --network=host --rm -it wangyang1749/slurm-all:1.0 # echo 'starting slurmrestd' # sudo -u admin sh start-slurmrestd.sh & ################################################### sudo -u admin jupyter lab --no-browser --allow-root --ip=0.0.0.0 --NotebookApp.token='' --NotebookApp.password='' & echo "启动的服务..." service --status-all # slurmctld -D tail -f /dev/null # docker build -t wangyang1749/slurm-all:1.0 . # docker run --name slurm-all-master --network=host --rm wangyang1749/slurm-all-master:1.0 # docker image push wangyang1749/slurm-all-master:1.0
docker run --name slurm-all-master -p 8888:8888 --rm -it wangyang1749/slurm-all-master:1.0
访问http://localhost:8888
提交任务到slurm
#!/bin/bash sleep 200 hostname
提交任务