MPI

最后发布时间 : 2022-10-20 13:44:38 浏览量 :

sudo apt install openmpi-bin libopenmpi-dev
http://www.xtaohub.com/

https://blog.csdn.net/qq_22370527/article/details/109129567

https://blog.csdn.net/weixin_40729260/article/details/125435633

https://blog.csdn.net/qq_22370527/article/details/109129567

https://blog.csdn.net/weixin_40729260/article/details/125435633

https://www.cnblogs.com/aobaxu/p/16195237.html
srun --mpi=list

https://bugs.schedmd.com/show_bug.cgi?id=7236

alex@polaris:~/slurm/19.05/install/lib$ ls -l
total 70992
-rw-r--r-- 1 alex alex 62102336 Jun 13 16:36 libslurm.a
-rwxr-xr-x 1 alex alex      987 Jun 13 16:36 libslurm.la
lrwxrwxrwx 1 alex alex       18 Jun 13 16:36 libslurm.so -> libslurm.so.34.0.0
lrwxrwxrwx 1 alex alex       18 Jun 13 16:36 libslurm.so.34 -> libslurm.so.34.0.0
-rwxr-xr-x 1 alex alex 10562200 Jun 13 16:36 libslurm.so.34.0.0
drwxr-xr-x 3 alex alex    20480 Jun 13 16:38 slurm
alex@polaris:~/slurm/19.05/install/lib$

(there's no libpmi nor libpmi2)

You can manually install libpmi or libpmi2 shipped with Slurm by going building the contribs/pmi and contribs/pmi2 respectively. Here's an example of installing libpmi2:

alex@polaris:~/slurm/19.05/build/contribs/pmi2$ make -j install

alex@polaris:~/slurm/19.05/install/lib$ ls -l
total 71688
-rw-r--r-- 1 alex alex   490536 Jun 13 18:17 libpmi2.a
-rwxr-xr-x 1 alex alex      961 Jun 13 18:17 libpmi2.la
lrwxrwxrwx 1 alex alex       16 Jun 13 18:17 libpmi2.so -> libpmi2.so.0.0.0
lrwxrwxrwx 1 alex alex       16 Jun 13 18:17 libpmi2.so.0 -> libpmi2.so.0.0.0
-rwxr-xr-x 1 alex alex   214400 Jun 13 18:17 libpmi2.so.0.0.0
-rw-r--r-- 1 alex alex 62102336 Jun 13 16:36 libslurm.a
-rwxr-xr-x 1 alex alex      987 Jun 13 16:36 libslurm.la
lrwxrwxrwx 1 alex alex       18 Jun 13 16:36 libslurm.so -> libslurm.so.34.0.0
lrwxrwxrwx 1 alex alex       18 Jun 13 16:36 libslurm.so.34 -> libslurm.so.34.0.0
-rwxr-xr-x 1 alex alex 10562200 Jun 13 16:36 libslurm.so.34.0.0
drwxr-xr-x 3 alex alex    20480 Jun 13 16:38 slurm
alex@polaris:~/slurm/19.05/install/lib$

[wangyang-PC:331801] OPAL ERROR: Unreachable in file ext3x_client.c at line 112
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[wangyang-PC:331801] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
srun: error: wangyang-PC: task 0: Exited with exit code 1

$ cat>hello.cpp<<EOF
#include "mpi.h"
#include <iostream>
int main(int argc,  char* argv[])
{
        int rank;
        int size;
        MPI_Init(0,0);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Comm_size(MPI_COMM_WORLD, &size);

        std::cout<<"Hello world from process "<<rank<<" of "<<size<<std::endl;

        MPI_Finalize();

        return 0;
}
EOF
$ mpicxx hello.cpp -o hello
$ srun -p test --mpi=pmi2 -n 4 ./hello
Hello world from process 1 of 4
Hello world from process 2 of 4
Hello world from process 0 of 4
Hello world from process 3 of 4

python mpi
https://blog.csdn.net/weixin_39594457/article/details/110780781

cat >env.sh <<EOF
#!/bin/bash
UCX=/usr/local/ucx
OPENMPI=/usr/local/openmpi
PMIX3=/usr/local/pmix3
LIBEVENT=/usr/local/libevent
HWLOC=/usr/local/hwloc
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
export PATH=$UCX/bin:$OPENMPI/bin:$PMIX3/bin:$LIBEVENT/bin:$HWLOC/bin:$PATH
export LD_LIBRARY_PATH=$UCX/lib:$OPENMPI/lib:$PMIX3/lib:$LIBEVENT/lib:$HWLOC/lib:$LD_LIBRARY_PATH
export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
EOF
source env.sh
cd openmpi-4.0.5 
cd examples
make hello_c
[root@mn0 examples]# mpirun -np 1 ./hello_c
Hello, world, I am 0 of 1, (Open MPI v4.0.5, package: Open MPI root@mn0 Distribution, ident: 4.0.5, repo rev: v4.0.5, Aug 26, 2020, 103)
from mpi4py import MPI

import sys

import time

size = MPI.COMM_WORLD.Get_size()

rank = MPI.COMM_WORLD.Get_rank()

name = MPI.Get_processor_name()

sys.stdout.write("Hello, World! I am process %d of %d on %s.\n" % (rank, size, name))

time.sleep(2)

mpirun python demo.py
srun -n 2 --mpi=pmix python demo.py
https://mpi4py.readthedocs.io/en/stable/index.html

  /data3/cluster/anaconda3/bin/mpicc: line 301: x86_64-conda_cos6-linux-gnu-cc: command not found
  failure.
conda install gxx_linux-64

https://github.com/RcppCore/Rcpp/issues/770

https://hcc.unl.edu/docs/submitting_jobs/