Split RAM asked between nodes and different partitions

Ask Question

Asked 22 days ago

Modified 22 days ago

Viewed 38 times

I'm using a Slurm-based HPC at my university to run memory-intensive software. I need to know if it's possible to distribute the required RAM across multiple nodes and partitions. My lab has exclusive access to one node in the 'uni' partition with 384 GB RAM. However, my current models require more memory, necessitating the use of an additional partition called 'work'.

I'm aiming to combine RAM from both the 'uni' and 'work' partitions. The 'uni' partition provides 384 GB, and I plan to use two nodes from the 'work' partition, each with approximately 255 GB of idle RAM, totaling around 800 GB.

I've attempted the following configurations:

Attempt 1 (using both partitions):

#SBATCH --partition=uni,work
#SBATCH --time=10:00:00
#SBATCH --nodes=3
#SBATCH --nodelist=n008,n010,n011
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=1
#SBATCH --mem=800G

Attempt 2 (using only 'work' partition):

#SBATCH --partition=work
#SBATCH --time=10:00:00
#SBATCH --nodes=4
#SBATCH --nodelist=n010,n011,n012,n013
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=1
#SBATCH --mem=800G

Attempt 3 (without specifying node list):

#SBATCH --partition=work
#SBATCH --time=10:00:00
#SBATCH --nodes=4
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=1
#SBATCH --mem=800G

All attempts resulted in the following error: "sbatch: error: Batch job submission failed: Requested node configuration is not available"

Can you advise on how to properly configure my job to utilize the required memory across multiple nodes or partitions? I think the --mem parameter is just for one node, right?

asked Jul 1 at 13:59

Zoranis

11 bronze badge

1

$\begingroup$ Without rewriting your program to use a distributed memory paradigm, such as MPI or co-array Fortran, it is almost certainly impossible. I'm assuming you can only run your program as a single process, is this correct? $\endgroup$
– Ian Bush
Commented Jul 1 at 18:00
$\begingroup$ you can use --mem=0 to request all available memory on the nodes you are requesting. This doesn't get past the restriction that a single process cannot access memory beyond what's on its own node. As Ian mentioned you need a multi-processes program where every node has at least one process, possibly more. $\endgroup$
– helloworld922
Commented Jul 2 at 0:05
$\begingroup$ The software, link, uses OpenMPI to run. I can set cpus (ntasks) beyond the maximum capacity of a single node, but when I tried with RAM it does not work. Occasionally, the following error also occurs: ORTE has lost communication with a remote daemon. HNP daemon : [[41960,0],0] on node n024 Remote daemon: [[41960,0],6] on node n033 This is usually due to either a failure of the TCP network connection to the node, or possibly an internal failure of the daemon itself. We cannot recover from this failure, and therefore will terminate... $\endgroup$
– Zoranis
Commented Jul 2 at 2:10
$\begingroup$ OK if the program already uses MPI it should in principle be able to do what you want. what is possible and how to do it depends upon exactly what your local SLURM setup is. Which we don't know. I suggest you are better off asking your local support who will know this. $\endgroup$
– Ian Bush
Commented Jul 2 at 6:30
$\begingroup$ Thank you @IanBush, I will contact the HPC team from the university to check this issue. $\endgroup$
– Zoranis
Commented Jul 2 at 7:20

Add a comment |

Stack Exchange Network

Split RAM asked between nodes and different partitions

0

Browse other questions tagged
hpc
or ask your own question.

Hot Network Questions

Split RAM asked between nodes and different partitions

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged hpc or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
hpc
or ask your own question.