The GPU compute cluster currently offers several hybrid nodes with balanced cpu/ram and gpu ratio. Access maybe granted upon request by additional user permissions. The cluster utilizes the workload manager Slurm. You may start your jobs via the login node login.gpu.cit-ec.net. For Compute tasks it is mandatory to use the slurm scheduler.
The GPU compute cluster currently offers several hybrid nodes with balanced cpu/ram and gpu ratio. Access maybe granted upon request by additional user permissions. The cluster utilizes the workload manager Slurm. You may start your jobs via the login node *login.gpu.cit-ec.net*. For Compute tasks it is mandatory to use the slurm scheduler.
Although the cluster nodes running the TechFak netboot installation the locations /homes and /vol are not available. Therefore an exclusive homes and vol is located at /media/compute/ On initial login your compute home directory will be provisioned under /media/compute/homes/user. It is separated from your regular TechFak home location /homes/user. The compute home is accessible via files.techfak.de and a regular TechFak netboot system like compute.techfak.de.
Although the cluster nodes running the TechFak netboot installation, the locations `/homes` and `/vol` are not available. Therefore an exclusive homes and vol is located at `/media/compute/` On initial login your compute home directory will be provisioned under `/media/compute/homes/user`. It is separated from your regular TechFak home location `/homes/user`. The compute home is accessible via [files.techfak.de](https://www.techfak.net/dienste/remote/files) and a regular TechFak netboot system like *compute.techfak.de*.
## Support Channel
On the Universities Matrix Service join the Channel `#citec-gpu:uni-bielefeld.de`
## Slurm Basics
## Slurm Basics
Slurm jobs may be scheduled by slurm client tools. For a brief introduction checkout Slurm Quickstart
Slurm jobs may be scheduled by slurm client tools. For a brief introduction checkout [Slurm Quickstart](https://slurm.schedmd.com/quickstart.html)
There are some possibilities to schedule a task in a slurm controlled system.
There are some possibilities to schedule a task in a slurm controlled system.
The main paradigm for job scheduling in a slurm cluster is using sbatch. It schedules a Job and requests the user claimed resources.
The main paradigm for job scheduling in a slurm cluster is using `sbatch`. It schedules a Job and requests the user claimed resources.
The cluster provides two types of resources: CPUs and GPUs which can be requested for jobs in variable amounts.
The cluster provides two types of resources: CPUs and GPUs which can be requested for jobs in variable amounts.
The GPUs in the cluster come in two flavours: The GPU objects tesla and gtx.
The GPUs in the cluster come in two flavours: The GPU objects tesla and gtx.
You may request a single gpu object via the option --gres=gpu:1. The Slurm scheduler reserves one gpu object exclusive for your job and therefore schedules the jobs by free resources.
You may request a single gpu object via the option `--gres=gpu:1`. The Slurm scheduler reserves one gpu object exclusive for your job and therefore schedules the jobs by free resources.
CPUs are requested with -c or --cpus-per-task= options. For further information have a look at the man-pages of srun and sbatch. Reading the Slurm documentation is also highly recommended
CPUs are requested with `-c` or `--cpus-per-task= options`. For further information have a look at the man-pages of srun and sbatch. Reading the [Slurm documentation](https://slurm.schedmd.com/documentation.html) is also highly recommended
The commands sinfo and squeue provide detailed information about the clusters state and jobs running.
The commands sinfo and squeue provide detailed information about the clusters state and jobs running.
### CPU management and CPU-only jobs
### CPU management and CPU-only jobs
Though the facility is called GPU-cluster, it also appropriate for CPU-only computing as it not only provides 12 GPUs but also 240 CPU-cores. Effective utilization of the CPU-resources can be tricky so you should make yourself familiar with CPU-management.
Though the facility is called GPU-cluster, it also appropriate for CPU-only computing as it not only provides 12 GPUs but also 240 CPU-cores. Effective utilization of the CPU-resources can be tricky so you should make yourself familiar with [CPU-management](https://slurm.schedmd.com/cpu_management.html).
### Choosing the appropriate partition
### Choosing the appropriate partition
The cluster offers two partitions. Partitions can be considered as separate queues with slightly different features.
The cluster offers two partitions. Partitions can be considered as separate queues with slightly different features.
Partition selection is done with the parameter -p or --partition= in your srun-commands and sbatch-scripts. The default partition is 'cpu'. Jobs which aren't mapped on a partition will be started there.
Partition selection is done with the parameter `-p` or `--partition=` in your *srun-commands* and *sbatch-scripts*. The default partition is *cpu*. Jobs which aren't mapped on a partition will be started there.
We have the 'cpu' and 'gpu' partition. If you have a cpu-only job (not requesting any GPU-resources with --gres=gpu:n ), you should start it on the 'cpu' partition.
We have the *cpu* and *gpu* partition. If you have a cpu-only job (not requesting any GPU-resources with `--gres=gpu:n` ), you should start it on the *cpu* partition.
A job using GPU should be started on the 'gpu' partition, with one exception. Jobs which request one GPU (with --gres=gpu:1) and more than 2 CPUs (with the -c or --cpus-per-task option) should use the 'cpu' partition.
A job using GPU should be started on the *gpu* partition, with one exception. Jobs which request one GPU (with `--gres=gpu:1`) and more than 2 CPUs (with the `-c` or `--cpus-per-task` option) should use the *cpu* partition.
The reason for this policy is not obvious and will be explained under *GPU Blocking*
The reason for this policy is not obvious and will be explained under *GPU Blocking*
The example example.job.sbatch request one GTX 1080 Ti for the job and calls the payload example.job.sh via srun.
The example `example.job.sbatch` request one GTX 1080 Ti for the job and calls the payload example.job.sh via srun.