The GPU compute cluster currently offers several hybrid nodes with balanced cpu/ram and gpu ratio. Access maybe granted upon request by additional user permissions. The cluster utilizes the workload manager Slurm. You may start your jobs via the login node login.gpu.cit-ec.net. For Compute tasks it is mandatory to use the slurm scheduler.
The GPU compute cluster currently offers several hybrid nodes with balanced cpu/ram and gpu ratio. Access maybe granted upon request by additional user permissions. The cluster utilizes the workload manager Slurm. You may start your jobs via the login node *login.gpu.cit-ec.net*. For Compute tasks it is mandatory to use the slurm scheduler.
Although the cluster nodes running the TechFak netboot installation the locations /homes and /vol are not available. Therefore an exclusive homes and vol is located at /media/compute/ On initial login your compute home directory will be provisioned under /media/compute/homes/user. It is separated from your regular TechFak home location /homes/user. The compute home is accessible via files.techfak.de and a regular TechFak netboot system like compute.techfak.de.
Although the cluster nodes running the TechFak netboot installation, the locations `/homes` and `/vol` are not available. Therefore an exclusive homes and vol is located at `/media/compute/` On initial login your compute home directory will be provisioned under `/media/compute/homes/user`. It is separated from your regular TechFak home location `/homes/user`. The compute home is accessible via [files.techfak.de](https://www.techfak.net/dienste/remote/files) and a regular TechFak netboot system like *compute.techfak.de*.
## Support Channel
On the Universities Matrix Service join the Channel `#citec-gpu:uni-bielefeld.de`
## Slurm Basics
Slurm jobs may be scheduled by slurm client tools. For a brief introduction checkout Slurm Quickstart
Slurm jobs may be scheduled by slurm client tools. For a brief introduction checkout [Slurm Quickstart](https://slurm.schedmd.com/quickstart.html)
There are some possibilities to schedule a task in a slurm controlled system.
The main paradigm for job scheduling in a slurm cluster is using sbatch. It schedules a Job and requests the user claimed resources.
The main paradigm for job scheduling in a slurm cluster is using `sbatch`. It schedules a Job and requests the user claimed resources.
The cluster provides two types of resources: CPUs and GPUs which can be requested for jobs in variable amounts.
The GPUs in the cluster come in two flavours: The GPU objects tesla and gtx.
You may request a single gpu object via the option --gres=gpu:1. The Slurm scheduler reserves one gpu object exclusive for your job and therefore schedules the jobs by free resources.
You may request a single gpu object via the option `--gres=gpu:1`. The Slurm scheduler reserves one gpu object exclusive for your job and therefore schedules the jobs by free resources.
CPUs are requested with -c or --cpus-per-task= options. For further information have a look at the man-pages of srun and sbatch. Reading the Slurm documentation is also highly recommended
CPUs are requested with `-c` or `--cpus-per-task= options`. For further information have a look at the man-pages of srun and sbatch. Reading the [Slurm documentation](https://slurm.schedmd.com/documentation.html) is also highly recommended
The commands sinfo and squeue provide detailed information about the clusters state and jobs running.
### CPU management and CPU-only jobs
Though the facility is called GPU-cluster, it also appropriate for CPU-only computing as it not only provides 12 GPUs but also 240 CPU-cores. Effective utilization of the CPU-resources can be tricky so you should make yourself familiar with CPU-management.
Though the facility is called GPU-cluster, it also appropriate for CPU-only computing as it not only provides 12 GPUs but also 240 CPU-cores. Effective utilization of the CPU-resources can be tricky so you should make yourself familiar with [CPU-management](https://slurm.schedmd.com/cpu_management.html).
### Choosing the appropriate partition
The cluster offers two partitions. Partitions can be considered as separate queues with slightly different features.
Partition selection is done with the parameter -p or --partition= in your srun-commands and sbatch-scripts. The default partition is 'cpu'. Jobs which aren't mapped on a partition will be started there.
Partition selection is done with the parameter `-p` or `--partition=` in your *srun-commands* and *sbatch-scripts*. The default partition is *cpu*. Jobs which aren't mapped on a partition will be started there.
We have the 'cpu' and 'gpu' partition. If you have a cpu-only job (not requesting any GPU-resources with --gres=gpu:n ), you should start it on the 'cpu' partition.
We have the *cpu* and *gpu* partition. If you have a cpu-only job (not requesting any GPU-resources with `--gres=gpu:n` ), you should start it on the *cpu* partition.
A job using GPU should be started on the 'gpu' partition, with one exception. Jobs which request one GPU (with --gres=gpu:1) and more than 2 CPUs (with the -c or --cpus-per-task option) should use the 'cpu' partition.
A job using GPU should be started on the *gpu* partition, with one exception. Jobs which request one GPU (with `--gres=gpu:1`) and more than 2 CPUs (with the `-c` or `--cpus-per-task` option) should use the *cpu* partition.
The reason for this policy is not obvious and will be explained under *GPU Blocking*
The example example.job.sbatch request one GTX 1080 Ti for the job and calls the payload example.job.sh via srun.
The example `example.job.sbatch` request one GTX 1080 Ti for the job and calls the payload example.job.sh via srun.