What's New

Index

Major Updates Planned for Slurm Version 15.08

Slurm Version 15.08 was released in August 2015. Major enhancement include:

  • Added TRES (Trackable resources) to track utilization of memory, GRES, burst buffer, license, and any other configurable resources in the accounting database.
  • Add configurable billing weight that takes into consideration any TRES when calculating a job's resource utilization.
  • Add configurable prioritization factor that takes into consideration any TRES when calculating a job's resource utilization.
  • Add burst buffer support infrastructure. Currently available plugin include burst_buffer/generic (uses administrator supplied programs to manage file staging) and burst_buffer/cray (uses Cray APIs to manage buffers).
  • Add power capping support for Cray systems with automatic rebalancing of power allocation between nodes.
  • Add support for job dependencies joined with OR operator (e.g. "--depend=afterok:123?afternotok:124").
  • Add advance reservation flag of "replace" that causes allocated resources to be replaced with idle resources. This maintains a pool of available resources that maintains a constant size (to the extent possible).
  • Permit PreemptType=qos and PreemptMode=suspend,gang to be used together. A high-priority QOS job will now oversubscribe resources and gang schedule, but only if there are insufficient resources for the job to be started without preemption. NOTE: That with PreemptType=qos, the partition's Shared=FORCE:# configuration option will permit one job more per resource to be run than than specified, but only if started by preemption.
  • A partition can now have an associated QOS. This will allow a partition to have all the limits a QOS has. If a limit is set in both QOS the partition QOS will override the job's QOS unless the job's QOS has the 'OverPartQOS' flag set.
  • Expanded --cpu-freq parameters to include min-max:governor specifications. --cpu-freq now supported on salloc and sbatch.
  • Add support for optimized job allocations with respect to SGI Hypercube topology.
  • Optimize resource allocation for systems with dragonfly networks.
  • Add the ability for a compute node to be allocated to multiple jobs, but restricted to a single user. Added "--exclusive=user" option to salloc, the scontrol and sview commands. Added new partition configuration parameter "ExclusiveUser=yes|no".
  • Added plugin to record job completion information using Elasticsearch.
  • Add support for PMI Exascale (PMIx) for improved MPI scalability.
  • Add support for communication gateway nodes to improve scalability.
  • Add layouts framework, which will be the basis for further developments toward optimizing scheduling with respect to additional parameters such as temperature and power consumption.

Major Updates in Slurm Version 16.05 and beyond

Detailed plans for release dates and contents of additional Slurm releases have not been finalized. Anyone desiring to perform Slurm development should notify slurm-dev@schedmd.com to coordinate activities. Future development plans includes:

  • Add support asymmetric resource allocation and MPMD programming. Multiple resource allocation specification (memory, CPUs, GPUs, etc.) will be supported in a single job allocation.
  • Add support for Remote CUDA (rCUDA)
  • Distributed architecture to support the management of resources with Intel Knight's Landing processors.
  • Fault-tolerance and jobs dynamic adaptation through communication protocol between Slurm, MPI libraries and the application.
  • Improved support for provisioning and virtualization.

Last modified 22 October 2015