ARCC's SLURM HPC Scheduling Policy is subject to adjustments. These proposed changes are intended to improve and incentivize the following:
Login nodes are provided for authorized users to access ARCC HPC cluster resources. These nodes are intended for specific purposes only and should only be utilized for these specific use cases. Users may set up and submit jobs, access results from jobs, and transfer data to/from the cluster, using the login nodes.
As a courtesy to your colleagues, users should refrain from the following on login nodes:
anything compute-intensive (tasks that use significant computational/hardware resources - Example: utilizing 100% of cluster CPU)
long-running tasks (over 10 minutes)
any collection of a large number of tasks that results in a similar hardware use footprint to the actions listed previously.
Computationally intense work performed on login nodes WILL interfere with the ability
of others to use the node resources and interfere with community user’s ability to
log into HPC resources. To prevent this, users should submit compute intensive tasks
as jobs to the compute nodes, as that is what compute nodes are for. Submitting batch
jobs or requesting interactive jobs using salloc
to the scheduler should be performed from login nodes. If you have any questions or need clarification, please
contact arcc-help@uwyo.edu.
Short compilations of code are permissible. If users are doing a very parallel or
long compilations, a request should be made for an interactive job, and users should
then perform compilation on nodes allocated after requesting an interactive job using
an salloc
as a courtesy to your colleagues. If you’re not sure how to do this, see an example requesting an interactive job on our wiki.
If ARCC staff are alerted of computationally intense work or jobs being performed on a login node, they may will kill them without prior notification.
Do NOT run compute-intensive, long-running tasks, or large numbers of tasks on the login nodes.
Tasks violating these rules will be terminated immediately and the owner will be warned, and continue violation may result in the suspension of access to the cluster(s). Access will not be restored until the ARCC director receives a written request from the user’s PI.
HPC/HPS accounts are available for all University faculty, staff, and students for research
Principal Investigator (PI) may request an account and project on any ARCC resource
for research purposes.
In order to create an account on a resource, an associated research project must be
created on that resource.
Who qualifies as a PI?
Principal Investigators (PIs) are any University of Wyoming faculty member with an
extended-term position with UW, or Graduate Students in a limited capacity (see below).
Faculty member position designation does not extend to Adjunct nor to Emeritus Faculty.
See more about PI Responsibilities above (Under General Policies: 1 -> User Responsibilities:
B -> PI Responsibilities: iii)
Graduate Students/Researchers serving as Project PIs
Graduate students are able to serve as PIs on Projects in a limited academic research
capacity with the following conditions:
All Graduate level project requests are subject to ARCC leadership approval.
Under Graduate PI designation a Graduate Student or Graduate Researcher is granted access to an approved project with 1/2 storage associated with normal projects, and limited computational hours.
All accounts with access to ARCC resources must be approved by a University of Wyoming Principal Investigator (PI).
PIs give permission to a user to use the HPC/HPS resource allocations as a member
of their project(s).
PIs may sponsor multiple users.
Users can be sponsored by multiple PIs. Removing a user from a project does not necessarily mean the user is removed from the resource, if they retain membership to another PI's project on that resource.
HPC/HPS resource allocations are granted to PIs as projects. Users utilize resources from their sponsor's allocated project on that resource.
The following conditions apply to all account types. Additional details on how the different account types of work can be found elsewhere on this page.
All HPC accounts are for academic purposes only.
Commercial activities are prohibited.
Password sharing and all other forms of account sharing are prohibited.
Account-holders found to be circumventing the system policies and procedures will have their accounts locked or removed.
All HPC accounts can be requested through the ARCC Account Request Form. Note that all requests, for creating projects and making changes to a project, must
be made by the project PI as specified above, and only the listed project PI may request
a user be granted access to their project**
For questions about HPC account procedures not addressed below, please e-mail us at arcc-help@uwyo.edu
PI Accounts
PI accounts are for individual PIs only. These accounts are for research only and
are not to be shared with anyone else. PI accounts are subject to periodic review
and can be deleted if the PI's University affiliation changes or they fail to comply
with UW and ARCC account policies.
Project Member Accounts
PIs may request accounts for any number of project members, but these accounts must
be used for research only. UW PIs are responsible for all of their project members.
These accounts are subject to periodic review and will be deleted if the PI faculty
member or the account holders change their University affiliation or fail to comply
with UW and ARCC account policies.
External Collaborator Accounts (To be replaced by ARCC-Only Accounts) **
Prior to the implementation of ARCC-Only accounts, PIs went through UW Information
Technology to request UWYO federated accounts for non-UW users collaborating on their
research projects. These accounts granted the external collaborator access to the
PI's project on ARCC resources.
ARCC Only Accounts **
These are accounts are not associated with UWYO federated login nor UWYO enterprise
technology resources. They are created by ARCC and explicitly grant access to designated
ARCC resources. Like all other project member accounts, they must be requested by
the project PI and PIs are responsible for the account users access and actions on
the resource.
Instructional Accounts
PIs may sponsor HPC accounts and projects for instructional purposes on the ARCC systems
by submitting a request through the ARCC Account Request. Instructional requests are subject to denial only when the proposed use is inappropriate
for the systems and/or when the instructional course would require resources that
exceed available capacity on the systems or substantially interfere with research
computations. HPC accounts for instructional purposes will be added by the Sponsor
into a separate group created with the 'class group' designation. Class group membership
is to be sponsored for one semester and the Sponsor will remove the group at the end
of the semester. Class/Instructional group jobs should only be submitted to the 'class'
queue, which will be equivalent in priority to the 'windfall' queue, and only available
on the appropriate nodes of the ARCC systems.
System Accounts
System accounts are for staff members who have a permanent relationship with UW and
are responsible for system administration.
**ARCC-Only and external-collaborator accounts do not have access to Globus by default,
but may request access, subject to PI and ARCC approval.
Account Creations
ARCC HPC/HPS accounts will be created to match existing UWYO accounts whenever possible.
PIs may request accounts for existing projects/allocations or courses.
Account Renewal
If an account or project is specified with an end date, and needs to be extended beyond
that date, a renewal of the account (and if necessary, member project) is required.
Account Transfer
A PI who is leaving the project or the University can request that their project be
transferred to a new PI. Any non-PI accounts can be transferred from one PI's zone
of control to another as necessary as students move working from one researcher to
another. Account transfer requests will also be made by contacting the Help Desk (766-4357).
Account Terminations
The VP of Research, the University and UW CIO, and University Provost comprise the
University of Wyoming's Research Computing Executive Steering Committee (UW-ESC).
The UW-ECS will govern the termination of Research Computing accounts, following other
University policies as needed. Non-PI accounts may be terminated at the request of
the UW-ESC. Any users found in violation of this Research Computing Allocation Policy
or any other UWyo Policies may have access to their accounts suspended for review by the Director of Research
Support, IT, and the UW-ESC.
** - unless otherwise designated by the PI with documentation (i.e., in cases of a designated project manager), and acknowledged by ARCC.
This policy is subject to proposed revisions and approval. Please see this page to review a detailed description of proposed changes.
a. All job submissions require the specification of an account. Based on wall time and/or quality of service, a job will be placed in one of several queues. If no partition, QoS, or wall-time is specified, a job, by default, will be placed in the Normal queue with a 3 day wall-time.
b. Job submission syntax will remain largely the same, except that users should supply a Quality of Service (QoS) to put their work into a prioritization queue. If a user does not supply a QoS or wall time in their job submission, they will, by default be placed into the Normal queue. Queue information is detailed below under Queues/QoS Types. Users are incentivized to specify shorter wall times so their jobs are given priority to run, and thereby placed in a queue with a shorter queue time frame when the cluster is under high utilization.
c. Job submissions will have a partition set for them if not supplied. This will work in a specific order of 'oldest' hardware to 'newest'. This will ensure the newer hardware is available for jobs that require it, and less intense jobs are sent to older hardware.
a. Our department has found that PIs are more frequently using ARCC systems in the classroom or for instructional purposes. If resources are required during a specific time window, we strongly encourage PIs to request reservations when using part of the cluster in an instructional setting.
b. Reservations must be requested 21 days prior to the start of the reservation period. This advance notice is necessary to account for the scheduling of personnel needed to configure reservations, and the 14-day maximum job wall time.
c. This is critically important for classes running interactive sessions. Reservations must be requested and configured to guarantee timely access.
a. No single SLURM defined account or HPC project may occupy more that 33% of the cluster + their investment allocation. This will be set per SLURM Account (AKA project, not per user).
b. Users can no longer specify all memory on a node using the --mem=0
specification in their submission. Users must explicitly specify how much memory
they need. This will reduce the likelihood that users request a disproportionally
large segment of available HPC memory and thereby reduce likelihood that such portions
would be assigned by the SLURM scheduler to any given job. Alternatively, --exclusive
may be used but will request all resources on the node. This will ensure accurate
utilization in reporting.
c. Users cannot use a GPU partition without requesting a GPU. Users will be required to request a GPU (and potentially be billed for that use) if they use a GPU node. This does not apply to investments with GPUs.
ARCC has created the following queues and Quality of Service (QoS) as a function within
SLURM allowing for configuration of preemption, priority and resource quotas for different
purposes. Each is detailed below:
Queue/QoS Name | Priority | Maximum Specified Wall Time | Limitations | Purpose | |
Debug | n/a | 1 hour | Limited to a small hardware partition, limited in job size | For debugging job submissions, code issues, etc. | |
Interactive | 1 | 8 hours | Limited to interactive jobs | Hosting interactive jobs and imposing a short wall time to ensure fair use. | |
Fast | 2 | 12 hours | None outside of the general account/project limitations** | For any normal jobs that will not take an extended amount of time. Higher priority means the user's job will run quickly and the shorter wall time means the scheduler may prioritize more efficiently. | |
Normal (Default) | 3 | 3 days | None outside of the general account/project limitations** | This is the default queue. The shorter wall time is still liberal, but will allow the scheduler to manage resources effectively. | |
Long | 4 | 7 days | Limited to 20% of overall cluster + Investment** | To allow for jobs requiring a longer wall time. This flexibility benefits users with longer-running workloads while not overwhelming the entire cluster. | |
Extended | 5 | 14 days | Limited to investors Limited to 15% of cluster total Limited with respect to number of jobs in queue per project** |
To encourage investments to ARCC providing investors more flexibility. |
** Interactive jobs may not be run in this queue
This queue consists of a small subset of HPC hardware with rigid limitations. Priority is not applicable since this queue falls outside of normal Slurm scheduling and standard scheduling policies. This is a specialty partition. Submissions will be subject to partial node allocations, and debug jobs may not request the entire node.
The Interactive queue is used for all interactive jobs, and such jobs are given the
highest priority when jobs are submitted. All interactive jobs are subject to a
maximum 8 hour wall time.
This includes:
1. Interactive desktops (Including On-Demand XFCE Desktop Sessions)
2. On-Demand applications (Jupyter, XFCE Desktop Sessions, Matlab, Paraview, and all
sessions launched through web the based OnDemand Platform, https://medicinebow.arcc.uwyo.edu)
3. Jobs requested via an salloc
command.
The Fast queue is given highest priority for typical job specifications. It should have a shorter wall time of 12 hours to encourage rapid clearing of open devices for use, and will allow the Slurm Job Scheduler to effectively and more quickly manage any job backlogs.
This queue is the default queue for any job submitted without a specified wall time or QoS. It will have standardized priority, and should be subject to larger wait time than the fast queue to run a job.
The Long queue will allow for 7-day jobs to account for jobs that require this substantial duration, but will be limited in scope encouraging shorter wall times and higher overall HPC resource availability. This is subject to the 20% of overall cluster + Investment limit detailed in the above table.
To encourage investment, ARCC investors and their project members will be able to use a 14-day wall-time on any node. This allows them to run longer jobs, but also allows ARCC to implement maintenance windows as required. Priority level is 5 (outside of investments when discretionary). Investors will be able to use a longer wall time on ANY node however can preempt jobs only on nodes they have invested. Limit to number of resources outside of investment is 15% HPC resources + Investment.
Users may e-mail arcc-help@uwyo.edu to request an extension for long-running jobs. Approval will not be guaranteed, is discretionary, and will be dependent upon current HPC usage, and justification.
The following table describes HPC service quotas applicable only to UWYO internal users and quota applies on top of any Investment Allocations.
HPC Compute |
||
MedicineBow CPU Hours |
Compute hours on MedicineBow CPU Node |
up to 100,000 total non-investment CPU core hrs/year |
MedicineBow A30 GPU Hours |
Compute hours on MedicineBow A30 GPU Node |
up to 20,000 total non-investment GPU hrs/year (GPU hrs are calculated as combined sum of all GPU hours over 1 year, independent of GPU hardware) |
MedicineBow L40S GPU Hours |
Compute hours on MedicineBow L40S GPU Node |
|
MedicineBow H100 GPU hours |
Compute hours on MedicineBow H100 GPU Node |
|
Data Storage |
||
MedicineBow HPC storage |
HPC/Cluster storage. *All /gscratch storage is subject to a 90 day purge
|
** Graduate projects subject to 50% default data storage quotas. *** Wildiris Projects subject to 1TB /gscratch and 2.5TB /project quota |