A. HPC Login Node Use
Login nodes are provided for authorized users to access ARCC HPC cluster resources. These nodes are intended for specific purposes only and should only be utilized for these specific use cases. Users may set up and submit jobs, access results from jobs, and transfer data to/from the cluster, using the login nodes.
ii. Unauthorized Login Node Activities
a. Prohibited cluster computations
As a courtesy to your colleagues, users should refrain from the following on login
nodes:
1. Anything compute-intensive (tasks that use significant computational/hardware resources - Example: utilizing 100% of cluster CPU)
2. Long-running tasks (over 10 minutes)
3. Any collection of a large number of tasks that results in a similar hardware use footprint to the actions listed previously.
Computationally intense work performed on login nodes WILL interfere with the ability of others to use the node resources and interfere with community user’s ability to log into HPC resources. To prevent this, users should submit compute intensive tasks as jobs to the compute nodes, as that is what compute nodes are for. Submitting batch jobs or requesting interactive jobs using salloc to the scheduler should be performed from login nodes. If you have any questions or need clarification, please contact arcc-help@uwyo.edu.
b. Compiling code
Short compilations of code are permissible. If users are doing a very parallel or
long compilations, a request should be made for an interactive job, and users should
then perform compilation on nodes allocated after requesting an interactive job using
an salloc as a courtesy to your colleagues. If you’re not sure how to do this, see
an example requesting an interactive job on our wiki.
c. Work preventing authorized use by other cluster users
If ARCC staff are alerted of computationally intense work or jobs being performed
on a login node or other work that prevents authorized cluster users from using shared
resources, they may will kill them without prior notification.
Do NOT run compute-intensive, long-running tasks, or large numbers of tasks on the
login nodes.
Inbound VPN connections are disallowed without ARCC approval
iii. Violations & Repercussions
Computational work violating these rules will be terminated immediately and the owner will be warned, and continue violation may result in the suspension of access to the cluster(s). Access will not be restored until the ARCC director receives a written request from the user’s PI.Tasks violating these rules will be terminated immediately and the owner will be warned, and continue violation may result in the suspension of access to the cluster(s). Access will not be restored until the ARCC director receives a written request from the user’s PI.
HPC/HPS accounts are available for all University faculty, staff, and students for research
Principal Investigator (PI) may request an account and project on any ARCC resource
for research purposes.
In order to create an account on a resource, an associated research project must be
created on that resource.
a. Who qualifies a as a PI?
Principal Investigators (PIs) are any University of Wyoming faculty member with an
extended-term position with UW, Research Focused Staff (i.e. Research Scientists)
or Graduate Students in a limited capacity (see below).
Faculty member position designation does not extend to Adjunct nor to Emeritus Faculty.
See more about PI Responsibilities above (Under General Policies: 1 -> User Responsibilities:
B -> PI Responsibilities: iii)
b. Graduate Students/Researchers serving as Project PIs
Graduate students are able to serve as PIs on Projects in a limited academic research
capacity with the following conditions:
1. All Graduate level project requests are subject to ARCC leadership approval.
2. Under the Graduate PI designation, a Graduate Student or Graduate Researcher is granted access to an approved project with 1/2 storage associated with normal projects, and limited computational hours.
II. Account as a member of a HPC Resource Project
All accounts with access to ARCC resources must be approved by a University of Wyoming Principal Investigator (PI).
PIs give permission to a user to use the HPC/HPS resource allocations as a member
of their project(s).
a. PIs may sponsor multiple users
b. Users may be sponsored by multiple PIs and be members of multiple projects.
Users utilize resources from their sponsors allocated project on that resource.
c. HPC/HPS resource allocations are granted to PIs as projects.
Users then utilize resources from their sponsored allocated project on that resource.
III. Account General terms of Use
The following conditions apply to all account types. Additional details on how the different account types of work can be found elsewhere on this page.
a. All HPC accounts are for academic or research purposes only.
b. Commectial activities are prohibited
c. Password sharing and other forms of account sharing are prohibited
d. Account-holders circumventing the policies and procedures will have accounts locked or removed.
All HPC accounts can be requested through the ARCC Account Request Form. Note that all requests, for creating projects and making changes to a project, must
be made by the project PI as specified above, and only the listed project PI may request
a user be granted access to their project**
For questions about HPC account procedures not addressed below, please e-mail us at arcc-help@uwyo.edu
PI accounts are for individual PIs only. These accounts are for research only and are not to be shared with anyone else. PI accounts are subject to periodic review and can be deleted if the PI's University affiliation changes or they fail to comply with UW and ARCC account policies. PI accounts will have elevated permissions on their projects.
PIs may request accounts for any number of project members, but these accounts must
be used for research only. UW PIs are responsible for all of their project members.
These accounts are subject to periodic review and will be deleted if the PI faculty
member or the account holders change their University affiliation or fail to comply
with UW and ARCC account policies.
c. External Collaborators (To be replaced by ARCC-only Accounts) **
Prior to the implementation of ARCC-Only accounts, PIs went through UW Information Technology to request UWYO federated accounts for non-UW users collaborating on their research projects. These accounts granted the external collaborator access to the PI's project on ARCC resources.
These are accounts are not associated with UWYO federated login nor UWYO enterprise
technology resources. They are created by ARCC and explicitly grant access to designated
ARCC resources. Like all other project member accounts, they must be requested by
the project PI and PIs are responsible for the account users access and actions on
the resource.
e. Instructional Accounts
PIs may sponsor HPC accounts and projects for instructional purposes on the ARCC systems by submitting a request through the ARCC Account Request. Instructional requests are subject to denial only when the proposed use is inappropriate for the systems and/or when the instructional course would require resources that exceed available capacity on the systems or substantially interfere with research computations. HPC accounts for instructional purposes will be added by the Sponsor into a separate group created with the 'class group' designation. Class group membership is to be sponsored for one semester and the Sponsor will remove the group at the end of the semester. Class/Instructional group jobs should only be submitted to the 'class' queue, which will be equivalent in priority to the 'windfall' queue, and only available on the appropriate nodes of the ARCC systems.
System accounts are for staff members who have a permanent relationship with UW and are responsible for system administration.
**ARCC-Only and external-collaborator accounts do not have access to Globus by default, but may request access, subject to PI and ARCC approval.
a. Account CreationsARCC HPC/HPS accounts will be created to match existing UWYO accounts whenever possible.
PIs may request accounts for existing projects/allocations or courses.
If an account or project is specified with an end date, and needs to be extended beyond that date, a renewal of the account (and if necessary, member project) is required
A PI who is leaving the project or the University can request that their project be transferred to a new PI. Any non-PI accounts can be transferred from one PI's zone of control to another as necessary as students move working from one researcher to another. Account transfer requests will also be made by contacting the Help Desk (766-4357).
The VP of Research, the University and UW CIO, and University Provost comprise the University of Wyoming's Research Computing Executive Steering Committee (UW-ESC). The UW-ECS will govern the termination of Research Computing accounts, following other University policies as needed. Non-PI accounts may be terminated at the request of the UW-ESC. Any users found in violation of this Research Computing Allocation Policy or any other UWyo Policies may have access to their accounts suspended for review by the Director of Research Support, IT, and the UW-ESC.
** - unless otherwise designated by the PI with documentation (i.e., in cases of a designated project manager) and acknowledged by ARCC.
C. Job Scheduling Policy (Revised)
This policy is subject to proposed revisions and approval. Please see this page to review a detailed description of proposed changes.
All job submissions require the specification of an account. Based on wall time and/or quality of service, a job will be placed in one of several queues. If no partition, QoS, or wall-time is specified, a job, by default, will be placed in the Normal queue with a 3 day wall-time.
b. Quality of Service
Job submission syntax will remain largely the same, except that users should supply
a Quality of Service (QoS) to put their work into a prioritization queue. If a user
does not supply a QoS or wall time in their job submission, they will, by default
be placed into the Normal queue. Queue information is detailed below under Queues/QoS
Types. Users are incentivized to specify shorter wall times so their jobs are given
priority to run, and thereby placed in a queue with a shorter queue time frame when
the cluster is under high utilization.
c. Partitions
Job submissions will have a partition set for them if not supplied. This will work
in a specific order of 'oldest' hardware to 'newest'. This will ensure the newer
hardware is available for jobs that require it, and less intense jobs are sent to
older hardware.
Our department has found that PIs are more frequently using ARCC systems in the classroom or for instructional purposes. If resources are required during a specific time window, we strongly encourage PIs to request reservations when using part of the cluster in an instructional setting.
Reservations must be requested 21 days prior to the start of the reservation period. This advance notice is necessary to account for the scheduling of personnel needed to configure reservations, and the 14-day maximum job wall time. This is critically important for classes running interactive sessions. Reservations must be requested and configured to guarantee timely access.
a. Cluster account and project limits
No single SLURM defined account or HPC project may occupy more that 33% of the cluster + their investment allocation. This will be set per SLURM Account (AKA project, not per user).
Users can no longer specify all memory on a node using the--mem=0 specification in their submission. Users must explicitly specify how much memory
they need. This will reduce the likelihood that users request a disproportionally
large segment of available HPC memory and thereby reduce likelihood that such portions
would be assigned by the SLURM scheduler to any given job. Alternatively, --exclusive may be used but will request all resources on the node. This will ensure accurate
utilization in reporting.
c. GPU and Partition Directives
Users cannot use a GPU partition without requesting a GPU. Users will be required to request a GPU (and potentially be billed for that use) if they use a GPU node. This does not apply to investments with GPUs.
** This set of queues applices only to the Medicinebow Cluster. ARCC has created the following queues and Quality of Service (QoS) as a function within SLURM allowing for configuration of preemption, priority and resource quotas for different purposes. Each is detailed below:
| Queue/QoS Name | Priority | Maximum Specified Wall Time | Limitations | Purpose | |
| Debug | n/a | 1 hour | Limited to a small hardware partition, limited in job size | For debugging job submissions, code issues, etc. | |
| Interactive | 1 | 8 hours | Limited to interactive jobs | Hosting interactive jobs and imposing a short wall time to ensure fair use. | |
| Fast | 2 | 12 hours | None outside of the general account/project limitations** | For any normal jobs that will not take an extended amount of time. Higher priority means the user's job will run quickly and the shorter wall time means the scheduler may prioritize more efficiently. | |
| Normal (Default) | 3 | 3 days | None outside of the general account/project limitations** | This is the default queue. The shorter wall time is still liberal, but will allow the scheduler to manage resources effectively. | |
| Long | 4 | 7 days | Limited to 20% of overall cluster + Investment** | To allow for jobs requiring a longer wall time. This flexibility benefits users with longer-running workloads while not overwhelming the entire cluster. | |
| Extended | 5 | 14 days | Limited to investors Limited to 15% of cluster total Limited with respect to number of jobs in queue per project** |
To encourage investments to ARCC providing investors more flexibility. | |
** Interactive jobs may not be run in this queue
This queue consists of a small subset of HPC hardware with rigid limitations. Priority is not applicable since this queue falls outside of normal Slurm scheduling and standard scheduling policies. This is a specialty partition. Submissions will be subject to partial node allocations, and debug jobs may not request the entire node.
The Interactive queue is used for all interactive jobs, and such jobs are given the
highest priority when jobs are submitted. All interactive jobs are subject to a
maximum 8 hour wall time.
This includes:
1. Interactive desktops (Including On-Demand XFCE Desktop Sessions)
2. On-Demand applications (Jupyter, XFCE Desktop Sessions, Matlab, Paraview, and all
sessions launched through web the based OnDemand Platform, https://medicinebow.arcc.uwyo.edu)
3. Jobs requested via an salloc command.
The Fast queue is given highest priority for typical job specifications. It should have a shorter wall time of 12 hours to encourage rapid clearing of open devices for use, and will allow the Slurm Job Scheduler to effectively and more quickly manage any job backlogs.
This queue is the default queue for any job submitted without a specified wall time or QoS. It will have standardized priority, and should be subject to larger wait time than the fast queue to run a job.
The Long queue will allow for 7-day jobs to account for jobs that require this substantial duration, but will be limited in scope encouraging shorter wall times and higher overall HPC resource availability. This is subject to the 20% of overall cluster + Investment limit detailed in the above table.
f. Extended (Investor-Only/Discretionary) **
To encourage investment, ARCC investors and their project members will be able to use a 14-day wall-time on any node. This allows them to run longer jobs, but also allows ARCC to implement maintenance windows as required. Priority level is 5 (outside of investments when discretionary). Investors will be able to use a longer wall time on ANY node however can preempt jobs only on nodes they have invested. Limit to number of resources outside of investment is 15% HPC resources + Investment.
Users may e-mail arcc-help@uwyo.edu to request an extension for long-running jobs. Approval will not be guaranteed, is discretionary, and will be dependent upon current HPC usage, and justification.
The table describes HPC service quotas applicable only to UWYO internal users and quota applies on top of any Investment Allocations.
|
HPC Compute |
||
|
MedicineBow CPU Hours |
Compute hours on MedicineBow CPU Node |
up to 100,000 total non-investment CPU core hrs/year |
|
MedicineBow A30 GPU Hours |
Compute hours on MedicineBow A30 GPU Node |
up to 20,000 total non-investment GPU hrs/year (GPU hrs are calculated as combined sum of all GPU hours over 1 year, independent of GPU hardware) |
|
MedicineBow L40S GPU Hours |
Compute hours on MedicineBow L40S GPU Node |
|
|
MedicineBow H100 GPU hours |
Compute hours on MedicineBow H100 GPU Node |
|
|
Data Storage |
||
|
MedicineBow HPC storage |
HPC/Cluster storage. *All /gscratch storage is subject to be purged after 90 days of inactivity. Data will only be purged if deemed necessary by ARCC and warnings will be sent if the purge policy needs to be enacted prior to any data being deleted.
|
** Graduate projects subject to 50% default data storage quotas. *** Wildiris Projects subject to 1TB /gscratch and 2.5TB /project quota |
