University of California

Policies

Overview

The mission of the Shared Research Computing Services project is to provide highly available, extremely powerful, leading edge research compute resources to all campuses of the University of California. These resources are provided through the UC system, exclusively for UC research projects, with costs borne by the university rather than directly by the projects. This will allow projects to concentrate on producing results and delivering scientific breakthroughs without the burden of overhead incurred by competition for and logistics management of high-performance computing infrastructure.

The policies of ShaRCS regarding use of the computing resources are designed to support this mission to the fullest. As this concept is new within the UC community and the Office of the President, its definitions and boundaries are not fixed. One of the primary goals of the pilot phase is to identify and clarify how the ShaRCS objectives may best be met. Pilot phase users will have a high degree of responsibility for shaping and fleshing out the functions and capabilities these services will provide in later phases.

The user-oriented policies of ShaRCS are evolving. During the pilot phase, observed user behavior and workflow will contribute to the development of these policies. Feedback from PIs, users and project managers is welcome and encouraged to assist the ShaRCS project administration team in providing the most valuable, effective, and research-friendly computing environment possible for all research projects at the University of California. The project's sponsors depend on your suggestions and comments regarding improvements to the policies outlined below.

Open Use Policy

The ShaRCS resources adhere to an open use policy during the pilot project phase. This means that users are not given a limit on the number of hours of CPU time they may consume. Although the amount of hours they use will be tracked, the general policy is to allow as much time to run on the clusters as they so determine is necessary for their research. This is a significant benefit to researchers, and contrasts favorably with many HPC resources that require job runs to be estimated in advance and requests for project time to be allocated, justified, paid for, and extended or supplemented as their time depletes.

The limit for compute nodes on a single job is 256, which is over 95% of the cluster's compute nodes. With no allocations to restrict them, users could overtax the system and unfairly monopolize the clusters. It is expected that users will self-regulate, and recognize the needs of others with access to the system. If users are unable to run jobs because of system unavailability due to other users' monopolistic behavior, administrators will request the abusive parties to reduce their demand and find a cooperative strategy for each, without forcing limits on them. Of course, when there is no other demand for the system, users are welcome to consume all the compute time they wish.

A mechanism for users to communicate about such matters has not been defined as yet, so likely this will be moderated through the ShaRCS Help email account. A mail list is being considered to potentially support users managing this by themselves, and may be in place by the time the full 24 pilot projects all have access.

The User Support staff will also try to suggest ways that project tasks might be modified to better fit the scheduling strategies of the queue system. This could entail modifying job scripts or compute algorithms so that better availability of resources is guaranteed to more projects, perhaps at the expense of development time or performance of a project's task load.

Job Submissions

Users should only submit jobs to compute nodes in the cluster to which they are assigned. Login nodes are not configured to handle production runs and should only be used for compilation and testing to determine if codes or job submission scripts are ready to be used on the regular compute nodes. A fifteen-minute limit is defined for compute jobs on the login nodes, and jobs exceeding this time may be killed. Compilation jobs may take longer and are not restricted by time.

Since there is no charge for running jobs, users are strongly encouraged to offload as much of their compute task load as possible to the compute nodes. There is no penalty for running jobs on compute nodes, finding that a fix is required, and resubmitting the job. Keeping this type of work off the login nodes will make access for other users easier when the system is busy.

As ShaRCS is a parallel programming environment, tasks that lend themselves to parallelization should be developed as such. Serial codes that fit the parallel programming model can be modified to greatly improve resource usage and job performance. Likewise, new programming tasks that embody parallel algorithms should be developed with the intent to use a parallel architecture as much as possible. Please contact if you would like advice on how your project might benefit from this strategy.

Expected User Behavior

Certain limitations may be enforced on users who are not cooperative. Job queues are detailed on the Running Jobs page. Users may obtain approval to run jobs outside the queue specifications by sending a request and justification to .

Support Service Expectations

The User Support staff will respond within a reasonable time to questions and requests for assistance delivered through the proper channels. The only accepted channel for support requests at this time is the email account.