This job has expired

Research Computing HPC Cluster Administrator

University of Colorado Boulder
Colorado, United States
Salary Not Specified
Posted date
Nov 8, 2021

View more

Position Type
Administrative, Business & Administrative Affairs, Computer Services & Information Technology, Technology Administration/Other
Employment Type
Full Time
You need to sign in or create an account to save a job.

Job Summary

This is an exciting opportunity to be part of the team starting up our newest super-computer ! The RC HPC Cluster Administrator will be primarily responsible for the planning, configuration, and maintenance of our HPC, HTC, and "condo" cluster environments. including daily monitoring and operations, problem investigation and resolution, and providing technical assistance to end-users and other members of the research computing team. This position also collaborates with storage administrators to support cluster-attached "scratch" storage resources.
The University of Colorado Boulder is committed to building a culturally diverse community of faculty, staff, and students dedicated to contributing to an inclusive campus environment. We are an Equal Opportunity employer, including veterans and individuals with disabilities.

Who We Are

CU Boulder Research Computing (RC), formed in 2010, and with a growing regional and national presence, is an exciting and fast-paced innovator and campus leader, as well as a provider of comprehensive cyberinfrastructure supporting the CU Boulder campus research computing, and data needs. RC supports fair and equitable access for a diverse set of stakeholders to innovative, large-scale computing and data resources, emphasizing open, stable, and secure access to data while maintaining required measures of compliance for our sponsors.

Research Computing is an important part of the Office of Information Technology :

  • OIT will be valued by campus as a strategic, inclusive, and innovative partner in advancing learning and discovery in order to enable CU Boulder to be a premier public university.

  • OIT enables campus priorities by providing high-value IT services and solutions.

  • Trust, as a foundation for how we engage with one another and with campus partners, along with
  • Avid curiosity in how to better support the campus and our stakeholders while Fostering empowerment and authentic engagement among ourselves and
  • Celebrating inclusivity that promotes a sense of belonging while acknowledging that each person is unique and valued.

  • OIT will advance learning and discovery by delivering high-value reliable IT services and solutions that:
  • Provide a fluid and adaptable academic and student experience
  • Enable research competitiveness and
  • Deliver core infrastructure and enterprise IT services for business effectiveness.

Based on our departmental goals and our commitment to diversity and inclusive excellence, OIT particularly welcomes applications from candidates whose knowledge, skills, and abilities, and desire to contribute to an inclusive campus environment, will help us achieve our vision of a diverse and inclusive community.

What Your Key Responsibilities Will Be
  • Plan, propose, and implement new solutions in and improvements to the compute cluster environments. May include cluster architecture, hardware repairs, operating system provisioning and configuration, system software updates, and procedure automation. Respond to end-user queries. Work with external suppliers during proposal, quoting, and deployment processes when expanding an existing system or deploying a new system.
  • Proactively monitor the compute cluster infrastructure health and performance using automated monitoring systems.
  • Test and tune compute cluster and associated storage systems to optimize performance and reliability.
  • Participate in associated storage administration and hardware maintenance, file system configuration, storage server updates, and access provisioning and control.
  • Maintain and/or create documentation in support of the research computing infrastructure for the benefit of the user community and members of the internal support team.
  • Coordinate with "Science Network" administrators.
What You Should Know
  • This position carries a general expectation to respond to critical issues and incidents that arise outside of normal business hours within a reasonable time frame, as established by the position’s supervisor. This expectation is in support of commitments RC has made that many of its services will have “best-effort” coverage outside of regular business hours.
  • This position has the possibility of being primarily remote, with visits to campus 3 - 4 times per month, or the incumbent may choose to be onsite as much as 100% of the time.
  • All University of Colorado Boulder employees are required to comply with the campus COVID-19 vaccine requirement . New employees must provide proof of vaccination or receive a medical or religious exemption within 30 days of employment.
What We Can Offer
  • Salary range for this position is $86,000 to $97,000.

The University of Colorado offers excellent benefits , including medical, dental, retirement, paid time off, tuition benefit and ECO Pass. The University of Colorado Boulder is one of the largest employers in Boulder County and offers an inspiring higher education environment. Learn more about the University of Colorado Boulder .

Be Statements
Be collaborative. Be game-changing. Be Boulder.

What We Require
  • Bachelor's Degree in science, engineering or related field. A combination of education and relevant experience as described below may be substituted for a degree on a year for year basis.
  • Detailed knowledge of and 4 years professional experience in a combination of the following:
    • Design, deployment, configuration, and administration of clustered Linux or Unix computer systems.
    • Evaluating, configuring, and maintaining systems software.
    • Monitoring and maintaining server hardware.

What You Will Need
  • Exceptional ability to work effectively both within a team and independently, as circumstances warrant.
  • Ability to follow through with assignments and commitments in a timely and professional manner.
  • Ability to work from a set of requirements to build complex computing systems
  • Ability to develop and advocate independent solutions and system designs.
  • Experience in system and related network administration of complex computer systems, specifically Linux systems and preferably Linux clusters.
  • Experience diagnosing and repairing computer hardware.
  • Experience with batch queueing systems, preferably Slurm.
  • Familiarity with network interconnects (e.g., Intel Omni-Path Architecture, Mellanox InfiniBand, RoCE).
What We Would Like You To Have
  • Familiarity with stateless and/or diskless server provisioning.
  • Experience with parallel file systems (e.g. GPFS, Lustre, or BeeGFS).
  • Knowledge of or experience in networking systems and software, including TCP/IP, DNS, DHCP, PXE, LDAP, and NFS.
  • Familiarity with configuration management using tools such as Puppet, Ansible, Foreman, and Git.
  • Scripting experience (Bash and/or Python preferred).
  • Experience providing end-user support, particularly with a ticket tracking systems.
Special Instructions

To apply, please submit the following materials:
  1. A current resume.
  2. A cover letter that specifically tells us how your background and experience align with the requirements, qualifications, and responsibilities of the position.

We may request references at a later time.

Please apply by December 3, 2021 for consideration.

Note: Application materials will not be accepted via email. For consideration, please apply through CU Boulder Jobs.

Posting Contact Information

Posting Contact Name: Boulder Campus Human Resources

Posting Contact Email:

You need to sign in or create an account to save a job.

Get job alerts

Create a job alert and receive personalised job recommendations straight to your inbox.

Create alert